Technology is accelerating bioinformatics needs, again, while the current need isn\u2019t diminishing.<\/p>\n
This post was meant to be just a brief snapshot aimed at students wondering where bioinformatics is going in 2008 and beyond. What\u2019s the future of bioinformatics? What kind of focus should you develop in the near future? What kinds of skills will you need?<\/p>\n
More after the jump, below.<\/p>\n<\/div>\n
Recent (but probably solvable problems) in bioinformatics include:<\/p>\n
1) Pipelines that generate new de novo copies the human genome as part of massively parallel sequencing projects. I\u2019m not talking about just aligning bits of nucleotide sequences to a reference sequence, but also using that information to completely re-assemble each new genome, variations and all.<\/p>\n
These algorithms must yield, as part of their task, fast and efficient alignment of massively parallel sequence reads (50-bp and above). I believe that 50-100bp analysis will become important in the near future. For a nice review of some of the recent developments, see the review page on massively parallel sequence alignment by Heng Li at Sanger<\/a><\/p>\n Alignment problems are partially solved by programs such as ELAND, MAQ,SSAHA2, etc \u2014 there\u2019s another complication to add to the mix: be both fast AND efficient. Some systems are drowning in the flood of next-gen sequencing<\/a> so all you compbio efficiency geeks who are also interested in being bioinformaticians, you have a big open playing field there, even for those of us with clusters, the flood of data will be overwhelming.<\/p>\n So, to regenerate an individual human genome sequence, any pipeline containing these methods must find the most likely location of such reads and on top of this, determine A) single nucleotide (SNP) genetic variation B) copy number variation and C) large-scale chromosomal rearrangements from such reads against what is certainly (at this point) once again the draft copy of the human genome. Which brings me to \u2014<\/p>\n 2) Integration of sequence and genomic data: Enter the concept of the \u201cHuman Statistical Genome\u201d. You will need statistics. How do you integrate data across many different individual genomes? How do you integrate functional genomic information back to these genome sequences?<\/p>\n