Medbioinformatics? Biomedinformatics?

Deanne Taylor — Sat, 07 Feb 2009 04:53:48 +0000

Things have changed for me, radically. I left Harvard School of Public Health back in September, with a heavy heart in some ways because of the great opportunity that HSPH offered. Some people might think I’m crazy, jumping out of Harvard to go to a clinic at a smaller institution like UMDNJ/RWJ. But I had my reasons.

The reason was opportunity and to make a big impact in patients’ day-to-day lives. At HSPH, I was working on more esoteric problems, though many resulting in changes, possibly, to public health policy or our understanding of public health. At RMA/RWJ, I will be changing patients lives daily, and my bioinformatics work and research will end up in many babies being born. Many parents will be walking away from my bioinformatics analyses with babies in their arms…how cool is that?

I took a leap, nearly sight-unseen, into what was, for me, an exotic metaworld of informatics. It’s not really ‘medical informatics’. It’s medical bioinformatics. Medbioinformatics?

Giving it a name is actually important, in a way. If you care to read about why, jump down below.

I speak generally here — as scientists, we are usually able to work with patient data on a limited basis — for example, with genetic information tied to a few clinical series or endpoints.

However, what happens when the entire detailed patient clinical record is opened up to tie directly to the patient’s genetic and genomic high-throughput data through an IRB-guided study? (I do want to mention that there is always IRB involvements assumed in this post and appropriate levels of ethics review and oversight)

In research clinical situations, the high-throughput data can pile up fast, and improving patient health and quality of life are the direct goals. The high-throughput technologies are affordable enough, now, that clinics can set up studies across a number of endpoints and end up with huge amounts of genetics and genomics data to mine.

Previously, I’ve worked with groups where a few clinical endpoints were tested against genetics data through association studies — and I imagine an entire silo of patient records and genetic data could be approached with association studies, but adding molecular biology to the mix makes for some interesting research. The high-throughput data gains additional dimensions from the clinical data.

Medicine meets bioinformatics with a shake of the hand in plenty of medical schools right now, but it’s probably not enough. Bioinformatics is well established in medical centers — like those for cancer or genetic diseases — where informatics as of necessity already got a foothold, but there are many barriers for other areas of clinical study.

The largest barriers I have found are in language and communication, and difference in methods and ways of thinking about data. Clinicians have their own statistical language that can’t always apply to what high-throughput methods require. We need to open wider channels of communication between medical clinicians and molecular and computational biologists, for that time when we must all work together.

Yet, the time’s already here when the full patient clinical endpoint record is integrated with the full patient genomic/genetic record, and where the ethics, the informatics, and the medicine all meet — and we often must work to understand each other. We can start by defining our own new language as a fusion of the two.

Medinformatics? Biomedinformatics? Translational bioinformatics? It might not have a name that fits best, but it’s a necessary effort. And when the end result is babies, you can’t go wrong.

Future Bioinformatics

Deanne Taylor — Fri, 23 May 2008 21:22:45 +0000

Technology is accelerating bioinformatics needs, again, while the current need isn’t diminishing.

This post was meant to be just a brief snapshot aimed at students wondering where bioinformatics is going in 2008 and beyond. What’s the future of bioinformatics? What kind of focus should you develop in the near future? What kinds of skills will you need?

More after the jump, below.

Recent (but probably solvable problems) in bioinformatics include:

1) Pipelines that generate new de novo copies the human genome as part of massively parallel sequencing projects. I’m not talking about just aligning bits of nucleotide sequences to a reference sequence, but also using that information to completely re-assemble each new genome, variations and all.

These algorithms must yield, as part of their task, fast and efficient alignment of massively parallel sequence reads (50-bp and above). I believe that 50-100bp analysis will become important in the near future. For a nice review of some of the recent developments, see the review page on massively parallel sequence alignment by Heng Li at Sanger

Alignment problems are partially solved by programs such as ELAND, MAQ,SSAHA2, etc — there’s another complication to add to the mix: be both fast AND efficient. Some systems are drowning in the flood of next-gen sequencing so all you compbio efficiency geeks who are also interested in being bioinformaticians, you have a big open playing field there, even for those of us with clusters, the flood of data will be overwhelming.

So, to regenerate an individual human genome sequence, any pipeline containing these methods must find the most likely location of such reads and on top of this, determine A) single nucleotide (SNP) genetic variation B) copy number variation and C) large-scale chromosomal rearrangements from such reads against what is certainly (at this point) once again the draft copy of the human genome. Which brings me to —

2) Integration of sequence and genomic data: Enter the concept of the “Human Statistical Genome”. You will need statistics. How do you integrate data across many different individual genomes? How do you integrate functional genomic information back to these genome sequences?

Who’s to say which chromosomal inversion is really atypical or rare with a minor sample of the population? A thousand genomes can be sequenced, but it might not be wide enough for an outbred population like humans. Or, we might have a hard time estimating the true number of variations from limited samples

How do we estimate a sequence representing the ‘most popular’ human genome as a new true reference? Our human genome could be replaced from a nice modular set of (mostly) contiguous chromosomes to a draft copy which contains probabilities at each nucleotide position representing the polymorphism in the population. The highest resolution for the human genome might not stop at the HapMap level (sequence blocks)…it may reach down to nucleotide level as we discover more about human genetic structure.

3. Microarrays. We love them, we hate them. They plague us with variance and give us insights nevertheless. Will microarrays be replaced by sequence methods? Or are things like ChIP-Seq going to become standard? Are there standards in microarrays? Will microarrays be around in 10 years, or will everything fall to massively parallel sequencing? Will microarrays stick around because of their precedents, and because they’re simple to work with and don’t require huge computational facilities to process their data?

4. Computational Systems Biology. The ENCODE project, though not directly a sysbio project, is doing a lot for the field by showing that the big picture is a lot more complex than the early attempts at modeling could hint at. CSB is pretty much where it was a few years ago. This isn’t to say that it’s not going anywhere and that systems biology is ‘dead’…far from it. You’ll find that systems biology — or if you prefer, integrative biology — is just in that low slow lingering dawn right before a summer day.

5. Medical informatics. Integration of clinical data back to biomedical/experimental data. There’s so much to say here — and it’s still in its infancy. If you have any good resources on this, please post them in the comments.

Translational Informatics – Confluxion

Medbioinformatics? Biomedinformatics?

Future Bioinformatics