The promise of bioinformatics

Now and again, you hear the concern that bioinformatics will fail to “fulfill its promise”. I find this statement to be both a bit scary and a little preposterous. Scary because the success of the field will have an effect on my own personal success. Preposterous because, well, the advantages of high throughput computation, structured biological databases, etc. are so abundantly clear, how could bioinformatics possibly fail?

There are certainly success stories. Important approaches to biological analysis in use today were not available ten years ago. I think some of the frustration arises because, in spite of these successes, some users feel that much of the biomedical software being churned out today just isn’t quite useful enough to justify the cost (in money and time) of using it. Assuming this is the case, we must then ask, why? Is biological analysis too difficult to capture in a set of machine instructions? Are bioinformaticians just a bunch of good for naughts?

The latter response, though intended to be humorous, is actually probably more common among biologists than we bioinformaticians might like. This answer also suggests a dichotomy too often encountered in organizations undertaking software development, that of users vs. developers, or, in the domain-specific vernacular, scientists/clinicians vs. bioinformaticians. Users, angered by the failure of software, blame developers for not working hard enough, for not listening, for being idiots (you know they think this sometimes), etc. Developers, for their part, are no more charitable. Developers blame users for not knowing what they want, for using software inconsistently, for not being able to work around seemingly trivial problems, and of course for being idiots. Much of the naturally occurring tension churned up in the process of building software finds its release in similar fits of whinging by one camp or the other. When we’re more reasonable, we are still honestly perplexed by the question, why isn’t this working out better?

In the past I’ve heard people say that bioinformaticians just need to be trained very well in both biology and computer science, that this would alleviate a lot of the problem of getting them to build biologically relevant and valuable software. This may work in some cases. A couple weeks ago I was having lunch with a biologist colleague, and he told me that I needed to learn the biology better, otherwise I would always be beholden to biologists to come up with interesting problems to work on. I see what he was getting at, but I don’t think that is the solution. The truth is both biology and software development are so complex that I don’t think it’s possible to gather into one person’s head all the expertise necessary to produce all the products that bioinformatics promises. Rather, I think the answer is better communication between biology experts and software experts.

Rather than focusing solely on algorithms and technologies, we must focus more on the people side of building biomedical software. You read this very comment in the bio-IT business literature sometimes, taken from the mouths of venture capitalists, in the form of something like “companies can no longer expect to get funding simply for having cool technology”, their software has to solve a biologically relevant problem, i.e. it has to be useful. I am reminded of something I heard during a talk at an agile conference in New Orleans in 2003. Josh Kerievsky said, and I paraphrase, “Some think we’re in the technology business, but we’re not. We’re in the communication business”. Communicating effectively with users is surprisingly difficult to do, and requires wisdom and dedication to get right. Effective communication is a much bigger challenge than the algorithms and the technology usually are. What’s more, it’s a two-way street, and both developers and scientists have to be committed.

I think biologists and bioinformaticians want to communicate better. I think part of the problem is organizational. For instance, at Northwestern, like at any university, it’s very difficult to get good office space. We were stuck in a converted greenhouse for the three years I was in Evanston, on the top floor of the Hogan building. At first we were in the same building as many of the biologists with whom we worked, but not all of them. Over the course of the project, a nice new building opened up, and a number of them moved into the new building. We typically saw these people once a week, if that, at weekly user meetings. We tried to keep contact with them by going to see them individually on a periodic basis (although I think we probably could have done a better job at that). But these people are very busy, and it’s difficult to fit into their schedules. As my biologist colleague pointed out at lunch, it would have been ideal if we could have had shared office space, so that spontaneous discussions would have been more frequent. I think the software we built would have become more useful as a result.

There are many other organizational problems (the difficulty of funding ongoing bioinformatics groups on grants; the lack of a history of operational management positions within academic groups). I imagine some of these get easier in industry. But these are not the only problems. I also think that we don’t yet focus enough attention on communication, on doing it right, and on getting help from outside to do it right. Most of us assume that, hey, we’re smart people, we should be able to communicate. Part of the problem is we’re too smart. We’re trying to communicate complex information, information we’re used to communicating with peers within our field who usually understand us even if we’re not clear. The level of tacit knowledge in biology, like in software, is very high, and it often takes people outside the field quite a while to get a feel for a problem.

This post is long enough. We talk about some of these communication issues in the paper David Kane and others and I are trying to get published on agile software methods in bioinformatics. I think agile principles help, at the very least because they get software engineers to focus less on impressing people with their bulletproof processes and more on people and on communication that works (i.e. “Individuals and interactions over processes and tools”). Although our application domain is more technical, the general software industry and its customers have been grappling with these issues for years, and there are a number of very smart and capable people out there who could really help transfect bioinformatics groups with good approaches for tackling these problems.

4 Responses to “The promise of bioinformatics”

  1. Deepak Says:

    It is not just bioinformatics that struggles with this, but computation in itself. Even after so many years computer modeling as something different from experiment and somewhat inferior. Part of it stems from days when computational models (this is mostly true for molecular modeling) were so bad, due to the number of approximations required, that a generation of scientists grew up not trusting them. I think that could change. If today’s scientist learns how to apply bioinformatics and other computational techniques as part of their college education. IMO that fundamental change, i.e. incorporating informatics and modeling into lab work for all undergraduates, and maybe even in high school will make a big difference.

  2. Moses Says:

    Thanks, Deepak. You make a good point about the historical effects of past failures. I agree that having biologists get some exposure to informatics during their education would help. Ideally this would not just be a few weeks’ crash course in Perl or one class on using NCBI’s eUtils, but would involve working with actual developers to produce something together, perhaps a cross-discipline lab course between computer science and biology.

    I also agree that poor communication is a wider problem that exists anywhere people build software for other people. I think the problem is more severe in any field like bioinformatics where the domain is very complex. The onus is on software and (in this case) biologist leaders to make improved communication a priority.

  3. Mark Says:

    Lately I’ve been helping some students in my lab by reviewing their theses and dissertations and giving them feedback. One of the biggest shortcomings of the program is that they don’t get enough biology throughout the course of the program, and their theses reflect that. Questions that should have been asked and answered over the course of their projects were never addressed. The biologists that they worked with, often didn’t spend enough time on the project with them discussing the relevance and application of their work. This leaves gaping holes in what they could have discovered and places serious limitations on the scientific relevance and value of their work.

    From an interaction standpoint, it’s best to learn as much as you can about the problem space you’re dealing with, in order to ask the right questions of your collaborators. That’s often difficult to do, if you speak PERL/Python/Java and they speak Molecular Biology/Microbiology/Virology. You have to speak the same language to be on the same page. Even if you’re not the one formulating the questions, you have to be able to understand the question and understand its implications.

  4. Moses Says:

    Thanks for your comments, Mark. it certainly helps to be able to speak the language of biology if you’re writing biological software. There are many kinds of bioinformatics software projects, and in the scenario you describe above, a student writing a bioinformatics thesis, I think it’s fair to expect the programmer to have a deeper understanding of the biology. However, for other projects this may not be realistic. You see this all the time in other application domains. For example, at ThoughtWorks our team wrote equipment leasing software. None of the programmers was a leasing expert (or really knew much about leasing at all). This situation was (and is often) mitigated by hiring “business analysts”, people who understand the business and have some understanding (sometimes a deep understanding) of software. These people are in a position to notice when programmers don’t ask the right questions, can act as proxies for the customer when the customer is not available to answer questions, and in general can translate between programmer-speak and business-speak.

    Most bioinformatics projects have very small software development teams, so it is perhaps impossible for many of these projects to expect to have someone who could just play a “science analyst” role. However, I think it’s perfectly reasonable to try to form a team composed of experts in software who don’t understand the biology fully and bioinformaticists with less training in software but a good understanding of the science. This is in fact what my colleague Mike McCormick’s group did at the Hutch, and he reports that it works quite well. The nice thing is that the more senior developers teach the bioinformaticists more about writing code, the bioinformaticists teach the developers more about biology, and everyone gets better at what they do in the end.

Leave a Reply