SciLnk

April 20th, 2006

Some friends of mine have been working on an interesting new startup called SciLnk. Essentially, the site is LinkedIn for life scientists. It will also allow you to browse pubmed abstracts via the network of authors. Often when you’re researching a subject you want to read all the papers published by a particular person and his or her collaborators on the topic, and it’s not as easy as it should be to collect everything via PubMed. SciLnk hopes to leverage the people network to improve the literature browsing experience and vice versa. There are lots of other interesting directions to take such a resource: improving the conference-going experience, grant searching/suggestions, job searching, etc.

They recently posted screenshots (and, more recently, these) of the product under development, and you can sign up to get in on internal beta testing at scilnk.com.

Full disclosure: I was personally working with the SciLnk team in its early days, but quickly realized I had bitten off more than I could chew, what with having a full-time day job and not living in Boston with the rest of the crew. I no longer have any financial interest in the company, I just think it’s a cool idea.

A few more thoughts on communication, tech posts to come

March 21st, 2006

I discovered recently on postgenomic.com that mine is one of the wordiest life science blogs around, so I’m going to try to be a little pithier. We’ll see if I can constrain myself.

In my last post I argued for the central importance of effective collaboration and communication in biomedical informatics. I wanted to list a few things that have worked for my teams in those areas. At Northwestern we worked on two projects. Neuromice.org is a phenotype database and virtual storefront for the mutant lines produced by three neurologically-focused whole genome mutagenesis efforts at Northwestern, the Jackson Laboratory and the Tennessee Mouse Genome Consortium. The other application, MouseDB, is an intranet (i.e. you can’t see it) colony and phenotyping management system for the mice under study at Northwestern (10,000 mice/year when we were in full swing). Each project had different challenges, but here are a few things I learned from those experiences. Some are pretty standard agile ideas, others less so.

  • Each distinct customer/user subgroup should appoint a representative who speaks for that subgroup in all discussions of feature definition and priority. Keep the number of subgroups as small as possible (ideally, one). This greatly reduces the uncertainty and difficulty of scope decisions.
  • Some users in the group might have no reason to use your software. Make this fact explicit, and don’t factor their interests into the product.
  • Be completely open with your user community. Give them the opportunity to know everything you’re working on, and the reasons for (and the opportunity to contribute to) any decisions made about features going into the product.
  • A development team should avoid making any decisions about scope or feature priority. Emphasize to users that it is in their power to steer the software toward the greatest possible utility. Technical improvements are a sticking point here, but we’ve found if you make a good argument for them, users understand their value and will prioritize them appropriately.
  • If you let academics’ busy schedules eat away at your face time with them, you will eventually suffer for it. Be creative.

I think I’ve reached my word limit. Over the next couple weeks I’m going to let loose a flurry of technical posts on various topics that have been on my mind lately.

The promise of bioinformatics

March 3rd, 2006

Now and again, you hear the concern that bioinformatics will fail to “fulfill its promise”. I find this statement to be both a bit scary and a little preposterous. Scary because the success of the field will have an effect on my own personal success. Preposterous because, well, the advantages of high throughput computation, structured biological databases, etc. are so abundantly clear, how could bioinformatics possibly fail?

There are certainly success stories. Important approaches to biological analysis in use today were not available ten years ago. I think some of the frustration arises because, in spite of these successes, some users feel that much of the biomedical software being churned out today just isn’t quite useful enough to justify the cost (in money and time) of using it. Assuming this is the case, we must then ask, why? Is biological analysis too difficult to capture in a set of machine instructions? Are bioinformaticians just a bunch of good for naughts?

The latter response, though intended to be humorous, is actually probably more common among biologists than we bioinformaticians might like. This answer also suggests a dichotomy too often encountered in organizations undertaking software development, that of users vs. developers, or, in the domain-specific vernacular, scientists/clinicians vs. bioinformaticians. Users, angered by the failure of software, blame developers for not working hard enough, for not listening, for being idiots (you know they think this sometimes), etc. Developers, for their part, are no more charitable. Developers blame users for not knowing what they want, for using software inconsistently, for not being able to work around seemingly trivial problems, and of course for being idiots. Much of the naturally occurring tension churned up in the process of building software finds its release in similar fits of whinging by one camp or the other. When we’re more reasonable, we are still honestly perplexed by the question, why isn’t this working out better?

In the past I’ve heard people say that bioinformaticians just need to be trained very well in both biology and computer science, that this would alleviate a lot of the problem of getting them to build biologically relevant and valuable software. This may work in some cases. A couple weeks ago I was having lunch with a biologist colleague, and he told me that I needed to learn the biology better, otherwise I would always be beholden to biologists to come up with interesting problems to work on. I see what he was getting at, but I don’t think that is the solution. The truth is both biology and software development are so complex that I don’t think it’s possible to gather into one person’s head all the expertise necessary to produce all the products that bioinformatics promises. Rather, I think the answer is better communication between biology experts and software experts.

Rather than focusing solely on algorithms and technologies, we must focus more on the people side of building biomedical software. You read this very comment in the bio-IT business literature sometimes, taken from the mouths of venture capitalists, in the form of something like “companies can no longer expect to get funding simply for having cool technology”, their software has to solve a biologically relevant problem, i.e. it has to be useful. I am reminded of something I heard during a talk at an agile conference in New Orleans in 2003. Josh Kerievsky said, and I paraphrase, “Some think we’re in the technology business, but we’re not. We’re in the communication business”. Communicating effectively with users is surprisingly difficult to do, and requires wisdom and dedication to get right. Effective communication is a much bigger challenge than the algorithms and the technology usually are. What’s more, it’s a two-way street, and both developers and scientists have to be committed.

I think biologists and bioinformaticians want to communicate better. I think part of the problem is organizational. For instance, at Northwestern, like at any university, it’s very difficult to get good office space. We were stuck in a converted greenhouse for the three years I was in Evanston, on the top floor of the Hogan building. At first we were in the same building as many of the biologists with whom we worked, but not all of them. Over the course of the project, a nice new building opened up, and a number of them moved into the new building. We typically saw these people once a week, if that, at weekly user meetings. We tried to keep contact with them by going to see them individually on a periodic basis (although I think we probably could have done a better job at that). But these people are very busy, and it’s difficult to fit into their schedules. As my biologist colleague pointed out at lunch, it would have been ideal if we could have had shared office space, so that spontaneous discussions would have been more frequent. I think the software we built would have become more useful as a result.

There are many other organizational problems (the difficulty of funding ongoing bioinformatics groups on grants; the lack of a history of operational management positions within academic groups). I imagine some of these get easier in industry. But these are not the only problems. I also think that we don’t yet focus enough attention on communication, on doing it right, and on getting help from outside to do it right. Most of us assume that, hey, we’re smart people, we should be able to communicate. Part of the problem is we’re too smart. We’re trying to communicate complex information, information we’re used to communicating with peers within our field who usually understand us even if we’re not clear. The level of tacit knowledge in biology, like in software, is very high, and it often takes people outside the field quite a while to get a feel for a problem.

This post is long enough. We talk about some of these communication issues in the paper David Kane and others and I are trying to get published on agile software methods in bioinformatics. I think agile principles help, at the very least because they get software engineers to focus less on impressing people with their bulletproof processes and more on people and on communication that works (i.e. “Individuals and interactions over processes and tools”). Although our application domain is more technical, the general software industry and its customers have been grappling with these issues for years, and there are a number of very smart and capable people out there who could really help transfect bioinformatics groups with good approaches for tackling these problems.

Northwestern to host biomedical software development meeting

January 7th, 2006

Good news! Northwestern University will host the second BRIITE meeting in 2006, to take place sometime around August or September of this year. We are planning on offering a second component to the meeting exclusively focused on software development issues, and for that we will solicit a wider audience than the one that typically attends BRIITE. Instead of just managerial folks, we also want the software developers, testers, scientists, etc., because this will make for a much more lively, interesting and meaningful discussion.

This idea is still very new, but I will keep you posted on developments as things unfold.

BRIITE 2005 La Jolla

December 9th, 2005

Last month I attended the BRIITE meeting for the first time. As its website says, the meeting’s mission is to:

  • Establish personal contacts, bringing together those responsible for research computing activities at biomedical research institutions
  • Identify and document common problems and interests
  • Seek opportunities for partnership / consortium activities
  • Identify common issues that should be brought to attention of home institutions, government and other funding agencies

One of the things I really liked about BRIITE was its focus on stimulating offline discussions. I went to the meeting hoping to find out if there were others in the biomedical informatics community who cared about software development issues. The focus of the meeting was “IT Support for Multi-Institution Collaborative Research”. We heard talks on federated identity management, Globus, GridShib, BIRN, and, slightly off-topic but still pretty neat, the Research Channel. So, the meeting was mostly about distributed security, and many attending were not primarily software development people.

Due to the focus on offline discussions, attendees are encouraged to suggest topics of interest, and then others sign up to discuss these topics as a group (a conference practice I’ve heard called a “Birds of a Feather” session). I proposed software development practices as a topic, and and a small group of us got together for a lively two-hour discussion. I will summarize what we talked about here.

People were interested in the topic for a variety of reasons. One person wanted to know how to introduce better development practices to his group. Another wanted input on how to manage many small projects at once (a common problem in biomedical informatics). Another was plagued with the problem of funding shared informatics resources at a biomedical research institution (people clearly need these resources, but no one wants to or can pay for them). Another wanted to learn more about agile software methods. Finally, I wanted to talk about how better to promote awareness and dicussion of software practices in the biomedical informatics community.

In the interest of brevity I’ll just list bullet points rather than go on and on about each item.

Funding IT

While not strictly a software development practice topic, anyone who wants to assemble a decent-sized software team at a biomedical research institution runs up against the problem of funding it. Most of the helpful tips here came from Charles Donnelly of the Jackson Laboratory.

  • If these resources don’t exist, who provides seed funding to get them started? Investigators typically cannot fund these things from their research grants. Institutional funding and commitment is necessary. Where it exists, people have seen some success in developing shared informatics resources (e.g. the Jackson Laboratory).
  • Funding improves if core informatics staff reviews research grants to help make them more reaslitic about informatics needs and funding. Encouraging awareness among scientific investigators and administrators is key.
  • Once you have institutional funding, keep track of the time you spend on various grants. Calculate the percentage of your total work this amounts to, and tell the PIs on those grants. This will help them understand how much their informatics costs, even if they aren’t paying for it yet.
  • Generally, PIs that have done some computing in the past are more forward-looking, so these are good people to start with (no-brainer).
  • Funding informatics from research grants is difficult, because all budgeting is done ahead of time. Often, however, the informatics needs change over the course of the scientific project. How do you come up with specifics like $ and number of FTEs before a project begins? (The answer, it seems to me, is you do what scientists do. You make your best guess, and then you adjust what you produce based on what you have the money to produce once you learn what is possible during the course of the project.)

Project Management

Mix of suggestions and problems here . . .

  • One way to scale your capabilities effectively is to develop shared resources used by several groups. Supporting shared resources can be a challenge, however, because PIs may want results formatted their own way, not in a common format.
  • Before embarking on a project, draw up a project charter. This document establishes the level of involvement of key people (on both the informatics and science side) and the high-level outline of the work planned. The process of preparing this document will let you know if science people are invested enough in the project, or if there are reasons to worry. One cannot underestimate the importance of buy-in and personal investment in the success of an informatics endeavor by the science leaders on the project.
  • Promote awareness of software development: Jax has a software lifecycle process approved by the institution and posted up everywhere.

Requirements/Communication

  • Some used a detailed requirements gathering process (at Jax they draw up 150 page documents) to firm up what would be developed.
  • Others used agile approaches, which are interactive with the scientists throughout the life of the project and make it a specific point to allow the requirements to evolve along with the scientists’ understanding of their needs.
  • All agreed that requirements gathering in biomedical software has to be an ongoing, interactive endeavor – biologists sometimes want to wait until features are in production before they give feedback, they want to see something working and tinker with it.
  • Underestimated opportunity for software people: helping biologists with their ad-hoc solutions (Access, Excel, FileMaker Pro) — biologists will always use Excel, so is there some way we can help them use Excel better? Can we leverage their familiarity with these technologies? A software development background teaches you to look askance on “low-tech” solutions like these, but in this domain there is something to be said for them.
  • Starting with the front end (UI) and working back helps user interaction — you can show a prototype of an interface and get better feedback earlier.
  • Best ideas don’t always come from biologists: we are doing our job best when we are helping scientists understand what is possible with software, and helping them focus on their goals and how software could help achieve them.

Looking Forward

We talked about possible content for a meeting or conference on biomedical and bioinformatics software development. People showed interest in the following.

  • Documenting typical staffing models — help us all understand our options for organizing our work and getting it funded
  • Questionnaire — find out what people are doing, try to get a picture of the current state of affairs
  • Present experience reports: What working examples are out there?
  • People preferred a workshop more than a conference, a hands-on experience. Plenty of short talks that don’t give answers, but raise questions – followed by open discussions.
  • People would like to see defined milestones for the meeting.
  • Tutorials on some subjects (e.g. project management, quality assurance and testing, good development practices) would be welcome
  • Q&A sessions – what have you done that can benefit others, what are you doing?