Why is BioPerl the favorite?
BioPerl is the most successful (as far as I can tell) of the various Open Bioinformatics Foundation projects. To me this is strange, because most programmers I know (and here I am including programmers outside the field of bioinformatics) find Perl either a little distasteful or a little passé. There certainly are some saltworthy programmers who don’t feel this way, but they’re not the majority.
I think this has to do with a) the users of these code libraries, and b) the problems these types of code libraries best solve in general. This is a wild guess, because I certainly don’t know a good cross-section of users of these libraries, nor do I have a solid handle on what sorts of problems they’re used to solve. Nevertheless, I’ll be wild and continue this train of thought.
The users are typically people who are not programmers by profession. They came to programming from biology or some other field. Their shtick is finding scientifically interesting patterns in masses of biological data, and they know how to wire together a bunch of scripts to do this. They don’t have to create stable, production-quality software. They just have to get the right answer. Perl is great for these people, because it’s basically a procedural language (yeah, bless me a hash all you want, Perl is really used primarily as a procedural language), so the programming model is easy for someone with less programming experience, and it lends itself very well to creating a library of reusable scripts.
Java, on the other hand, is a language better suited to production-quality applications. It requires a bigger upfront investment than Perl, essentially because there’s more structure, and that structure pays off down the road for larger, long-lived applications. But these applications are not typically a bunch of scripts strung together (otherwise you really shouldn’t be using Java). The Java libraries that do thrive are usually one of several competing solutions to a smaller, focused problem, not a huge mass of single solutions to a bunch of loosely related problems. BioJava fits more in the latter category.
So, why is BioRuby so far behind? Ruby is a language for programming geeks. People with less programming experience tend not to appreciate Ruby’s pure object-orientedness, its blocks/closures, etc. These features are really more confusing than valuable for your typical bioinformatician. It’s also a relative newcomer to the language scene, and a lot of the documentation is in Japanese.
Just in case it needs to be said, there’s nothing wrong with being a bioinformatician with less programming experience. These people are a whole lot better at science than your average Java business software development guru. Everyone has their strengths and their weaknesses.
In a glaring omission, I didn’t even mention BioPython, perhaps the most puzzling second-runner of them all, since it offers a nice compromise between Perl and Ruby, although it isn’t so far behind after all.