Korean Pronunciation: 몇

September 27th, 2004

Korean pronunciation is pretty difficult. Here is a not atypical example of the kinds of craziness that can occur.

The word for “how many” in Korean is 몇. For the uninitiated, that character is a Hangul syllable composed of three “letters”, ㅁ = “m”, ㅕ= “yuh” (a vowel) and ㅊ = “ch” (sort of). So you’d think it might be pronounced: “myuch”. However, when the “ch” occurs at the end of a word (in a “final” position), it changes sound to a final “t” sound. So, this word, pronounced alone, sounds like “myut”.

However, If you want to ask the date, you use this word plus the word for day, “eel” (일). Putting these two words together, you’d think you’d get “myut eel”, but not so. Because the “ch” sound now precedes a word beginning in a vowel, the “ch” is pronounced as the beginning of the following word and goes back to having its “ch” sound. Thus, you get “myuh cheel”.

This, however, is not the end of the story. If you want to ask how many people (are in someone’s family, for example), you append 몇 in front of one of the counter words for people, 명 (”myung”). Because “myung” starts with a nasalized consonant, “m”, the preceding syllable’s final consonant (our “ch”) undergoes a nasalization. You determine this nasalization from the consonant’s normal final sound, that final “t” sound I discussed earlier. “t” nasalizes to become “n”, so in formal speech you’d pronounce “how many people” as “myun myung”. However, colloquially, quite often people change nasalized final ns to ms if the following word begins with an m. Thus, you shouldn’t be surprised if you hear someone say “myum myung”.

Whew. And like I said, that isn’t an unusual example at all. There are lots and lots of contextual pronunciation rules, and of course plenty of exceptions to those rules. This makes speaking Korean correctly and understanding Korean speech kind of difficult. I guess I can say I’m lucky it’s not as bad as English, though. I also have discovered a great resource for learning about these rules and their exceptions, a book out of the impressive and prolific Korean language department at the University of Hawaii: The Sounds of Korean: A Pronunciation Guide by Miho Choo and William O’Grady.

Ruby’s block/closure support

September 27th, 2004

I’ve been playing around with Ruby lately, because I hear it’s an interesting scripting language. Like many other people, I have enjoyed the block/closure support that the language offers. For example, when creating an array, you can pass a block to the constructor to populate the Array’s values. So, if you wanted to populate an array initially with integers equal to the indices at which they occur, in Java you would have to write

int[] order = new int[52];
for(int i=0; i < order.length; i++) {
  order[i] = i;
}

whereas in Ruby you just write

order = Array.new { |i| i }

Another nice example of block usage is the following way to “shuffle” an array:

a.sort_by { rand }

where rand returns a random number between 0 and 1. In this case, the output of the block returns the value by which sort_by sorts for each element of the array. In general this seems to lead to much more compact code that is still readable.

A multiline block is specified like so:

a.collect do |element|
  if element=="capitalize me"
    return element.upcase
  else
    return element.downcase
  end
end

That code returns an array of the elements of a with the block applied to each one. Note that you can eliminate the uses of return, because those lines are the last expressions evaluated inside the block. Thus, a more Ruby-like way to write the above would be:

a.collect do |element|
  if element=="capitalize me"
    element.upcase
  else
    element.downcase
  end
end

Why is BioPerl the favorite?

September 11th, 2004

BioPerl is the most successful (as far as I can tell) of the various Open Bioinformatics Foundation projects. To me this is strange, because most programmers I know (and here I am including programmers outside the field of bioinformatics) find Perl either a little distasteful or a little passé. There certainly are some saltworthy programmers who don’t feel this way, but they’re not the majority.

I think this has to do with a) the users of these code libraries, and b) the problems these types of code libraries best solve in general. This is a wild guess, because I certainly don’t know a good cross-section of users of these libraries, nor do I have a solid handle on what sorts of problems they’re used to solve. Nevertheless, I’ll be wild and continue this train of thought.

The users are typically people who are not programmers by profession. They came to programming from biology or some other field. Their shtick is finding scientifically interesting patterns in masses of biological data, and they know how to wire together a bunch of scripts to do this. They don’t have to create stable, production-quality software. They just have to get the right answer. Perl is great for these people, because it’s basically a procedural language (yeah, bless me a hash all you want, Perl is really used primarily as a procedural language), so the programming model is easy for someone with less programming experience, and it lends itself very well to creating a library of reusable scripts.

Java, on the other hand, is a language better suited to production-quality applications. It requires a bigger upfront investment than Perl, essentially because there’s more structure, and that structure pays off down the road for larger, long-lived applications. But these applications are not typically a bunch of scripts strung together (otherwise you really shouldn’t be using Java). The Java libraries that do thrive are usually one of several competing solutions to a smaller, focused problem, not a huge mass of single solutions to a bunch of loosely related problems. BioJava fits more in the latter category.

So, why is BioRuby so far behind? Ruby is a language for programming geeks. People with less programming experience tend not to appreciate Ruby’s pure object-orientedness, its blocks/closures, etc. These features are really more confusing than valuable for your typical bioinformatician. It’s also a relative newcomer to the language scene, and a lot of the documentation is in Japanese.

Just in case it needs to be said, there’s nothing wrong with being a bioinformatician with less programming experience. These people are a whole lot better at science than your average Java business software development guru. Everyone has their strengths and their weaknesses.

In a glaring omission, I didn’t even mention BioPython, perhaps the most puzzling second-runner of them all, since it offers a nice compromise between Perl and Ruby, although it isn’t so far behind after all.