R makes NYT

January 7th, 2009

Nice: http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html

Pair programming and microarrays

July 9th, 2008

Yesterday I met with folks at Lawrence Berkeley labs. The PI entered the room, full of energy and clearly ducking briefly out of the fray to speak with us. Part of the discussion revolved around microarray experiments. We’ve all heard about how notoriously difficult it is to reproduce microarray experiments. People have proposed minimum information standards (really they’re guidelines) to combat this problem, and we’ve also all heard that often these standards aren’t enough. Even if people are following the guidelines, inevitably a crucial piece of information isn’t obviously critical and therefore isn’t communicated.

The PI noted that he has seen it to be helpful when more than one lab conducts an experiment, so that each can help the other avoid finicky and/or tacit experimental conditions that would prevent others from reproducing their results. I have wondered for some time (and for the case of microarrays in particular) whether the practice of “pair programming” that we use in software development would be more helpful than minimum information standards to increase the reproducibility of complex experiments. The problem with this, as the PI pointed out, is that duplicating every experiment can get expensive, and in the world of soft money (especially today’s world), people are always looking for ways to make the research dollar go farther. The possible long term efficiency of duplicating some efforts to increase data value and reduce a tendency to go down blind alleys might not be easy to quantify, and thus not easy to weigh quantitatively against the immediate penalty of “getting half as much work done”. (That’s certainly true in software.)

The PI pointed out that even if direct duplication was too expensive, he still advocated some kind of collaboration on experiments. In particular he advocated getting people together in the same room to look at the experiment together as it was being performed, so that the collaborator might catch important things that weren’t immediately apparent to the person performing the experiment. This, at most, only costs a small amount of travel funds.

I asked the PI if others shared his views, and he said that most of the larger microarray efforts had some sort of distributed work going on, but he wasn’t sure that this idea had been formalized anywhere.

I’m interested in this not only because of its parallel with software work, but also because I work for a company focused on facilitating collaborative science. I’m very interested in the different forms that scientific collaboration can take, and how best to help them along.

DTrace predicate hack

July 6th, 2008

One of the things I keep wanting in DTrace’s D language but isn’t there (right?) is a richer set of string comparison functions. Ideally I want full regular expression functionality, so that I can predicate actions on, say, regex matches of a class and/or method name. For instance, a while back, while profiling some Java, I wanted to only count time spent in methods of classes that belonged to a particular package (org/apache/solr) or to its subpackages. There is no “starts with” string operator in D. However, the following did the trick:

hotspot$target:::method-entry
/(self->class = copyinstr(arg1)) != NULL && self->class >= "org/apache/solr/" && self->class < "org/apache/sols"/
{
  /* action goes here */
}

A little ugly, but it worked. The choice of “org/apache/sols” as the upper bound was somewhat arbitrary.

Too many mock objects == ruby refactoring death

July 6th, 2008

It’s a question we face as test-driven ruby programmers: Should we use mock objects or real objects in our tests?

Both approaches have trade-offs, and their biggest downsides both have to do with wasting programmer time. If you test with real objects, then your tests run slowly (especially if you use an ORM that binds your domain objects tightly to the database like ActiveRecord). Your tests hit the database, and this is slow. There are other sources of slowness, but nothing has anywhere near as great an effect as hitting the DB.

If you test with mock objects, once your app has any kind of complexity, your refactoring and test writing processes become slow. This is not immediately apparent when you start using mock objects. But as you start writing more and more code, eventually you start having to come up with a crazy number of mock expectations just to test some of your methods. It is true that this is good feedback that the class you’re testing presents too complex an interface to other collaborating objects, or that it collaborates with too many objects, etc. What starts simple will eventually become too complex, and at some point you’re going to need to refactor.

More insidious than this, however, is the effect this web of mocks you’ve wrapped yourself in has on refactoring. You don’t notice how thoroughly you’ve painted yourself into a corner until you want to refactor some ugly aspect of a core class that collaborates with many objects in your system. Suddenly all of those collaborating objects’ tests break because they expect certain method calls from this core object. These tests break because you’ve changed a method signature, a method name, or even worse just an implementation, because no matter what those BDDers tell you, if you test with mocks, to too great a degree that means you’re testing implementation, not behavior (or is that behaviour).

So, now you have to go through all of those mock-based tests and “correct” them, i.e. change their expectations so that they fit the new method name/signature/implementation. This is horrible. The whole point of tests during refactoring is to verify that your refactoring hasn’t changed the behavior of the system (that being half of the definition of refactoring). The tests should pass before you refactor, and they should pass after you refactor. Not only does this break the fundamental refactoring process, it also can take a lot of time, because you have to remember the context of each of those test cases that you have to fix.

You can do something about slow running tests that hit a database (in extreme cases you can use a faster in-memory database, or even parallelize your tests). Of course they won’t run as fast as they would if they didn’t hit a database, but in my opinion it’s something you can live with. Dav Yaginuma has a good suggestion for what to do with this time: Go write that email you need to write, go take the bathroom break, go walk around the office and stretch your legs. It’s not like that’s really wasted time. It is wasted time, however, if you’re sitting there squinting at the screen fixing all of your mock expectations. You can’t do anything else with that time.

I’m kind of half-convinced now that people who advocate the heavy use of mocking either have really nice IDEs that make fixing the expectations a breeze, or they don’t refactor. And if they’re using Ruby that means they don’t refactor. Ok, tongue out of cheek. Seriously, I’d love to hear from folks who have used and continue to use mocks heavily on long running projects, to hear how they handle the refactoring issue. I have pretty much sworn off mocks except in old-school traditional cases (”mocking out an external dependency too expensive to call directly”) because of it.

MacPorts Ruby, now with DTrace

February 21st, 2008

We are gearing up to do some profiling/performance improvement at work, and we use MacPorts (mostly at my stubborn insistence) to install Ruby on our OS X dev boxes. Unfortunately, the MacPorts version of Ruby is not DTrace-enabled, so we were faced with the decision to either go with the Apple-installed Ruby or not use DTrace.

Fortunately, there was a third option. I spent some time massaging Joyent’s Ruby DTrace patch so that it would compile with Apple’s version of DTrace (subtly different from Solaris’s), and so it would play nice with the other patches in the official Ruby MacPort distribution. Anyway, long story short: you can get it via my newest RubyForge project, rubyport-dtrace. You can install either from the tarfile or by checking out from Subversion, see the instructions in the distribution.

Why I like MacPorts: I like being able to cleanly remove software I install. I also like that I can compile Ruby with DTrace and other patches that I might want (such as the Railsbench GC patch, which I’m also working in to the rubyport-dtrace (dare I call it) code, it might already work but I haven’t tested it).


  • order viagra
  • synthroid pills
  • buy cialis generic
  • viagra in uk
  • levitra online
  • cheap lasix
  • compare cialis prices
  • find viagra no prescription required
  • buying generic cialis
  • viagra for sale
  • soma prescription
  • order cialis online
  • buying generic viagra
  • buy propecia online
  • cialis discount
  • levitra sale
  • viagra tablets
  • discount viagra online
  • cialis in us
  • soma discount
  • viagra pill
  • compare viagra prices online
  • cialis side effects
  • cialis approved
  • cheap cialis in uk
  • cialis drug
  • acomplia online stores
  • viagra medication
  • lasix prices
  • lowest price soma
  • price of viagra
  • viagra bangkok
  • generic lasix
  • synthroid online
  • low cost cialis
  • cialis pills
  • order no rx viagra
  • cheap lasix online
  • accutane
  • order discount cialis
  • price of soma
  • buy cialis online cheap
  • discount synthroid
  • buy cheap propecia online
  • generic acomplia
  • cheapest zithromax
  • buy lasix online
  • synthroid cheap
  • online cialis
  • lowest price zithromax
  • zithromax online cheap
  • viagra cheap drug
  • viagra us
  • discount lasix
  • cialis canada
  • cialis online without prescription
  • buying cialis online
  • cheap soma online
  • cheap synthroid tablets
  • accutane no prescription
  • cheap clomid tablets
  • cialis india
  • generic viagra
  • acomplia for sale
  • where to buy accutane
  • lasix for sale
  • clomid no prescription
  • tablet viagra
  • fda approved viagra
  • cialis free sample
  • cost viagra
  • cialis online review
  • zithromax sale