Keeping Rails migrations happy

May 9th, 2007

Two quick things we’ve learned about migrations at CDD:

  • Avoid using your model objects in your migrations, e.g. stuff like Group.create!(:name => "Watson Lab"). The problem with this is that later you might add a required field to your model, and then this migration will throw an exception. Occasionally you need some logic from a model in a migration, but if at all possible I’d suggest exposing that logic in a way that doesn’t require creating or loading model objects in your migration itself. The migration should just know about SQL, nothing else.
  • Say you branch your code base for a release, and you anticipate needing to support that branch for any length of time. Sometimes you’ll need to address an issue in the production code that requires another migration. What we’ve found works best with ActiveRecord migrations is:
    1. In the trunk, delete all existing migrations when you branch.
    2. Dump a version of the branch schema, and make that migration #1 (001_production_branch_schema.rb) in your trunk.
    3. Start your next trunk migration several numbers higher than your last migration on the production release branch. So, if your last migration on the branch was 40, start 40+N, where N gives you enough cushion to accommodate any additional migrations needed for the branch until your next release.
    4. Any time you add another migration to the branch, in the trunk replace 001_production_branch_schema.rb with a new dump of your branch schema.

Kind of a hack, but it works better than anything else we’ve come up with. My former colleague, Rhett Sutphin, took a different approach to this problem when he wrote a Java/Groovy port of migrations called bering (to which I minimally contributed in its early stages). In bering, migrations are specific to a particular release. Each release is numbered and gets its own separate migration directory, and migrations start at one again for each new release.

BDD: Forces and the “given X, when Y, then Z” pattern

May 7th, 2007

Thinking more about the issue mentioned in my previous post, I’ve come up with a possible set of forces that push you in one direction or another, that is toward organizing your specifications around method behavior vs. organizing around object state behavior or vice versa):

  1. Clearly, if you’re specifying procedural code (Rails helpers, many class-level methods), there’s no state, and you should organize your specifications around methods.
  2. If your object has more attributes/more states and fewer methods/less business logic, then it’s probably clearer to organize your specifications around the behavior of the methods. If, however, the object has fewer attributes and more methods/business logic, then it is clearer to organize around the various object states, each with a number of specifications of the behavior of each method in that state. This is probably somewhat debatable, but it might be true. Perhaps there’s a better characterization of this force. Definitely less code if you follow this approach, and less code (expressing the same meaning) is usually clearer. It also seems to me that fewer contexts are usually easier to understand, because there is less information to parse.
  3. If your objects have states that are expensive to set up (i.e. lots of mock expectations), then it’s better to organize specifications around those states. However, this might also be a code smell that your classes should be more loosely coupled.

Another way to think about this is in terms of how each spec reads. Dan North in his Introducing BDD article talks about formulating specifications using the language pattern: “Given X, when Y, then Z”, so for example, “Given an assay criterion with nothing specified, when asked for its corresponding SQL conditions, it should return nil.” There are several ways you could express this in RSpec (leaving out the before block and the body of the spec). Organized around the object state, it would be:

    describe "Given an assay criterion with nothing specified" do
      it "should return nil when asked for its corresponding SQL conditions (:to_conditions)" do
        ...
      end
    end

or, organized around the method:

    describe "When an AssayCriterion is asked for the corresponding SQL conditions (:to_conditions)" do
      it "should return nil if nothing was specified" do
        ...
      end
    end

or Dustin’s hybrid approach:

    describe "Given an assay criterion with nothing specified, when asked for the corresponding SQL conditions (:to_conditions)" do
      it "should return nil" do
        ...
      end
    end

I’m not sure that tells you which one is better, but there it is.

Also, I found a couple more good links about BDD. I’ll list them and other useful stuff on general BDD I’ve found here, please let me know about any others:

CDD community meeting on open R&D for developing world disease

May 6th, 2007

Last August I moved out to San Francisco to join a great cheminformatics startup, Collaborative Drug Discovery, as director of software development. Two months ago (March 5th) we had our first user community meeting on open R&D for developing world disease drug discovery. It was an inspiring event, both because of the evident energy of the community and because it made it so much clearer to me how important our customers’ work is.

Prof. Jim McKerrow at UCSF gave a nice overview of the scope of the work our customers face, and how collaboration (through CDD and otherwise) helps them arrive at cures sooner and more efficiently (the slides are blurry, so download them separately). We put up several other talks from the meeting on Google Video, available along with PDF slides from our website, including one by the famous medicinal chemist, Chris Lipinski, who is a member of our customer advisory board. Cool stuff.

BDD: specifying domain objects de novo

May 6th, 2007

The vanilla example used in most blog posts for BDD is some incarnation of de novo domain object specification, that is, specifying the behavior of a simple domain object from scratch. David Chelimsky’s stack example is a decent online example of this sort of situation (the comments are interesting to read as well). Stack is an independent class without any collaborators, and its behavior is not extensive. This results in a state-oriented set of contexts with specifications that are very readable and give you a good idea of what Stacks do.

Recently my colleague Kurt Schrader posted something in reaction to a discussion he and I had about a specification I had written. In his post he gives a simple example of de novo domain object specification, and asks (here I paraphrase) whether the example should be the method or the object in a particular state, i.e. should we describe the behavior of an object in a particular state (in his example, “A new sword”) or describe the behavior of the method Sword#sharp?. I would agree with him that in this case describing new and old swords is better than describing sharp?. However, the original specification that provoked the most recent iteration of this discussion was a bit more complex, so I’ll present it (correction: I’ll present a similar specification, see Note at the bottom) here.

Continue reading →

Behavior-driven development

May 6th, 2007

About two months ago at CDD we decided to start using the RSpec Behavior-driven development (BDD) framework instead of the standard Test::Unit unit-testing library. My initial interest in using RSpec was that it provided “contexts” for a bundle of tests/specifications (hereafter, “specs”), and that seemed a cleaner way to group specifications/tests than throwing everything in one big test class. Our existing Test::Unit test classes were getting very long (some with 60+ test methods if I remember correctly), and related tests were grouped just by placing them next to each other in the file, which wasn’t always maintainable/maintained. And of course, when you have sixty tests in one class, the setup method has to be too general to be used properly. So we needed to do something, and RSpec seemed like it would help. In addition, I liked how specs in RSpec read better than how a Test::Unit assertion reads, i.e. I liked assigns[:assay].should have(4).runs more than assert_equal(4, assigns(:assay).runs.size) and the like.

I have to admit at the time I only vaguely knew what BDD was supposed to be about. The main thing I knew was that BDD was an attempt to change the words we use to talk about automated developer testing/specification/test-driven development (TDD), to make clearer an under-appreciated purpose of such activity, to help developers write code intentionally using better design (loose coupling, etc.). As such, BDD is less a change of practice from TDD (if TDD is practiced correctly) than a clarification of the practice.

After two months using a BDD framework, I have found that while BDD does clarify high-level principles, it still leaves plenty unclear. I have been unit testing for many years now, and I’ve always felt that automated developer testing is a rich subject that takes a long time to fully appreciate, and is not a discipline that can be covered adequately by a few high-level principles. The details matter, because in most realistic testing situations there are always tradeoffs and context-specific considerations that should lead a developer to take one approach over another. That said, BDD is a significant step in the right direction. Over the next week or so I plan to write a series of blog posts examining some of these detailed contexts we’ve encountered at CDD in the context of the principles and tradeoff considerations, with the hope both that these details will be useful to others and that some more experienced BDDers out there will give us some feedback to help us make better choices about how we specify our code.

Before ending this post, I’ll list some of the high-level principles, so I can refer to them over the next week:

  1. Specs should be valuable.
  2. Specs should be acceptable.
  3. Corollary of #2: Some code duplication in specs is ok; the focus should be on clarity/readability/acceptability.
  4. Specs should specify behavior not implementation (the classic interface vs. implementation distinction). Unfortunately, we’ve discovered that “behavior” is still a fairly vague term (leading to some intense discussions within our team), and what “the interface” is varies according to context.
  5. Contexts/examples should set up a particular state (of an object, etc.), and specifications should then describe the behavior that state. This is typically accomplished by setting up state in the setup method or before(:each) block, and then writing many short descriptions of behavior in separate test methods/specs.
  6. Specs should be loosely coupled to application code, so that refactoring app code doesn’t cause lots of tests to break. There is at least a hope here that by specifying behavior/interfaces you’re likely to get loose coupling as well.
  7. Specs should encourage developers to think about interface-centric, just-in-time design of their code. This is TDD/BDD’s major benefit #1.
  8. Finally, I still believe (and here I perhaps depart from some BDDers) that the other major benefit of TDD/BDD is that specs help you verify that your application code works. This is particularly true for small development teams that don’t have a ruthless army of QA people keeping a lid on bugs.

Stay tuned for a specific example later today.