Z factor refactored

November 11th, 2007

I recently reread the original Z factor paper (Zhang et al). The Z factor is a measure of assay reliability and comes in two flavors: the Z’ factor, based entirely based on controls (those with and without the desired effect); and the Z factor, based on experimental data compared with the controls that should have the desired effect.

Rereading a paper months later often makes you wonder whether you read the paper at all the first time. This reading really clarified for me what the Z factor is, that it is not just for high-throughput screening, and raised a number of questions (especially after discussion with colleagues) not addressed in the paper.

The Z factor is the ratio of the “separation band” of the data to the assay dynamic range. A picture helps:

separation band image

where μ+ is the mean of the positive controls (in this case the controls with desired effect), μs is the mean of the data, σ+ is the standard deviation of the positive controls, etc. The assay dynamic range in this diagram is μ+ – μs. The screening window is then (μ+ – μs) – (3σ+ + 3σs), and the ratio of this to the dynamic range is the Z factor = 1 – (3σ+ + 3σs)/(μ+ – μs).

(If you’re reading this in an RSS reader, the story continues on my website.)

Continue reading →

Chained Selenium RSpec examples

August 8th, 2007

From the RSpec documentation:

It is very tempting to use before(:all) and after(:all) for situations in which it is not appropriate. before(:all) shares some (not all) state across multiple examples. This means that the examples become bound together, which is an absolute no-no in testing. You should really only ever use before(:all) to set up things that are global collaborators but not the things that you are describing in the examples.

Well-known conventional wisdom says that different test cases (in spec-speak, “examples”) should not depend on one another for state, should be runnable in any order, etc. I certainly agree with this wisdom in general, but I think there’s one case where this rule should be broken. We’ve been writing a fair number of Selenium RC tests lately for our app, using RSpec to drive Selenium RC. When writing integration tests like this, each example (in test speak, “test method”) is often a very long script with lots of shoulds/asserts in it. We lose the nice descriptive power of small examples with specific, descriptive text, and instead are faced with a choice between vague and high-level example descriptions and really long example descriptions using ugly here documents that can easily fall out of synch with the example code.

Instead, we want to be able to do something like this:

    describe "A user customizing a car" do
      use_chained_examples
 
      before(:all) do
        @model = models(:spiffy)
        log_in
        go_to_car_customization_start_page
      end
 
      it "should first be required to select car model" do
        page_title.should == "Select a model"
        droplist("model_id").should be_present
        droplist("model_id").options.should == [
          "(Please select a model)",
          models(:spiffy).name,
          models(:sporty).name
        ]
        droplist("model_id").selected_value.should == ""
 
        droplist("model_id").select(@model.id)
        click_next_button
      end
 
      it "should then be required to select paint color" do
        page_title.should == "Select #{@model.name} color"
        droplist("color_id").should be_present
        # etc.
      end
    end

A before(:each) block can be used to reset the page between each example, which we’ve found useful when testing a bunch of validations or something similar. Note too that using chained examples is not the default behavior, and must be explicitly specified by the developer, who should “know what they are doing” if they do this.

Anyway, obviously we did figure out how to make this happen, and after we’ve refined it a bit if people are interested we’ll open source our selenium rspec stuff as a plugin. Note that our selenium specs use a different spec_helper.rb than the rest of our normal specs, so we’re keeping the ability to chain examples out of our standard specs, as conventional wisdom would recommend. Let me know what you think.

Developer testing is not primarily about design

July 27th, 2007

With the advent a few years ago of test-driven development (TDD, a.k.a. “test-first” development), people started parroting the phrase “TDD isn’t about testing, it’s about design” like a mantra. As is often the case when a phrase is repeated too many times, eventually (for some people anyway) this seems to have come to mean “(developer) testing isn’t about testing, it’s about design”, i.e. the testing part of developer testing isn’t that important.

This is wrong. One of the key practices that enables agile software development is the production and maintenance of a robust, thorough set of automated tests written and run often by developers. This set of tests allows developers to make frequent and significant code changes without causing unintended bugs in related code. This practice fundamentally enables software teams to “embrace change”.

It is true that writing code test-first adds another dimension to the benefits of the practice of automated testing, by focusing the developer on the design/interface of their code before they write the actual code. Thus, it is true that TDD, considered as an enhancement to plain old automated testing, is primarily an enhancement that addresses the issue of design. Nevertheless, the automated testing part of TDD is still very much about testing, testing that is a very important part of an agile team’s toolbox.

Corollary: The primary criterion for evaluation of any approach to automated developer testing should be its ability to provide this robust, thorough test harness that protects against regression.

Keeping Rails migrations happy

May 9th, 2007

Two quick things we’ve learned about migrations at CDD:

  • Avoid using your model objects in your migrations, e.g. stuff like Group.create!(:name => "Watson Lab"). The problem with this is that later you might add a required field to your model, and then this migration will throw an exception. Occasionally you need some logic from a model in a migration, but if at all possible I’d suggest exposing that logic in a way that doesn’t require creating or loading model objects in your migration itself. The migration should just know about SQL, nothing else.
  • Say you branch your code base for a release, and you anticipate needing to support that branch for any length of time. Sometimes you’ll need to address an issue in the production code that requires another migration. What we’ve found works best with ActiveRecord migrations is:
    1. In the trunk, delete all existing migrations when you branch.
    2. Dump a version of the branch schema, and make that migration #1 (001_production_branch_schema.rb) in your trunk.
    3. Start your next trunk migration several numbers higher than your last migration on the production release branch. So, if your last migration on the branch was 40, start 40+N, where N gives you enough cushion to accommodate any additional migrations needed for the branch until your next release.
    4. Any time you add another migration to the branch, in the trunk replace 001_production_branch_schema.rb with a new dump of your branch schema.

Kind of a hack, but it works better than anything else we’ve come up with. My former colleague, Rhett Sutphin, took a different approach to this problem when he wrote a Java/Groovy port of migrations called bering (to which I minimally contributed in its early stages). In bering, migrations are specific to a particular release. Each release is numbered and gets its own separate migration directory, and migrations start at one again for each new release.

BDD: Forces and the “given X, when Y, then Z” pattern

May 7th, 2007

Thinking more about the issue mentioned in my previous post, I’ve come up with a possible set of forces that push you in one direction or another, that is toward organizing your specifications around method behavior vs. organizing around object state behavior or vice versa):

  1. Clearly, if you’re specifying procedural code (Rails helpers, many class-level methods), there’s no state, and you should organize your specifications around methods.
  2. If your object has more attributes/more states and fewer methods/less business logic, then it’s probably clearer to organize your specifications around the behavior of the methods. If, however, the object has fewer attributes and more methods/business logic, then it is clearer to organize around the various object states, each with a number of specifications of the behavior of each method in that state. This is probably somewhat debatable, but it might be true. Perhaps there’s a better characterization of this force. Definitely less code if you follow this approach, and less code (expressing the same meaning) is usually clearer. It also seems to me that fewer contexts are usually easier to understand, because there is less information to parse.
  3. If your objects have states that are expensive to set up (i.e. lots of mock expectations), then it’s better to organize specifications around those states. However, this might also be a code smell that your classes should be more loosely coupled.

Another way to think about this is in terms of how each spec reads. Dan North in his Introducing BDD article talks about formulating specifications using the language pattern: “Given X, when Y, then Z”, so for example, “Given an assay criterion with nothing specified, when asked for its corresponding SQL conditions, it should return nil.” There are several ways you could express this in RSpec (leaving out the before block and the body of the spec). Organized around the object state, it would be:

    describe "Given an assay criterion with nothing specified" do
      it "should return nil when asked for its corresponding SQL conditions (:to_conditions)" do
        ...
      end
    end

or, organized around the method:

    describe "When an AssayCriterion is asked for the corresponding SQL conditions (:to_conditions)" do
      it "should return nil if nothing was specified" do
        ...
      end
    end

or Dustin’s hybrid approach:

    describe "Given an assay criterion with nothing specified, when asked for the corresponding SQL conditions (:to_conditions)" do
      it "should return nil" do
        ...
      end
    end

I’m not sure that tells you which one is better, but there it is.

Also, I found a couple more good links about BDD. I’ll list them and other useful stuff on general BDD I’ve found here, please let me know about any others: