BDD: specifying domain objects de novo
The vanilla example used in most blog posts for BDD is some incarnation of de novo domain object specification, that is, specifying the behavior of a simple domain object from scratch. David Chelimsky’s stack example is a decent online example of this sort of situation (the comments are interesting to read as well). Stack is an independent class without any collaborators, and its behavior is not extensive. This results in a state-oriented set of contexts with specifications that are very readable and give you a good idea of what Stacks do.
Recently my colleague Kurt Schrader posted something in reaction to a discussion he and I had about a specification I had written. In his post he gives a simple example of de novo domain object specification, and asks (here I paraphrase) whether the example should be the method or the object in a particular state, i.e. should we describe the behavior of an object in a particular state (in his example, “A new sword”) or describe the behavior of the method Sword#sharp?. I would agree with him that in this case describing new and old swords is better than describing sharp?. However, the original specification that provoked the most recent iteration of this discussion was a bit more complex, so I’ll present it (correction: I’ll present a similar specification, see Note at the bottom) here.
Before I start complaining about situations where the canonical approach breaks down, I should note that I’m well aware that breakdown sometimes (often?) indicates that there’s something wrong with the code, i.e. breakdown pushes you to refactor. This is part of TDD/BDD’s good effect on the design of your application code. I’m open to that being the answer here.
I was writing code to support CDD’s data mining functions, specifically I was writing a class that could take a set of form attributes describing bioassay criteria and turn them into SQL conditions. For example, a customer might want to find all compounds in his CDD database that were found to have a percent inhibition (a measure of a molecule’s ability to inhibit the normal binding activity of an enzyme) greater than 80% when tested in a particular cruzain enzyme assay (cruzain is an important protease enzyme in the Chagas disease parasite). The customer could also be much less specific, for example just requesting all molecules ever screened for any assay of type “enzyme”. As a result, assay criteria have five attributes: assay type, assay id (if specified), readout definition id (what we call the concept of “percent inhibition” measured using this particular assay), readout value (the value measured in the assay corresponding to the readout definition), and finally a comparator (the “greater than” in the above example).
An assay criterion has a few different kinds of behavior, but I’ll focus on one, its ability to give you a set of SQL WHERE conditions that can be used to retrieve the molecules matching the criterion.. Using the canonical approach, we would come up with specifications something like this:
An AssayCriterion with nothing specified
- should return nil conditions when sent :to_conditions
An AssayCriterion with only the assay type specified
- should return conditions that constrain the assay type when sent :to_conditions
An AssayCriterion with both assay type and assay id specified
- should return conditions that constrain the assay (ignoring assay type) when sent :to_conditions
An AssayCriterion with assay type, assay id and readout definition id specified
- should return conditions that constrain the readout definition (ignoring assay and assay type) when sent :to_conditions
An AssayCriterion with assay type, assay id, readout definition id, comparator and readout value specified
- should return conditions that constrain readout definition and readout value when sent :to_conditions.
It should also normalize the value used to constrain the value using the readout definition's unit.
An AssayCriterion with everything specified and comparator "="
- should return conditions constraining the readout value to be equal to the specified value when sent :to_conditions
An AssayCriterion with everything specified and comparator "<"
- should return conditions constraining the readout value to be less than the specified value when sent :to_conditions
An AssayCriterion with everything specified and comparator ">"
- should return conditions constraining the readout value to be greater than the specified value when sent :to_conditions
An AssayCriterion with everything specified and comparator some unallowed string (say "something possibly malicious")
- should sanitize the comparator by using "=" by default when returning conditions when set :to_conditions
An AssayCriterion with assay id specified but no assay type specified
- should ignore the missing assay type and still constrain the assay
An AssayCriterion with readout definition id specified but no assay type or assay specified
- should ignore the missing assay information and still constrain the readout definition
An AssayCriterion with assay and readout value specified but no readout definition specified
- should ignore the readout value and just constrain the assay
An AssayCriterion with assay and readout value specified but no readout definition specified
- should ignore the readout value and just constrain the assay
An AssayCriterion with assay and readout value specified but no comparator specified
- should ignore the readout value and just constrain the assay
Is the canonical approach better than the following?
AssayCriterion#to_conditions
- should be nil if nothing is specified
- should constrain assay when only assay type is specified
- should constrain assay when assay id is specified
- should constrain readout when readout definition id is specified
- should only constrain readout and use unit to normalize readout value when readout_value is specified
- should constrain the readout value to be equal to the specified value when the comparator is =
- should constrain the readout value to be less than the specified value when the comparator is <
- should constrain the readout value to be greater than the specified value when the comparator is >
- should sanitize comparator by returning equals if unrecognized
- should ignore missing assay type but still constrain the assay when assay id is specified
- should ignore missing assay type and assay id but still constrain the readout definition when readout definition id is specified
- should ignore readout value if readout definition id is not specified and just constrain the assay if specified
- should ignore readout value if comparator is not specified and just constrain the assay if specified
In code, the canonical, object behavior-centric approach is 140 lines long, whereas the method behavior-centric approach is 81. The question is, which approach fits the principles described in my earlier post better? Which principles are more important than others? The real downfall of the object behavior approach is that the contexts in which the other behavior of an AssayCriterion is relevant don’t overlap well with the contexts relevant for producing SQL conditions. There are two other behaviors of AssayCriteria, the ability to give you its assay id or readout definition id as an integer (returning nil if not an integer), for which the relevant contexts are assay_id/readout_definition_id nil, blank, integer as String and integer as Fixnum. So, the nine contexts above are doomed to only ever have one spec in each of them, which is kind of verbose and in my opinion leads to a less readable set of specs for the AssayCriterion class.
The downside of the method behavior approach is that each specification has its own short setup in each spec, for example:
it "should be nil if nothing is specified" do
criterion = AssayCriterion.new(:type => "", :id => "", :readout_definition_id => "", :comparator => "", :readout_value => "")
criterion.to_conditions.should be_nil
end
This violates the principle that an example should be written by setting up state in the before block and testing behavior in each spec (#5 in my prior post).
I hope the way I’ve framed the discussion makes it clear that I think it’s less an issue of “outside-in” vs. “inside-out” and more an issue of whether focusing on method behavior or object state behavior makes it clearer to the developer how the class is supposed to behave. I’d appreciate any comments from anyone out there who has done any serious specifying.
Update: I updated the specifications in both approaches above to include four edge cases that were missing (the “should ignores”: the AssayCriterion should “do its best” if given an inconsistent set of attributes).
Note: The above example is actually not the one that originally lead to the discussion that produced Kurt’s post. Instead, it was another method (blank?) on the same class, however when I wrote this up I discovered that the blank? method was only actually used once in the codebase, in another test to make that test a little clearer, so I decided to remove blank?. I felt like making an example of a method that wasn’t really useful was a bad one, but my decision to change the method under discussion does confuse the issue somewhat. I’m hoping that this example is similar enough that the same issues apply.
Update #2: Oops. I discovered while coding today that the blank? method was actually used by real code, so I could have just used it as the example in this post. I’m not sure the example is that fundamentally different, but in case someone thinks it is, I’ve posted my original version of that part of this post on a separate WordPress page so you can compare, if you want.