Friday, July 24, 2009

Domain Model Testing

To me, measuring coverage is not about the coverage measured (regression, certainty), it's about what's left uncovered (risk, agility) and why I should care (cost, agility). I'd prefer to have 100% coverage and be confident that it's due to ninja coders writing ninja code. In RL I only ask that model coverage be in line with the project's overall coverage, starting with the most complex methods. Maximize coverage and minimize cost, a.k.a. the simplest (smallest) code that could possibly work.

There is always debate over testing getters and setters, or even measuring their coverage at all or including it in reports. It's seen as unfair to measure and derogatory towards one's stats. I think that misses the point.

So here's what I do and why.

There's a lot to a domain model. Herein I'm referring to persistent models, i.e., Hibernate-mapped classes (or similar, depending on the platform). If you've worked with Hibernate applications in the real world then you have likely seen it load the entire database, cascade deletes you didn't expect, and throw the occasional exception complaining about possibly unsafe thread access. These have nothing to do with getters and setters, though in exercising your model you will eventually exercise nearly all such methods.

So what are some common issues that persistent models face? Some that I've seen include:
  • Forgetting metadata (@Enumerated !!)
  • Omitting constraints - either on the model or -- more importantly -- in the database
  • Inappropriate eagerness and laziness
  • Unnecessary relationships, particularly bidirectional ones
  • Missing or buggy equals and hashCode methods
  • Impact of broken transactional schemes, particularly as you make changes later
Some of the waste that I see when reviewing code involves creating test data specific to a test. Not only is this redundant RY-ism, it increases the cost of changing a model markedly, and obfuscates the wider impact of those changes.

Instead, over time I have come to use a pattern that I find very useful and inexpensive.
  1. Create a test scenario data service
  2. Maintain a domain model test
  3. Pervasively use scenarios in tests

Test Scenario Data Service

Except for very simple instances I always create data using a test data service. Something like the following:

@Service
@Transactional
public class TestDataServiceImpl implements TestDataService
{
/** {@inheritDoc} Thorough, accurate. */
public Survey createScenario1(Account owner) {
return createScenario1(owner, true);
}

/** {@inheritDoc} Thorough, accurate. */
public Survey createScenario1(Account owner, boolean persist) {
Assert.notNull(owner);
Survey survey = new Survey();
survey.setOwner(owner);
survey.setName("My Test Survey #" + getMBUN());
if(persist) {
persist(retval);
}

return survey;
}

/** {@inheritDoc} Thorough, accurate. */
public Survey createScenario2(Account owner, boolean persist) {
Assert.notNull(owner);
Survey retval = createScenario1(owner, false);
retval.addQuestion(new TextQuestion(retval, "Please enter your email address:"));
if(persist) {
persist(retval);
}

return survey;
}
}

This class is then wired into pretty close to 100% of my test classes. As you can see, the createScenarioX methods often have a variant with a "boolean persist" flag, true by default, so that scenarios can build on each other without prematurely persisting information and for those times when a unit test is in order -- consistency in the scenario model maintains velocity in testing.


Maintain a Domain Model Test

In each of my projects can typically be found a class called DomainModelTest. It contains at least one test for each scenario in my TestDataService. These tests will invoke the createScenarioX() method, flush and clear the Hibernate session, then reload the root object and walk it, comparing values and references along the way. While hard coded, not only do these tests exercise accessor methods (metrics for management) they also:
  1. document the domain model
  2. document variants, use cases and negative tests
  3. alert you to damage done to the test database e.g., by corporate DBAs or by inappropriate or incompetent resources, typically after you're off the project
  4. fail (often fantastically) due to performance issues when the suite runs against a copy of production - therefore, run it against a copy of production from time to time.

Pervasively Use Scenarios in Tests

Any test that I write that involves data being loaded, changed, deleted, queried or otherwise interacting with a model will always use the test service and usually in a persistent way. No mocks here, thanks.

The trick (and goal) is to use the scenarios you created earlier pervasively throughout your entire test suite. Not only does this reduce the cost of scenario creation, it reduces the cognitive load of inspecting, changing and maintaining your test suite because all references to createScenario3() do the same thing, cost only 1 line of code, and set the expectation for the context of the rest of the test in question. This in turn enables estimating the impact of changes to your model in a more deterministic manner.


Lessons Learned

The big lesson that I've learned applying this pattern is that at first you want to create the most complete scenario possible -- resist and heed my warning because each time you change or extend the model, Scenario #1 gets updated and so does the corresponding domain model test and most other tests in the suite. All green, all good, at first. What happens is that after a while (say, 500 tests later) you have a test suite that takes on the order of 45 seconds to run. Come on, I don't have all minute!

Here are some recommendations:
  • Don't be afraid of multiple scenarios - basicScenario1, complexScenario2, basicScenario3WithBranchingLogic, etc. Start small and expand, and consider having a "maximum scenario" which implements every possible model component. Use this latter scenario in the domain model test often, and when wanting to generate a largish dataset for performance testing
  • Start small - Scenario #1 should be the minimal dataset required to be persisted without errors
  • boolean persist - being able to construct a scenario without persisting it means you can customize a particular scenario or write a unit test. This is not always possible, particularly if you have any logic in the backend - document that clearly in the javadoc
  • Have a @Test createReallyBigDataSetSanityTest() and make sure you flush(). It should create n scenarios, of differing types, where n is something that fits in a reasonable timeout. You will catch missing indexes, Hibernate loading too much data and various other sundry issues
  • Iterating a test implementation against multiple scenarios is a powerful technique; consider having a List<Type> getAllScenarios() method or multiple ones if your various scenarios create trees with different roots. It really sucks to have test failures dependent on backend data because users will inevitably create all possible scenarios in production