If you’ve been bitten by the testing bug, you’ve surely encountered the problem of testing a database-intensive application. The problem this presents isn’t specific to SQL databases, nor is it just a database problem. Any data-driven application can be hard to test, regardless of how that data is stored and retrieved.
The problem is that in order to test your code, you need data that at least passably resembles data that the app would work with in reality. With a complex schema, that can be a lot of data spread out across many tables. I often find that trying to test each class in isolation becomes very difficult, since the data is not confined to one class.
For example, the app I’m working on now is a wiki. I’m trying to test the Page class, but that involves interactions with many tables. Pages have revisions, they have links to other pages, to files, and to not-yet-created pages. Pages also belong to a wiki, and are created by a user. To test page creation, I need to already have a wiki to add the page to, and a user to create the page.
There are a various solutions to this problem, all of which suck in different ways.
You can try mocking out the database entirely. I’ve used DBD::Mock for this, but I’ve never been happy with it. DBD::Mock has one of the most difficult to use APIs I’ve ever encountered. Also, DBD::Mock doesn’t really solve the fundamental problem. I still have to seed all the related data for a page. I’d even go so far as to say that DBD::Mock makes things worse. Because inserts don’t actually go anywhere, I have to re-seed the mock handle for each test of a SELECT, and since a single method may make multiple SELECT calls, I have to work out in advance what each method will select and seed all the data in the right order!
My experience with DBD::Mock has largely been that the test code becomes so complex and fragile that maintaining it becomes a huge hassle. The test files become so full of setup and seeding that the actual tests are lost.
I wrote Fey::ORM::Mock to help deal with this, but it only goes so far. It partially solves the problem with DBD::Mock’s API, but I still have to manage the data seeding, and that is still fragile and complicated.
The other option is to just use a real DBMS in your tests. This has the advantage of actually working like the application. It also helps expose bugs in my schema definition, and lets me test triggers, foreign keys, and so on. This approach has several huge downsides, though. I have to manage (re-)creating the schema each time the tests run, and it will be much harder for others to run my tests on their systems. Also, running the tests can be rather slow.
For the app I’m working on I’ve decided to mostly go the real DBMS route. At least this way the tests will be very useful to me, and anyone else seriously hacking on the application. I can isolate the minimal data seeding in a helper module, and the test files themselves will be largely free of cruft. Making it easier to write tests also means that I’ll write more of them. When I was using DBD::Mock, I found myself avoiding testing simply because it was such a hassle!
Some people might want to point out fixtures as a solution. I know about those, and that’s basically what I’m using now, except that there’s only one fixture for now, a minimally populated database. And of course, fixtures still don’t fix the problems that come with the tests needing to talk to a real DBMS.
I am going to make sure that tests which don’t hit the database at all can be run without connecting to a DBMS. That way, at least a subset of the tests can be run everywhere.
Are there any better solutions? I often feel like programming involves spending an inordinate amount of time solving non-essential problems. Where’s my silver bullet?