I’ve been on vacation for the past week, and I decided to take a look at using Test2 to reimplement the core of Test::Class::Moose.
Test::Class::Moose (TCM) lets you write tests in the form of Moose classes. Your classes are constructed and run by the TCM test runner. For each class, we constructor instances of the class and then run the
test_* methods provided by that instance. We run the class itself in a subtest, as well as each method. This leads to a lot of nested subtests (I’ll tell you why this matters later). Here’s an example TCM class:
Currently, TCM is implemented on top of the existing Perl test stack consisting of
The fundamental problem with the existing test stack is that it is not abstract enough. The test stack is all about producing TAP (the Test Anything Protocol). This is the text-based format (mostly line-oriented) that you see when you run
prove -v. It’s possible to produce other types of output or capture the test output to examine it, but it’s not nearly as easy as it should be.
TAP is great for end users. It’s easy to read, and when tests fail it’s usually easy to see what happened. But it’s not so great for machines. The line-oriented protocol isn’t great for things like expressing a complex data structure, and the output format simply doesn’t allow you to express certain distinctions (diagnostics versus error messages, for example). Even worse is how the current TAP ecosystem handles subtests, which can be summarized as “it doesn’t handle subtests at all”. Here’s an example program:
If we run this via
prove -v we get this:
What happened there? Well, the TAP ecosystem more or less ignore the contents of a subtest. Any line starting with space is treated as “unknown text”. What
Test::Builder does is keep track of the subtest’s pass/fail status in order to print a test result at the next level up the stack summarizing the subtest. That’s the
ok 2 - this gets weird line up above. Because it’s not actually parsing the contents of the subtest, it doesn’t see that the test count is wrong or that some tests have failed.
In practice, this won’t affect most code. As long as all your tests are emitted via Test::Builder you’re good to go. It does make life much harder for tools that want to actually look at the contents of subtests, in particular tools that want to emit a non-TAP format.
The core test stack tooling around concurrency is also fairly primitive. The test harness supports concurrency at the process level. It can fork off multiple test processes, track their TAP output separately, and generate a summary of the results. However, you cannot easily fork from inside a test process and emit concurrent TAP.
This concurrency issue really bit
Test::Class::Moose. Unlike traditional Perl test suites, with TCM you normally run all of your tests starting from a single
whatever.t file. That file contains just a few lines of code to create a TCM runner. The runner loads all of your test classes and executes them. Here’s an example:
Ovid is a smart guy. He realized that once you have enough test classes, you’d really want to be able to run them concurrently. So he wrote TAP::Stream. This modules let you combine multiple streams of subtest-level TAP into a single top-level TAP stream.
This is completely and utterly insane! This is not Ovid’s fault. He was doing the best he could with the tools that existed. But it’s terribly fragile, and it’s way more work than it should be. It also made it incredibly difficult to provide feature parity between the parallel and sequential TCM test execution code. The parallel code has always been a bit broken, and there was a lot of nearly duplicated code between the two execution paths.
Enter Test2, which is Chad Granum (Exodist’s) project to implement a proper event-level abstraction on top of all the test infrastructure. With
Test2, our fundamental layer is a stream of events, not TAP. An event is a test result, a diagnostic, a subtest, etc. Subtests are proper first class events which can in turn contain other events.
Working at this level makes writing TCM much easier. There’s still some trickiness involved in starting a subtest in one process but executing it’s contents in another, but the amount of duplicated code is greatly reduced, and it’s much easier to achieve feature parity between the parallel and sequential paths.
As a huge, huge bonus, testing tools built on top of
Test2 is a pleasure instead of a chore. The sad truth about TCM is that it was never as well tested as it should have been. The tools for testing with
Test::Builder are primitive at best, and because of the fact that subtests are ignored by TAP, the testing tools were nearly useless for TCM.
Test2 we can capture and examine the event stream of a test run in incredible detail. This lets me write very detailed tests for the behavior of TCM in all sorts of success and failure scenarios, which is fantastically useful. Here’s a snippet of what this looks like:
test_events_is sub is a helper I wrote using the
Test2 tools. All it does is add some useful diagnostic output if the event stream from running TCM contains
Test2::Event::Exception events. And the diagnostics from Test2 are simply beautiful:
It’s a lot to read but it’s incredibly detailed and makes understanding why a test failed much easier than the current test stack.
Chad is currently working on finishing up
Test2 and making sure that it’s stable and backwards-compatible enough to replace the existing test suite stack. Once
Test::Builder, and friends are all running on top of Test2, it will make it much easier to write new test tools that integrate with this infrastructure.
The future of testing in Perl 5 is looking bright! And Perl 6 isn’t being left behind. I’ve been working on a similar project in Perl 6 with the current placeholder name of Test::Stream. This is a little easier than the Perl 5 effort since there’s no large body of test tools with which I need to ensure backwards compatibility. I want Perl 6 to have the same excellent level of test infrastructure that Perl 5 is going to be enjoying soon.
Timm Murray, on 2016-02-29 06:48, said:
There was a proposal on the old testanything.org web site that would have handled forking streams:
Not the prettiest output, but it should work.