The Real Dirt on Tidyall

Sorry, couldn’t resist the pun.

First, a bit of background. Tidyall (aka Code::TidyAll) was first released in June of 2012 by Jon Swartz. It’s a code quality meta-tool that orchestrates other code quality tools. With a single configuration file you can enable many different tools for a single project. By “code quality tools” I mean both pretty printers and linting tools. Each tool is supported via a plugin, implemented as a Perl class, that knows how to talk to that tool. In some cases the plugin is the tool, as with the SortLines plugin, which sorts lines in text files.

The first version supported perltidy, perlcritic, and podtidy, and as you can tell from that list it was very much a Perl-specific tool at first. But Jon added support for JavaScript tools like js-beautify a few months after the initial release, in September of 2012. Over time, it’s grown to include plugins for PHP, Go, YAML, Postgres SQL, and many more. Just search MetaCPAN for Code::TidyAll::Plugin to see them.

Within the first few months after release it also acquired support for use as an SVN or Git pre-commit hook, which is a natural use for this type of tooling. Over time, I think there’s been a fair number of Perl people who’ve used it, but I don’t know if it’s ever gotten much traction (if any) outside the Perl community. In November of 2014, Jon gave me primary ownership of the distribution, and I’ve maintained it ever since.

But it’s starting to show its age. The primary issues come from its initial design and are not easy to fix. Note this is not Jon’s fault. Like most successful projects, eventually people wanted to use it in ways that the initial design doesn’t support. Projects that don’t reach this point usually do so because they don’t get used, not because the creator was a visionary genius who foresaw all possible future uses.

The basic algorithm behind how tidyall works is as follows:

Find all the files to which tidyall could apply based on each plugin’s include/exclude rules as well as any global exclude rules.
Looping over each file, it finds all the plugins that apply to that file and applies each plugin to it in turn. Specifically, it reads in the source of the file, passes it to the plugin, and then writes that source back out to disk. Remember this detail, it’s important!
That’s it. There’s not much to it conceptually.

There are two issues that occur because of this design.

First, because it’s based on files, not directories or any sort of generic notion of paths, it has serious problems with Go. In Go, packages are directory-based. All of the files in a directory are part of that package, and when it comes to linting (as opposed to pretty printing), many linters must consider the whole directory at once.

This also causes issues for linters that want to look at multiple packages at once. For example, the Rust clippy tool is invoked across your entire crate at once. A crate can contain many directories. Clippy doesn’t even have a way to run it on single directories or files because that doesn’t make sense in Rust.

Relatedly, remember that in step #2 I said it reads the file source, passes it to the plugin, and then writes it back? Well, that also causes problems for some languages. If a plugin only works on files (versus in-memory source as a string), then tidyall will write the source out to a file in a temporary directory, then pass that temporary file name to the plugin. The plugin is expected to change the file it’s given, after which tidyall reads the source back into memory from the temp file and moves on to the next plugin.

Again, this causes issues with some languages, notably Go, which expects files to live in directories named after the package they are in, alongside the other files that make up the package.

This makes using tidyall for tools in some languages more or less impossible without a major redesign.

The other big issue with all of this reading and writing is that it’s really hard to keep track of the file’s encoding. Tidyall doesn’t have any way of knowing if the file it’s reading is binary data, UTF-8 strings, or something else. Combine this with Perl’s sometimes idiosyncratic Unicode handling and you get all sorts of confusing issues.

Fixing these issues basically involves throwing out the core of the system and starting from scratch. If I were to do that (which I am, so far purely as an experiment), it’s pretty easy to fix. Instead of operating on source in memory, we only operate on paths. And instead of implementing plugins as Perl classes, they’re implemented via configuration telling the tool how to invoke an executable. We can leave string encoding handling to the individual tools, which are in the best position to do so.

Along the way, I’m also making my rewrite loop over plugins before files. In other words, rather than taking each file one at a time and passing it to each plugin in sequence, I’d take each plugin one at a time and pass it the appropriate set of paths. This isn’t that big a deal, but I think it makes for slightly better output on failures.

But making any of these changes to tidyall, much less all of them, would break every existing plugin. In essence I’d be writing tidyall2. And if I’m going to do that, I’m going to use it as an excuse to learn Rust, which is exactly what I’m doing with precious.

In a follow up post I’ll compare and contrast tidyall, precious, and pre-commit. If you know of any other tools in this category (code quality meta-tools), I’d love to hear about them. Please add a comment with a link!

Comments

Dotan Dimet, on 2020-04-26 02:23, said:
I ran across a git hooks manager called lefthook, which seems to fit the same niche.

Dave Rolsky, on 2020-04-26 11:06, said:
Thanks for the pointer. The docs for lefthook also mention Husky, lint-staged, and Overcommit.