This was a great conference, and the organizers did a great job. This is my first visit to the EU, and so far I've had a great time.

Over the last day or so, I've had some interesting conversations with people about how we can improve our conferences, and I wanted to write down some notes before I forget these ideas. Apologies in advance for rambling and incoherence. It's 1:30am here in Pisa and I'm beat.

  • Shortening the auction - at YAPC::EU this year the auction was done as competition between three teams, UK, US, and EU. Each time had 4 lots to sell, and we competed to see who could raise the most money. Smylers made the excellent point that a competition incentivizes each time to take a long time on each lot to maximize the price. One way around this might be to simply impose a very hard time limit on each lot (2-3 minutes?). Another might be a softer time limit (5 minutes), but to measure the winner based on dollars/euros raised per minute used.

  • The high-value/interest auction items should be announced well in advance, so people make sure to reserve money for items that interest them. There would still be room for a surprise item or two, of course.

  • Raising prices - I think YAPC is too cheap. We've been at the 100 dollar/euro price point for quite some time. Raising prices just 20% would raise an additional $4000-6000 at a YAPC::NA, which is more than the auction raises. We could still do an auction focused on a very few entertaining items (maybe 3-7 items), just for fun.

  • The YAPC::EU schedule started at 10am, which was fantastic. We need to stop trying to pack so much stuff into the conference.

We discussed a number of ideas for improving the social aspect of the conference. We all agreed that the social aspect of the conference is as valuable (or more) than the technical aspect. Ultimately, I think that people who have a good social experience will feel like the conference was a good value for them.

I suggested seating plans for the sit-down dinner. However, upon further discussion, we seemed to reach the conclusion that a buffet style dinner with less seating might encourage more mingling. I think YAPC::NA 2008 in Chicago was a great example of this. The dinner was in a big game room in the student center, so people ate, drank, bowled, played Wii, rocked out with guitar hero, and generally ended up mingling, rather than just sitting at a table.

Smylers suggested another way to encourage people to approach more people would be a sort of "human scavenger hunt". Instead of silent auctioning off tons of books for very little money, we could offer them as prizes. The hunt would ask people to do things like ...

  • Find a first-time YAPC attendee
  • Find two people from Europe (or UK, or US, as appropriate)
  • Find a person with 10+ modules on CPAN
  • Find an author of a Perl book (and a non-Perl book)
  • Find someone who has attended at least 5+ YAPCs and workshops
  • Find someone who learned Perl within the last two years

Goals like these would do a good job of encouraging newbies and experienced attendees to interact.

Another idea I had to encourage mingling would be some sort of "speed dating" event. This would have to be broken up into smaller groups, since you can't really have 150 people in one speed dating event. Maybe we could encourage groups of 30 or so to split off and do this. Maybe this could be scheduled as one of the sessions. We could even do this as a plenary session, and split people up based on something arbitrary (value of ACT user_id % 7).

If you had a YAPC idea you're afraid you'll forget, please leave a comment here!

New Moose Blog

| 6 Comments | No TrackBacks

The Moose Cabal now has our own blog. We plan to use this as a source of news about Moose development and usage, so add it to your feed reader if you're interested.

At Compassionate Action for Animals, we explicitly do not promote veganism using arguments about human health. We are happy to talk about how to be a healthy vegan, but we don't try to convince people to go vegan for their own health.

Some people find this odd. Isn't veganism obviously the healthiest diet? Why wouldn't we use such a powerful argument? Shouldn't we make the best case we can for veganism?

I came across a blog post titled "The China Study: Fact or Fallacy?" that reminded me so well why we don't engage in this argument.

Go ahead, take a moment to read (or at least skim) that blog post.

Are you back? Great.

The China Study was big news in the animal rights world when the book first came out. I haven't read it, but from what I've heard it basically says "go (mostly?) vegan". Wow, a whole book backed by lots of data telling people that veganism is the way to go! How exciting!

That blog post a perfect illustration of why this isn't exciting. The blog post contains 9,000 words of statistical analysis, complete with tables, charts, and more. In the end, the author of the post concludes that The China Study is extremely flawed.

Is she right? Who the f*ck knows?

And that's the real problem. It is incredibly difficult for someone without expertise to assess claims about health. How do I know if the blog post author has any credibility? For that matter, how do I know if T. Colin Campbell (author of The China Study) has any credibility? I am not a biologist, epidemiologist, statistician, or dietitian. That blog post sure has a lot of numbers and charts, though! I bet The China Study has some too.

It's trivial to find health arguments for dozens of radically different diets (vegan, Atkins, paleo, raw, and more). If I, as an animal rights activist, start making claims about human health, why should anyone listen to me? There are lots of people with better credentials ready to disagree with me. I can cite sources, but so can others. Without a lot of independent research, it's very difficult for a layperson to figure out the truth, and that assumes there is one truth to figure out. Scientific research is full of contradictions, especially in a field as complex as diet and human health.

Health arguments are a distraction from the real key issue, animal suffering. Animal suffering in factory farms is undeniable and easily proved. It doesn't take a Ph.D. to understand that being crammed in a tiny cage unable to move is torture. Few people in the general public will argue the opposite. An argument based on animal suffering appeals to the fundamental empathy all of us possess, and doesn't require statistics or studies to suport it.

I realized that the migrations I wrote were very buggy. Now I've written a test system to help me test future migrations, but the existing releases are problematic.

I can create a set of schema changes to fixup a schema which has been migrated, but the changes will have to be applied manually.

Note that if you're comfortable wiping your existing schema because you're just playing with Silki then this is a non-issue.

Please email me if you are using Silki.

There's been a lot of discussion about the role of TPF lately, both at YAPC and on blogs. The most recent discussion is in the comments of a recent blog post by Gabor Szabo asking people to weight in on what TPF should be doing.

In the comments, Casey West says:

It's a striking sign that The Perl Foundation is expected to pay for open source contributors

...

Right now TPF is using money to demotivate the Perl Community! It's killing the Perl [sic].

This is a bold and, in my opinion, incorrect statement.

Casey is no doubt referring to the well-known research suggesting that payment reduces performance by replacing intrinsic motivation with extrinsic motivation. Let's assume that this research is true for the sake of this blog post.

Does it necessarily follow that TPF grants reduce motivation? I don't think so. There are a number of ways grants can help people get more work done. In fact, I think there are several ways that grants can boost intrinsic motivation.

Public Promises

When a grant is approved, the recipient is promising to do something with the community's money. I can't speak for others, but I know that when my grant was approved, I had made a promise to the Perl community to follow through.

My experience with volunteers suggests that people are more likely to follow through when they make a firm commitment to someone. My understanding is that this is also backed up by modern psychological research.

I think this is one reason why regular grant reports are crucial to the grant process. This follow up makes it clear that the community is paying attention to the grant recipient.

The public nature of the grants should motivate the grant recipient. If the recipient doesn't find this motivational, I don't think they should be getting a grant in the first place!

Validation of Competence

Getting a grant can be an external validation of one's self-worth. I know that I felt good about the fact that my grant proposal got a lot of public support, and was eventually approved. Effectively, the Perl community agreed that my skills were worth $3,000 of their money.

I can't speak for others, but this sort of ego boost is definitely motivational for me.

Resume Building

A successfully completed grant is a nice bit of resume building. How many developers out there have been paid by their peers to work on a project? I make a point of mentioning the Moose docs grant in my bio, and I would hope that this helps sell my Moose class.

Money = Time

One big obstacle to getting stuff done is lack of time. This is one area where a grant can help, by effectively allowing a person to take unpaid leave from a job, or a sabbatical from self-employment. In practice, most TPF grants don't do this. The grants program limits grant requests to $3,000, which doesn't compensate for much time off, at least for people living in a large chunk of the world.

David Mitchell's grants are a good example of a grant that aims to provide time. His current grant pays for 500 hours of his time at $50/hour. This is probably a lot less than he could earn freelancing, but is definitely enough to allow him to live comfortably while working on the grant.

It's hard for me to see how a grant like this could be de-motivating. In this case, the grant isn't about the money per se, it's about freeing up time that would otherwise have to spent on paying work.

Forcing Me to Plan

While not directly connected to motivation, I found that the grant proposal process was very useful because it forced me to think about my project. My grant proposal was my project plan after the grant was approved, and it gave me a lot of direction for working on the Moose docs.

I imagine that other grant recipients also benefited from going through a planning process. I'm not sure I would've have done as much thinking if I'd written the docs without having to write a proposal first.

Summary

In my final grant report for the Moose docs grant, I wrote:

I'd like to thank the Perl Foundation again for sponsoring this work. The grant was motivational for me, because this was a huge amount of work. I might have done some of it over time, but I doubt I would have done all or done it nearly as quickly without the grant.

There are probably other ways that grants affect recipients. I'd love to hear from other grant recipients and/or submitters, either in the comments or on their own blogs.

In a comment on my entry about Dist::Zilla pros and cons, Phred says:

I'm not clear on the value Dist::Zilla provides other than some versioning auto-incrementing and syntactic sugar for testing.

This brings a up a good question. What the heck to does dzil do?

Let's walk through a dist.ini file from a real project. I'll use the dist.ini from my Markdent distribution. This should answer the "what does it do" question quite well.

Here's the whole file:

name    = Markdent
author  = Dave Rolsky <autarch@urth.org>
license = Perl_5
copyright_holder = Dave Rolsky
copyright_year   = 2010

version = 0.13

[@Basic]
[InstallGuide]
[MetaJSON]

[MetaResources]
bugtracker.web    = http://rt.cpan.org/NoAuth/Bugs.html?Dist=Markdent
bugtracker.mailto = bug-markdent@rt.cpan.org
repository.url    = http://hg.urth.org/hg/Markdent
repository.web    = http://hg.urth.org/hg/Markdent
repository.type   = hg

[PodWeaver]

[KwaliteeTests]
[NoTabsTests]
[EOLTests]
[Signature]

[CheckChangeLog]

[Prereq]
Digest::SHA1                   = 0
HTML::Stream                   = 0
List::AllUtils                 = 0
Moose                          = 0.92
MooseX::Params::Validate       = 0.12
MooseX::Role::Parameterized    = 0
MooseX::SemiAffordanceAccessor = 0.05
MooseX::StrictConstructor      = 0.08
MooseX::Types                  = 0.20
namespace::autoclean           = 0.09
Tree::Simple                   = 0
Try::Tiny                      = 0

[Prereq / TestRequires]
File::Slurp                          = 0
Test::Deep                           = 0
Test::Differences                    = 0
Test::Exception                      = 0
Test::More                           = 0.88
Tree::Simple::Visitor::ToNestedArray = 0

[@Mercurial]

That's a mouthful. Let's step through it in tiny chunks ...

name    = Markdent
author  = Dave Rolsky <autarch@urth.org>

Setting these does several things. First, these values will end up in the generated Makefile.PL for the distro. Second, these values are available for plugins which do POD munging, which we'll look at shortly. In particular, the author will end up in the every module's POD.

license = Perl_5

The license setting is used for several things. First, the License plugin will use it to add a LICENSE file to the distro. Second, it is also available to POD mungers.

copyright_holder = Dave Rolsky
copyright_year   = 2010

This is another bit for the POD mungers. Together with the license, we'll end up with this POD section in each module:

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by Dave Rolsky.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

version = 0.13

Again, this ends up in both my Makefile.PL and my POD.

[@Basic]

This is a plugin bundle, which is a name for a pre-defined set of plugins. The Basic bundle contains:

[GatherDir]
[PruneCruft]
[ManifestSkip]
[MetaYAML]
[License]
[Readme]
[ExtraTests]
[ExecDir]
[ShareDir]
[MakeMaker]
[Manifest]
[TestRelease]
[ConfirmRelease]
[UploadToCPAN]

Whoa, that's a lot. So what do these do?

[GatherDir]

This tells dzil that it should include all the files in the current directory (the root of my distro) in the generated distro. I have to include this, or I won't end up with a distro at all!

[PruneCruft]

This prunes the gathered files to remove generated files like a Build file, files that start with a dot (.), etc.

[ManifestSkip]

This prunes the gathered files based on the contents of a MANIFEST.SKIP file.

[MetaYAML]

This generates a META.yml file for the distro (using version 1.4 of the CPAN Meta format).

[License]

This plugin generates a LICENSE file, based on the value I set for the license earlier.

[Readme]

This one generates a fairly minimal README. Arguably, it's so minimal it's useless. It could probably be improved ;)

[ExtraTests]

This looks for tests under my working copy's xt directory. This directory can contain subdirectories for three different types of "extra" tests, smoke tests, author tests, and release tests. Each of these directories has its tests rewritten so that they only run under specific circumstances (based on environment variables). The tests are rewritten into the t directory.

Typically, I only use xt/release. The tests in the release directory are run when $ENV{RELEASE_TESTING} is true. The dzil release command makes sure this is true, so my release tests are run before I do a release, but not when the module is installed from CPAN. This is perfect for things like POD tests.

[ExecDir]

This plugin arranges for a directory's contents to be installed as executables. Well, actually, it just marks the files as executables, and another plugin does something useful with them. By default, it looks for a directory named bin.

[ShareDir]

Just like ExecDir but for "share" files (non-executable content like templates, images, etc).

[MakeMaker]

This generates Makefile.PL for the distro. This plugin is pretty smart, and generates a file with lots of conditionals so that it does the best job it can for the version of ExtUtils::MakeMaker that is available on the installing user's machine. If you've ever written this sort of conditional crap you know how annoying it is to maintain. Now I don't have to deal with this. As a bonus, future versions of dzil will account for new versions of EUMM, and I'll get a better Makefile.PL for free.

This plugin makes use of the information provided by the ExecDir and ShareDir plugins we saw earlier. It arranges to have these files installed in the right place via ExtUtils::MakeMaker and File::ShareDir.

There is also a ModuleBuild plugin, but dzil really makes the difference between the two minimal. Unless I want to integrate a custom Module::Build subclass, as I did with Silki, there isn't much difference between EUMM and MB for a project which uses dzil.

[Manifest]

This plugin creates the MANIFEST.

[TestRelease]

This runs the tests when I run dzil release.

[ConfirmRelease]

This prompts me to ask if I'm really sure I want to upload a distro when I run dzil release.

[UploadToCPAN]

I bet you can figure out what this does.

[InstallGuide]

This generates a nice INSTALL file. This plugin is smart. It generates the right instructions regardless of whether the distro is using EUMM or Module::Build.

[MetaJSON]

This generates a META.json file for the distro (using version 2.0 of the CPAN Meta format).

[MetaResources]
bugtracker.web    = http://rt.cpan.org/NoAuth/Bugs.html?Dist=Markdent
bugtracker.mailto = bug-markdent@rt.cpan.org
repository.url    = http://hg.urth.org/hg/Markdent
repository.web    = http://hg.urth.org/hg/Markdent
repository.type   = hg

This adds a "resources" section to my META.* files. There are some plugins on CPAN which will automate this. For the repository settings, the plugin looks at your working copy to figure out your VCS and remote VCS uris. I might switch over to these plugins in the future, although I think I'd actually have to add Mercurial support first.

[PodWeaver]

I mentioned "POD mungers" several times. Pod::Weaver is a POD rewriting module which does all sorts of fancy stuff, though I'm using just using a subset of its default behavior.

First, it looks in my module files for a comment in the form:

# ABSTRACT: Some text here

It uses this to generate the "NAME" section in the POD.

It also inserts "VERSION", "AUTHOR", and "COPYRIGHT AND LICENSE" sections. Pod::Weaver also lets you do even fancier stuff, like use POD dialects, add custom sections, etc. I'll be investigating this further in the future. Really, this module deserves its own blog entry or three.

[KwaliteeTests]
[NoTabsTests]
[EOLTests]

These add some release tests for various sanity checks. I never need to customize these tests, so I can let the plugins write them out for me.

[Signature]

This signs the distro using Module::Signature.

[CheckChangeLog]

This checks my Changes file to ensure that I have an entry for the version mentioned in my dist.ini. It could be smarter and check for a date as well. I'm sure patches are welcome ;)

[Prereq]
...

This should be obvious. It lists the prerequisites for my distro. There is also an AutoPrereq module. I don't use this because it generates a lot of prereqs I think are cruft, like core modules, or multiple modules in the same distro.

[Prereq / TestRequires]
...

Again, this is pretty obvious.

[@Mercurial]

Another plugin bundle. I wrote some plugins to automate some release tasks for a Mercurial-using project.

When I run dzil release, it will check to make sure that my repository is in a clean state (no changes that haven't yet been checked in). After the release is uploaded, it tags my working copy and then pushes the changes back to the remote.

Summary

At a high level, dzil does a couple different tasks.

It ensures that support files like the MANIFEST and LICENSE stay up to date. It also helps improve compatibility by generating a "smart" Makefile.PL. Basically, it takes distribution metadata and generates all the files support files I need. Of course, both EUMM and Module::Build already did that, but dzil takes this several steps further.

The pod munging is similar. It includes standard POD boilerplate that should be in all my modules, but can be annoying to maintain.

It also helps me include various "sanity tests". Since the plugin writes them out anew each time I build the distro, I don't have to worry about keeping them up to date with changes to the testing modules, I just have to update the plugin.

Besides automating support, dzil also helps automate the actual release process. It adds some sanity checks like checking the changelog and the working copy state, and after the release it automates tagging and pushing.

Whereas I previously had to maintain various support files and update them as the toolchain changed, I can now update my plugins and get the updated support files "for free" in every distro I maintain. Overall, the number of steps that go into a release has been hugely reduced, and the possibility of error is much lower. That means its easier to make a new release, and the release quality is higher. Faster and better!

At last count, I maintain (for some value of maintain) 66 distros, so anything I can do to reduce busy work is very welcome!

I've been playing with Dist::Zilla lately, and while I like it, I've also realized there's some perhaps not-so-obvious cons to using it as well. There's also some obvious cons, and some obvious pros.

In talking about cons, there are really two categories. Some of the cons are essential to the design of dzil. others are non-essential, and can easily be fixed in the future, given a sufficient supply of round tuits. Obviously, the essential cons are most important.

Let's get the non-essential ones out of the way. The obvious one is that the docs are pretty minimal right now. I found that to really get what I wanted, I had to mix together cargo-culting and source diving. I still don't understand how the heck I can make use of Pod::Weaver.

A closely related problem is that while there are lots of dzil plugins, they too are mostly poorly documented, and they're also insufficiently flexible. A good example, is the Dist::Zilla::Plugin::PodSpellingTests plugin. Spell checking your pod is great, and I'd love to automate it as much as I can. However, if you're doing spell checking you must include a custom dictionary that includes things like your name.

This plugin adds a wordlist that the author created in the form of a CPAN module. That's not very useful when the wordlist module doesn't a word you want to whitelist. There's no way to provide an alternate module. Of course, the real problem is that this is a terrible interface. I don't want to release a new distro every time I add a word to my wordlist. The right way to do this is to look for a .pod-spelling file in the distro root.

Ultimately, I skipped this plugin and created POD spelling test "by hand".

Let's not pick on Marcel too much. My own dzil Mercurial plugin is pretty minimal too. It works for me, but may not satisfy anyone else.

Also, dzil is slow. It uses Moose for a CLI app, which is a known-slow combination. Someone should improve Moose startup speed ;)

But as I said, these are non-essential problems, and all entirely fixable.

So what can't be fixed?

Ultimately, using dzil to its utmost means creating a sharp divide between the source repository and released code. Dzil is in part a big ol' pre-processor. It does things like add a $VERSION to each module, add boilerplate to the POD, generate a LICENSE file, etc.

Of course, Perl module authors are already accustomed to this. I'm sure that most authors don't check their META.yml files into source control and edit them by hand. Instead, they're updated as part of the release process. Dzil just takes this several steps further.

However, some of these steps can be particularly problematic. If you allow dzil to add the $VERSION line, that means that when you use the distro's modules directly from the lib directory, they have no version. This can be a problem if you're trying to test some other module against the source repo, and that other module has a minimum version requirement.

Similarly, when you run tests with prove, you're testing something that isn't quite what gets released. Don't worry too much; when you dzil release, it runs the tests against the post-processed code, so you're not likely to incur bugs this way.

You can choose to not use the $VERSION-inserting plugin, and maintain the $VERSION manually, and dzil still has lots of other useful features. Nonetheless, this sort of issue is likely to crop up with other plugins.

So what are the pros? Ultimately, it makes maintaining modules easier. The less non-essential work I have to do in order to make a new release, the better. Also, some of the plugins do things to ensure that my releases are not broken, like checking for an update to Changes that matches the current module version, or ensuring that I have pod syntax tests as part of the release.

For someone like me, who has dozens of modules on CPAN, these time savings really add up.

Overall, I'm pretty happy with dzil, and I consider the eliminated drudgery a win, despite the hassles. I'm hoping that this entry will give people a better idea of what they're getting into if they explore Dist::Zilla.

I also look forward to rjbs finally finishing the much-discussed configuration system overhaul so he can finally write some damn docs ;)

I released the first version of Text::TOC, so now we can revisit my earlier design in light of an actual implementation.

From a high level, what's released is pretty similar to what I thought I would release. Here's what I said the high level process looked like:

  • Process one or more documents for "interesting" nodes.
  • Assemble all the nodes into a table of contents.
  • Annotate the source documents with anchors as needed.
  • Produce a complete table of contents in the specified output format.

This is more or less exactly what the released code does.

However, I was also wrong in some cases. I said that "adding anchors and generating a table of contents will also be roles". In fact, this became one role, Text::TOC::Role::OutputHandler.

The output handler is responsible for iterating over the nodes that were deemed interesting. It adds anchors to the source document via the nodes themselves, which are assumed to somehow connect back to the source document. In the HTML case, I'm using HTML::DOM, so given any node in the document, I can alter the source document in place.

At the same time, as it iterates over the nodes, the output handler generates a table of contents.

I still might go back and split these responsibilities up, but for now I wanted to get something released rather than futzing around to find the perfect architecture. Even if I do split them up, the OutputHandler abstraction is useful. In the future an OutputHandler could just delegate to an AnchorInserter and TOCBuilder.

I got some other parts right too. I said ...

Different types of source documents will produce different types of nodes. For an HTML document, the node contents will probably be a DOM fragment representing the content of a given tag.

That is exactly how the released code works.

I also said that finding "interesting" nodes would be a role. It is, and in the HTML implementation there are sane defaults for single- and multi-document tables of contents.

I planned to have an API for managing the formatting of the TOC, but I punted on that for now. Your current choices are unordered or ordered lists. This is good enough for my needs, and therefore good enough for a first release.

Finally, the shortcut API I proposed was a bit off. I eventually realized that the key decision is whether we're making a single- versus multi-document table of contents. That decision determines what is the sane default for a node filter, and a sane link generation strategy. In the multi-doc case you'll always have to provide your own link generator, since I can't know your URI space.

I also punted entirely on embedding the table of contents in the output document. You can do that yourself for now.

The code is on CPAN or in my mercurial repo, so feel free to take a closer look. I hope this will be of use to others as well. I don't know if there will ever be interest in working with non-HTML documents, but even as it is I think it's more useful than the other HTML TOC tools that previously existed on CPAN.

A while ago, I wrote an entry on the idea of breaking problems down as a strategy for building good tools.

Today, I started writing a new module, Text::TOC. The goal is to create a tool for generating a table of contents from one or more documents. I'm going to write up my initial design thoughts as a "how-to" on problem break down.

First, a little background. I've already looked at some relevant modules on CPAN. Both HTML::Toc and HTML::GenToc have awkward and/or insufficiently powerful APIs. Their internals are also nothing to write home about, so I ruled out patching them. At a certain point, I just can't stomach wading through a bad design, even if that might get me to my goal quicker.

I started this project wanting to generate a table of contents for an HTML document, but I quickly realized that with a little extra work, I could make a table of contents tool that worked for different document formats. A table of contents is a pretty generic concept, so there's no reason not to generalize it.

The ultimate product will also include a shortcut module to facilitate extremely common cases for HTML documents.

Producing a set of low-level components, and then tying them together in convenience modules makes for very good tools. With this approach, if I can build one convenience module, I can build five. Just as importantly, it will also be possible to handle more complicated cases. I believe in following the Perl spirit of making simple tasks simple, and complicated tasks possible. Too many CPAN modules solve one specific problem case at the expense of locking the code into a single-use API.

Roles Rock

I started by thinking about the process that goes into generating a table of contents:

  • Process one or more documents for "interesting" nodes.
  • Assemble all the nodes into a table of contents.
  • Annotate the source documents with anchors as needed.
  • Produce a complete table of contents in the specified output format.

This is all very generic. What kind of nodes? What makes a node interesting? What do anchors look like? What does the table of contents look like for a given format?

This project will make extensive use of roles in its API, and this list of steps gives me a good idea of what those roles will be. I'll create a role for nodes. There will also roles for input and output handling. Anything that does input processing will also do input filtering to find "interesting" nodes. This filtering is also a role.

Finally, adding anchors and generating a table of contents will also be roles.

You'll notice that I haven't talked about anchor names. For now, I'm going to hardcode an algorithm to generate these based on combining the anchor's display text with a unique id. There's no need to solve every problem up front. Patches will be always be welcome.

What is a Table of Contents?

For this project, I'm going to represent the table of contents as a list of nodes. Each node will consist of a type ("h2", "h3", "image"), a link, and the node's contents.

Different types of source documents will produce different types of nodes. For an HTML document, the node contents will probably be a DOM fragment representing the content of a given tag.

This is a very minimal representation. I want to avoid encoding things like a "level" in the node list itself. Instead, I'll defer decisions on how to handle this to the output generation stage. This will make it easier to produce different table styles. Of course, there will be a default which handles common node types (heading) in a sane way.

Input Handling

Concrete input handlers will take a document in a given format and find the interesting nodes in that document. As I mentioned earlier, finding "interesting" nodes will be a role. However, since this is something that people will often want to tweak, I want to make sure that providing a custom filter is as easy as possible.

Instead of requiring that people instantiate a concrete class which implements the filtering role, I will define a type coercion from a code reference to an object. Callers of the module can provide a simple code reference as a filter:

sub {
    my $node = shift;

    return 0 if $node->className() =~ /\bskip-toc\b/;

    return 1 if $node->tagName() =~ /H[2-5]/ || $node->tagName() eq 'IMG';

    return 0;
}

Internally, we'll take the code reference and wrap it with an object which implements the filter role's API.

Output Handling

There are two distinct output tasks. First, we need to annotate an existing document with anchors, so that the we have something to which we can link the table of contents.

Second, we need to produce the table of contents itself.

It's tempting to create a single interface that does both, because these tasks both depend on the output format. However, there's a lot of variation in the way a table of contents can be represented, so I think these will be two separate interfaces.

Another important part of the output interface is the formatting of links in the table of contents, and this will have its own API.

This makes things a little more complicated, but the shortcut modules can gloss over the details in most cases.

The Shortcut API

Now that I have a handle on the low-level components, I want to consider the shortcut API. The shortcut API needs to expose some implementation detail, but not all of it. Understanding what's most important for users helps me in turn understand exactly how to break down the low-level pieces.

I'm going to assume that most users of this module will be inputting and outputting the same format, so we'll have a single API setting for the format. I'll simply encode this in the class name, since the choice of format decides many of the low-level classes.

The shortcut API should support generating a table of contents for either a single document or multiple documents. This affects the generation of links for the table. We also want to support embedding the table in the generated document, at least for the single document case.

Finally, we can offer a few different styles of output for the table of contents. Two obvious choices which come to mind are unordered versus ordered lists.

Given all that, our API might look something like this:

my $generator = Text::TOC::HTML->new(
    filter         => 'single-document',
    link_generator => undef,
    style          => 'unordered-list',
);

$generator->add_file($path_to_html);

$generator->embed_table_of_contents();

for my $file ( $generator->files() ) {
    open my $fh, '>', $file;
    print {$fh} $generator->document_for_file($file);
}

The "single-document" filter will find second- through fourth-level headings. My assumption is that a single document only has a one <h1> tag, which is the document's title. There's no reason to put this in the table of contents.

If we were generating a table of contents for multiple documents, we would want to include the first-level heading, necessitating a different filter.

Since we're only linking within a single document, we don't need to do anything intelligent with the links, we can just use the anchor name directly.

For a multi-document table, I'll need a code reference that does something smart based on the file name. I'm not sure it's worthwhile trying to provide a shortcut for this part of the API, since there may not be any common patterns here. Every application has it's own URI patterns.

Instead, I'll probably just take a code reference:

my $link_gen = sub {
    my $file = shift;
    my $anchor = shift;

    return 'file://' . $file->absolute() . '/#' . $anchor->name();
};

my $generator = Text::TOC::HTML->new(
    filter         => 'multi-document',
    link_generator => $link_gen,
    style          => 'unordered-list',
);

$generator->add_file($_) for @files;

for my $file ( $generator->files() ) {
    open my $fh, '>', $file;
    print {$fh} $generator->document_for_file($file);
}

This shortcut API isn't set in stone, but it's a good start for something useful, and it gives me some good clues about the low-level API.

Writing the Code

Writing this blog entry has been a good way to clarify how this tool should work. Stay tuned for a release of Text::TOC to a CPAN mirror near you.

We'll see how much of the design survives the fires of implementation.

In my last entry, I proposed doing away with DateTime::Locale entirely.

I've since realized that I will want to keep it around as a place to integrate both CLDR and glibc locale data in one unified interface. I'm still going to work on my new Locale::CLDR module, but the DateTime::Locale API will probably stick around more or less as-is.

The one thing I will want to get rid of is the custom locale registration system. However, custom locales would still be usable. They would be loadable by id, or you could pass an already-instantiated custom locale object to a DateTime object.

Recent Comments

  • hanekomu: At YAPC::Asia 2008 there was also a buffet-style dinner. I read more
  • Dave Rolsky: @Alex: Actually, there is an option in the conference software read more
  • Alex Balhatchet: I think it would be great if the talks had read more
  • Mateu: That works, thanks. read more
  • Dave Rolsky: The IP is 173.11.48.50, but you'll need to change your read more
  • Mateu: Maybe an IP address in the meantime for those who read more
  • Sister Bar: It resolved for me this morning (woops globalization, let's say read more
  • Dave Rolsky: I thought the domain was resolving consistently, but it seems read more
  • Brother Foo: Domain doesn't resolve? (also your anonymous commenting wants my name read more
  • Sawyer: Actually, I'm not trying to "get you started" on anything. read more