Recently in Programming Category

Moose Class in Minneapolis - Friday, February 5, 2009

| No Comments | No TrackBacks

I'm doing my one-day Moose class here in Minneapolis again, as part of Frozen Perl. The class is even cheaper this time, as a special deal for the workshop. It's a mere $100 per person!

The class is an interactive course, meaning you bring your laptop and do exercises in between lecture sections. It covers all the basics of Moose, and even gets into some of the more advanced bits.

Here's what students who took the class back in September said:

  • "The exercises were awesome! I really love how they were set up as test cases--there is really no other way to give this much feedback!" - wu
  • "Using the test framework to drive the exercises was brilliant, providing feedback and building confidence. The exercises weren't too difficult, and the detailed, step-by-step instructions helped to make them friendly." - Ken O

You can sign up and pay for the class at the Frozen Perl site. I'd also encourage you to join us for the workshop the next day. There's a good slate of presentations scheduled, and it should be a lot of fun.

Finally, on Sunday, February 7, we'll be having a hackathon. See the Frozen Perl site for details.

Project Stack Push/Pop

| 4 Comments | No TrackBacks

I have an amazing ability to get distracted from my goals when programming. Sometimes it feels like each project I work on is just the latest distraction from what I was working on. Usually this happens because I'm happily hacking away on project A until I hit a roadblock. That roadblock might be a missing feature in a module I'm using, or maybe a module I need that doesn't exist. Sometimes the roadblock is a gap in my understanding. I don't know how to do what I want in a satisfactory way, so I need to learn more about a tool, or just experiment with ways to approach the problem.

I push a new project onto the stack and off I go. I don't know how deep the stack is now. There's probably items that were on ther long ago that have already been forgotten.

Here's an example of where I am in my stack right now:

  • Working on [VegGuide](http://www.vegguide.org) and other things, I've become thoroughly sick of Alzabo ...

    • so I play with DBIx::Class but it doesn't grab me ...

    • I write Fey::ORM ...

  • and back to VegGuide, but I really don't like a lot of things about it, I need to explore new ways of writing webapps ...

    • I start working on a new webapp, a donor/volunteer management app for nonprofits. By building an app from scratch I can get a better understanding of how I want to write moderm webapps. But this type of app is rather complicated so ...

      • having been unhappy with MojoMojo, I start working on a wiki designed for non-geeks (target audience, my animal rights group). I'm using Markdown as the wikitext language, but the existing Markdown tools in Perl don't do what I want so ...

      • now I'm running into major issues with HTML::WikiConverter, which I use to turn GUI-generated HTML back to Markdown. The temptation to fix/rewrite is strong ...

        • sigh

Of course, it's not really quite as simple as this might imply. It's not like the only reason for working on a donor management application is to explore webapps. It's useful all on its own.

Even scarier is the fact that there are other unrelated projects that keep trying to intrude, like making ACT run on mod_perl2 so I can upgrade my server from Dapper to Hardy. I've managed to put that one off for a while, at least, but it keeps nagging at me.

My capacity for adding projects to my stack is simultaneously impressive and disturbing. There's no problem so compelling that it can't be superceded by a new problem uncovered in the course of solving the original.

What's the Point of Markdent?

| 7 Comments | No TrackBacks

Markdent is my new event-driven Markdown parser toolkit, but why should you care?

First, let's talk about Markdown. Markdown is yet another wiki-esque format for marking up plain text. What makes Markdown stand out is it's emphasis on usability and "natural" usage. It's syntax is based on things people have been doing to "mark up" plain text email for years.

For example, if you wanted to list some items in a plain text email, you'd wite something like:

* List item 1
* List item 2
* List item 3

Well, this is how it works in Markdown too. Want to emphasize some text? *Wrap it in asterisks* or _underscores_.

So why do you need an event-driven parser toolkit for dealing with Markdown? CPAN already has several modules for dealing with Markdown, most notably Text::Markdown.

The problem with Text::Markdown is that all you can do with it is generate HTML, but there's so much more you could do with a Markdown document.

If you're using Markdown for an application (like a wiki), you may need to generate slightly different HTML for different users. For example, maybe logged-in users see documents differently.

But what if you want to cache parsing in order to speed things up? If you're going straight from Markdown to HTML, you'd need to cache the resulting HTML for each type of user (or even for each individual user in the worst case).

With Markdent, you can cache an intermediate representation of the document as a stream of events. You can then replay this stream back to the HTML generator as needed.

What's the Impact of Caching?

Here's a benchmark comparing three approaches.

  1. Use Markdent to parse the document and generate HTML from scratch each time.
  2. Use Text::Markdown
  3. Use Markdent to parse the document once, then use Storable to store the event stream. When generating HTML, thaw the event stream and replay it back to the HTML generator.
Rate parse from scratch Text::Markdown replay from captured events
parse from scratch 1.07/s -- -67% -83%
Text::Markdown 3.22/s 202% -- -48%
replay from captured events 6.13/s 475% 91% --

This benchmark is included in the Markdent distro. One feature to note about this benchmark is that it parses 23 documents from the mdtest test suite. Those documents are mostly pretty short.

If I benchmark just the largest document in mdtest, the numbers change a bit:

Rate parse from scratch Text::Markdown replay from captured events
parse from scratch 2.32/s -- -58% -84%
Text::Markdown 5.52/s 138% -- -63%
replay from captured events 14.8/s 538% 168% --

Markdent probably speeds up on large documents because each new parse requires constructing a number of objects. With 23 documents we construct those objects 23 times. When we parse one document the actual speed of parsing becomes more important, as does the speed of not parsing.

What Else?

But there's more to Markdent than caching. One feature that a lot of wikis have is "backlinks", which is a list of pages linking to the current page. With Markdent, you can write a handler that only looks at links. You can use this to capture all the links and generate your backlink list.

How about a full text search engine? Maybe you'd like to give a little more weight to titles than other text. You can write a handler which collects title text and body text separately, then feed that into your full text search tool.

There's a theme here, which is that Markdent makes document analysis much easier.

That's not all you can do. What about a Markdown-to-Textile converter? How about a Markdown-to-Markdown converter for canonicalization?

Because Markdent is modular and pluggable, if you can think of it, you can probably do it.

I haven't even touched on extending the parser itself. That's still a much rougher area, but it's not that hard. The Markdent distro includes an implementation of a dialect called "Theory", based on some Markdown extension proposals by David Wheeler.

This dialect is implemented by subclassing the Standard dialect parser classes, and providing some additional event classes to represent table elements.

I hope that other people will pick up on Markdent and write their own dialects and handlers. Imagine a rich ecosystem of tools for Markdown comparable to what's available for XML or HTML. This would make an already useful markup language even more useful.

Want Good Tools? Break Your Problems Down

| 2 Comments | 1 TrackBack

I've been working a new a project recently, Markdent, an event-driven Markdown parser toolkit.

Why? Because the existing Perl Markdown tools just aren't flexible enough. They bundle up Markdown parsing with HTML conversion all in one API, and I need to do more than convert to HTML.

This sort of inflexibility is quite common when I look at CPAN libraries. Looking back at the Perl DateTime Project, one of my big problems with all the other date/time modules on CPAN was their lack of flexibility. If I could have added good time zone handling to an existing project way back then, I probably would have, but I couldn't, and the Perl DateTime Project was born.

If there is one point I would hammer home to all module authors, it would be "solve small problems". I think that the failure to do this is what leads to the inflexibility and tight coupling I see in so many CPAN distributions.

For example, I imagine that in the date/time world some people thought "I need a bunch of date math functions" or "I need to parse lots of possible date/time strings". Those are good problems to solve, but by going straight there you lose any hope of a good API.

Similarly, with Markdown parsers, I imagine that someone though "I'd like to convert Markdown to HTML", so they wrote a module that does just that.

I can't really fault their goal-focused attitudes. Personally, I sometimes find myself getting lost in digressions. For example, I'm currently writing a webapp with the goal of exploring techniques I want to use in another webapp!

But there's a lot to be said for not going straight to your goal. I'm a big fan of breaking a problem down into smaller pieces and solving each piece separately.

For example, when it comes to Markdown, there are several distinct steps on the way from Markdown to HTML. First, we need to be able to parse Markdown. Parsing Markdown is a step of its own. Then we need to take the results of parsing and turn it into HTML.

If we think of the problem as consisting of these pieces, a clear and flexible design emerges. We need a tool for parsing Markdown (a parser). Separately, we need a tool for converting parse results to HTML (a converter or parse result handler).

Now we need a way to connect these pieces. In the case of Markdent, the connection is an event-driven API where each event is an object and the event receiver conforms to a known API.

It's easy to put these two things together and make a nice simple Markdown-to-HTML converter.

But since I took the time to break the problem down, you can also do other things with this tool. For example, I can do something else with our parse results, like capture all the links or cache the intermediate result of the parsing (an event stream).

And since the HTML generator is a small piece, I can also reuse that. Now that I've cached our event stream, I can pull it from the cache later and use it to generate HTML without re-parsing the document. In the case of Markdent, using a cached parse result to generate HTML was about six times faster in my benchmarks!

Because Markdent has small pieces, there are all sorts of interesting ways to reuse them. How about a Markdown-to-Textile converter? Or how about adding a filter which doesn't allow any raw HTML?

We've all heard that loose coupling makes good APIs. But just saying that doesn't really help you understand how to achieve loose coupling. Loose coupling comes from breaking a big problem down into small independent problems.

As you solve each problem, think about how those solutions will communicate. Design a simple API or communications protocol. You'll know the API is simple enough if you can imagine easily swapping out each piece of the problem with another API-conformant piece. A loosely coupled API is one that makes replacing one end of the API easy.

And best of all, when you break problems down into loosely coupled pieces, you'll make it much easier for others to contribute to and extend your tools. Moose is a great example of this. It's fancy sugar layer exists on top of loosely coupled units known as the metaclass protocol. By separating the sugar from the underlying pieces, we've enabled others to create a huge number of Moose extensions.

The same goes for the Perl DateTime Project. I wrote the core pieces, but there have been many, many great contributions. This wealth of extensions wouldn't be possible without the loosely coupled core pieces and a well-defined API for communicating between components.

Support Me in a Leaflet-a-thon

| No Comments | No TrackBacks

First off, there's no technical content in this blog post. Sorry.

I'll be participating in a leaflet-a-thon next week with my animal advocacy group, Compassionate Action for Animals. This is like a walkathon, but with less walking and more handing stuff out.

To those within the light of my pixels, if you'd like to support me, you can do so by making a donation online. Even if you don't particularly support the cause, please consider doing this to support me. If you've used a module I've written, you could say thanks by making a donation.

Thanks,

-dave

Moose Class in Minneapolis - September 23, 2009

| No Comments | No TrackBacks

The class is scheduled for Wednesday, September 23, 2009, from 8:30am to 5:00pm. The class will be at the Days Inn in Minneapolis near the U of MN campus. There is free on-site parking at the hotel.

The class will run all day, with an hour or so lunch break. I will not be serving food or drinks, but you are welcome to bring your own, of course.

In order to attend, you must register in advance. You can register through the Perl Review and pay online. The class costs $120 per person. This is a very low introductory rate, and future classes will cost substantially more, so take advantage of this opportunity while you can.

Enrollment is limited to 16 people in total.

Here's the full class description ....

Intro to Moose

Join us for an interactive hands-on course all about Moose. Moose is an OO system for Perl 5 that provides a simple declarative layer of "sugar" on top of a powerful, extensible meta-model.

With Moose, simple classes can be created without writing any subroutines, and complex classes can be simplified. Moose's features include a powerful attribute declaration system, type constraints and coercions, method modifiers ("before", "after", and "around"), a role system (like mixins on steroids), and more. Moose also has a vibrant ecosystem of extensions as seen in the variety of MooseX:: modules on CPAN.

This course will cover Moose's core features, dip a toe into the meta-model, and explore some of the more powerful MooseX:: modules available on CPAN.

Students are expected to bring a laptop, as you will be writing code during the class. You will also be provided with a tarball a week or so before the class is scheduled, which will contain a directory tree skeleton and test files.

More on Catalyst Models

| No Comments | No TrackBacks

Marcus Ramberg responded to my post on How I Use Catalyst, and I'd like to respond to a few points he made.

Marcus wrote:

I disagree that $schema->resultset('Person') is a significant improvement on $c->model('DBIC::Person').

Me too! I don't think the former is a significant improvement over the latter. They are, after all, more or less the same. The one big problem is that the latter version uses a nonexisting DBIC::Person namespace. There are no DBIC classes anywhere in the app. I think the model version would be much better if it was just written as $c->model('Person').

Marcus also points out that the model layer lets you configure multiple models and access them in a unified way. That is indeed nice. Unfortunately, that has the problem of tying yourself to Catalyst's config, which is problematic for reasons I already described. Similarly, the unified layer only exists inside Catalyst, which is really only accessible during a web request. So now we're stuck with recreating all of this if we need to access our models outside of a web request.

The long-term Catalyst roadmap includes the much-talked-about application/context split. Once this is done, presumably you will be able to access the application, which I take to mean config and models, outside of the context (of a web request). Once that is in place, I think many of my objections will go away. Unfortunately, for now I have to write my own application/context splitting code.

She Said What?!

| No Comments | No TrackBacks

I created a new website as a fun little personal project, She Said What?!

It was a fun experiment both in minimal web design, and also in minimal code. I can update it from the command line just by typing:

ssw 'A quote goes here|and commentary goes here'

This adds a quote to the quote "database", which is just a directlory of timestamped flat files on my desktop. Then it regenerates the site as static HTML and pushes it to the live server.

The code is in my mercurial repository for anyone who might care.

How I Use Catalyst

| 8 Comments | 1 TrackBack

Now that I've written about My Way of the Webapp and what Catalyst really is, I'll explain what I don't like about Catalyst.

I'm not going to talk about things I know the Catalyst developers are aware of. In particular, the use of subroutine attributes for dispatching is horrible, and they know it. I'm excited to see CatalystX::Declare, since something like that should be the future of Catalyst controllers. Another well-known misfeature is the rampant use of subclassing for plugins and the lack of well-defined APIs. Yuval Kogman explained why this is so problematic very nicely already.

Instead, I'm going to focus on what I consider "Catalyst Worst Practices", in particular misfeatures of Catalyst (and/or plugins) that many people use.

Configuration File (Mis)Handling

Catalyst::Plugin::ConfigLoader loads a config file and merges it into the application config set via MyApp->config(...). "Wonderful", you say, "I'm sick of dealing with config files". Me too! Unfortunately, if you embrace this style of config handling you're setting yourself up for problems later.

It is absolutely crucial that your configuration file be available outside of a web environment. Yes, we're writing webapps, but any sufficiently complex web application will expand to include a cron job or job queue or some sort of asynchronous task. Usually this will involve sending email.

Unfortunately, ConfigLoader's config handling is very tightly integrated into its web components. First, it gets things conceptually wrong by combining all sorts of config into one massive hash. When you call $c->config you can find configuration items for ...

  • Configuration info from your config file
  • Configuration info set in a call to MyApp->config(...)
  • Configuration info for the current controller and its parents

When you use ConfigLoader, your config file can contain both non-web things like database connections, as well configuration specific to your app, and configuration for plugins you use.

All of this gets jumbled together into one simplistic API. This API just gives you back the config info as a giant data structure, with no opportunity to add logic to the mix. Worse it's only available from inside an instantiated webapp via $c->config. This is wrong, wrong, wrong.

How I Do It Instead

I always write my own app-specific config module. This module will use a CPAN module for the actualy reading of files. I like to stick with a simple format, so Config::INI works nicely, but that's a small detail.

The configuration file contains the most minimal set of things it can in order to bootstrap the application. Typically, this will include database connection info and not much else. Maybe it also includes a hostname for the application, which may sometimes be necessary.

This module also includes logic for determining various application configuration values. Note that it does not allow (or require) the end user to configure these things. The fact that PluginLoader lets you configure everything from a configuration file is a nightmare. A configuration file is something that non-developers see, and should have a well-defined, small set of options.

I then use this module to generate configuration data for various parts of my application. In my webapp class, I use it to feed configuration data into Catalyst. That looks something like this:

package R2;

use R2::Config;

use Moose;

my $Config;

BEGIN
{
    extends 'Catalyst';

    $Config = R2::Config->new();

    Catalyst->import( @{ $Config->catalyst_imports() } );
}

__PACKAGE__->config( name => 'R2',
                     %{ $Config->catalyst_config() },
                   );

Most of the configuration passed to Catalyst is not user-settable. For example, I don't want people installing an app to have control over how the Catalyst Session plugin is configured! This is part of the application internals, and users have no business messing with it.

This R2::Config module just works both inside Catalyst and outside of it. When I need application-wide config I simply need to write R2::Config->new()->share_dir() and it works. This means I can take advantage of my configuration in any context, not just inside a web request. This makes writing cron jobs and other non-web pieces trivially easy, although there is a bigger investment up front in designing the configuration module's API.

BTW, the "R2" example comes from a real app in progress.

The Maleficent "Model"

Have you ever looked in a Catalyst class and seen something like $c->model('DBIC::Person')->find(...)? What is it doing? Well, not much, but it's just enough to make a mess.

A good example is the MojoMojo source, which I've been hacking on recently. If you look at the source tree, you'll see that the model code lives under MojoMojo::Schema::ResultSet::* and MojoMojo::Schema::Result::*. The MojoMojo::Schema class ties this all together. In any sane world you'd be writing $schema->resultset('Person')->find(...). But this is not a sane world.

You might argue that the Model bit is solving a problem, which is that we need to instantiate a schema object before we can get at the database. That is a problem that needs solving, but the model API adds nothing to this.

What is wrong with something like this?

package MojoMojo;

has schema =>
    ( is      => 'ro',
      lazy    => 1,
      default => sub { MojoMojo::Schema->connect() },
    );

Then later in our controllers we can write:

$c->schema()->resultset('Person')->find(...);

If we've done our work on configuration handling as I described above, then MojoMojo::Schema knows just where to look for connection info. All that the model API adds is a useless layer of redirection (aka confusion) and a useless 'DBIC::' prefix to our resultset names.

(Nosy readers might point out that the R2 code does have a Model class. That was an experiment which must die.)

$c->uri_for? Not for Me!

Here comes my ultimate heresy. I never use $c->uri_for. I always write application-specific logic for generating URIs. Once again, this comes back to being able to use my application outside of a web environment. For example, I may want to generate email from a cron job that includes application URIs. If I rely on $c->uri_for I would then need to duplicate its logic outside of Catalyst.

My current approach is to simply make generating URIs a responsibility of each object in the system. I don't love this, because it inflicts "web-ness" on my model, but I can rationalize this by considering the URI a persistent unique identifier. In the age of REST that actually makes sense.

This also lets me do things like install the application under a path prefix like "/r2". If the application supports adding an arbitrary prefix to all outgoing paths, this works nicely. I can strip the prefix before any controllers see it, so it requires very little code to support, just some configuration.

This approach is especially handy when an application is designed to be served from multiple hostnames. If you're doing this, you need to account for this in the above-mentioned emails. With R2, each Account (a group of Users) is associated with a Domain. A domain can have separate web and email hostnames, and those hostnames are always used when generating URIs for anything associated with the account.

If I used $c->uri_for I'd still need a way to go from a web hostname to an email hostname.

Summary

I encourage you to think twice before adopting every feature you see someone else use in a Catalyst app. Catalyst is great, but not everything about it supports long-term maintainable applications.

Some of its features make getting started with small apps really easy, but they will bite you in the ass as your app grows. With a little more work up front, you can build a cleaner app that won't require major hacks or rewriting later.

Moose Class in Minneapolis?

| No Comments | No TrackBacks

I have this one day Moose class I've developed. I was supposed to give it at YAPC, but I was sick and cancelled my trip.

Here's the class summary I wrote for YAPC:

This will be an interactive hands-on course all about Moose. Moose is an OO system for Perl 5 that provides a simple declarative layer of "sugar" on top of a powerful, extensible meta-model.

With Moose, simple classes can be created without writing any subroutines, and complex classes can be simplified. Moose's features include a powerful attribute declaration system, type constraints and coercions, method modifiers ("before", "after", and "around"), a role system (like mixins on steroids), and more. Moose also has a vibrant ecosystem of extensions as seen in the variety of MooseX:: modules on CPAN.

This course will cover Moose's core features, dip a toe into the meta-model, and if there's time, explore some of the more powerful MooseX:: modules available on CPAN

Students are expected to bring a laptop, as you will be writing code during the class. You will also be provided with a tarball a week or so before the class is scheduled, which will contain a directory tree skeleton and test files.

The class ended up being taught by Shawn Moore, and I got a lot of good feedback about the class slides and exercises.

I'd really like to give the class myself, and I am trying to figure out if there's interest in the Twin Cities, MN area. I'd reserve a conference room at a hotel or on the UMN campus in Minneapolis or St Paul for the class, and I'm hoping to do this some time in August or September.

Normally, a one-day training session like this would run about $500 per person, but since this would be my first time actually giving the class, the rate would be a mere $120 per person.

In return, I want each attendee to commit to answering a short survey I'll give them after the class so I can get feedback on the class and my teaching. So there is a small price for the discount ;)

In order to make this happen, I'd need at least 5 people to sign up.

If you have questions, feel free to ask in a comment or email me or email the Minneapolis Perl Mongers list.

If you're interested in attending, please let me know what days of the week are best. Would you prefer a weekday, Saturday, or Sunday? Or maybe it doesn't matter. Again, leave a comment or email me.

If there is enough interest, I'll schedule it and then announce it on this blog and Minneapolis Perl Mongers list.