I decided to give Perl 6 a go today at the Frozen Perl Hackathon. It was a great opportunity because I had Patrick Michaud sitting across the table from me, and I was able pick his brain both about Perl and the Rakudo/Parrot issues I was seeing.

The last time I looked at Perl 6 was about 2.5 years ago, when Pugs was still active. I started working on some DateTime code, but didn’t get too far because of various missing features.

Perl 6 is a really cool language, at least the parts I’ve played with. However, I still had trouble figuring out how to do what I wanted for a couple reasons. First, there’s no up to date comprehensive user documentation. The synopses (basically the Perl 6 language spec) are readable, but not really user-level docs. Second, there’s not a huge body of existing libraries, apps, and one-liners like there is with Perl 5. Because Rakudo doesn’t yet support the full Perl 6 language, the Perl 6 code that does exist is often not coded in the most natural way.

I encountered a few barriers to getting going with Perl 6. First, there is the documentation issue. Rakudo doesn’t support all of Perl 6 yet. Because Rakudo does not yet support all of Perl 6, and because I am pretty much a Perl 6 noob, it’s very hard for me to distinguish between “not supported” and “incorrect code”. This is further compounded by the fact that Rakudo’s error reporting is very rough.

I think if I had more than a few hours to devote to this, I could probably pick it up pretty quickly. A few days hacking on Perl 6 with blead Rakudo woud give me a better idea of how to interpret Rakudo’s error messages, and a better sense of what parts of Perl 6 are actually supported.

For now I’m too busy to put the time in. I’ve got my existing Perl 5 projects, animal rights activism, and rocking Rock Band to do.

I’ll probably come back to Rakudo in a few months and try again, maybe at YAPC. I’ll be less busy then, and I expect that Rakudo will continue to advance quickly. I’m optimistic that we’ll see a Perl 6 alpha or beta in 2009, though I would be surprised to see a real 1.0 release. Of course, I’d be thrilled to be wrong!

I’m very excited to announce the first release of the new Moose::Manual documentation as part of the Moose 0.66 release. This is part of the documentation work being funded by the Moose docs grant from TPF.

One of the complaints we often hear about Moose is that the docs are not good enough, and that it’s hard to get started with it. The Manual was created to address this issue. I’m hoping that it will be a good starting point for people to understand Moose concepts and features. After reading it, you should be able to start creating Moose-based classes right away.

I also hope that people will give us feedback on the Manual. It’s still new, and I’m sure there are parts that need polish, and concepts which were left out. If there’s something you’re confused about or wish were documented, please email me or visit the Moose IRC channel and tell us.

People want many things from software, and those desires are often contradictory. There’s a constant back and forth about what people want from CPAN modules, in particular. It seems like we have the same arguments year after year. I think talking about priorities before talking about why something is good or bad is crucial.

So what are these priorities? How do they work together? Which ones are contradictory? Which ones are most important to you, and when do the priorities shift?

(Note: when I say library below, mentally substitute “module, library, distro, or framework”)

  • Well-vetted – When looking for a library, you might want something other people have already used for a while. You want to know that it does what it says on the box, and that most of the big bugs have been found.
  • Cutting Edge – Some folks like to use new stuff. It’s fun to experiment, and often the new stuff is the most advanced, most interesting, most time-saving. It could also be the biggest new piece of shit, the buggiest, slowest, etc.
  • Dependency free – The CPAN dependency discussion never goes away. Some people really, really don’t like large dependency chains. When you want to use a module as part of an app, and you want non Perl gurus to install that app, this becomes a huge issue. Telling them “just install these 100 modules from CPAN” doesn’t cut it.
  • Small (does one thing) – Less code means less bugs. It also means less docs to read, and makes a library simpler to learn.
  • Easy to integrate – Some libraries are designed to be integrated with other modules (Catalyst), some want you to embrace their world (Jifty).
  • Complete – Some libraries come with a complete solution (Jifty) and some require you to put together a bunch of pieces into a whole (Catalyst).
  • Fast – Sometimes speed (of compilation and/or execution) really matter.
  • Memory frugal – Just like with speed, sometimes memory usage matters.
  • No XS – Sometimes you’re stuck using a system where you can’t compile anything. Or maybe you have a compiler, but the module requires external C libraries, and you can’t install them (a hosted account).
  • Active development – Maybe you feel more comfortable knowing the module has a future, even if that means a higher rate of change.
  • Stable – On the other hand, maybe you want something that’s just done, where you know new releases will be infrequent and backwards compatible.

I’m sure there are more priorities (feel free to mention some in the comments). It’s easy to say we want all of these things, but there are many, many conflicts here. I won’t go into all of them, but here’s a few examples.

If you want well-vetted, you’re not going to be using cutting edge code.

If you want dependency free, that code is probably not well-vetted. That dependency free code probably has some reinvented wheels, and those wheels are probably less round than the dependency they avoid.

If you want fast or memory frugal, you probably can’t also insist on no XS. If you want complete solutions, than small and easy to integrate may go out the window.

Personally, my top priorities are usually small, easy to integrate, and active development. I’d rather learn several small pieces and put them together than try to digest a big framework all at once. And I’d rather have an active community, even if I have to keep up with API changes.

I don’t care too much about fast or memory frugal. I work on a lot of webapps, which are often less demanding performance wise, at least if you can count on a dedicated server or two. Contrast this to a small “throw it in cgi-bin” app. Webapps also often have a lot of opportunities for speed improvements at the application level with caching and other strategies, so I worry less about the underlying libraries.

I’d much prefer well-vetted to dependency free. I think the latter is an entirely false economy, and what we really need are much, much better installer tools.

But these are just my priorities for the work I do most often. If I were working on an embedded data-crunching app, I’m sure my priorities would change quite a bit!

I’d like to see people state their priorities up front, and explain why it’s important for the work they do. Often this gets left out of the discussion. Without this information, we often end up just talking past each other.

I enjoy reading a good epic fantasy from time to time. Sure, it’s a well-worn genre, but I like a big story, and if it’s well-written, it can be fun.

I just finished re-reading Tad Williams’ Memory, Sorrow, and Thorn trilogy (for the first time since it was published 20 years ago). It was enjoyable, despite a bunch of cliche bits.

But it got me thinking about how ridiculous many fantasy worlds are when you look a little deeper.

The first example is the Sitha (Tad Williams’ elves). In these books, the Sitha are immortal, and it’s stated that they give birth approximately every 500 years. They migrated to the continent they’re on thousands of years ago, but it doesn’t say how many. For some reason, their population is ridiculously small, but that really doesn’t make much sense, especially considering that they were the unchallenged rulers of that continent for a long time.

If we assume that 1,000 Sitha were in the first migration, and that migration occurred 10,000 years ago, how many Sitha should there be “now”? Let’s assume that the every 500 years birth pattern is true. Let’s also assume, since they’re immortal, that the females can continue to have children indefinitely. That means that every 500 years, half of the population will give birth to a child, of whom half will be female, and so on and so forth.

In other words, every 500 years the population should increase by 50 percent. After 10,000 years, the initial population of 1,000 should be over 3,000,000 (that’s 3 million)! That’s a lot of Sitha! In the books, however, they’re a dying race. Yes, there’s been a bunch of wars and such, but those wars started a long time after their initial migration, when their population should already have been in the hundreds of thousands.

The other goofy bit of ecology is a dragon that supposedly lived in a system of tunnels underneath a castle. The dragon is described as being very large, presumably bigger than an elephant. While there are some big spaces in the tunnel system, there’s no giant pathway into the part where the dragon is, which seems to be pretty far into the tunnel system. Maybe it was born there and grew too big to leave? I can buy that, but what does it eat?

People know about this dragon, so excluding the occasional foolhardy hero, I don’t think there’s a lot of traffic down there. Certainly there is probably nothing bigger than mice and bats, and even they would probably avoid a large predator’s living space.

Stuff like this does kind of annoy me, because it seems like the author adopts some fantasy convention (immortal elves who are dying out) without actually figuring out how to make that make any sort of sense, other than “it’s a magic world, and I say so”.

A good example of doing better is Robin Hobb’s Elderlings trilogy of trilogies. She actually comes up with a very interesting and sane life-cycle for various fantastic creatures (I don’t want to be too specific), and even includes things like natural disasters in this fantasy ecology. It all makes sense and ties into the story very nicely.

I’m still stuck on the whole problem of the requirement that URIs for REST APIs be discoverable, not documented. It’s not so much that making them discoverable is hard, it’s that making them discoverable makes them useless for some (common) purposes.

When I last wrote about REST, I got taken to task and even called a traitor (ok, I didn’t take that very seriously ;) Aristotle Pagaltzis (and Matt Trout via IRC) told me to take a look at AtomPub.

I took a look, and it totally makes sense. It defines a bunch of document types, which along with the original Atom Syndication Format, would let you easily write a non-browser based client for publishing to and reading from an Atom(Pub)-capable site. That’s cool, but this is for a very specific type of client. By specific I mean that the publishing tool is going to be interactive. The user navigates the Atom workspaces, in the client finds the collection they’re looking for, POSTs to it, and they have a new document on the site.

But what about a non-interactive client? I just don’t see how REST could work for this.

Let me provide a very specific example. I have this site VegGuide.org. It’s a database of veg-friendly restaurant, grocers, etc., organized in a tree of regions. At the root of the tree, we have “The World”. The leaves of that node are things like “North America”, “Europe”, etc. In turn “North America” contains “Canada”, “Mexico” and “USA”. This continues until you find nodes which only contain entries, not other regions, like “Chicago” and “Manhattan”.

(There are also other ways to navigate this space, but none of them would be helpful for the problem I’m about to outline.)

I’d like for VegGuide to have a proper REST API, and in fact its existing URIs are all designed to work both for browsers and for clients which can do “proper” REST (and don’t need HTML, just “raw” data in some other form). I haven’t actually gotten around to making the site produce non-HTML output yet, but I could, just by looking at the Accept header a client sends.

Let’s say that Jane Random wants to get all the entries for Chicago, maybe process them a bit, and then republish them on her site. At a high level, what Jane wants is to have a cron job fetch the entries for Chicago each night and then generate some HTML pages for her site based on that data.

How could she do this with a proper REST API? Remember, Jane is not allowed to know that http://www.vegguide.org/region/93 is Chicago’s URI. Instead, her client must go to the site root and somehow “discover” Chicago!

The site root will return a JSON document something like this:

Then her client can go to the URI for North America, which will return a similar JSON document:

Her client can pick USA and so on until it finally gets to the URI for Chicago, which returns:

Now the client has the data it wants and can do its thing.

Here’s the problem. How the hell is this automated client supposed to know how to navigate through this hierarchy?

The only (non-AI) possibility I can see is that Jane must embed some sort of knowledge that she has as a human into the code. This knowledge simply isn’t available in the information that the REST documents provide.

Maybe Jane will browse the site and figure out that these regions exist, and hard-code the client to follow them. Her client could have a list of names to look for in order: “North America”, “USA”, “Illiinois”, “Chicago”.

If the names changed and the client couldn’t find them in the REST documents, it could throw an error and Jane could tweak the client. A sufficiently flexible client could allow her to set this “name chain” in a config file. Or maybe the client could use regexes so that some possible changes (“USA” becomes “United States”) are accounted for ahead of time.

Of course, if Jane is paying attention, she will quickly notice that the URIs in the JSON documents happen to match the URIs in their browser, and she’ll hardcode her client to just GET the URI for Chicago and be done with it. And since sites should have Cool URIs, this will work for the life of the site.

Maybe the answer is that I’m trying to use REST for something inherently outside the scope of REST. Maybe REST just isn’t for non-interactive clients that want to get a small part of some site’s content.

That’d be sad, because non-interactive clients which interact with just part of a site are fantastically useful, and much easier to write than full-fledged interactive clients which can interact with the entire site (the latter is commonly called a web browser!).

REST’s discoverability requirement is very much opposed to my personal concept of an API. An API is not discoverable, it’s documented.

Imagine if I released a Perl module and said, “my classes use Moose, which provides a standard metaclass API (see RFC124945). Use this metaclass API to discover the methods and attributes of each class.”

You, as an API consumer, could do this, but I doubt you’d consider this a “real” API.

So as I said before, I suspect I’ll end up writing something that’s only sort of REST-like. I will provide well-documented document types (as opposed to JSON blobs), and those document types will all include hyperlinks. However, I’m also going to document my site’s URI space so that people can write non-interactive clients.

The “Perl is Dead” meme has been going around for some time. It seems like one of those self-reinforcing things that people keep repeating, but where’s the evidence? The other half of the meme is that other dynamic languages, specifically Ruby, Python, and PHP are gaining market/mind share.

That is true. I hear a lot more about Python, Ruby, and even PHP these days than I did five or ten years ago. Does that mean Perl is dead? No, it just means Python, Ruby, and PHP are doing better now than in the past. That’s not bad for Perl. On the contrary, my theory is that a rising “dynamic languages” tide will lift all boats.

Tim Bunce wrote about job posting trends in February of 2008, and it’s interesting reading. Unsurprisingly (to me), all of Perl, PHP, Ruby, and Python jobs are growing, and while Ruby and Python are growing faster than Perl, Perl is still way ahead of them. My guess is that eventually they’ll level out around Perl’s percentage and start growing slower.

Today I was thinking about Perl’s reported morbidity (in the context of a relatively stupid “Perl 6 is Vaporware” article (that I don’t care to link to because it was lame)).

Perl could have a lot of jobs and still be dead. After all, COBOL has a lot of jobs, but no one thinks of COBOL as a “living” language, it’s just undead.

I decided to take a quick look at books instead. My theory is that if people are buying books on a topic, it must have some life, because that means someone wants to learn about said topic.

The flagship Perl book is O’Reilly’s Learning Perl. The fifth edition was just released in June of this year.

It’s currently #3,984 amongst all books, which isn’t bad. Even more impressive, it’s #1 in the Amazon category of “Books > Computers & Internet > Programming > Introductory & Beginning”. This would be more impressive if this category included Learning Python, but I don’t think it does.

O’Reilly’s Learning Python is also doing well, at #3,357 among all books. In fact, this is the highest rank book of those I looked at.

O’Reilly’s Learning Ruby is at #194,677, which I can only assume reflects the book, not Ruby itself. The best-selling intro-level Ruby book is (I think) Beginning Ruby: From Novice to Professional, at #23,024.

So Perl seems to be holding its own, and for some reason the intro Ruby books aren’t selling well.

On to O’Reilly’s Programming Perl, which is the Perl reference, despite being rather old (8 years). It’s at #12,428.

O’Reilly’s Programming Python is at #32,658. I would’ve expected Dive Into Python to do much better than #177,394. It has very high ratings, much better than Programming Python, and I’ve heard good things about it on the net. Go figure.

O’Reilly’s The Ruby Programming Language is at #5,048 and Programming Ruby is at #13,125. My guess is that many people skip the intro level Ruby books in favor of these two.

So what’s the summary? Each of these three languages has at least one book in the top 10,000, and the best selling books for each language are all relatively close. Certainly, Perl is looking pretty good in this light.

Another interesting thing about the Perl book market is the sheer number of niche Perl books out there, one of which I co-wrote. Compare O’Reilly’s Python book page to their Perl page. Of course, the Python page has more recent books, but maybe they’re just catching up on topics Perl had covered years ago.

This is all quite unscientific, but I think there’s some value here. My conclusion is that Perl is not quite dead yet, and is in fact doing reasonably well. While it may not have the same buzz that the new kids have, people still want to learn it.

Roy Fielding, the inventor of REST, wrote a blog post recently titled REST APIs must be hypertext-driven. It’s quite hard to understand, being written in pure academese, but I think I get the gist.

The gist is that for an API to be properly RESTful it must be discoverable. Specifically, you should be able to point a client at the root URI (/) and have it find all the resources that the API exposes. This is a cool idea, in theory, but very problematic in practice.

A consequence of this restriction is that any sort of documentation that contains a list of URIs (or URI templates, more likely) and documentation on accepted parameters is verboten.

Presumably, if I had a sufficiently smart client that understood the media types used in the application, I’d point it at the root URI, it’d discover all the URIs, and I could manipulate and fetch data along the way.

That’s a nice theory, but has very little with how people want to use these APIs. For a simple example, let’s take Netflix. Let’s assume that I want to use the Netflix API to search for a movie, get a list of results and present it back for a human to pick from, and add something from that list to my queue.

Without prior documentation on what the URIs are, how would I implement my client? How do I get those search results? Does my little client appgo to the root URI and then looks at the returned data for a URI somehow “labeled” as the search URI? How does my client know which URI is which without manual intervention?

If I understand correctly this would somehow all be encoded in the definition of the media types for the API. Rather than define a bunch of URI templates up front, I might have a media type of x-netflix/resource-listing, which is maybe a JSON document containing label/URI/media type triplet. One of those label/URI pairs may be “Search/http://…”. Then my client POSTS that URI using the x-netflix/movie-search media type. It gets back a x-netflix/movie-listing entity, which contains a list of movies, each of which consists of a title and URI. I GET each movie URI, which returns an x-netflix/movie document, which contains a URI template for posting to a queue? Okay, I’m lost on that last bit. I can’t even figure this out.

Resource creation and modification seems even worse. To create or modify resources, we would have a media type to describe each resource’s parameters and type constraints, but figuring out how to create one would involve traversing the URI space (somehow) until you found the right URI to which to POST.

Of course, this all “just works” with a web browser, but the whole point of having a web API is to allow someone to build tools that can be used outside of a human-clicks-on-things-they’re-interested-in interface. We want to automate tasks without requiring any human interaction. If it requires human intervention and intelligence at each step, we might as well use a web browser.

I can sort of imagine how all this would work in theory, but I have trouble imagining this not being horribly resource-intensive (gotta make 10 requests before I figure out where I can POST), and very complicated to code against.

Worse, it makes casual use of the API much harder, since the docs basically would say something like this …

“Here’s all my media types. Here’s my root URI. Build a client capable of understanding all of these media types, then point it at the root URI and eventually the client will find the URI of the thing you’re interested in.”

Compare this with the Pseudo-REST API Fielding says is wrong, which says “here is how you get information on a single Person. GET a URI like this …”

Fielding’s REST basically rules out casual implementers and users, since you have to build a complete implementation of all the media types in advance. Compare this to the pseudo-REST API he points out. You can easily build a client which only handles a very small subset of the APIs URIs. Imagine if your client had to handle every URI properly before it could do anything!

In the comments in his blog, Fielding throws in something that really makes me wonder if REST is feasible. He says,

A truly RESTful API looks like hypertext. Every addressable unit of information carries an address, either explicitly (e.g., link and id attributes) or implicitly (e.g., derived from the media type definition and representation structure). Query results are represented by a list of links with summary information, not by arrays of object representations (query is not a substitute for identification of resources).

Look at last sentence carefully. A “truly RESTful API”, in response to a search query, responds not with the information asked for, but a list of links! So if I do a search for movies and I get a hundred movies back, what I really get is a summary (title and short description, maybe) and a bunch of links. Then if I want to learn more about each movie I have to request each of 100 different URIs separately!

It’s quite possible that I’ve completely misunderstood Fielding’s blog post, but I don’t think so, especially based on what he said in the comments.

I’m not going argue that REST is something other than what Fielding says, because he’s the expert, but I’m not so sure I really want to create true REST APIs any more. Maybe from now I’ll be creating “APIs which share some characteristics with REST but are not quite REST”.

I just got back from seeing The Magnetic Fields, and it was a great show. It got me thinking about the most memorable concerts I’ve seen over the years.

In no particular order …

  • Weird Al at Toad’s Place in New Haven, 1991 (or 1992). I know how deeply uncool it is to admit this, but I’ve seen Weird Al live, and it was great. I think this was the first rock concert I ever went to, in fact. Weird Al did a great live show, with all sorts of wacky costume changes, weird dances, and a generally kick-ass performance.
  • Most of the They Might Be Giants shows I’ve seen. I think they may be the second band I saw live, and I’ve seen them many times since.
  • The first time I saw Einsturzende Neubaten. I was amazed at how good the sound was for such a complicated set of instruments. I also appreciated the fact that it was loud, but not way too fucking loud, like many concerts I’ve been to.
  • Tokyo Incidents at the Kamakura Cultural Center. This may be the single best concert I’ve ever been to. This band is amazing, and the singer, Ringo Shiina, is one of the best singers I’ve ever heard. A lot of what she sang was quite vocally demanding, and she and the band nailed every note. Combine that with great sound and acoustics (yay for concert halls).
  • Seeing the Minneapolis Orchestra perform Messiaen’s Turangalila, and a few years later Britten’s War Requiem. These are two of my all time favorite pieces. I’ve also loved seeing George Crumb’s chamber works live. I saw Music for a Summer Evening (Makrokosmos III) and then years later Vox Balanae, and both were amazing.
  • Seeing Low perform at Orchestra Hall. The acoustics of the hall worked incredibly well with their minimalist music. I’ve seen Low many times live, but I think this was my favorite, just cause it sounded so good.

And finally, one dishonorable mention.

  • The Polyphonic Spree at The Fine Line. This wasn’t the band’s fault, I think they might have been doing a fine job. However, the sound was so amazingly loud that I couldn’t really hear any music, just a roar of noise from which I could sort of pick out musical sounds. This was a huge disappointment, because I’d been very excited to see them. I think some sound engineers are deaf, and they crank everything to 11. They need to be fired.

Programmers like to talk about scaling and performance. They talk about how they made things faster, how some app somewhere is hosted on some large number of machines, how they can parallelize some task, and so on. They particularly like to talk about techniques used by monster sites like Yahoo, Twitter, Flickr, etc. Things like federation, sharding, and so on come up regularly, along with talk of MogileFS, memcached, and job queues.

This is lot like gun collectors talking about the relative penetration and stopping power of their guns. It’s fun for them, and there’s some dick-wagging involved, but it doesn’t come into practice all that much.

Most programmers are working on projects where scaling and speed just aren’t all that important. It’s probably a webapp with a database backend, and they’re never going to hit the point where any “standard’ component becomes an insoluble bottleneck. As long as the app responds “fast enough”, it’s fine. You’ll never need to handle thousands of request per minute.

The thing that developers usually like to finger as the scaling problem is the database, but fixing this is simple.

If the database is too slow, you throw some more hardware at it. Do some profiling and pick a combination of more CPU cores, more memory, and faster disks. Until you have to have more than 8 CPUs, 16GB RAM, and a RAID5 (6? 10?) array of 15,000 RPM disks, your only database scaling decision will be “what new system should I move my DBMS to”. If you have enough money, you can just buy that thing up front.

Even before you get to the hardware limit, you can do intelligent things like profiling and caching the results of just a few queries and often get a massive win.

If your app is using too much CPU on one machine, you just throw some more app servers at it and use some sort of simple load balancing system. Only the most brain-short-sighted or clueless developers build apps that can’t scale beyond a single app server (I’m looking at you, you know who).

All three of these strategies are well-known and quite simple, and thus are no fun, because they earn no bragging rights. However, most apps will never need more than this. A simple combination of hardware upgrades, simple horizontal app server scaling, and profiling and caching is enough.

This comes back to people fretting about the cost of using things like DateTime or Moose.

I’ll be the first to admit that DateTime is the slowest date module on CPAN. It’s also the most useful and correct. Unless you’re making thousands of objects with it in a single request, please stop telling me it’s slow. If you are making thousands of objects, patches are welcome!

But really, outside your delusions of application grandeur, does it really matter? Are you really going to be getting millions of requests per day? Or is it more like a few thousand?

There’s a whole lot of sites and webapps that only need to support a couple hundred or thousand users. You’re probably working on one of them ;)

… because who doesn’t love a good Venn Diagram?

CAA is committed to focusing on just one issue, and we avoid taking stances on other issues. Sometimes people question why, and I often see calls among the greater animal rights/social justice world for a multi-issue movement.

There are lots problem with any multi-issue group, and the bigger your scope the bigger the problems. For example, what are your goals, what are your strategies? How do you know you’ve won?

But I think the most serious problem is simply recruiting volunteers. I’ve illustrated that problem with this handy Venn Diagram.

Each circle represents your potential volunteer base, and the intersections are the potential volunteer base of a hypothetical group, either an AR/anti-abortion group or an AR/pro-abortion group.

Now, some people might dispute the relatively small size of each intersection. That’s not the point (and I mostly did it so the text stayed readable). The real key is that the two intersections are mutually exclusive. Our hypothetical multi-issue group can only pick one of those two intersections.

Even in cases where the intersections are not exclusive, you still need to find people who fall into that intersection, and the more issues you add, the smaller the intersection becomes.

I consider this to be entirely proved by this blog entry, because it includes math. And math is always right.