The “Perl is Dead” meme has been going around for some time. It seems like one of those self-reinforcing things that people keep repeating, but where’s the evidence? The other half of the meme is that other dynamic languages, specifically Ruby, Python, and PHP are gaining market/mind share.

That is true. I hear a lot more about Python, Ruby, and even PHP these days than I did five or ten years ago. Does that mean Perl is dead? No, it just means Python, Ruby, and PHP are doing better now than in the past. That’s not bad for Perl. On the contrary, my theory is that a rising “dynamic languages” tide will lift all boats.

Tim Bunce wrote about job posting trends in February of 2008, and it’s interesting reading. Unsurprisingly (to me), all of Perl, PHP, Ruby, and Python jobs are growing, and while Ruby and Python are growing faster than Perl, Perl is still way ahead of them. My guess is that eventually they’ll level out around Perl’s percentage and start growing slower.

Today I was thinking about Perl’s reported morbidity (in the context of a relatively stupid “Perl 6 is Vaporware” article (that I don’t care to link to because it was lame)).

Perl could have a lot of jobs and still be dead. After all, COBOL has a lot of jobs, but no one thinks of COBOL as a “living” language, it’s just undead.

I decided to take a quick look at books instead. My theory is that if people are buying books on a topic, it must have some life, because that means someone wants to learn about said topic.

The flagship Perl book is O’Reilly’s Learning Perl. The fifth edition was just released in June of this year.

It’s currently #3,984 amongst all books, which isn’t bad. Even more impressive, it’s #1 in the Amazon category of “Books > Computers & Internet > Programming > Introductory & Beginning”. This would be more impressive if this category included Learning Python, but I don’t think it does.

O’Reilly’s Learning Python is also doing well, at #3,357 among all books. In fact, this is the highest rank book of those I looked at.

O’Reilly’s Learning Ruby is at #194,677, which I can only assume reflects the book, not Ruby itself. The best-selling intro-level Ruby book is (I think) Beginning Ruby: From Novice to Professional, at #23,024.

So Perl seems to be holding its own, and for some reason the intro Ruby books aren’t selling well.

On to O’Reilly’s Programming Perl, which is the Perl reference, despite being rather old (8 years). It’s at #12,428.

O’Reilly’s Programming Python is at #32,658. I would’ve expected Dive Into Python to do much better than #177,394. It has very high ratings, much better than Programming Python, and I’ve heard good things about it on the net. Go figure.

O’Reilly’s The Ruby Programming Language is at #5,048 and Programming Ruby is at #13,125. My guess is that many people skip the intro level Ruby books in favor of these two.

So what’s the summary? Each of these three languages has at least one book in the top 10,000, and the best selling books for each language are all relatively close. Certainly, Perl is looking pretty good in this light.

Another interesting thing about the Perl book market is the sheer number of niche Perl books out there, one of which I co-wrote. Compare O’Reilly’s Python book page to their Perl page. Of course, the Python page has more recent books, but maybe they’re just catching up on topics Perl had covered years ago.

This is all quite unscientific, but I think there’s some value here. My conclusion is that Perl is not quite dead yet, and is in fact doing reasonably well. While it may not have the same buzz that the new kids have, people still want to learn it.

Roy Fielding, the inventor of REST, wrote a blog post recently titled REST APIs must be hypertext-driven. It’s quite hard to understand, being written in pure academese, but I think I get the gist.

The gist is that for an API to be properly RESTful it must be discoverable. Specifically, you should be able to point a client at the root URI (/) and have it find all the resources that the API exposes. This is a cool idea, in theory, but very problematic in practice.

A consequence of this restriction is that any sort of documentation that contains a list of URIs (or URI templates, more likely) and documentation on accepted parameters is verboten.

Presumably, if I had a sufficiently smart client that understood the media types used in the application, I’d point it at the root URI, it’d discover all the URIs, and I could manipulate and fetch data along the way.

That’s a nice theory, but has very little with how people want to use these APIs. For a simple example, let’s take Netflix. Let’s assume that I want to use the Netflix API to search for a movie, get a list of results and present it back for a human to pick from, and add something from that list to my queue.

Without prior documentation on what the URIs are, how would I implement my client? How do I get those search results? Does my little client appgo to the root URI and then looks at the returned data for a URI somehow “labeled” as the search URI? How does my client know which URI is which without manual intervention?

If I understand correctly this would somehow all be encoded in the definition of the media types for the API. Rather than define a bunch of URI templates up front, I might have a media type of x-netflix/resource-listing, which is maybe a JSON document containing label/URI/media type triplet. One of those label/URI pairs may be “Search/http://…”. Then my client POSTS that URI using the x-netflix/movie-search media type. It gets back a x-netflix/movie-listing entity, which contains a list of movies, each of which consists of a title and URI. I GET each movie URI, which returns an x-netflix/movie document, which contains a URI template for posting to a queue? Okay, I’m lost on that last bit. I can’t even figure this out.

Resource creation and modification seems even worse. To create or modify resources, we would have a media type to describe each resource’s parameters and type constraints, but figuring out how to create one would involve traversing the URI space (somehow) until you found the right URI to which to POST.

Of course, this all “just works” with a web browser, but the whole point of having a web API is to allow someone to build tools that can be used outside of a human-clicks-on-things-they’re-interested-in interface. We want to automate tasks without requiring any human interaction. If it requires human intervention and intelligence at each step, we might as well use a web browser.

I can sort of imagine how all this would work in theory, but I have trouble imagining this not being horribly resource-intensive (gotta make 10 requests before I figure out where I can POST), and very complicated to code against.

Worse, it makes casual use of the API much harder, since the docs basically would say something like this …

“Here’s all my media types. Here’s my root URI. Build a client capable of understanding all of these media types, then point it at the root URI and eventually the client will find the URI of the thing you’re interested in.”

Compare this with the Pseudo-REST API Fielding says is wrong, which says “here is how you get information on a single Person. GET a URI like this …”

Fielding’s REST basically rules out casual implementers and users, since you have to build a complete implementation of all the media types in advance. Compare this to the pseudo-REST API he points out. You can easily build a client which only handles a very small subset of the APIs URIs. Imagine if your client had to handle every URI properly before it could do anything!

In the comments in his blog, Fielding throws in something that really makes me wonder if REST is feasible. He says,

A truly RESTful API looks like hypertext. Every addressable unit of information carries an address, either explicitly (e.g., link and id attributes) or implicitly (e.g., derived from the media type definition and representation structure). Query results are represented by a list of links with summary information, not by arrays of object representations (query is not a substitute for identification of resources).

Look at last sentence carefully. A “truly RESTful API”, in response to a search query, responds not with the information asked for, but a list of links! So if I do a search for movies and I get a hundred movies back, what I really get is a summary (title and short description, maybe) and a bunch of links. Then if I want to learn more about each movie I have to request each of 100 different URIs separately!

It’s quite possible that I’ve completely misunderstood Fielding’s blog post, but I don’t think so, especially based on what he said in the comments.

I’m not going argue that REST is something other than what Fielding says, because he’s the expert, but I’m not so sure I really want to create true REST APIs any more. Maybe from now I’ll be creating “APIs which share some characteristics with REST but are not quite REST”.

I just got back from seeing The Magnetic Fields, and it was a great show. It got me thinking about the most memorable concerts I’ve seen over the years.

In no particular order …

  • Weird Al at Toad’s Place in New Haven, 1991 (or 1992). I know how deeply uncool it is to admit this, but I’ve seen Weird Al live, and it was great. I think this was the first rock concert I ever went to, in fact. Weird Al did a great live show, with all sorts of wacky costume changes, weird dances, and a generally kick-ass performance.
  • Most of the They Might Be Giants shows I’ve seen. I think they may be the second band I saw live, and I’ve seen them many times since.
  • The first time I saw Einsturzende Neubaten. I was amazed at how good the sound was for such a complicated set of instruments. I also appreciated the fact that it was loud, but not way too fucking loud, like many concerts I’ve been to.
  • Tokyo Incidents at the Kamakura Cultural Center. This may be the single best concert I’ve ever been to. This band is amazing, and the singer, Ringo Shiina, is one of the best singers I’ve ever heard. A lot of what she sang was quite vocally demanding, and she and the band nailed every note. Combine that with great sound and acoustics (yay for concert halls).
  • Seeing the Minneapolis Orchestra perform Messiaen’s Turangalila, and a few years later Britten’s War Requiem. These are two of my all time favorite pieces. I’ve also loved seeing George Crumb’s chamber works live. I saw Music for a Summer Evening (Makrokosmos III) and then years later Vox Balanae, and both were amazing.
  • Seeing Low perform at Orchestra Hall. The acoustics of the hall worked incredibly well with their minimalist music. I’ve seen Low many times live, but I think this was my favorite, just cause it sounded so good.

And finally, one dishonorable mention.

  • The Polyphonic Spree at The Fine Line. This wasn’t the band’s fault, I think they might have been doing a fine job. However, the sound was so amazingly loud that I couldn’t really hear any music, just a roar of noise from which I could sort of pick out musical sounds. This was a huge disappointment, because I’d been very excited to see them. I think some sound engineers are deaf, and they crank everything to 11. They need to be fired.

Programmers like to talk about scaling and performance. They talk about how they made things faster, how some app somewhere is hosted on some large number of machines, how they can parallelize some task, and so on. They particularly like to talk about techniques used by monster sites like Yahoo, Twitter, Flickr, etc. Things like federation, sharding, and so on come up regularly, along with talk of MogileFS, memcached, and job queues.

This is lot like gun collectors talking about the relative penetration and stopping power of their guns. It’s fun for them, and there’s some dick-wagging involved, but it doesn’t come into practice all that much.

Most programmers are working on projects where scaling and speed just aren’t all that important. It’s probably a webapp with a database backend, and they’re never going to hit the point where any “standard’ component becomes an insoluble bottleneck. As long as the app responds “fast enough”, it’s fine. You’ll never need to handle thousands of request per minute.

The thing that developers usually like to finger as the scaling problem is the database, but fixing this is simple.

If the database is too slow, you throw some more hardware at it. Do some profiling and pick a combination of more CPU cores, more memory, and faster disks. Until you have to have more than 8 CPUs, 16GB RAM, and a RAID5 (6? 10?) array of 15,000 RPM disks, your only database scaling decision will be “what new system should I move my DBMS to”. If you have enough money, you can just buy that thing up front.

Even before you get to the hardware limit, you can do intelligent things like profiling and caching the results of just a few queries and often get a massive win.

If your app is using too much CPU on one machine, you just throw some more app servers at it and use some sort of simple load balancing system. Only the most brain-short-sighted or clueless developers build apps that can’t scale beyond a single app server (I’m looking at you, you know who).

All three of these strategies are well-known and quite simple, and thus are no fun, because they earn no bragging rights. However, most apps will never need more than this. A simple combination of hardware upgrades, simple horizontal app server scaling, and profiling and caching is enough.

This comes back to people fretting about the cost of using things like DateTime or Moose.

I’ll be the first to admit that DateTime is the slowest date module on CPAN. It’s also the most useful and correct. Unless you’re making thousands of objects with it in a single request, please stop telling me it’s slow. If you are making thousands of objects, patches are welcome!

But really, outside your delusions of application grandeur, does it really matter? Are you really going to be getting millions of requests per day? Or is it more like a few thousand?

There’s a whole lot of sites and webapps that only need to support a couple hundred or thousand users. You’re probably working on one of them ;)