Tuesday, February 23, 2010

The Evolution of the CPAN



In the last week something interesting has happened: The venerable CPAN client has gained a new sibling. It's called "cpanminus" and Tatsuhiko Miyagawa recently tweeted about writing it to solve an issue he was having, but warned that he did not believe it to be a good solution at all. However, it sparked enough interest that he decided to release the code to the CPAN and more people are beginning to think it's not such a bad idea after all!

cpanminus is not the first alternative CPAN client, and it's too new to know if it will gain enough adoption to be considered successful, but I'm beginning to see signs that people like it and want to use it. The only other CPAN client to achieve widespread adoption has been CPANPLUS, which took many years and a huge amount of effort - and still not many folks use it by default, even though it's now bundled with perl.

That got me thinking about the history of the CPAN, the services surrounding it, and the ideas and needs that have driven the evolution of the many little pieces that make up what we colloquially call "CPAN."

Now, what do I mean by that? Let me explain a bit about the CPAN first. CPAN is an acronym for "Comprehensive Perl Archive Network" and it's the primary means by which users of Perl obtain additional modulesframeworkslanguage extensionslibraries and even whole applications. When people say "the CPAN" what they are usually referring to is this massive online network of mirrors, and probably the supporting websites and services, like http://search.cpan.org.

Please note: I will henceforth refer to these bundles of software of any kind or category on the CPAN as "dists," short for "distributions." It's analogous to a package on *nix systems and technically can contain code, data, documentation, whatever, so "dist" is the easiest way to say what I mean and it's the terminology used across the CPAN itself.

Over the years, additional services that leverage the CPAN have sprung up, each adding information and value to all the software stored there, making life better for users and authors, and all without the need for them to change or update anything!

Some of my favorites from this constellation of services are:

Let me just point out http://cpantesters.org again. It seriously rocks the casbah. If you're evaluating a dist on http://search.cpan.org and you do not visit the links to http://cpantesters.org and its subdomains you are doing yourself huge disservice! 

Such a repository of free software is alone a teriffic resource, but it's made even more powerful by a client-side library and script both by the same name that are distributed with perl itself! When you hear somebody say "just use cpan" or "install it with cpan" what they're referring to is the cpan command bundled with perl. The CPAN library and cpan script make installing these resources and their dependencies pretty much bone-easy

However, it's not always just get-up-and-go. The default CPAN client must support a dizzying array of operating systems, perl configurations, myriad installation quirks, and a near-infinite combination these and other variables that could affect it's ability to operate. In addition, this venerable piece of software has been around since 1996, and we all know a LOT has changed since then, and as software ages it usually becomes harder and harder to keep up-to-date. Even today, with all the techniques for probing a system to determine the necessary info, the old CPAN.pm could do a lot better. It could be lighter, faster, easier to extend and maintain. It could be increasingly more automated and automatable. There are features the Perl community constantly dreams up but even trying them out is prohibitively difficult due to the original design and the aforementioned factors of complexity.

Back in 2001, Michael Schwern was promoting an idea to create a CPAN Testing service (CPANTS) that would automatically test and analyze everything on the CPAN across a matrix of platforms and configurations. This would require a highly automated CPAN client, plus extensions for capturing the output of the build process, probing for a bazillion things and sending back the results... The first attempts to do this with the existing CPAN.pm was quickly abandoned and thus was born CPANPLUS.

CPANPLUS has been evolving over the years and while not entirely backwards-compatible with CPAN, it has eventually reached the point where it is trusted and used by enough of the Perl community that it can be considered successful. It has a modular, object-oriented architecture, numerous plugins and additions, and is widely considered to be superior to the old CPAN client in just about every way. However, CPANPLUS was not even included with perl itself until recently, with the release of perl 5.10.0! Still, CPAN.pm remains the default when you type in the command "cpan" on your system and CPANPLUS is only used if you use the command "cpanp". This will likely remain the case for the future in the interests of compatibility and predictability.

So... now we have a somewhat new, improved cpan client available to the masses, and it's considered a big advancement over the old. So why the sudden excitement over another CPAN client? Simple: CPANPLUS, despite being newer is still pretty darn old and it's now got it's own cruft and limitations. In addition, both the old CPAN and CPANPLUS made some design decisions that have lead to higher memory use as a function of the size of the CPAN itself, which continues to grow rapidly year after year! Using either one on a system with limited RAM can quickly exhaust your free memory, and then you're back to installing things by hand. Also, there's a lot of code to handle corner cases and relatively uncommon environments, not to mention the need for supporting versions of perl back to 5.005 (and perhaps earlier!)

There has definitely been a need for a lightweight, simpler CPAN client, even if it only works for 80% of the users, or only 80% of the dists on the CPAN. When you can't use CPAN or CPANPLUS, that could be saving you at least 80% of the time and effort it might take to proceed manually!

Now that Miyagawa++ has written cpanminus, the community has leaped on it and it is being put through the wringer, tested on all sorts of installation scenarios as we speak. I'm relatively new to the Perl community, and I'm sure this is nothing new to the battle-scarred old gurus, but I'm jazzed and impressed at the sheer speed that I am seeing this new tool develop. It's like witnessing the birth of a star - will it reach critical mass and support it's own ecosystem, or will it fizzle out a failed experiment of the Perl Universe?

So, where to go from here? What's in the future?

For example, features: cpanminus has already gotten plugin support and a bevy of plugins for features that even CPANPLUS lacks!

Next, testing: We'll probably soon see support to send reports a-la CPAN::Reporter, and with that find out where cpanminus does and does not work properly. Folks can try to install the Phalanx 100, and even Bundle::Everything, and report back dists that do not build or fail tests under cpanminus.

Finally, compatibility: How much of the CPAN can you install with it? On how many platforms? Can you use local::lib? Module::Build? Module::Install? Module::AutoInstall? So far, things are actually looking pretty good...

I will certainly be watching, even if it's only as an observer, but who knows... maybe this is my chance to get in on the beginning phases of something big. Either way, I know I am learning a lot about what makes open source fun and exciting. It's especially great because though Perl has "grown up" and been stable for many years little things like this show me that there's plenty of youthful zeal and vigor left out there!