Tuesday, February 23, 2010

The Evolution of the CPAN



In the last week something interesting has happened: The venerable CPAN client has gained a new sibling. It's called "cpanminus" and Tatsuhiko Miyagawa recently tweeted about writing it to solve an issue he was having, but warned that he did not believe it to be a good solution at all. However, it sparked enough interest that he decided to release the code to the CPAN and more people are beginning to think it's not such a bad idea after all!

cpanminus is not the first alternative CPAN client, and it's too new to know if it will gain enough adoption to be considered successful, but I'm beginning to see signs that people like it and want to use it. The only other CPAN client to achieve widespread adoption has been CPANPLUS, which took many years and a huge amount of effort - and still not many folks use it by default, even though it's now bundled with perl.

That got me thinking about the history of the CPAN, the services surrounding it, and the ideas and needs that have driven the evolution of the many little pieces that make up what we colloquially call "CPAN."

Now, what do I mean by that? Let me explain a bit about the CPAN first. CPAN is an acronym for "Comprehensive Perl Archive Network" and it's the primary means by which users of Perl obtain additional modulesframeworkslanguage extensionslibraries and even whole applications. When people say "the CPAN" what they are usually referring to is this massive online network of mirrors, and probably the supporting websites and services, like http://search.cpan.org.

Please note: I will henceforth refer to these bundles of software of any kind or category on the CPAN as "dists," short for "distributions." It's analogous to a package on *nix systems and technically can contain code, data, documentation, whatever, so "dist" is the easiest way to say what I mean and it's the terminology used across the CPAN itself.

Over the years, additional services that leverage the CPAN have sprung up, each adding information and value to all the software stored there, making life better for users and authors, and all without the need for them to change or update anything!

Some of my favorites from this constellation of services are:

Let me just point out http://cpantesters.org again. It seriously rocks the casbah. If you're evaluating a dist on http://search.cpan.org and you do not visit the links to http://cpantesters.org and its subdomains you are doing yourself huge disservice! 

Such a repository of free software is alone a teriffic resource, but it's made even more powerful by a client-side library and script both by the same name that are distributed with perl itself! When you hear somebody say "just use cpan" or "install it with cpan" what they're referring to is the cpan command bundled with perl. The CPAN library and cpan script make installing these resources and their dependencies pretty much bone-easy

However, it's not always just get-up-and-go. The default CPAN client must support a dizzying array of operating systems, perl configurations, myriad installation quirks, and a near-infinite combination these and other variables that could affect it's ability to operate. In addition, this venerable piece of software has been around since 1996, and we all know a LOT has changed since then, and as software ages it usually becomes harder and harder to keep up-to-date. Even today, with all the techniques for probing a system to determine the necessary info, the old CPAN.pm could do a lot better. It could be lighter, faster, easier to extend and maintain. It could be increasingly more automated and automatable. There are features the Perl community constantly dreams up but even trying them out is prohibitively difficult due to the original design and the aforementioned factors of complexity.

Back in 2001, Michael Schwern was promoting an idea to create a CPAN Testing service (CPANTS) that would automatically test and analyze everything on the CPAN across a matrix of platforms and configurations. This would require a highly automated CPAN client, plus extensions for capturing the output of the build process, probing for a bazillion things and sending back the results... The first attempts to do this with the existing CPAN.pm was quickly abandoned and thus was born CPANPLUS.

CPANPLUS has been evolving over the years and while not entirely backwards-compatible with CPAN, it has eventually reached the point where it is trusted and used by enough of the Perl community that it can be considered successful. It has a modular, object-oriented architecture, numerous plugins and additions, and is widely considered to be superior to the old CPAN client in just about every way. However, CPANPLUS was not even included with perl itself until recently, with the release of perl 5.10.0! Still, CPAN.pm remains the default when you type in the command "cpan" on your system and CPANPLUS is only used if you use the command "cpanp". This will likely remain the case for the future in the interests of compatibility and predictability.

So... now we have a somewhat new, improved cpan client available to the masses, and it's considered a big advancement over the old. So why the sudden excitement over another CPAN client? Simple: CPANPLUS, despite being newer is still pretty darn old and it's now got it's own cruft and limitations. In addition, both the old CPAN and CPANPLUS made some design decisions that have lead to higher memory use as a function of the size of the CPAN itself, which continues to grow rapidly year after year! Using either one on a system with limited RAM can quickly exhaust your free memory, and then you're back to installing things by hand. Also, there's a lot of code to handle corner cases and relatively uncommon environments, not to mention the need for supporting versions of perl back to 5.005 (and perhaps earlier!)

There has definitely been a need for a lightweight, simpler CPAN client, even if it only works for 80% of the users, or only 80% of the dists on the CPAN. When you can't use CPAN or CPANPLUS, that could be saving you at least 80% of the time and effort it might take to proceed manually!

Now that Miyagawa++ has written cpanminus, the community has leaped on it and it is being put through the wringer, tested on all sorts of installation scenarios as we speak. I'm relatively new to the Perl community, and I'm sure this is nothing new to the battle-scarred old gurus, but I'm jazzed and impressed at the sheer speed that I am seeing this new tool develop. It's like witnessing the birth of a star - will it reach critical mass and support it's own ecosystem, or will it fizzle out a failed experiment of the Perl Universe?

So, where to go from here? What's in the future?

For example, features: cpanminus has already gotten plugin support and a bevy of plugins for features that even CPANPLUS lacks!

Next, testing: We'll probably soon see support to send reports a-la CPAN::Reporter, and with that find out where cpanminus does and does not work properly. Folks can try to install the Phalanx 100, and even Bundle::Everything, and report back dists that do not build or fail tests under cpanminus.

Finally, compatibility: How much of the CPAN can you install with it? On how many platforms? Can you use local::lib? Module::Build? Module::Install? Module::AutoInstall? So far, things are actually looking pretty good...

I will certainly be watching, even if it's only as an observer, but who knows... maybe this is my chance to get in on the beginning phases of something big. Either way, I know I am learning a lot about what makes open source fun and exciting. It's especially great because though Perl has "grown up" and been stable for many years little things like this show me that there's plenty of youthful zeal and vigor left out there!

Friday, December 11, 2009

How I manage my Perl modules on Debian

I just started doing this on another server this morning, and I just realized I can't find my notes on the process! Since I like to record things like this, I started typing and decided I'll turn it into a blog post...

Anyhow, I think I've developed a good enough set of rules and steps for managing perl modules on a Debian-based system that I feel comfortable sharing them with the public. If I'm lucky I will get some useful feedback and critique that will help me further improve!

The basic premise of it this: Try as hard as possible to avoid installing modules from CPAN to the system's perl.

I do this by following these rules-of-thumb:
  • only install system-wide modules from debian packages (excepting local::lib)
  • use local::lib as much as possible
  • properly configure CPAN both system-wide and per-user
So, here's how I set up a fresh system to make following these rules as easy as possible...

Assume starting from a fresh, bare-bones installation of Debian Lenny. This should work with previous releases, but YMMV.

Step 1: Install necessary debian packages to make the cpan client happy.
# Stuff to install for a happy CPAN client:
sudo aptitude install \
  ftp \
  tar \
  curl \
  gzip \
  less \
  lynx \
  wget \
  bzip2 \
  gnupg \
  ncftp \
  patch \
  unzip \
  makepatch \
  libwww-perl \
  libyaml-perl \
  libexpect-perl \
  build-essential \
  libyaml-syck-perl \
  libmodule-signature-perl
Step 2: (optional) Throw your favorite default settings into the system's CPAN config file.

cat <<'END_TXT' >/etc/perl/CPAN/Config.pm
$CPAN::Config = {
    'build_requires_install_policy' => q[ask/yes],
    'check_sigs'           => q[0],
    'build_dir_reuse'      => q[0],
    'prefer_installer'     => q[MB],
    'prerequisites_policy' => q[ask],
    'inactivity_timeout'   => q[300],
    'build_cache'          => q[250],
    'build_dir'         => qq[$ENV{HOME}/.cpan/build],
    'cpan_home'         => qq[$ENV{HOME}/.cpan],
    'histfile'          => qq[$ENV{HOME}/.cpan/histfile],
    'keep_source_where' => qq[$ENV{HOME}/.cpan/sources],
    'prefs_dir'         => qq[$ENV{HOME}/.cpan/prefs],
    'urllist'           => [
        q[http://cpan.mirror.facebook.net/],
        q[http://cpan.yahoo.com/],
        q[http://www.perl.com/CPAN/],
        q[ftp://mirrors2.kernel.org/pub/CPAN/]
    ],
};
1;
END_TXT

Step 3:  Initalize the system CPAN config, just agree to auto-config so it can fill in what's missing.
sudo -H perl -MCPAN -e 'CPAN::Shell->o(qw[conf init])'
Step 4: (optional if you did step 2) Choose and save your default CPAN mirrors.
sudo -H perl -MCPAN -e 'CPAN::Shell->o(qw[conf init urllist]);CPAN::Shell->o(qw[conf commit])';
Step 4:  Install local::lib to the system perl.
sudo -H cpan local::lib
Step 5: (optional) Enable local::lib by default for all users
sudo sh -c 'echo eval \$\(perl -Mlocal::lib\) >> /etc/profile'
Step 6: (optional) Give new users a convenient default config, for example:
cat <<'END_TXT' >/etc/skel/.cpan/CPAN/MyConfig.pm
$CPAN::Config = {
    'prerequisites_policy' => q[follow],
    'tar_verbosity'        => q[none],
    'build_cache'          => q[100],
    'build_dir'         => qq[$ENV{HOME}/.cpan/build],
    'cpan_home'         => qq[$ENV{HOME}/.cpan],
    'histfile'          => qq[$ENV{HOME}/.cpan/histfile],
    'keep_source_where' => qq[$ENV{HOME}/.cpan/sources],
    'prefs_dir'         => qq[$ENV{HOME}/.cpan/prefs],
    # you may not need these, I think a user's CPAN.pm
    # will pick them up from the system CPAN Config.pm
    'urllist' => [
        q[http://cpan.mirror.facebook.net/],
        q[http://cpan.yahoo.com/],
        q[http://www.perl.com/CPAN/],
        q[ftp://mirrors2.kernel.org/pub/CPAN/]
    ],
};
1;
END_TXT

Anyhow, that's a start for now. It sounds like a lot of work now that I've typed it all out, but I really feel like it saves me time and trouble later!

Thursday, October 15, 2009

Following the Wave

This is in response to a post by geoffeg: http://www.geoffeg.org/wordpress/2009/09/30/why-i-am-not-yet-convinced-about-google-wave/

I've just started looking at Google Wave myself. I'm also skeptical but the ideas behind it just may hold some promise.

I agree that email needs to be fixed... more like completely replaced. Maybe google is a big enough entity to make it happen, but I don't think so. Especially if it requires possibly complicated server and client-side components...

My (limited, so far) look at google wave gives me a bit of hope, but only if it can give rise to software people want.

I personally think the only thing that could best email would have to provide everything Microsoft Exchange/Office/Sharepoint/Etc. provide (or claim to) but do more and do it better, and have available some really stellar clients/interfaces.

I'm talking person-to-person messaging, group discussions, shared calendars and scheduling, document sharing and collaboration, linking between various entities, and also end-to-end authentication and authorization, all with as little  centralization as email, or nearly so...

It's a pretty tall order.

Google Wave seems it could be capable of much of these things, but since I haven't used it I can't say what it's got already. What does it take to run a server? How much work does a client have to do? Is the protocol completely open? Is the API any good? How does it scale in various use cases? How easy is it for users to get done what they want to do?

If it is easy to do things... how do users keep from getting buried in information like we all are with what we have now?

And the big question - will anybody actually write Wave software that's compelling enough to convince users everywhere that they should use it instead of the trusty standby of email?

Lots of questions, and few answers, but true to my geeky self, I still get excited over the possibilities shiny new systems bring.

Friday, October 9, 2009

Abstractions and Usability: When Practicality trumps Sophistication, both sides can win.

Something I think about a lot are abstractions. There's a lot of abstraction in programming and sometimes it's healthy, and the sign of good design and thinking.

But more often than not, I find the layers of abstraction become confusing, limiting, and frustrating.

Some recent discussion with a friend with whom I have been coding made me think some more about just how much a powerful, flexible, ingenious set of abstractions can lead to software that is unusable by the intended audience.

I am hoping we go with a simpler, more consistent architecture that uses small, easily-understood concepts. It may be harder to say it's "enterprise grade" without the fancier ideas, but I think it will be both easier to build and easier to use.

This isn't of course a Perl-specific problem, but we're in the design-stage of a Perl application that will require the ability to be customized in ways we will probably never imagine!

No matter what architecture we end up implementing in order to provide a product, if it isn't immediately useful, and easily understandable by the target user audience, it can not succeed.

That makes me think about what makes a language successful... Perl is a great example of a language that shares the ideals I want in this application! It should:
  • be easy to get started for simple uses
  • be easy to transform simple uses into more complex uses
  • provide functionality that users will need to easily get common stuff done
  • provide functionality that users will need to customize generic tasks
  • allow users to easily add custom functionality
  • allow users to share and re-use custom functionality
  • be flexible enough to allow for different ways of thinking about solving problems - whatever the user finds best for *them*
  • make possible the creation of sophisticated systems - that can then easily become part of other systems.
I really see these goals as very Perlish in nature: The people that will use this software do not want to babysit a system, nor do they want to become gurus on its internals. They do not want to care about rule-sets and mini-languages and asynchronous message-passing data-flow systems...

The just want a system that is immediately useful and easy in simple cases, but allows and encourages them to grow it into something much more powerful over time, without forcing upon them layers of new abstractions or solutions that feel unnatural and inflexible.

Perhaps I'll write more about this specific application soon... I've been reading The Mythical Man Month and it's lit a fire under my rear on actually making this thing a reality... I now want to start not with code (of which I've already written some) but with a design document detailing how an end user would see and use this system in the most common cases.

I believe that software should always be written for the users, and making the program (or language) easy to use yet still powerful and flexible is the highest goal for which a programmer can strive.

Tuesday, September 29, 2009

Crazy ideas

I'm certain that I will re-use the title of this post again, someday.

Something I've devoted a lot of brane-cycles to is the idea of a real CLEC-grade monitoring system. Crazy? Yes. But I've used all the real contenders, and many of the non-contenders and they all fall short in one way or another. I accept that this sort of thing is not easy, but I humbly ask you to humor me while I describe this particular idea.

The system I envision is highly modular, and distributed, but still requires one (or more) "core servers" where the monitoring information is aggregated so that other components can find and use relevant information as it arrives. Well, I had an epiphany the other day and followed it up with some research today and found something very encouraging...

This part of the system will require, for lack of a better term, a "stream" of data, which will need to support the ability to have hundreds (possibly thousands) of filters attached. Each filter would be an expression registered by a processing node to identify records that node would be interested in. None of the filters modify the data in any way, they simply pass a copy of matching records on to the nodes which could be operating in any number of interesting ways.

So, poking around the CPAN (as usual) yielded a number of insights:
  1. This has been done before. (good!)
  2. This has been done *many* different ways. (also good)
  3. It does not seem that this has been done with my particular scalability and usage needs. (uh-oh?)
  4. This type of system can be made to scale with relative ease. (yaay!)
Some of the modules/dists I looked at while coming to these conclusions:
Something very interesting to me is that the data format used by Boulder, "Stone" is a perfect match for the data that will comprises the records in the stream. I don't need all the whiz-bang features and OO and stuff, but it's encouraging to me that my design of this component is beginning to converge with that of something as successful and data-intensive as Boulder.

The challenge will be keeping latency down as much as possible but I have ideas for doing that through caching, algorithmic optimizations and evaluating filters in parallel... only if necessary, of course!

Friday, September 25, 2009

Do text editors *have* to suck?

Something I was thinking about this past week - the tools we use as programmers. Specifically, I was thinking about the all-important text editor.

When I started out programming, I would just use whatever was around. In DOS, it was EDIT.EXE. In Windows, I began with Notepad then proceeded to try a million different other things until settling on Notepad++ most of the time. I've used Eclipse quite a bit, but that's so much more than an editor, and while I liked it I found it far too heavy for my usual day-to-day work.

In parallel with my progression in M$-land I've led a second life extensively using Linux and Solaris (and other *nix-en). When I started in 1996 or so I think my primary text editor was pico... which I found after being baffled by vi, finding ed useless, and completely unable to figure out emacs. I eventually found joe and was happy for a year or two, until college.

In college we received cheat-sheets for vim and emacs, and both were still quite intimidating to me, so I continued to use joe, even compiling it in my home directory on systems where it was unavailable. However at this time I found myself beginning to hit joe's limitations and realized I had to bite the bullet.

I was still afraid of emacs, so I've been a vim user ever since.

So, here I am, ten years later, still using vim, and here's the part I hate: I've never mastered it.

I have memory-mapped all the common keystrokes for moving around a file, searching, replacing, mark, cut, paste, etc... but for anything else I need to consult a cheat sheet. This feels like a a failure on my part...

There are features I see emacs users have that I want *desperately* yet they feel out of reach in vim. Sure, vim can do some of them, split-pane editing, multiple files open at once, cutting and pasting between them, macros, shells, compiler/interpreter/debugger integration but all of this seems to never work the way I want it to.

I've ignored the elephant (gnu?) in the room for all this time: emacs. So, in the last year I made up my mind that I *must* learn to use emacs, at least as well as I do vim, and this week I did all my non-work hacking in emacs.

You know what I'm beginning to believe?

EVERY TEXT EDITOR SUCKS IN ONE WAY OR ANOTHER.

Yes, emacs is ultimately customizable. In fact, I've spent more time this week writing elisp than I have Perl, and yet, I'm still dissatisfied. Countless searches on Google have showed me that most people eventually give up fighting its quirks and just settle.

I have to admit, I'm beginning to get comfortable with all the keystroke-combinations, and the macros and slowly figuring out how to customize various things to my liking...

And of course, there's one more big problem: vim (or at least vi) is installed *everywhere* and I spend most of my time working on remote servers. Sure, I have root on most of them, but that's no reason to start installing things just because I feel like it.

Also, emacs seems to be quite a different beast running in an xterm over ssh, whereas vim remains the same.

I'm torn and frustrated. I need the best of both worlds and I need to figure out how to get them before I begin an epic yak shaving expedition that will ultimately suck my life away.

Maybe I'm just writing this to vent. Yes, that's probably it. Maybe in another few weeks I'll be singing the praises of emacs. Maybe I'll return to the relative comfort of vim.

If anybody reading this has tips for a reasonably-intelligent Perl hacker who just wants to get more out of his editor without moving heaven and earth and re-mapping every key-stroke to custom macros, please, please help!!!

Wednesday, September 16, 2009

CPAN.pm, Improved

I was just reading a blog post from szabgab and rather than reply directly, I decided I should finally start making posts of my own.

The topic he raises is "CPAN client for beginners" and it's one I've thought about a lot, though not necessarily in the same way.

I've been comfortable with the CPAN shell for a long time, and I know enough about the internals that I have written my own wrappers and automations to drive module retrieval and installation.

My opinion of CPAN.pm is that it's a classic case of "It works, so don't fix it" for the Perl community at large... but I think this is a bit of a fallacy.

I'm not particularly interested in a "beginner friendly" CPAN client. I'm far more interested in a *smarter* CPAN client.

Most of the questions the cpan shell asks me when I use it can be, and are auto-detected, properly. Except when I'm not root. And except when I'm running under sudo. Both scenarios seem to me to be pretty easily fixable.

Another place where CPAN.pm seems to fall through: picking mirrors. This can also be done automatically, based on latency and bandwidth. I think by default it goes through the redirector so it's probably not a big deal, but even after "auto configuring" it keeps asking me to choose mirrors. Yes, I have a preferred set, and I can easily configure them, but remember, I'm *lazy*.

The third thing I would change about CPAN.pm is how it keeps track of installed files. Other package managers can determine if an installed file has been unexpectedly changed - why not CPAN.pm? Other package managers can tell you from which package (and which version) a file was installed - why not CPAN.pm? (in regards to the CPAN, package == distribution)

The fourth change, which would be relatively simple if the third were implemented, would be sane uninstall. When upgrading a distribution, the old version's files should be removed! If an author removes a file from a new version, *it should not remain after I upgrade to that version!*

Lastly, and this is probably the most difficult feature I would want, would be the ability to determine *which* distributions (and their versions) are currently installed given an @INC. This would require additional metadata available from the CPAN itself, a sort of super-manifest that the client can download and use to validate and match every file in the include path. I don't think this is something that can be done by any other package management system, but it's nice to dream. :-)

The other things that I would want, well, a lot of that would depend on support in the various build systems and so is probably the topic for another post. Things that would be helpful for newbies are nice, too... but I think a lot of those have already been implemented or solved. Suppressing needless output, a GUI, reporting failed installs to the authors, etc... all either currently have solutions that just need to be integrated or would be fixed with better, friendlier documentation.