Friday, December 11, 2009

How I manage my Perl modules on Debian

I just started doing this on another server this morning, and I just realized I can't find my notes on the process! Since I like to record things like this, I started typing and decided I'll turn it into a blog post...

Anyhow, I think I've developed a good enough set of rules and steps for managing perl modules on a Debian-based system that I feel comfortable sharing them with the public. If I'm lucky I will get some useful feedback and critique that will help me further improve!

The basic premise of it this: Try as hard as possible to avoid installing modules from CPAN to the system's perl.

I do this by following these rules-of-thumb:
  • only install system-wide modules from debian packages (excepting local::lib)
  • use local::lib as much as possible
  • properly configure CPAN both system-wide and per-user
So, here's how I set up a fresh system to make following these rules as easy as possible...

Assume starting from a fresh, bare-bones installation of Debian Lenny. This should work with previous releases, but YMMV.

Step 1: Install necessary debian packages to make the cpan client happy.
# Stuff to install for a happy CPAN client:
sudo aptitude install \
  ftp \
  tar \
  curl \
  gzip \
  less \
  lynx \
  wget \
  bzip2 \
  gnupg \
  ncftp \
  patch \
  unzip \
  makepatch \
  libwww-perl \
  libyaml-perl \
  libexpect-perl \
  build-essential \
  libyaml-syck-perl \
  libmodule-signature-perl
Step 2: (optional) Throw your favorite default settings into the system's CPAN config file.

cat <<'END_TXT' >/etc/perl/CPAN/Config.pm
$CPAN::Config = {
    'build_requires_install_policy' => q[ask/yes],
    'check_sigs'           => q[0],
    'build_dir_reuse'      => q[0],
    'prefer_installer'     => q[MB],
    'prerequisites_policy' => q[ask],
    'inactivity_timeout'   => q[300],
    'build_cache'          => q[250],
    'build_dir'         => qq[$ENV{HOME}/.cpan/build],
    'cpan_home'         => qq[$ENV{HOME}/.cpan],
    'histfile'          => qq[$ENV{HOME}/.cpan/histfile],
    'keep_source_where' => qq[$ENV{HOME}/.cpan/sources],
    'prefs_dir'         => qq[$ENV{HOME}/.cpan/prefs],
    'urllist'           => [
        q[http://cpan.mirror.facebook.net/],
        q[http://cpan.yahoo.com/],
        q[http://www.perl.com/CPAN/],
        q[ftp://mirrors2.kernel.org/pub/CPAN/]
    ],
};
1;
END_TXT

Step 3:  Initalize the system CPAN config, just agree to auto-config so it can fill in what's missing.
sudo -H perl -MCPAN -e 'CPAN::Shell->o(qw[conf init])'
Step 4: (optional if you did step 2) Choose and save your default CPAN mirrors.
sudo -H perl -MCPAN -e 'CPAN::Shell->o(qw[conf init urllist]);CPAN::Shell->o(qw[conf commit])';
Step 4:  Install local::lib to the system perl.
sudo -H cpan local::lib
Step 5: (optional) Enable local::lib by default for all users
sudo sh -c 'echo eval \$\(perl -Mlocal::lib\) >> /etc/profile'
Step 6: (optional) Give new users a convenient default config, for example:
cat <<'END_TXT' >/etc/skel/.cpan/CPAN/MyConfig.pm
$CPAN::Config = {
    'prerequisites_policy' => q[follow],
    'tar_verbosity'        => q[none],
    'build_cache'          => q[100],
    'build_dir'         => qq[$ENV{HOME}/.cpan/build],
    'cpan_home'         => qq[$ENV{HOME}/.cpan],
    'histfile'          => qq[$ENV{HOME}/.cpan/histfile],
    'keep_source_where' => qq[$ENV{HOME}/.cpan/sources],
    'prefs_dir'         => qq[$ENV{HOME}/.cpan/prefs],
    # you may not need these, I think a user's CPAN.pm
    # will pick them up from the system CPAN Config.pm
    'urllist' => [
        q[http://cpan.mirror.facebook.net/],
        q[http://cpan.yahoo.com/],
        q[http://www.perl.com/CPAN/],
        q[ftp://mirrors2.kernel.org/pub/CPAN/]
    ],
};
1;
END_TXT

Anyhow, that's a start for now. It sounds like a lot of work now that I've typed it all out, but I really feel like it saves me time and trouble later!

Thursday, October 15, 2009

Following the Wave

This is in response to a post by geoffeg: http://www.geoffeg.org/wordpress/2009/09/30/why-i-am-not-yet-convinced-about-google-wave/

I've just started looking at Google Wave myself. I'm also skeptical but the ideas behind it just may hold some promise.

I agree that email needs to be fixed... more like completely replaced. Maybe google is a big enough entity to make it happen, but I don't think so. Especially if it requires possibly complicated server and client-side components...

My (limited, so far) look at google wave gives me a bit of hope, but only if it can give rise to software people want.

I personally think the only thing that could best email would have to provide everything Microsoft Exchange/Office/Sharepoint/Etc. provide (or claim to) but do more and do it better, and have available some really stellar clients/interfaces.

I'm talking person-to-person messaging, group discussions, shared calendars and scheduling, document sharing and collaboration, linking between various entities, and also end-to-end authentication and authorization, all with as little  centralization as email, or nearly so...

It's a pretty tall order.

Google Wave seems it could be capable of much of these things, but since I haven't used it I can't say what it's got already. What does it take to run a server? How much work does a client have to do? Is the protocol completely open? Is the API any good? How does it scale in various use cases? How easy is it for users to get done what they want to do?

If it is easy to do things... how do users keep from getting buried in information like we all are with what we have now?

And the big question - will anybody actually write Wave software that's compelling enough to convince users everywhere that they should use it instead of the trusty standby of email?

Lots of questions, and few answers, but true to my geeky self, I still get excited over the possibilities shiny new systems bring.

Friday, October 9, 2009

Abstractions and Usability: When Practicality trumps Sophistication, both sides can win.

Something I think about a lot are abstractions. There's a lot of abstraction in programming and sometimes it's healthy, and the sign of good design and thinking.

But more often than not, I find the layers of abstraction become confusing, limiting, and frustrating.

Some recent discussion with a friend with whom I have been coding made me think some more about just how much a powerful, flexible, ingenious set of abstractions can lead to software that is unusable by the intended audience.

I am hoping we go with a simpler, more consistent architecture that uses small, easily-understood concepts. It may be harder to say it's "enterprise grade" without the fancier ideas, but I think it will be both easier to build and easier to use.

This isn't of course a Perl-specific problem, but we're in the design-stage of a Perl application that will require the ability to be customized in ways we will probably never imagine!

No matter what architecture we end up implementing in order to provide a product, if it isn't immediately useful, and easily understandable by the target user audience, it can not succeed.

That makes me think about what makes a language successful... Perl is a great example of a language that shares the ideals I want in this application! It should:
  • be easy to get started for simple uses
  • be easy to transform simple uses into more complex uses
  • provide functionality that users will need to easily get common stuff done
  • provide functionality that users will need to customize generic tasks
  • allow users to easily add custom functionality
  • allow users to share and re-use custom functionality
  • be flexible enough to allow for different ways of thinking about solving problems - whatever the user finds best for *them*
  • make possible the creation of sophisticated systems - that can then easily become part of other systems.
I really see these goals as very Perlish in nature: The people that will use this software do not want to babysit a system, nor do they want to become gurus on its internals. They do not want to care about rule-sets and mini-languages and asynchronous message-passing data-flow systems...

The just want a system that is immediately useful and easy in simple cases, but allows and encourages them to grow it into something much more powerful over time, without forcing upon them layers of new abstractions or solutions that feel unnatural and inflexible.

Perhaps I'll write more about this specific application soon... I've been reading The Mythical Man Month and it's lit a fire under my rear on actually making this thing a reality... I now want to start not with code (of which I've already written some) but with a design document detailing how an end user would see and use this system in the most common cases.

I believe that software should always be written for the users, and making the program (or language) easy to use yet still powerful and flexible is the highest goal for which a programmer can strive.

Tuesday, September 29, 2009

Crazy ideas

I'm certain that I will re-use the title of this post again, someday.

Something I've devoted a lot of brane-cycles to is the idea of a real CLEC-grade monitoring system. Crazy? Yes. But I've used all the real contenders, and many of the non-contenders and they all fall short in one way or another. I accept that this sort of thing is not easy, but I humbly ask you to humor me while I describe this particular idea.

The system I envision is highly modular, and distributed, but still requires one (or more) "core servers" where the monitoring information is aggregated so that other components can find and use relevant information as it arrives. Well, I had an epiphany the other day and followed it up with some research today and found something very encouraging...

This part of the system will require, for lack of a better term, a "stream" of data, which will need to support the ability to have hundreds (possibly thousands) of filters attached. Each filter would be an expression registered by a processing node to identify records that node would be interested in. None of the filters modify the data in any way, they simply pass a copy of matching records on to the nodes which could be operating in any number of interesting ways.

So, poking around the CPAN (as usual) yielded a number of insights:
  1. This has been done before. (good!)
  2. This has been done *many* different ways. (also good)
  3. It does not seem that this has been done with my particular scalability and usage needs. (uh-oh?)
  4. This type of system can be made to scale with relative ease. (yaay!)
Some of the modules/dists I looked at while coming to these conclusions:
Something very interesting to me is that the data format used by Boulder, "Stone" is a perfect match for the data that will comprises the records in the stream. I don't need all the whiz-bang features and OO and stuff, but it's encouraging to me that my design of this component is beginning to converge with that of something as successful and data-intensive as Boulder.

The challenge will be keeping latency down as much as possible but I have ideas for doing that through caching, algorithmic optimizations and evaluating filters in parallel... only if necessary, of course!

Friday, September 25, 2009

Do text editors *have* to suck?

Something I was thinking about this past week - the tools we use as programmers. Specifically, I was thinking about the all-important text editor.

When I started out programming, I would just use whatever was around. In DOS, it was EDIT.EXE. In Windows, I began with Notepad then proceeded to try a million different other things until settling on Notepad++ most of the time. I've used Eclipse quite a bit, but that's so much more than an editor, and while I liked it I found it far too heavy for my usual day-to-day work.

In parallel with my progression in M$-land I've led a second life extensively using Linux and Solaris (and other *nix-en). When I started in 1996 or so I think my primary text editor was pico... which I found after being baffled by vi, finding ed useless, and completely unable to figure out emacs. I eventually found joe and was happy for a year or two, until college.

In college we received cheat-sheets for vim and emacs, and both were still quite intimidating to me, so I continued to use joe, even compiling it in my home directory on systems where it was unavailable. However at this time I found myself beginning to hit joe's limitations and realized I had to bite the bullet.

I was still afraid of emacs, so I've been a vim user ever since.

So, here I am, ten years later, still using vim, and here's the part I hate: I've never mastered it.

I have memory-mapped all the common keystrokes for moving around a file, searching, replacing, mark, cut, paste, etc... but for anything else I need to consult a cheat sheet. This feels like a a failure on my part...

There are features I see emacs users have that I want *desperately* yet they feel out of reach in vim. Sure, vim can do some of them, split-pane editing, multiple files open at once, cutting and pasting between them, macros, shells, compiler/interpreter/debugger integration but all of this seems to never work the way I want it to.

I've ignored the elephant (gnu?) in the room for all this time: emacs. So, in the last year I made up my mind that I *must* learn to use emacs, at least as well as I do vim, and this week I did all my non-work hacking in emacs.

You know what I'm beginning to believe?

EVERY TEXT EDITOR SUCKS IN ONE WAY OR ANOTHER.

Yes, emacs is ultimately customizable. In fact, I've spent more time this week writing elisp than I have Perl, and yet, I'm still dissatisfied. Countless searches on Google have showed me that most people eventually give up fighting its quirks and just settle.

I have to admit, I'm beginning to get comfortable with all the keystroke-combinations, and the macros and slowly figuring out how to customize various things to my liking...

And of course, there's one more big problem: vim (or at least vi) is installed *everywhere* and I spend most of my time working on remote servers. Sure, I have root on most of them, but that's no reason to start installing things just because I feel like it.

Also, emacs seems to be quite a different beast running in an xterm over ssh, whereas vim remains the same.

I'm torn and frustrated. I need the best of both worlds and I need to figure out how to get them before I begin an epic yak shaving expedition that will ultimately suck my life away.

Maybe I'm just writing this to vent. Yes, that's probably it. Maybe in another few weeks I'll be singing the praises of emacs. Maybe I'll return to the relative comfort of vim.

If anybody reading this has tips for a reasonably-intelligent Perl hacker who just wants to get more out of his editor without moving heaven and earth and re-mapping every key-stroke to custom macros, please, please help!!!

Wednesday, September 16, 2009

CPAN.pm, Improved

I was just reading a blog post from szabgab and rather than reply directly, I decided I should finally start making posts of my own.

The topic he raises is "CPAN client for beginners" and it's one I've thought about a lot, though not necessarily in the same way.

I've been comfortable with the CPAN shell for a long time, and I know enough about the internals that I have written my own wrappers and automations to drive module retrieval and installation.

My opinion of CPAN.pm is that it's a classic case of "It works, so don't fix it" for the Perl community at large... but I think this is a bit of a fallacy.

I'm not particularly interested in a "beginner friendly" CPAN client. I'm far more interested in a *smarter* CPAN client.

Most of the questions the cpan shell asks me when I use it can be, and are auto-detected, properly. Except when I'm not root. And except when I'm running under sudo. Both scenarios seem to me to be pretty easily fixable.

Another place where CPAN.pm seems to fall through: picking mirrors. This can also be done automatically, based on latency and bandwidth. I think by default it goes through the redirector so it's probably not a big deal, but even after "auto configuring" it keeps asking me to choose mirrors. Yes, I have a preferred set, and I can easily configure them, but remember, I'm *lazy*.

The third thing I would change about CPAN.pm is how it keeps track of installed files. Other package managers can determine if an installed file has been unexpectedly changed - why not CPAN.pm? Other package managers can tell you from which package (and which version) a file was installed - why not CPAN.pm? (in regards to the CPAN, package == distribution)

The fourth change, which would be relatively simple if the third were implemented, would be sane uninstall. When upgrading a distribution, the old version's files should be removed! If an author removes a file from a new version, *it should not remain after I upgrade to that version!*

Lastly, and this is probably the most difficult feature I would want, would be the ability to determine *which* distributions (and their versions) are currently installed given an @INC. This would require additional metadata available from the CPAN itself, a sort of super-manifest that the client can download and use to validate and match every file in the include path. I don't think this is something that can be done by any other package management system, but it's nice to dream. :-)

The other things that I would want, well, a lot of that would depend on support in the various build systems and so is probably the topic for another post. Things that would be helpful for newbies are nice, too... but I think a lot of those have already been implemented or solved. Suppressing needless output, a GUI, reporting failed installs to the authors, etc... all either currently have solutions that just need to be integrated or would be fixed with better, friendlier documentation.