Tuesday, September 29, 2009

Crazy ideas

I'm certain that I will re-use the title of this post again, someday.

Something I've devoted a lot of brane-cycles to is the idea of a real CLEC-grade monitoring system. Crazy? Yes. But I've used all the real contenders, and many of the non-contenders and they all fall short in one way or another. I accept that this sort of thing is not easy, but I humbly ask you to humor me while I describe this particular idea.

The system I envision is highly modular, and distributed, but still requires one (or more) "core servers" where the monitoring information is aggregated so that other components can find and use relevant information as it arrives. Well, I had an epiphany the other day and followed it up with some research today and found something very encouraging...

This part of the system will require, for lack of a better term, a "stream" of data, which will need to support the ability to have hundreds (possibly thousands) of filters attached. Each filter would be an expression registered by a processing node to identify records that node would be interested in. None of the filters modify the data in any way, they simply pass a copy of matching records on to the nodes which could be operating in any number of interesting ways.

So, poking around the CPAN (as usual) yielded a number of insights:
  1. This has been done before. (good!)
  2. This has been done *many* different ways. (also good)
  3. It does not seem that this has been done with my particular scalability and usage needs. (uh-oh?)
  4. This type of system can be made to scale with relative ease. (yaay!)
Some of the modules/dists I looked at while coming to these conclusions:
Something very interesting to me is that the data format used by Boulder, "Stone" is a perfect match for the data that will comprises the records in the stream. I don't need all the whiz-bang features and OO and stuff, but it's encouraging to me that my design of this component is beginning to converge with that of something as successful and data-intensive as Boulder.

The challenge will be keeping latency down as much as possible but I have ideas for doing that through caching, algorithmic optimizations and evaluating filters in parallel... only if necessary, of course!

Friday, September 25, 2009

Do text editors *have* to suck?

Something I was thinking about this past week - the tools we use as programmers. Specifically, I was thinking about the all-important text editor.

When I started out programming, I would just use whatever was around. In DOS, it was EDIT.EXE. In Windows, I began with Notepad then proceeded to try a million different other things until settling on Notepad++ most of the time. I've used Eclipse quite a bit, but that's so much more than an editor, and while I liked it I found it far too heavy for my usual day-to-day work.

In parallel with my progression in M$-land I've led a second life extensively using Linux and Solaris (and other *nix-en). When I started in 1996 or so I think my primary text editor was pico... which I found after being baffled by vi, finding ed useless, and completely unable to figure out emacs. I eventually found joe and was happy for a year or two, until college.

In college we received cheat-sheets for vim and emacs, and both were still quite intimidating to me, so I continued to use joe, even compiling it in my home directory on systems where it was unavailable. However at this time I found myself beginning to hit joe's limitations and realized I had to bite the bullet.

I was still afraid of emacs, so I've been a vim user ever since.

So, here I am, ten years later, still using vim, and here's the part I hate: I've never mastered it.

I have memory-mapped all the common keystrokes for moving around a file, searching, replacing, mark, cut, paste, etc... but for anything else I need to consult a cheat sheet. This feels like a a failure on my part...

There are features I see emacs users have that I want *desperately* yet they feel out of reach in vim. Sure, vim can do some of them, split-pane editing, multiple files open at once, cutting and pasting between them, macros, shells, compiler/interpreter/debugger integration but all of this seems to never work the way I want it to.

I've ignored the elephant (gnu?) in the room for all this time: emacs. So, in the last year I made up my mind that I *must* learn to use emacs, at least as well as I do vim, and this week I did all my non-work hacking in emacs.

You know what I'm beginning to believe?


Yes, emacs is ultimately customizable. In fact, I've spent more time this week writing elisp than I have Perl, and yet, I'm still dissatisfied. Countless searches on Google have showed me that most people eventually give up fighting its quirks and just settle.

I have to admit, I'm beginning to get comfortable with all the keystroke-combinations, and the macros and slowly figuring out how to customize various things to my liking...

And of course, there's one more big problem: vim (or at least vi) is installed *everywhere* and I spend most of my time working on remote servers. Sure, I have root on most of them, but that's no reason to start installing things just because I feel like it.

Also, emacs seems to be quite a different beast running in an xterm over ssh, whereas vim remains the same.

I'm torn and frustrated. I need the best of both worlds and I need to figure out how to get them before I begin an epic yak shaving expedition that will ultimately suck my life away.

Maybe I'm just writing this to vent. Yes, that's probably it. Maybe in another few weeks I'll be singing the praises of emacs. Maybe I'll return to the relative comfort of vim.

If anybody reading this has tips for a reasonably-intelligent Perl hacker who just wants to get more out of his editor without moving heaven and earth and re-mapping every key-stroke to custom macros, please, please help!!!

Wednesday, September 16, 2009

CPAN.pm, Improved

I was just reading a blog post from szabgab and rather than reply directly, I decided I should finally start making posts of my own.

The topic he raises is "CPAN client for beginners" and it's one I've thought about a lot, though not necessarily in the same way.

I've been comfortable with the CPAN shell for a long time, and I know enough about the internals that I have written my own wrappers and automations to drive module retrieval and installation.

My opinion of CPAN.pm is that it's a classic case of "It works, so don't fix it" for the Perl community at large... but I think this is a bit of a fallacy.

I'm not particularly interested in a "beginner friendly" CPAN client. I'm far more interested in a *smarter* CPAN client.

Most of the questions the cpan shell asks me when I use it can be, and are auto-detected, properly. Except when I'm not root. And except when I'm running under sudo. Both scenarios seem to me to be pretty easily fixable.

Another place where CPAN.pm seems to fall through: picking mirrors. This can also be done automatically, based on latency and bandwidth. I think by default it goes through the redirector so it's probably not a big deal, but even after "auto configuring" it keeps asking me to choose mirrors. Yes, I have a preferred set, and I can easily configure them, but remember, I'm *lazy*.

The third thing I would change about CPAN.pm is how it keeps track of installed files. Other package managers can determine if an installed file has been unexpectedly changed - why not CPAN.pm? Other package managers can tell you from which package (and which version) a file was installed - why not CPAN.pm? (in regards to the CPAN, package == distribution)

The fourth change, which would be relatively simple if the third were implemented, would be sane uninstall. When upgrading a distribution, the old version's files should be removed! If an author removes a file from a new version, *it should not remain after I upgrade to that version!*

Lastly, and this is probably the most difficult feature I would want, would be the ability to determine *which* distributions (and their versions) are currently installed given an @INC. This would require additional metadata available from the CPAN itself, a sort of super-manifest that the client can download and use to validate and match every file in the include path. I don't think this is something that can be done by any other package management system, but it's nice to dream. :-)

The other things that I would want, well, a lot of that would depend on support in the various build systems and so is probably the topic for another post. Things that would be helpful for newbies are nice, too... but I think a lot of those have already been implemented or solved. Suppressing needless output, a GUI, reporting failed installs to the authors, etc... all either currently have solutions that just need to be integrated or would be fixed with better, friendlier documentation.