Tuesday, September 29, 2009

Crazy ideas

I'm certain that I will re-use the title of this post again, someday.

Something I've devoted a lot of brane-cycles to is the idea of a real CLEC-grade monitoring system. Crazy? Yes. But I've used all the real contenders, and many of the non-contenders and they all fall short in one way or another. I accept that this sort of thing is not easy, but I humbly ask you to humor me while I describe this particular idea.

The system I envision is highly modular, and distributed, but still requires one (or more) "core servers" where the monitoring information is aggregated so that other components can find and use relevant information as it arrives. Well, I had an epiphany the other day and followed it up with some research today and found something very encouraging...

This part of the system will require, for lack of a better term, a "stream" of data, which will need to support the ability to have hundreds (possibly thousands) of filters attached. Each filter would be an expression registered by a processing node to identify records that node would be interested in. None of the filters modify the data in any way, they simply pass a copy of matching records on to the nodes which could be operating in any number of interesting ways.

So, poking around the CPAN (as usual) yielded a number of insights:
  1. This has been done before. (good!)
  2. This has been done *many* different ways. (also good)
  3. It does not seem that this has been done with my particular scalability and usage needs. (uh-oh?)
  4. This type of system can be made to scale with relative ease. (yaay!)
Some of the modules/dists I looked at while coming to these conclusions:
Something very interesting to me is that the data format used by Boulder, "Stone" is a perfect match for the data that will comprises the records in the stream. I don't need all the whiz-bang features and OO and stuff, but it's encouraging to me that my design of this component is beginning to converge with that of something as successful and data-intensive as Boulder.

The challenge will be keeping latency down as much as possible but I have ideas for doing that through caching, algorithmic optimizations and evaluating filters in parallel... only if necessary, of course!

No comments:

Post a Comment