Something I've devoted a lot of brane-cycles to is the idea of a real CLEC-grade monitoring system. Crazy? Yes. But I've used all the real contenders, and many of the non-contenders and they all fall short in one way or another. I accept that this sort of thing is not easy, but I humbly ask you to humor me while I describe this particular idea.
The system I envision is highly modular, and distributed, but still requires one (or more) "core servers" where the monitoring information is aggregated so that other components can find and use relevant information as it arrives. Well, I had an epiphany the other day and followed it up with some research today and found something very encouraging...
This part of the system will require, for lack of a better term, a "stream" of data, which will need to support the ability to have hundreds (possibly thousands) of filters attached. Each filter would be an expression registered by a processing node to identify records that node would be interested in. None of the filters modify the data in any way, they simply pass a copy of matching records on to the nodes which could be operating in any number of interesting ways.
So, poking around the CPAN (as usual) yielded a number of insights:
- This has been done before. (good!)
- This has been done *many* different ways. (also good)
- It does not seem that this has been done with my particular scalability and usage needs. (uh-oh?)
- This type of system can be made to scale with relative ease. (yaay!)
- HOP::Stream
- Array::Stream::Transactional::Matcher
- Data::Stream::Bulk
- DS::Transformer
- Log::Log4perl::Filter::Boolean
- Boulder::Stream
- Parallel::Iterator
The challenge will be keeping latency down as much as possible but I have ideas for doing that through caching, algorithmic optimizations and evaluating filters in parallel... only if necessary, of course!