What is Katydid?

Katydid is a validation language based on regular expressions, but extended to be able to match deeply nested structures. It was built to filter through petabytes of serialized protocol buffers, but it can also filter through xml and json.

Katydid is Fast

matches up to millions of records per second

Katydid can match up to millions of protocol buffers on a single core. To achieve this speed, no memory allocation is done after initialization, except for the ongoing memoization, which becomes less and less the more matching is done.

filters faster the more records it matches

Katydid matches faster and faster the more records or serialized messages are passed to it, as it memoizes paths that have been seen before, which speeds up matching, while using the same validator.

matches deeply nested and
recursive structures

It is also extendible to match any structure with an implemented parser. The current implementation is in Go, but we are also working towards a Haskell implementation and proofs of correctness in LeanProver.

Benchmarks

Below is a graph of how the matching speeds up after matching 10, 100 and then a 1000 randomly initialized protocol buffers and also the ultimate speed which is achieved by memoizing all possibilities, given the filter:

.Addresses:_:[Number:456, Street: "TheStreet"]

Memoized vs Compiled Benchmarks

Katydid is Extendable

Katydid also makes it easy to add another format. While regular expressions are used as validators only for strings and RelaxNG, DTD and XSchema are validation languages only for XML. Katydid can validate any serialized data that has an implemented parser.
Get a taste of Katydid in our Playground or take a Guided Tour through all the features.