Katydid Tree Grammars for Trees

Katydid is a language based on Tree Grammars. Currently it only has an assembler for a validator which uses bottom up hedge (unranked tree) automata and functions on the leaves. In future a more intuitive language will be included that translates to this assembler.

Katydid supports multiple types of trees:

Currently they all need to have a protocol buffer specification.

Protocol buffers encode data structures. The encoded protocol buffers have a semi-unordered unranked tree structure, just like XML. Tree Automata have been found to be very applicable to XML processing. Katydid tries to do the same by applying Tree Automata to Protocol Buffers and other types of trees.

Ideals

Name

True Facts About The Leaf Katydid

Language

The current language reflects the proposed inner workings (assembler) of the final language, see BNF. This will be useful for debugging in future. The assembler is based on bottom up hedge (unranked tree) automata.

Streaming Limitation

Katydid does some streaming processing which is incompatible with the merging feature of protocol buffers. Merging allows a marshaled protocol buffer to contain more than one instance of the same optional field. This assumes that the last instance will be unmarshaled as the actual value. Katydid always assumes the first instance is the actual value. This means Katydid will not always process protocol buffers, that have been marshaled with the merging feature, correctly.

For more information please, see protocol buffer encoding documetation.

NoMerge is a function in gogoprotobuf that checks whether a marshaled protocol buffer has used this feature, see gogoprotobuf fieldpath documentation.

Next Steps