Star me on GitHub

Comments and Whitespace

Whitespace can contain a comment or one or more tabs, spaces or newlines.

_ws : ' ' | '\t' | '\n' | '\r' ;
space : _comment | _ws { _ws } ;

Comments can be block of line comments.

_lineComment : '/' '/' {.} '\n' ;
_blockComment : '/' '*' {. | '*'} '*' '/' ;
_comment : _lineComment | _blockComment ;

Examples

// line comment
/*
    block comment
*/

Throughout the rest of this specification the white spaces have omitted, for the sake of brevity. Please see the BNF for the full specification, which includes all the allowable white spaces.

Identifiers

An identifier starts with an english alphabet letter or underscore, but can be followed by an interleaved sequence of english alphabet letters, underscores and numbers.

_upcase : 'A'-'Z' ;
_lowcase : 'a'-'z' ;
_id_char : _upcase | _lowcase | '_' | _decimal_digit ;
_id : (_upcase | _lowcase | '_' ) {_id_char} ;
id : _id ;

Examples

abc
a1B2
_12
A_1_b_2

Literals

All literals have been kept as close as possible to their golang syntax counterparts.

Integer literals

There are three possible representations of an integer literal: decimal, octal and hexadecimal. Any of these representations can be prefix with a - indicating a negative number. Only octal and hexadecimal must be enclosed in parentheses and prefixed with the word int. Decimal may also be enclosed and prefixed, but it is not required.

_decimal_digit : '0' - '9' ;
_octal_digit : '0' - '7' ;
_hex_digit : '0' - '9' | 'A' - 'F' | 'a' - 'f';

_int_lit     : _decimal_lit | _octal_lit | _hex_lit ;
_decimal_lit : ( '1' - '9' ) { _decimal_digit } ;
_octal_lit   : '0' { _octal_digit } ;
_hex_lit     : '0' ( 'x' | 'X' ) _hex_digit { _hex_digit } ;

_singed_int_lit : ['-'] _int_lit ;

_int : 'i' 'n' 't';
int_lit: _int '(' _singed_int_lit ')' |  '0' | ['-'] _decimal_lit ;

Examples

123
-456
int(123)
int(-456)
int(0x1)
int(07)

Unsigned Integer literals

An unsigned integer literal is an integer literal which is always enclosed in parentheses and prefixed with the word uint.

_uint : 'u' 'i' 'n' 't';
uint_lit: _uint '(' _int_lit ')' ;

The integer literal should not include the negative sign.

Examples

uint(123)
uint(456)

Double literals

Floating point numbers only need enclosing parentheses and a prefix of the word double, if no decimals or exponent are provided.

_decimals  : _decimal_digit { _decimal_digit } ;
_exponent  : ( 'e' | 'E' ) [ '+' | '-' ] _decimals ;

_float_lit : ( _decimals '.' _decimals _exponent )
            | ( _decimals '.' ( _decimals | _exponent ) )
            | ( _decimals _exponent )
            | ( '.' _decimals [ _exponent ] ) 
            ;

_double : 'd' 'o' 'u' 'b' 'l' 'e' ;
double_lit: _double '(' ['-'] ( _float_lit | _int_lit ) ')' | ['-'] _float_lit ;

Examples

123.0
double(123)
12.3
12E3
12e-3
.12e+3

String literals

Strings in backticks and double quotes are supported.

_big_u_value      : '\\' 'U' _hex_digit _hex_digit _hex_digit _hex_digit
                            _hex_digit _hex_digit _hex_digit _hex_digit ;
_little_u_value   : '\\' 'u' _hex_digit _hex_digit _hex_digit _hex_digit ;
_hex_byte_u_value   : '\\' 'x' _hex_digit _hex_digit ;

_octal_byte_u_value : '\\' _octal_digit _octal_digit _octal_digit ;
_byte_value       : _octal_byte_u_value | _hex_byte_u_value ;
_raw_string : '`' {.} '`' ;
_escaped_char     : '\\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | '\\' | '\'' | '"' ) ;
_unicode_value    : . | _little_u_value | _big_u_value | _escaped_char ;
_interpreted_string : '"' { _unicode_value | _byte_value } '"' ;

string_lit : _raw_string | _interpreted_string ;

Examples

"abc"
`abc`
`a"bc`
"a\"bc"

Bytes literals

Bytes can be composed into comma separated list of bytes, enclosed in curly braces and prefixed by []byte. Each byte can be a character or an integer literal.

_char_lit   : '\'' (_unicode_value | _byte_value) '\'' ;
_byte_elem : _int_lit | _char_lit ;
_bytes : '[' ']' 'b' 'y' 't' 'e' ;
bytes_lit : _bytes '{' { _ws } [ _byte_elem { { _ws } ',' { _ws } _byte_elem } ] { _ws } '}' ;

Examples

[]byte{1, 0xa, 'a'}
[]byte{}

Boolean literals

A boolean value is a lowercase word: true or false.

Bool
  : "true"
  | "false"
  ;

Examples

true
false

Variables

A variable is a dollar followed by a type name. It specifies which Value method on the parser will be called.

double_var : '$' _double ;
int_var : '$' _int ;
uint_var : '$' _uint ;
bytes_var : '$' _bytes ;
string_var : '$' _string ;
bool_var : '$' _bool ;

Examples

$double
$string

Terminal

A terminal is a variable or any literal

Terminal
  : Literal
  | Variable
  ;

Expression

An expression is either a terminal, list or function.

Expr
  : Terminal
  | Function
  | List
  ;

List of Expressions

A list of expressions is a comma separated list of expressions.

Exprs
  : Expr
  | Exprs Comma Expr
  ;

List

A list is zero or more expressions that are enclosed by curly braces and prefixed by a ListType.

ListType
  : "[]bool"
  | "[]int"
  | "[]uint"
  | "[]double"
  | "[]string"
  | "[][]byte"
  ;

List
  : ListType "{" Exprs "}"
  | ListType "{" "}"
  ;

Examples

[]string{"a", "bc"}
[]int{12, int(3)}
[][]byte{[]byte{0xa}}
[]uint{}

Function

A function is an identifier with its arguments, a list of expressions, enclosed by parentheses.

Function
  : id "(" Exprs ")" 
  | id "(" ")"
  ;

Examples

eq(1, $int)
contains($string, "a")
now()

Built in Functions

A built in function is a well defined prefix followed by an expression.

BuiltIn
  : "==" Expr 
  | "!=" Expr 
  | "<" Expr 
  | ">" Expr 
  | ">=" Expr 
  | "<=" Expr 
  | "~=" Expr 
  | "*=" Expr 
  | "^=" Expr 
  | "$=" Expr 
  | "::" Expr 
  ;

Examples

== "abc"
> 5
:: $bool

Name Expression

A name expression can be:

  • any name indicated by an underscore;
  • a specific name, specified by a literal or an identifier;
  • anything except the specified name expression, which is specified using an exclamation mark and enclosing parentheses;
  • a list of alternative name expressions, which is specified with grouping parentheses and delimited with pipe characters.
NameExpr
  : "_"
  | Name
  | "!" "(" NameExpr ")"
  | StartNameChoice
  ;

Name
  : Literal
  | id
  ;

StartNameChoice : "(" NameExpr "|" ContinueNameChoice ")" ;

ContinueNameChoice
  : NameExpr
  | ContinueNameChoice "|" NameExpr
  ;

The identifier is used as a shorthand for some strings.

Examples

name
(name|othername)
!(name)
_
!(name|othername|"another name")
(!(_)|(1|2|2.0))

Pattern

A Pattern is the union of all pattern types.

symbol name example
-> LeafNode ->eq($int, 1)
shorthands LeafNode == 1
* ZAny (Zero or more of anything) *
: TreeNode A:*
<empty> Empty A:<empty>
[,] Concat [A:*, *]
(|) Or (A:* | B:*)
(&) And (A:* & _ == 1)
()* ZeroOrMore (A == 1)*
# @ = Reference #main=@a #a=*
!() Not !(A == 1)
. Contains .A == 1
()? Optional (A == 1)?
{;} Interleave {A:*;B:*}

The syntactical difference between a Pattern and DepthPattern will be discussed later. This pattern is equivalent to, but not the same as the one specified in the BNF. This was done for the sake of keeping it simple, please see the BNF for a more accurate representation.

Pattern
  : ZAny
  | TreeNode
  | DepthPattern
  | StartOr
  | StartAnd
  | ZeroOrMore
  | Optional
  | Reference
  | Not
  | Empty
  ;

DepthPattern
  : LeafNode
  | StartConcat
  | StartInterleave
  | Contains
  ;

ZAny

ZAny specifies zero or more of anything, very much like .* in a regular expression.

ZAny: "*"

TreeNode Pattern

A tree node consists of a name and a pattern.

TreeNode
  : NameExpr ":" Pattern
  | NameExpr DepthPattern
  ;

A DepthPattern has enough syntactical clues to indicate that, given a name expression, a tree node is specified. Other patterns require a colon between the name expression and the pattern.

Examples

a: *
name[a: *, b: *]
name:[a: *, b: *]

LeafNode Pattern

A LeafNode applies a function, that returns a boolean, to a value. A value is a tree node without any children, which can be some type of number, a string, bytes or a boolean. These functions can be composed with other functions, see the built in functions or you can even add your own functions. Some of the built in functions also have shorthand version.

LeafNode
  : "->" Function
  | BuiltIn
  ;

Examples

== "a"
->eq($int, 1)

Empty

Empty is used internally to indicate that the pattern has been satisfied. It is also useful in the specification of XML trees, or to indicate an empty structure or list.

Empty: "<empty>"

Concat

Concat is used to describe an ordered sequence of patterns. This is done by concatenating 2 or more patterns together.

StartConcat
  : "[" Pattern "," ContinueConcat "]"
  | "[" Pattern "," ContinueConcat "," "]"
  ;

ContinueConcat
  : Pattern
  | ContinueConcat "," Pattern
  ;

Examples

[a: *, b: *]
[
  a: *,
  b: *,
  c: *,
]

Interleave

Interleave is used to describe an unordered list of patterns.

StartInterleave
  : "{" Pattern ";" ContinueInterleave "}"
  | "{" Pattern ";" ContinueInterleave ";" "}"
  ;

ContinueInterleave
  : Pattern
  | ContinueInterleave ";" Pattern
  ;

Examples

_:{a:*; b:*}
{
  a: *;
  b: *;
  c: *;
}

Contains

Contains only cares about satisfying one pattern in sequence of nodes. It is a shorthand for [*, Pattern, *].

Contains: "." Pattern;

Examples

.a:*
.a.b.c == 1

Or

Or specifies a list of alternative patterns.

StartOr
  : "(" Pattern "|" ContinueOr ")"
  ;

ContinueOr
  : Pattern
  | ContinueOr "|" Pattern
  ;

Examples

(a:*|b:*)
(a:<empty> | b == 1 | c:*)

And

And specifies a list of intersecting patterns.

StartAnd
  : "(" Pattern "&" ContinueAnd ")"
  ;

ContinueAnd
  : Pattern
  | ContinueAnd "&" Pattern
  ;

Examples

(a:* & _ == b)
(a == 1 & _ == 1 & a:*)

ZeroOrMore

ZeroOrMore specifies a repeated pattern that repeats zero or more times.

ZeroOrMore: "(" Pattern ")" "*" ;

Examples

(a:*)*
([a:*, b:*])*

Optional

Optional is shorthand for (Pattern|<empty>)

Optional: "(" Pattern ")" "?" 

Examples

(a:*)?
{(a:*)?;b:*}

Not

Not is used to take the complement of a pattern.

Not: "!" "(" Pattern ")" ;

Examples

!(a:*)
!(*)

Reference

Reference is used as an in place reference to a predefined pattern or pattern declaration.

Reference: "@" id ;

Examples

@main
@myref

Grammar

A grammar consists of a list of pattern declarations. A pattern declaration consists of an name and a pattern. A grammar can start with one pattern which does not have a name. It is implied that this first pattern’s name is main.

Grammar
  : Pattern                   
  | Pattern PatternDecls        
  | PatternDecls                
  ;

PatternDecls
  : PatternDecl               
  | PatternDecls PatternDecl
  ;

PatternDecl : "#" id "=" Pattern ;

Examples

A:*

#main = A:*

A:(@main|<empty>)

A: @myref
#myref = @myref2
#myref2 = <empty>

BNF

Dragons can view the BNF source here