Comments and Whitespace
Whitespace can contain a comment or one or more tabs, spaces or newlines.
_ws : ' ' | '\t' |'\n' | '\r' ;
space : _comment | _ws { _ws } ;
Comments can be block of line comments.
_lineComment : '/''/' {.} '\n' ;
_blockComment : '/' '*' {. | '*'} '*' '/' ;
_comment : _lineComment | _blockComment ;
// line comment
/*
block comment
*/
Throughout the rest of this specification the white spaces have been omitted, for the sake of brevity. Please see the BNF for the full specification, which includes all the allowable white spaces.
Identifiers
An identifier starts with an english alphabet letter or underscore, but can be followed by an interleaved sequence of english alphabet letters, underscores and numbers.
_upcase : 'A'-'Z' ;
_lowcase : 'a'-'z' ;
_id_char : _upcase | _lowcase | '_' | _decimal_digit ;
_id : (_upcase | _lowcase | '_' ) {_id_char} ;
id : _id ;
Literals
All literals have been kept as close as possible to their golang syntax counterparts.
Integer literals
There are three possible representations of an integer literal: decimal, octal and hexadecimal. Any of these representations can be prefixed with a - indicating a negative number. Only octal and hexadecimal must be enclosed in parentheses and prefixed with the word int. Decimal may also be enclosed and prefixed, but it is not required.
_decimal_digit : '0' - '9' ;
_octal_digit : '0' - '7' ;
_hex_digit : '0' - '9' | 'A' - 'F' | 'a' - 'f';
_int_lit : _decimal_lit | _octal_lit | _hex_lit ;
_decimal_lit : ( '1' - '9' ) { _decimal_digit } ;
_octal_lit : '0' { _octal_digit } ;
_hex_lit : '0' ( 'x' | 'X' ) _hex_digit { _hex_digit } ;
_singed_int_lit : ['-'] _int_lit ;
_int : 'i' 'n' 't';
int_lit: _int '(' _singed_int_lit ')' | '0' | ['-'] _decimal_lit ;
123
-456
int(123)
int(-456)
int(0x1)
int(07)
Unsigned Integer literals
An unsigned integer literal is an integer literal which is always enclosed in parentheses and prefixed with the word uint.
_uint : 'u' 'i' 'n' 't';
uint_lit: _uint '(' _int_lit ')' ;
The integer literal should not include the negative sign.
Examples
Double literals
Floating point numbers only need enclosing parentheses and a prefix of the word double, if no decimals or exponent are provided.
_decimals : _decimal_digit { _decimal_digit } ;
_exponent : ( 'e' | 'E' ) [ '+' | '-' ] _decimals ;
_float_lit : ( _decimals '.' _decimals _exponent )
| ( _decimals '.' ( _decimals | _exponent ) )
| ( _decimals _exponent )
| ( '.' _decimals [ _exponent ] )
;
_double : 'd' 'o' 'u' 'b' 'l' 'e' ;
double_lit: _double '(' ['-'] ( _float_lit | _int_lit ) ')' | ['-'] _float_lit ;
123.0
double(123)
12.3
12E3
12e-3
.12e+3
String literals
Strings in backticks and double quotes are supported.
_big_u_value : '\\' 'U' _hex_digit _hex_digit _hex_digit_hex_digit
_hex_digit _hex_digit _hex_digit _hex_digit ;
_little_u_value : '\\' 'u' _hex_digit _hex_digit _hex_digit _hex_digit ;
_hex_byte_u_value : '\\' 'x' _hex_digit _hex_digit ;
_octal_byte_u_value : '\\' _octal_digit _octal_digit _octal_digit ;
_byte_value : _octal_byte_u_value | _hex_byte_u_value ;
_raw_string : '`' {.} '`' ;
_escaped_char : '\\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | '\\' | '\'' |'"' ) ;
_unicode_value : . | _little_u_value | _big_u_value | _escaped_char ;
_interpreted_string : '"' { _unicode_value | _byte_value } '"' ;
string_lit : _raw_string | _interpreted_string ;
Bytes literals
Bytes can be composed into comma separated list of bytes, enclosed in curly braces and prefixed by []byte. Each byte can be a character or an integer literal.
_char_lit : '\'' (_unicode_value | _byte_value) '\'' ;
_byte_elem : _int_lit | _char_lit ;
_bytes : '[' ']' 'b' 'y' 't' 'e' ;
bytes_lit : _bytes '{' { _ws } [ _byte_elem { { _ws } ',' { _ws } _byte_elem }] { _ws } '}' ;
[]byte{1, 0xa, 'a'}
[]byte{}
Boolean literals
A boolean value is a lowercase word: true or false.
Bool
: "true"
| "false"
;
Variables
A variable is a dollar followed by a type name. It specifies which Value method on the parser will be called.
double_var : '$' _double ;
int_var : '$' _int ;
uint_var : '$' _uint ;
bytes_var : '$' _bytes ;
string_var : '$' _string ;
bool_var : '$' _bool ;
Terminal
A terminal is a variable or any literal.
Terminal
: Literal
| Variable
;
Expression
An expression is either a terminal, list or function.
Expr
: Terminal
| Function
| List
;
List of Expressions
A list of expressions is a comma separated list of expressions.
Exprs
: Expr
| Exprs Comma Expr
;
List
A list is zero or more expressions that are enclosed by curly braces and prefixed by a ListType.
ListType
: "[]bool"
| "[]int"
| "[]uint"
| "[]double"
| "[]string"
| "[][]byte"
;
List
: ListType "{" Exprs "}"
| ListType "{" "}"
;
[]string{"a", "bc"}
[]int{12, int(3)}
[][]byte{[]byte{0xa}}
[]uint{}
Built in Functions
A built in function is a well defined prefix followed by an expression.
BuiltIn
: "==" Expr
| "!=" Expr
| "<" Expr
| ">" Expr
| ">=" Expr
| "<=" Expr
| "~=" Expr
| "*=" Expr
| "^=" Expr
| "$=" Expr
| "::" Expr
;
Name Expression
A name expression can be:
- any name indicated by an underscore;
- a specific name, specified by a literal or an identifier;
- anything except the specified name expression, which is specified using an exclamation mark and enclosing parentheses;
- a list of alternative name expressions, which is specified with grouping parentheses and delimited with pipe characters.
NameExpr
: "_"
| Name
| "!" "(" NameExpr ")"
| StartNameChoice
;
Name
: Literal
| id
;
StartNameChoice : "(" NameExpr "|" ContinueNameChoice")" ;
ContinueNameChoice
: NameExpr
| ContinueNameChoice "|" NameExpr
;
The identifier is used as a shorthand for some strings.
name
(name|othername)
!(name)
_
!(name|othername|"another name")
(!(_)|(1|2|2.0))
Pattern
A Pattern is the union of all pattern types.
symbol
name
example
->
LeafNode
->eq($int, 1)
LeafNode
== 1
*
ZAny (Zero or more of anything)
*
:
TreeNode
A:*
<empty>
Empty
A:<empty>
[,]
Concat
[A:*, *]
(|)
Or
(A:* | B:*)
(&)
And
(A:* & _ == 1)
()*
ZeroOrMore
(A == 1)*
# @ =
Reference
#main=@a #a=*
!()
Not
!(A == !1)
.
Contains
.A == 1
()?
Optional
(A == 1)?
{;}
Interleave
{A:*;B:*}
The syntactical difference between a Pattern and DepthPattern will be discussed later. This pattern is equivalent to, but not the same as the one specified in the BNF. This was done for the sake of keeping it simple, please see the BNF for a more accurate representation.
Pattern
: ZAny
| TreeNode
| DepthPattern
| StartOr
| StartAnd
| ZeroOrMore
| Optional
| Reference
| Not
| Empty
;
DepthPattern
: LeafNode
| StartConcat
| StartInterleave
| Contains
;
ZAny
ZAny specifies zero or more of anything, very much like .* in a regular expression.
TreeNode Pattern
A tree node consists of a name and a pattern.
TreeNode
: NameExpr ":" Pattern
| NameExpr DepthPattern
;
A DepthPattern has enough syntactical clues to indicate that, given a name expression, a tree node is specified. Other patterns require a colon between the name expression and the pattern.
Examples
a: *
name[a: *, b: *]
name:[a: *, b: *]
LeafNode Pattern
A LeafNode applies a function, that returns a boolean, to a value. A value is a tree node without any children, which can be some type of number, a string, bytes or a boolean. These functions can be composed with other functions, see the built in functions or you can even add your own functions. Some of the built in functions also have shorthand version.
LeafNode
: "->" Function
| BuiltIn
;
Empty
Empty is used internally to indicate that the pattern has been satisfied. It is also useful in the specification of XML trees, or to indicate an empty structure or list.
Concat
Concat is used to describe an ordered sequence of patterns. This is done by concatenating 2 or more patterns together.
StartConcat
: "[" Pattern "," ContinueConcat "]"
| "[" Pattern "," ContinueConcat ",""]"
;
ContinueConcat
: Pattern
| ContinueConcat "," Pattern
;
[a: *, b: *]
[
a: *,
b: *,
c: *,
]
Interleave
Interleave is used to describe an unordered list of patterns.
StartInterleave
: "{" Pattern ";" ContinueInterleave "}"
| "{" Pattern ";" ContinueInterleave ";""}"
;
ContinueInterleave
: Pattern
| ContinueInterleave ";" Pattern
;
_:{a:*; b:*}
{
a: *;
b: *;
c: *;
}
Contains
Contains only cares about satisfying one pattern in sequence of nodes. It is a shorthand for [*, Pattern, *].
Or
Or specifies a list of alternative patterns.
StartOr
: "(" Pattern "|" ContinueOr ")"
;
ContinueOr
: Pattern
| ContinueOr "|" Pattern
;
(a:*|b:*)
(a:<empty> | b == 1 | c:*)
And
And specifies a list of intersecting patterns.
StartAnd
: "(" Pattern "&" ContinueAnd ")"
;
ContinueAnd
: Pattern
| ContinueAnd "&" Pattern
;
(a:* & _ == b)
(a == 1 & _ == 1 & a:*)
ZeroOrMore
ZeroOrMore specifies a repeated pattern that repeats zero or more times.
ZeroOrMore: "(" Pattern ")" "*" ;
Optional
Optional is shorthand for (Pattern|<empty>).
Optional: "(" Pattern ")" "?"
Not
Not is used to take the complement of a pattern.
Not: "!" "(" Pattern ")" ;
Reference
Reference is used as an in place reference to a predefined pattern or pattern declaration.
Grammar
Reference is A grammar consists of a list of pattern declarations. A pattern declaration consists of an name and a pattern. A grammar can start with one pattern which does not have a name. It is implied that this first pattern’s name is main. as an in place reference to a predefined pattern or pattern declaration.
Grammar
: Pattern
| Pattern PatternDecls
| PatternDecls
;
PatternDecls
: PatternDecl
| PatternDecls PatternDecl
;
PatternDecl : "#" id "=" Pattern ;
A:*
#main = A:*
A:(@main|<empty>)
A: @myref
#myref = @myref2
#myref2 = <empty>
BNF
Dragons can view the BNF source here.