GRM syntax
Typography
On the typography used in this document.
- A character is represented by its representation as a character in the C programming language is followed by its name and its Unicode Code Point.
Depending on the type of character, one or more elements can be omitted.
For example a new line character is represented as as
\n(NEWLINE, U+000A). - When we write
from X to Z
we mean any element from the sequence X, Y and Z. In other words, Z is included when we writefrom X to Z
.
Glossary
- Attribute
- An element which gives additional information for a Named Node. An Attribute has an Attribute Name and optionally an Attribute Value.
- Comment
- An element that is not interpreted by a GRM parser. The purpose of a Comment is the same as the purpose of a comment in any programming languages or markup languages.
- GRM
- A markup language aimed to be short to write, easy to parse and customizable. GRM stands for "GeneRic Markup".
- GRM Definition Document
- A special GRM document used to describe which Named Node and Marks are defined. This is optional.
- GRM Lite
- A subset of GRM which does not contain Marks.
- Mark
- A shortcut for a Named Node containing no Attributes. A Mark has a starting visible character and an ending visible character. A Mark can be a Verbatim Mark which means that no characters between its starting and its ending characters are interpreted as GRM language characters.
- Node
- A GRM document is a set of Nodes. A Node can be a Text Node or a Named Node.
- Text Node
- A Node which only represents text.
- Named Node
- A Node which has a Name, optionally Attributes and optionally child Nodes.
Syntax in human language
Category 1: Encoding and characters
Here are the rules for the encoding and the characters used for a valid GRM document.
a and a A are two distinct characters.
- From NULL (U+0000) to BACKSPACE (U+0008).
- From VERTICAL TAB (U+000B) to INFORMATION SEPARATOR ONE (U+001F).
- From DELETE (U+007F) to APPLICATION PROGRAM COMMAND (U+009F).
Category 2: Syntax definition axioms
Here are the basis of the syntax definition. We write only rules that apply for a single character.
Remember that some characters are completely ignored (Rule C1R03).
Syntax for general single characters
\n (NEWLINE, U+000A).\t (HORIZONTAL TAB, U+0009).a to z or from A to Z.0 to 9.- A whitespace.
- A character listed in Invisible Characters:
- NO-BREAK SPACE (U+00A0);
- SOFT HYPHEN (U+00AD);
- COMBINING GRAPHEME JOINER (U+034F);
- ARABIC LETTER MARK (U+061C);
- from HANGUL CHOSEONG FILLER (U+115F) to HANGUL JUNGSEONG FILLER (U+1160);
- from KHMER VOWEL INHERENT AQ (U+17B4) to KHMER VOWEL INHERENT AA (U+17B5);
- from MONGOLIAN FREE VARIATION SELECTOR ONE (U+180B) to MONGOLIAN VOWEL SEPARATOR (U+180E);
- from EN QUAD (U+2000) to RIGHT-TO-LEFT MARK (U+200F);
- from LEFT-TO-RIGHT EMBEDDING (U+202A) to NARROW NO-BREAK SPACE (U+202F);
- from MEDIUM MATHEMATICAL SPACE (U+205F) to NOMINAL DIGIT SHAPES (U+206F);
- BRAILLE PATTERN BLANK (U+2800);
- IDEOGRAPHIC SPACE (U+3000);
- HANGUL FILLER (U+3164);
- from VARIATION SELECTOR-1 (U+FE00) to VARIATION SELECTOR-16 (U+FE0F);
- ZERO WIDTH NO-BREAK SPACE (U+FEFF);
- HALFWIDTH HANGUL FILLER (U+FFA0);
- OBJECT REPLACEMENT CHARACTER (U+FFFC).
- A non-character listed in WHATWG community and other sources:
- from U+FDC8 to U+FDCE;
- from U+FDD0 to U+FDEF;
- from U+FFFE to U+FFFF.
- Other invisible characters or non-characters not listed previously.
- Other invisible characters beyond U+FFFF.
Syntax for GRM single characters
#.{.}._ or -.\.[.].=._ or -.".\." or \ or n or t.", \, newline or tab.Category 3: Syntax definition for GRM Lite
Here we are defining GRM without the concept of Marks. For convenience, we call that GRM Lite.
By convention the extension file for a GRM document is .grm.
For example: blog-post.grm.
Concept
A GRM document, defined below as grm-document, is a collection of Nodes. There are two types of Node: Named Node and Text Node.
- A Named Node, defined below as named-node, is a Node which has a Name, optionally Attributes and optionally child Nodes.
- A Text Node, defined below as text-node, is a Node which only represents text. A Text Node cannot have any Attributes nor child Nodes.
- An Attribute, defined below as attribute, gives more information on its Node. An Attribute has an Attribute Name, defined below as attribute-name, and optionally an Attrbute Value, defined below as attribute-value.
- Just like XML or HTML, GRM has Comments. A Comment, defined below as comment, is a text ignored by the GRM parser.
Syntax
From here for the following syntax definitions, we ignore all comment.
- an att-val-str-non-escape;
- the sequence att-val-str-escape, att-val-str-escapable.
- an attribute-name;
- the sequence attribute-name, useless, attribute-assign, useless, attribute-value.
- an attribute;
- the sequence attribute, useless, attribute-seq.
- grm-char except ( text-node-escape or named-node-start or named-node-end or user-mark);
- the sequence text-node-escape, grm-char.
Rules
\a then only a is captured by Text Node and \ is ignored.
This is the same thing for \\ for which only one \ is returned.
- the sequence att-val-str-escape,
"is interpreted as the character"; - the sequence att-val-str-escape,
\is interpreted as the character\; - the sequence att-val-str-escape,
nis interpreted as the character\n(NEWLINE, U+000A); - the sequence att-val-str-escape,
tis interpreted as the character\t(HORIZONTAL TAB, U+0009).
nullwhen it is not present.
Category 4: Syntax definition for GRM
We previously defined what we called for convenience GRM Lite. GRM has a notion of Marks which we excluded in GRM Lite.
Concept
GRM is mostly aimed for writing prose documents. When writing, we do not want to write lengthy markup syntaxes. A Mark is way to shorten the markup syntax.
A Mark, defined below as mark, is a short way to write a specific Named Node without Attributes. A Mark has one Mark Start Character, defined below as mark-start, and one Mark End Character, defined below as mark-end.
GRM is aimed to be flexible. It is up to the developer of the software which uses a GRM parser to define which Named Node a Mark represents. It is still up to this developer to define which characters are used as Mark Start Character and Mark End Character.
A Mark can be a Verbatim Mark, defined below as verbatim-mark.
In a Verbatim Mark, all characters until its Mark End Character is part of its child Text Node.
This means that inside a verbatim Mark any special characters are considered as characters of a Text Node and \ is not needed to escape anything.
A non Verbatim Mark is defined below as non-verbatim-mark.
It is up to the developer of the software which uses a GRM parser to define if a Mark is a Verbatim Mark or not.
Syntax
We define below the rest of the GRM syntax. For the following syntax definitions, we ignore all comment.
The syntax definition for GRM Lite applies here too.
Rules
* and * and Mark B is defined by < and > then the sequence *<*> is incorrect.
Category 5: GRM Definition Document
Concept
A GRM Definition Document is a special GRM document used to describe which Named Nodes and Marks are defined. This document is not mandatory but it can be a good help for the people writing GRM documents interpreted by a specific software. A GRM Definition Document could be used by linters or LSP servers and clients to help someone writing a GRM document.
By convention the extension file for a GRM Definition Document is .grmd.
For example: website.grmd.
A GRM Definition Document is written in GRM Lite. Text Nodes in a GRM Definition Document are used to comment the definition made by their parent Named Node or to comment the whole document if they are not inside a Named Node. Other Text Nodes should be completely ignored.
Note that a GRM Definition Document is merely an indication. It is not the role of the parser to ensure that the definitions in a GRM Document complies to a GRM Definition Document. This role belongs to the software implementing the parser to generate documents from GRM Document.
Syntax
Below is the GRM Definition Document of what can be found in a GRM Definition Document.
Definition for GRM Definition Document.
{node [name="node"] Defines a Named Node.
{attribute [name="name"] Defines the name of the Node. This "name" Attribute is mandatory and must have a non-empty Attribute Value.
}
{node [name="attribute"] Defines an Attribute in a Named Node. This "attribute" Node must be the child of a "node" Node.
{attribute [name="name"]
Defines the Attribute Name.
This "name" Attribute is mandatory and must have a non-empty Attribute Value.
}
{attribute [name="optional" optional nullable]
By default when an Attribute is defined, it is considered as mandatory.
Defining an Attribute as "optional" means that this Attribute can be absent.
}
{attribute [name="nullable" optional nullable]
By default when an Attribute is defined, it is considered as requiring an Attribute Value.
Defining an Attribute as "nullable" means that this Attribute Value can be absent.
}
}
{node [name="mark"] Defines a Mark.
{attribute [name="name"]
References the Named Node that this Mark is referencing.
Multiple marks can have the same name.
}
{attribute [name="start"]
Defines the Mark Start Character.
This Attribute is mandatory and its value must be one single valid mark character not used yet for other Marks.
Its value can be the same has the value for the "end" Attribute defined for the same "mark" Node.
}
{attribute [name="end"]
Defines the Mark End Character.
This Attribute is mandatory and its value must be one single valid mark character not used yet for other Marks.
Its value can be the same has the value for the "start" Attribute defined for the same "mark" Node.
}
{attribute [name="verbatim" optional nullable]
Defines if the Mark is a Verbatim Mark.
By default, a Mark is not a Verbatim Mark.
}
}
Rules
We do not define "types" (boolean, integer, number, enumerate values...) for the Attributes. The reason is simple: GRM is a simple markup language for writing text. GRM is not the right tool for structuring complex precise data.
- The allowed Named Nodes are
node,attributeandmark; nodeNamed Node must have an Attributename;attributeNamed Node has the Attributesname(mandatory),optional(optional) andnullable(optional);markNamed Node has the Attributesname(mandatory),start(mandatory),end(mandatory) andverbatim(optional).
node and mark named-node must be in the first level of the parent hierarchy.
In other words, they have no parents.
attribute Named Node is a direct child of node Named Node.- If a
nodeNamed Node redefines a Named Node previously defined, the old definition is forgotten and the new one is used. - If a
markNamed Node redefines a Mark previously defined, the old definition is forgotten and the new one is used.
Category 6: JSON representation
A GRM document can be represented in JSON. This is mandatory for a parser to implement a way to export a GRM document into JSON.
- the JSON field representing its attribute-name;
- the JSON value representing its attribute-value as a JSON string if it exists, or
null.