@document.meta title: The 1.0 Norg Specification authors: [ vhyrro mrossinek ] categories: specifications version: 1.0 @end * Norg File Format Specification This file contains the formal file format specification of the Norg syntax version 1.0. This document is written in the Norg format in its original form and, thus, attempts to be self-documenting. Please note that this is *not* a reference implementation - this is an established rule set that should be strictly followed. * Introduction Before diving into the details we will start with an introduction. The Norg file format was designed as part of the [Neorg]{https://github.com/nvim-neorg/neorg} plugin for Neovim which was started by /Vhyrro (@vhyrro)/ in April 2021. Soon after starting this work, /Max Rossmannek (@mrossinek)/ joined the development team, and, with the help of the [Neorg] community, the two have shaped the Norg syntax to what it has become today. ** What is Norg? The Norg syntax is a /structured/ plain-text file format which aims to be human-readable when viewed standalone while also providing a suite of markup utilities for typesetting structured documents. Compared to other plain-text file formats like e.g. Markdown, Org, RST or AsciiDoc, it sets itself apart most notably by following a strict philosophy to abide by the following simple rules: ~ *Consistency:* the syntax should be consistent. Even if you know only a part of the syntax, learning new parts should not be surprising and rather feel predictable and intuitive. ~ *Unambiguity:* the syntax should leave _no_ room for ambiguity. This is especially motivated by the use of [tree-sitter]{https://tree-sitter.github.io/tree-sitter/} for the original syntax parser, which takes a strict left-to-right parsing approach and only has single-character look-ahead. ~ *[Free-form]{https://en.wikipedia.org/wiki/Free-form_language}:* whitespace is _only_ used to delimit tokens but has no other significance! This is probably the most contrasting feature to other plain-text formats which often adhere to the [off-side rule]{https://en.wikipedia.org/wiki/Off-side_rule}, meaning that the syntax relies on whitespace-indentation to carry meaning. Although built with [Neorg] in mind, Norg can be utilized in a wide range of applications, from external note-taking plugins to even messaging applications. Thanks to its {* layers}[layer] system one can choose the feature set they'd like to support and can ignore the higher levels. * Preliminaries First, we define some basic concepts which will be used in this specification. ** Characters A Norg file is made up of /characters/. A is any Unicode [code point]{https://en.wikipedia.org/wiki/Code_point} or [grapheme]{https://www.unicode.org/glossary/#grapheme}. *** Whitespace A {** characters}[character] is considered *whitespace* if it constitutes any code point in the [Unicode Zs general category]{https://www.fileformat.info/info/unicode/category/Zs/list.htm}. Any combination of the above is also considered whitespace. Tabs are not expanded to spaces and since whitespace has no semantic meaning there is no need to define a default tab stop. However, if a parser must (for implementation reasons) define a tab stop, we suggest setting it to 4 spaces. Any line may be preceded by a variable amount of whitespace, which should be ignored. Upon entering the beginning of a new line, it is recommended for parsers to continue consuming (and discarding) consecutive whitespace characters exhaustively. The "start of a line" is considered to be /after/ this initial whitespace has been parsed. Keep this in mind when reading the rest of the document. *** Line Endings Line endings in Norg serve as a termination character. They are used e.g. to terminate {** paragraph segments}, {** paragraphs} and other elements like the endings of {** range-able detached modifiers}. They are not considered {*** whitespace}. The following chars are considered line endings: - A line feed `U+000A` - A form feed `U+000C` - A carriage return `U+000D` The following line ending combinations are permitted: - A single line feed - A single carriage return - A carriage return immediately followed by a line feed *** Punctuation A {** characters}[character] is considered *punctuation* if it is any of the following: - A standard ASCII punctuation character: `|!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~|` - Anything in the general Unicode categories `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po` or `Ps`. *** Escaping A single {** characters}[character] can be escaped if it is immediately preceded by a backslash, `|\|` (`U+005C`). Escaping renders the next character /verbatim/. Any {** characters}[character] may be escaped /apart from/ {** characters} within free-form and ranged verbatim segments (see {** free-form attached modifiers} and {*** verbatim ranged tags}). For more information about precedence, take a look at the {* precedence} section. *** Regular Characters Any other character not described by the preceding sections is treated as a generic code point/character. ** Words The Norg format is designed to be parsed on a word-by-word basis from left-to-right through the entire document /in a single pass/. This is possible because the language is [free-form], meaning that whitespace has no semantic meaning, and because the markup follows strict rules which are outlined in the later sections of this document. A *word* is considered to be any combination of {*** regular characters}. ** Paragraph Segments {** Words} are first combined into *paragraph segments*. A paragraph segment may then contain any inline element of type: - {* attached modifiers} - {* linkables} Usually, a {*** line endings}[line ending] terminates the paragraph segment. This means that a paragraph segment is simply a line of text: |example I am a paragraph segment. I am another paragraph segment. Together we form a paragraph. |end ** Verbatim Paragraph Segments These are structurally equivalent to regular {** paragraph segments} with a single exception. Verbatim paragraph segments are built up from /only/ {** words}. This means that attached modifiers and linkables are simply parsed as raw text within a verbatim paragraph segment. ** Paragraphs Paragraphs are then formed of consecutive {** paragraph segments}. A paragraph is terminated by: - A {$ paragraph break} - Any of the {* detached modifiers} - Any of the {** delimiting modifiers} - Any of the {** ranged tags} - Any of the {*** strong carryover tags} $ Paragaph Break A paragraph break is defined as an _empty line_. In the simplest case that means two consecutive {*** line endings} but since Neorg is a /free-form/ markup language, a line which only contains whitespace is also considered empty. * Detached Modifiers Norg has several detached modifiers. The name originates from their differentiation to the {* attached modifiers}, which will be discussed later. These make up the majority of the syntax. All detached modifiers must abide by the following rules: - A detached modifier can _only_ occur at the beginning of the line. - Depending on the modifier type one, two or an arbitrary amount of the same consecutive characters may initiate the detached modifier. - A detached modifier must be immediately followed by {*** whitespace}. The following table outlines all valid *detached modifiers*. It also adds various possible properties to each category which will be explained in more detail below. : . : Character : > : Name : > : Categories : _ : `*` : > : Headings :: > - Structural - Nestable :: : _ : `-` : > : Unordered Lists :: > - Nestable :: : _ : `~` : > : Ordered Lists :: > - Nestable :: : _ : `>` : > : Quotes :: > - Nestable :: : _ : `$` : > : Definitions :: > - Range-able :: : _ : `^` : > : Footnotes :: > - Range-able :: : _ : `:` : > : Table cells :: > - Range-able :: : _ : `%` : > : Attributes :: > - Nestable :: ** Structural Detached Modifiers The first detached modifier type is the /structural/ modifier type. As the name suggests, modifiers under this category *structure* the Norg document in some form or another. After a structural modifier, one {# paragraph segments}[paragraph segment] is consumed as the /title/ of the modifier. A property of structural detached modifiers is that they consume *all* other non-structural detached modifiers, lower-level structural modifiers, inline markup and {** paragraphs}; they are the most important detached modifier in the hierarchy of modifiers. To manually terminate a structural detached modifier (like a heading) you must use a {** delimiting modifiers}[delimiting modifier]. Structural detached modifiers are automatically closed when you use another structural modifier of the same or lower level. *** Headings |example * Heading level 1 ** Heading level 2 *** Heading level 3 **** Heading level 4 ***** Heading level 5 ****** Heading level 6 ******* Heading level 7 (falls back to level 6 in the tree-sitter parser) |end Although headings are both structural /and/ nestable (see next section), the former takes precedence over the latter, meaning that headings only affect a single {** paragraph segments}[paragraph segment] as their title. This is for user convenience as it does not require an empty line right below a heading. Because of this precedence, headings are also non-{** grouping}. Headings serve as a way to categorize and organize other elements into smaller chunks for better readability. They are currently the /only/ structural detached modifier present in the Norg syntax. ** Nestable Detached Modifiers Nestable detached modifiers are a kind which may be repeated multiple times in order to produce a _nested_ object of the given type. Furthermore, in contrast to most other {* detached modifiers}, this detached modifier type has /no/ title, and consumes the following `paragraph` instead of only the next {# paragraph segments}[paragraph segment]. Said paragraph then becomes the modifier's /content/. This means that in order to terminate the detached modifier contents, you must use a {$ paragaph break}. Below you will find examples of nestable detached modifiers. *** Unordered Lists |example - Unordered list level 1 -- Unordered list level 2 --- Unordered list level 3 ---- Unordered list level 4 ----- Unordered list level 5 ------ Unordered list level 6 ------- Unordered list level 7 (falls back to level 6 in the tree-sitter parser) - Unordered list level 1 This text is still part of the level 1 list item. -- Unordered list level 2 This text is still part of the level 2 list item. --- Unordered list level 3 This text is still part of the level 3 list item. ---- Unordered list level 4 This text is still part of the level 4 list item. ----- Unordered list level 5 This text is still part of the level 5 list item. ------ Unordered list level 6 This text is still part of the level 6 list item. ------- Unordered list level 7 (falls back to level 6 in the tree-sitter parser) This text is still part of the level 7 list item. |end Unordered lists provide an easy way to enumerate items in an unordered fashion. Useful for data that's categorically similar but doesn't need to follow a strict order. *** Ordered Lists |example ~ Ordered list level 1 ~~ Ordered list level 2 ~~~ Ordered list level 3 ~~~~ Ordered list level 4 ~~~~~ Ordered list level 5 ~~~~~~ Ordered list level 6 ~~~~~~~ Ordered list level 7 (falls back to level 6 in the tree-sitter parser) ~ Ordered list level 1 This text is still part of the level 1 list item. ~~ Ordered list level 2 This text is still part of the level 2 list item. ~~~ Ordered list level 3 This text is still part of the level 3 list item. ~~~~ Ordered list level 4 This text is still part of the level 4 list item. ~~~~~ Ordered list level 5 This text is still part of the level 5 list item. ~~~~~~ Ordered list level 6 This text is still part of the level 6 list item. ~~~~~~~ Ordered list level 7 (falls back to level 6 in the tree-sitter parser) This text is still part of the level 7 list item. |end This list type is only useful for data that needs to be kept in sequence. In contrast to other formats which may use a syntax like `1.`/`1)`, Norg counts the items automatically - this reduces complexity and makes reordering items simple. *** Quotes |example > Quote level 1 >> Quote level 2 >>> Quote level 3 >>>> Quote level 4 >>>>> Quote level 5 >>>>>> Quote level 6 >>>>>>> Quote level 7 (falls back to level 6 in the tree-sitter parser) > Quote level 1 This text is still part of the level 1 quote. >> Quote level 2 This text is still part of the level 2 quote. >>> Quote level 3 This text is still part of the level 3 quote. >>>> Quote level 4 This text is still part of the level 4 quote. >>>>> Quote level 5 This text is still part of the level 5 quote. >>>>>> Quote level 6 This text is still part of the level 6 quote. >>>>>>> Quote level 7 (falls back to level 6 in the tree-sitter parser) This text is still part of the level 7 quote. |end Quotes are rather self-explanatory - they allow you to cite e.g. a passage from another source. *** Invalid Nestable Detached Modifier Examples |example >I am not a quote some preceding text > I am also not a quote >- I am not a valid detached modifier > > I am only a level 1 quote * I am not a valid heading title. |end ** Range-able Detached Modifiers Range-able detached modifiers can occur in two forms: - With a single character in which case they consume: -- The following *verbatim* paragraph segment which becomes the /title/. -- Any following paragraph which becomes the /content/. - With two consecutive characters in which case: -- The following *verbatim* paragraph segment also becomes the /title/. -- The content continues until the "closing" detached modifier is found. Said closing modifier is made up of the same two consecutive characters that initially opened the range-able detached modifier, however is immediately followed by a {*** line endings}[line ending]. Below you may find all available range-able detached modifiers within the Norg syntax. *** Definitions Definitions are primarily of use to people who write technical documents. They consist of a term, and then are followed by a definition of that term. |example $ Term Definition content. |end To create longer definitions, use the ranged definition syntax instead: |example $$ Term Content of the definition. Which scans up to the closing modifier. $$ |end *** Footnotes Footnotes allow the user to give supplementary information related to some text without polluting the paragraph itself. Footnotes can be linked to using {* linkables}. |example ^ Single Footnote Optional footnote content. |end To create longer footnotes, use the ranged footnote syntax instead: |example ^^ Ranged Footnote Content of the footnote. Which scans up to the closing modifier. ^^ |end *** Table Cells Table cells are used to procedurally build up a table. Here are a few examples of table cells: |example : A1 Content of table cell at `A1`. :: A2 > Content of table cell at `A2` (in a quote). :: |end Their semantics are described in more detail in the {:1.0-semantics:* Tables}[semantics] document, which we recommend reading if you are interested in the behavior of objects as opposed to how they are represented using just syntax. *NOTE*: In order to make tables more aesthetically pleasing, they're commonly mixed with the {* intersecting modifiers}[intersecting modifier] syntax to produce the following: |example : A1 : Content of table cell at `A1`. |end ** Grouping Both nestable and range-able detached modifiers have a unique quality - when several consecutive modifiers /of the same type/ are encountered (by consecutive we mean *not* separated via a {$ paragraph break}), they are treated as one whole