Generating code in M4: introduction The M4 macro processor is used to generate arbitrarily complex code from simple source code. The introductory part of the series contains its history, the basic principles of language, examples of usage and prerequisites for its mastery. Content (Česky) mc txt c updated January 30, 2021 1 Introduction 1.1 Examples for readers 2 History of macro languages 3 Basics of M4 3.1 Context-free grammar 3.2 Automata 3.3 Output queues 4 Main uses of M4 4.1 The code generation 4.2 The preprocessor 5 Prerequisites for mastering M4 5.1 Fundamentals of grammars 5.2 Fundamentals of automata 5.3 (GNU) make 5.4 Vim 5.5 Talent and time 6 References --- A Code generation examples B Preprocessor examples C M4: examples D Why to use M4 and why not? 1 Introduction Readers of this series will learn how to write scripts for machine code generation. The machine-generated code can be arbitrarily complex and can contain other internal dependencies. Interdependent files with complex code are hardly sustainable for humans in a consistent state. It is necessary to use some code generation mechanism. The code generation is performed by a tool for text transformation – a macro processor. The series focus on the practical use of the universal macro processor M4 (hereafter M4) using small examples. It also describes the theoretical part of all its implementations. The aim of the series is to acquaint the reader with this tool and also the programming language. What is it used for, how to program in it and its advantages and disadvantages. 🖹 Multilingual series “Generating code in M4” are generated by M4 scripts[1], which will make it easier (maybe) for other authors to write articles on www.root.cz. The result of the series is also a set of sample scripts for generating code. The introductory part describes the basic principles of the language with simple examples of use. All examples use rewriting rules of context-free grammar. Later we will learn how to use output queues, automata, associative memories, stacks and pushdown automata. We will also learn how to write testing automata to test input data. 1.1 Examples for readers The examples are a complementary part of the series and will be based to some extent on the discussion below the article. At the beginning of each episode, some parts of the M4 language will be described and supplemented with a set of examples at the end. Each part can be read in any order. • Code generation examples • Preprocessor examples • M4: examples • Why to use M4 and why not? • http://github.com/jkubin/m4root – project generating this series 2 History of macro languages Macro languages were invented when the assembly language (ASM) dominated. ASM source code usually contains identical instruction sequences that differ only in operand values. Identical instruction sequences can be grouped into one word or a macro instruction. The name usually describes the purpose of the hidden sequence of instructions. These macro instructions are translated by the macro processor to the original instruction sequences, which are then translated into the executable machine code. Programming in ASM using macro instructions is simpler, faster and less prone to human errors. Later, macro languages were used to extend compiled programming languages because they made it possible to write a source code at the higher level of abstraction than offered by the programming language itself. The speed, performance and efficiency of a complex lower-level programming language is maintained through macro languages. However, it is important to understand all layers of code well. GPM (General Purpose Macro-generator) Christopher Strachey introduced the basic idea of rewritable strings with arguments which recursively rewrite to other strings in his GPM[2] in 1965. The next generation of M3 and M4 macro processors basically just expanded the original GPM. The basic idea of the original proposal remained the same. M3 Dennis Ritchie took over the basic idea of GPM and wrote an improved macro processor for generating source code of C (1972) language, which he himself designed. The new macro processor was written for the minicomputer AP-3, hence the name M3. This direct ancestor of the current M4 managed to significantly save heavy and time-consuming work and attract developers programming to other languages (FORTRAN, COBOL, PL/I, …). Developers have customized M3 for these languages turning it into a universally usable M4 macro processor. Dennis Ritchie was also a co-creator of UNIX and therefore: • M4 is minimalist and fast, it does one thing and it does well • it relies solely on the non-interactive command line interface • parameters and dependencies of M4 scripts are described by Makefile • the # character begins with a one-line comment like in a UNIX shell • variables $@, $*, $#, $0, $1, $2, $3, … have similar meanings as in a UNIX shell • the argument delimiter is comma The M3 macro processor was also extended by Jim E. Weythman, the author of program construction, which is used in almost every M4 script: divert(-1) … define(…) … divert(0)dnl … 🖹 The divert(ℤ) keyword switches output queues. Argument -1 completely disables any text output. Argument 0 switches output to stdout (standard output). M4 Brian Kernighan has enhanced the M3 macro processor to the FORTRAN 66 preprocessor to create a hybrid language extension named RATFOR[3]. The basic program constructions of this extension (conditions, cycles) are the same as in C language. Programming in RATFOR is similar to C programming. The macro processor converts the source code back to FORTRAN, then the compiler performs the usual compilation to machine code. Note the almost perfect symbiosis with the C language: • CPP directives #define, #include, #ifdef, … are comments for M4 • most keywords separated from parentheses by a white character lose meaning • for example, M4 ignores void define (char c, int i) {…} • macro arguments separate commas just like commas in C functions • if the FUNC(char c, int i) macro is defined, its variables are: $# → 2, $0 → FUNC, $1 → char c, $2 → int i • the left control character ` is not a part of the C family syntax • the right control character ' does not matter if it is not part of the macro • both control characters can be hidden into user-defined macros LL(), RR() • macros are written IN_UPPERCASE, just like nonterminal symbols • this delimits their namespace The user manual[4] mentions other co-authors not mentioned here. So it would be fairly unfair to write that the authors of the M4 macro processor (1977) are only two people. Picture 1: Christopher Strachey[5], Dennis Ritchie[6], Brian Kernighan[7] GNU M4 Today, there are several implementations that differ from the original implementation rather by small details. The most common implementation of M4 is the GNU M4 used for Autotools and for translating the simple sendmail.mc configuration file to complex sendmail.cf. The author of this implementation (1990) is René Seindal. To install m4, type the following command: # dnf -y install make m4 pinfo A detailed description of the keywords can be found in the documentation[8]: $ pinfo m4 $ man m4 $ m4 --help 3 Basics of M4 M4 is based on context-free grammar, automata, stacks and output queues. To understand M4, it is therefore crucial to understand the basic concepts of formal language theory – terminal symbols (briefly terminals) and nonterminal symbols (briefly nonterminals). These terms will be explained later in more detail. The objective is to show the basic practical use of M4 language on examples. 3.1 Context-free grammar Context-free grammar (shortly CFG) is a formal grammar in which all rules for rewriting have the A → β form. The nonterminal A is rewritten to an arbitrarily long β string composed of nonterminals N or terminals Σ. Kleene star means that nonterminal A can be rewritten to ε (rewriting rule A → ε). P: A → β A ∈ N β ∈ (N ∪ Σ)* M4 rewriting rules The rules for rewriting are the same for context-free grammar and M4. # A → β define(`A', `β') # A → ε define(`A', `') define(`A') All M4 keywords are nonterminals (macros), which take action and are rewritten to ε or another symbol. All keywords can be renamed or turned off completely. This feature is crucial for the preprocessor mode. divert(ℤ) → ε define(`A', `β') → ε ifelse(`', `', `yes', `no') → yes ifelse(`', `', `ifdef(`dnl', `1', `0')', `no') → ifdef(`dnl', `1', `0') → 1 … Nonterminal expansion control The default character pair `' in M4 controls the expansion of nonterminals. The keyword changequote() can change them to other characters, for example {[], ␂␆, ⟦⟧}. The nonterminals that we do not want to (immediately) expand are surrounded by this pair of characters. When passing through the macro processor, all the symbols between this character pair are terminal symbols and the outer character pair is removed. The next pass will cause the expansion of the originally protected nonterminals. The control character pair is set at the beginning of the root file. 3.2 Automata Automata serve as “switches” of grammar rules. Automata use the grammar rules for rewriting as nodes and change their states according to input symbols. The currently used rule produces a specific code to the output queue (or several output queues) until the automaton moves to another node with a different rule. The examples of generating automata are in appendix. 3.3 Output queues The output queues temporarily store the portions of the resulting code. These parts are formed using the grammar rules for rewriting which subsequently rewrite input symbols. The divert(ℤ) keyword sets the output queue. Finally, all non-empty queues are dumped in ascending order to the standard output and compose the final code. The examples of the output queues are in the appendix. 🛈 Stacks will be described later. 4 Main uses of M4 M4 is used to generate the source code of any programming language or as a preprocessor for any source code. 4.1 The code generation M4 transforms input data from .mc files to output data with the following command: $ m4 root.m4 stem.m4 branch.m4 leaf.m4 input1.mc input2.mc > output.file Two basic operations are performed during file loading: • the reading transformation rules from files with the .m4 extension • the expansion of macros inside .mc files The input1.mc and input2.mc files contain the input data in a format that allows them to be transformed into output data according to the rules in the previous .m4 files. The .mc data files usually do not contain any transformation rules. The input data may also come from the pipeline: $ cat input.mc | m4 root.m4 stem.m4 branch.m4 leaf.m4 - > output.file $ cat input.mc | m4 root.m4 stem.m4 branch.m4 leaf.m4 - | gcc -x c -o progr - Try: Code generation examples 4.2 The preprocessor M4 can operate in the preprocessor mode and can also be part of a pipeline. The input source code passes unchanged through except for nonterminal symbols. The nonterminals found are expanded to terminals and the output along with the source code. M4 can extend any other language where the preprocessor is insufficient (no recursion) or none. It is important to select the left character for nonterminal expansion control, which must not collide with the input source code character. However the character collision is easily solved by a regex. $ m4 root.m4 stem.m4 branch.m4 leaf.m4 file.c > preproc.file.c $ m4 root.m4 stem.m4 branch.m4 leaf.m4 file.c | gcc -x c -o progr - `' Default characters The conflicting character ` from the input source code is hidden into a macro `'LL(). An empty pair of control characters `' before the macro serves as a symbol separator. When the source code is passed through the macro processor, the `'LL() macro is rewritten back to the original ` character and the empty pair `' is removed. $ sed 's/`/`'\''LL()/g' any.src | m4 rootq.m4 leaf.m4 - If there are # or dnl comments in the source code, they must be hidden first. The characters `' turn off original meaning and will be removed by the macro processor. M4 # and dnl comments are hidden between default characters: `#' `dnl' $ sed 's/`/`'\''LL()/g;s/#\|\/`&'\''/g' any.src | m4 rootq.m4 leaf.m4 - $ sed 's/`/`'\''LL()/g;s/#/`#'\''/g;s/\/`dnl'\''/g' any.src | m4 … [] Square brackets If square brackets are used to control the expansion of nonterminals, the left [ square bracket is hidden in the same way. Everything else applies as for default characters `'. $ sed 's/\[/[]LL()/g' any.src | m4 rootb.m4 leaf.m4 - | … M4 # and dnl comments are hidden between parentheses: [#] [dnl] $ sed 's/\[/[]LL()/g;s/#\|\/[&]/g' any.src | m4 rootb.m4 leaf.m4 - | … $ sed 's/\[/[]LL()/g;s/#/[#]/g;s/\/[dnl]/g' any.src | m4 rootb.m4 … ␂␆ Non printable characters Non printable characters ␂ (0x02) and ␆ (0x06) can be used to control the expansion of nonterminals. These characters cannot interfere with printable source code characters. $ m4 rootn.m4 leaf.m4 any.src | gcc … M4 # and dnl comments are hidden between non printable characters: ␂#␆ ␂dnl␆ $ sed 's/#\|\/␂&␆/g' any.src | m4 rootn.m4 leaf.m4 - | gcc … $ sed 's/#/␂#␆/g;s/\/␂dnl␆/g' any.src | m4 rootn.m4 leaf.m4 - | gcc … ⟦⟧ UTF-8 characters Expansion of nonterminals can also be controlled by a suitably selected UTF-8 character pair. The usual source code does not contain such characters, so we do not have to solve the collision of the left control ⟦ character. UTF-8 characters offer similar advantages to non printable characters. $ m4 rootu.m4 leaf.m4 any.src | gcc … M4 # and dnl comments are hidden between UTF-8 characters: ⟦#⟧ ⟦dnl⟧ $ sed 's/#\|\/⟦&⟧/g' any.src | m4 rootu.m4 leaf.m4 - | gcc … $ sed 's/#/⟦#⟧/g;s/\/⟦dnl⟧/g' any.src | m4 rootu.m4 leaf.m4 - | gcc … Try: Preprocessor examples Mixed mode The mixed mode is a combination of the previous modes and is mainly used for experiments. The data is not separated from the rules for its transformation. The leaf file leaf.m4 contains transformation rule definitions along with input data. $ m4 root.m4 leaf.m4 Try: M4: examples 5 Prerequisites for mastering M4 To successfully master this macro language it is important to fulfill several prerequisites. M4 is not a simple language because it is not possible to think and program in it like an ordinary programming language. The most important thing to realize is that it is used to program the grammar rules for rewriting. Each string is either a terminal or a nonterminal symbol, including all language keywords (the symbols # and , are special cases of nonterminals). M4 intentionally does not have keywords for cycles (for/while) because its basis is quite different from procedural or functional languages. • loops are only left-recursive or right-recursive • branching is made by symbol concatenation or ifelse(), ifdef() keywords 5.1 Fundamentals of grammars All grammars are based on the rules for rewriting and their forms are generally described: Formal grammar (Chomsky type) G = (N, Σ, P, S) N: a finite set of nonterminal symbols Σ: a finite set of terminal symbols N ∩ Σ = ø P: a finite set of production (rewrite) rules (N ∪ Σ)* N (N ∪ Σ)* → (N ∪ Σ)* S: is the start symbol S ∈ N The Formal grammar describes the subsets of the formal language rewriting rules and one of the subsets is called context-free grammar, shortly CFG. As mentioned earlier, the CFG rewriting rules work the same as the M4 rewriting rules. Some of the following episodes of this series will focus on formal grammar in detail. 5.2 Fundamentals of automata The ability to use predominantly two-state automata is an essential thing for writing simple M4 scripts because the vast majority of scripts use small automata. Testing automaton The order of input symbols or their context can be tested by an automaton. If the input symbols meet the required properties, the automaton ends up in a double-ring node which indicates the accepting state. Picture 2: Example of an automaton[9] accepting an even number (none is even) of symbols 0, ignoring symbols 1. The automaton is the same as the regular expression (1*01*01*)*1*. The previous automaton can be written as an ASCII art accompanying the M4 script: # ____1 # | / # ___V__/ 0 ____ # --->// S1 \\------>/ S2 \---.1 # \\____//<------\____/<--' # 0 Generating automaton Input symbols change the nodes of the automaton, thereby changing the rewriting rules for code generation. See the appendix for this generating automaton: # _______ ___________ # --->/ ERROR \--->/ NEXT_ITEM \---. # \_______/ \___________/<--' 5.3 (GNU) make A well-designed code generator usually consists of several smaller files whose order, dependencies and parameters are written to the Makefile file. Good knowledge of Makefile writing is therefore a prerequisite for mastering M4. Reading and maintaining source code generally takes more time than creating it. A well-structured Makefile therefore significantly contributes to the overall clarity of the resulting code generator. 🖹 Executing make[10] from the code editor with a shortcut key will significantly speed up M4 code development. The file ~/.vimrc contains nnoremap :make. 5.4 Vim Mastering the Vim[11] editor is an important prerequisite for the convenience and speed of writing M4 code. Vim shortcuts, defined by the iabbrev keyword, will save large amounts of unnecessary typing. These shortcuts also significantly reduce the occurrence of almost invisible errors caused by an unpaired bracket, thus saving the lost time spent on debugging. 5.5 Talent and time M4 usually cannot be mastered over the weekend, especially when the fundamentals[12] of automata theory and formal grammars are lacking. In order to master the M4 language, it is necessary to program in a longer period of time and write amounts of bad (complex) M4 code that you rewrite for a better idea. In this way it is possible to gradually gain practice. 6 References 1. Generating code in M4, a template with examples for www.root.cz http://github.com/jkubin/m4root 2. A General Purpose Macro-generator, Computer Journal 8, 3 (1965), 225–41 http://dx.doi.org/10.1093/comjnl/8.3.225 3. RATFOR — A Preprocessor for a Rational Fortran, Brian W. Kernighan https://wolfram.schneider.org/bsd/7thEdManVol2/ratfor/ratfor.pdf 4. The M4 Macro Processor, Bell Laboratories (1977) https://wolfram.schneider.org/bsd/7thEdManVol2/m4/m4.pdf 5. Christopher Strachey, Computer Hope – Free computer help since 1998 https://www.computerhope.com/people/christopher_strachey.htm 6. Dennis Ritchie, Zomrel tvorca Unixu a jazyka C https://pc.zoznam.sk/novinka/zomrel-tvorca-unixu-jazyka-c 7. Brian Kernighan, An Interview with Brian Kernighan https://www.cs.cmu.edu/~mihaib/kernighan-interview/ 8. GNU M4 - GNU macro processor, Free Software Foundation https://www.gnu.org/software/m4/manual/ 9. Automata theory, From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Automata_theory 10. GNU Make Manual, Free Software Foundation https://www.gnu.org/software/make/manual/make.html 11. Vim – the ubiquitous text editor, that edits text at the speed of thought https://www.vim.org/ 12. Automaty a formální jazyky I, Učební text FI MU https://is.muni.cz/elportal/estud/fi/js06/ib005/Formalni_jazyky_a_automaty_I.pdf 13. Automaty a gramatiky, Michal Chytil, 1. vydání, Praha, 331 s. 1984. https://is.muni.cz/publication/173173 A Code generation examples 🛈 Chars. {`', [], ␂␆, ⟦⟧} in the name controls the expansion of nonterminals. A.1 ⟦⟧ Input source code A.2 ⟦⟧ CSV: simplest example A.3 ⟦⟧ CSV: counter A.4 💡 Modification of special characters A.5 ⟦⟧ C: output queue A.6 ⟦⟧ INI: an external command A.7 ⟦⟧ .h: hex counter A.8 ⟦⟧ C: small automaton A.9 ⟦⟧ C: small automaton 2 A.10 ⟦⟧ HTML: output queues A.11 ⟦⟧ Branching by grammar A.12 ⟦⟧ JSON: generating automaton A.12.1 ⟦⟧ JSON: named queues A.12.2 ⟦⟧ JSON: generated queue indexes A.13 ⟦⟧ INI: discontinuous queue index A.14 ⟦⟧ XML: mixed messages A.15 ⟦⟧ XML: separated messages A.16 ⟦⟧ Bash $ echo "string" A.17 ⟦⟧ Bash $ echo 'string' 🖹 The examples in this appendix are more complex and are intended to demonstrate the practical use of M4. They will be explained in detail later. A.1 ⟦⟧ Input source code The input source code is similar to CSV, which is converted to arbitrarily complex target code of another language using CFG, automata and output queues. Stacks in the examples are not used. The input source code contains special characters that must be hidden: ff812e6 messages/messages_raw.mc 1 # 2018/05/15 Josef Kubin 2 3 ERROR(⟦COMPLEX⟧, ⟦!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~⟧) 4 QUERY(⟦READABLE⟧, ⟦Is badly written M4 code readable [N/y]?⟧) 5 ERROR(⟦SUCCESS⟧, ⟦Complex M4 code failed successfully.⟧) 6 WARNING(⟦ADDICTIVE⟧, ⟦Programming in M4 is addictive!⟧) 7 ERROR(⟦NO_FAULT⟧, ⟦It's not a language fault!⟧) 8 WARNING(⟦NO_ERRORS⟧, ⟦No other errors were found.⟧) 🖹 The input file may also contain notes that may not be hidden in the comments #, dnl, ifelse(⟦…⟧) or ⟦… somewhere inside brackets …⟧. A.2 ⟦⟧ CSV: simplest example This example does not use output queues, it only prints CSV separated by TAB to standard output. 97e45a3 messages/hello.csv.m4 1 # A → β 2 define(⟦ERROR⟧, ⟦ 3 4 divert(0)dnl 5 ⟦$1 $2⟧ 6 divert(-1) 7 ⟧) $ m4 root0u.m4 hello.csv.m4 messages_raw.mc > hello.csv 97e45a3 messages/hello.csv 1 COMPLEX !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 2 SUCCESS Complex M4 code failed successfully. 3 NO_FAULT It's not a language fault! A.3 ⟦⟧ CSV: counter The example uses the COUNT_UP macro from the countu.m4 file whose β is copied to the right side of the COUNTER macro. During the first expansion of COUNTER its initial value is initialized. Further expansion returns the numeric terminal symbol and increases the inner auxiliary (global) symbol by one. COUNTER is a small automaton. 825d4a3 messages/counter.csv.m4 1 # A → β 2 define(⟦COUNTER⟧, defn(⟦COUNT_UP⟧)) 3 4 # init counter 5 COUNTER(1) 6 7 # A → β 8 define(⟦ERROR⟧, ⟦ 9 10 divert(0)dnl 11 ERR_⟦⟧COUNTER ⟦$1 $2⟧ 12 divert(-1) 13 ⟧) $ m4 root0u.m4 countu.m4 counter.csv.m4 messages_raw.mc > counter.csv 97e45a3 messages/counter.csv 1 ERR_1 COMPLEX !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 2 ERR_2 SUCCESS Complex M4 code failed successfully. 3 ERR_3 NO_FAULT It's not a language fault! A.4 💡 Modification of special characters Each type of output code requires the modification of the special characters. The M4 patsubst() keyword is inappropriate for this type of task. First, we hide all special characters of the input file into appropriately named macros using regular expressions. Modified input code ff812e6 messages/messages.mc 1 # 2018/05/15 Josef Kubin 2 3 ERROR(⟦COMPLEX⟧, ⟦⟦⟧EX()⟦⟧DQ()⟦#⟧⟦⟧DO()%⟦⟧AMP()⟦⟧AP()()*+,-./:;⟦⟧LT()=⟦⟧GT()?@[⟦⟧BS()]^_⟦⟧BQ(){|}~⟧) 4 QUERY(⟦READABLE⟧, ⟦Is badly written M4 code readable [N/y]?⟧) 5 ERROR(⟦SUCCESS⟧, ⟦Complex M4 code failed successfully.⟧) 6 WARNING(⟦ADDICTIVE⟧, ⟦Programming in M4 is addictive⟦⟧EX()⟧) 7 ERROR(⟦NO_FAULT⟧, ⟦It⟦⟧AP()s not a language fault⟦⟧EX()⟧) 8 WARNING(⟦NO_ERRORS⟧, ⟦No other errors were found.⟧) ⟦⟧ Conversion file for XML, XSLT, HTML d2707ad messages/markup.m4 1 # A → β 2 define(⟦AMP⟧, ⟦&⟧) 3 define(⟦AP⟧, ⟦'⟧) 4 define(⟦BQ⟧, ⟦`⟧) 5 define(⟦BS⟧, ⟦\⟧) 6 define(⟦DO⟧, ⟦$⟧) 7 define(⟦DQ⟧, ⟦"⟧) 8 define(⟦EX⟧, ⟦!⟧) 9 define(⟦GT⟧, ⟦>⟧) 10 define(⟦LT⟧, ⟦<⟧) ⟦⟧ Conversion file for C, JSON, INI: "string" d2707ad messages/code.m4 1 # A → β 2 define(⟦AMP⟧, ⟦&⟧) 3 define(⟦AP⟧, ⟦'⟧) 4 define(⟦BQ⟧, ⟦`⟧) 5 define(⟦BS⟧, ⟦\\⟧) 6 define(⟦DO⟧, ⟦$⟧) 7 define(⟦DQ⟧, ⟦\"⟧) 8 define(⟦EX⟧, ⟦!⟧) 9 define(⟦GT⟧, ⟦>⟧) 10 define(⟦LT⟧, ⟦<⟧) ⟦⟧ Conversion file for Bash: "string" d2707ad messages/doubleq.m4 1 # A → β 2 define(⟦AMP⟧, ⟦&⟧) 3 define(⟦AP⟧, ⟦'⟧) 4 define(⟦BQ⟧, ⟦\`⟧) 5 define(⟦BS⟧, ⟦\\⟧) 6 define(⟦DO⟧, ⟦$⟧) 7 define(⟦DQ⟧, ⟦\"⟧) 8 define(⟦EX⟧, ⟦"\!"⟧) 9 define(⟦GT⟧, ⟦>⟧) 10 define(⟦LT⟧, ⟦<⟧) ⟦⟧ Conversion file for Bash: 'string' d2707ad messages/apost.m4 1 # A → β 2 define(⟦AMP⟧, ⟦&⟧) 3 define(⟦AP⟧, ⟦'\''⟧) 4 define(⟦BQ⟧, ⟦`⟧) 5 define(⟦BS⟧, ⟦\⟧) 6 define(⟦DO⟧, ⟦$⟧) 7 define(⟦DQ⟧, ⟦"⟧) 8 define(⟦EX⟧, ⟦!⟧) 9 define(⟦GT⟧, ⟦>⟧) 10 define(⟦LT⟧, ⟦<⟧) ⟦⟧ Conversion file for CSV, M4 (returns all characters) d2707ad messages/unchanged.m4 1 # A → β 2 define(⟦AMP⟧, ⟦&⟧) 3 define(⟦AP⟧, ⟦'⟧) 4 define(⟦BQ⟧, ⟦`⟧) 5 define(⟦BS⟧, ⟦\⟧) 6 define(⟦DO⟧, ⟦$⟧) 7 define(⟦DQ⟧, ⟦"⟧) 8 define(⟦EX⟧, ⟦!⟧) 9 define(⟦GT⟧, ⟦>⟧) 10 define(⟦LT⟧, ⟦<⟧) A.5 ⟦⟧ C: output queue The example uses one output queue for characters }; to close the array at the end. 24fd4f3 messages/array.c.m4 1 # A → β 2 define(⟦ERROR⟧, ⟦ 3 4 divert(0)dnl 5 "$2", 6 divert(-1) 7 ⟧) 8 9 divert(0)dnl 10 /* 11 * DONTE() 12 */ 13 14 char *error[] = { 15 divert(1)dnl 16 }; 17 divert(-1) $ m4 root0u.m4 array.c.m4 code.m4 messages.mc > array.c 97e45a3 messages/array.c 1 /* 2 * DO NOT EDIT! This file is generated automatically! 3 */ 4 5 char *error[] = { 6 "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~", 7 "Complex M4 code failed successfully.", 8 "It's not a language fault!", 9 }; A.6 ⟦⟧ INI: an external command The example runs an external date command and places its output in square brackets. The output of an external command are two comma-separated items. The SARG1() macro selects the first item because the second item contains an unwanted LF (0x0a) new line character. 24fd4f3 messages/hello.ini.m4 1 # A → β 2 define(⟦ERROR⟧, ⟦ 3 4 divert(0)dnl 5 ⟦$1⟧="$2" 6 divert(-1) 7 ⟧) 8 9 divert(0)dnl 10 ; DONTE() 11 12 SARG1(esyscmd(⟦date '+⟦[hello_%Y%m%d]⟧,'⟧)) 13 divert(-1) $ m4 root0u.m4 hello.ini.m4 code.m4 messages.mc > hello.ini 97e45a3 messages/hello.ini 1 ; DO NOT EDIT! This file is generated automatically! 2 3 [hello_20210130] 4 COMPLEX="!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" 5 SUCCESS="Complex M4 code failed successfully." 6 NO_FAULT="It's not a language fault!" A.7 ⟦⟧ .h: hex counter The example uses the COUNTER macro to number the resulting CPP macros and one output queue. The queue number 1 contains the preprocessor directive #endif to terminate the header file. The decimal value of the counter is converted to the two-digit hex by keyword eval(). 825d4a3 messages/messages.h.m4 1 # A → β 2 define(⟦COUNTER⟧, defn(⟦COUNT_UP⟧)) 3 4 # init counter 5 COUNTER(0) 6 7 # A → β 8 define(⟦ERROR⟧, ⟦ 9 10 divert(0)dnl 11 ⟦#define $1 0x⟧eval(COUNTER, 16, 2) 12 divert(-1) 13 ⟧) 14 15 divert(0)dnl 16 /* 17 * DONTE() 18 */ 19 20 #ifndef __ERROR_H 21 #define __ERROR_H 22 23 divert(1) 24 #endif /* __ERROR_H */ 25 divert(-1) $ m4 root0u.m4 countu.m4 messages.h.m4 messages.mc > messages.h 825d4a3 messages/messages.h 1 /* 2 * DO NOT EDIT! This file is generated automatically! 3 */ 4 5 #ifndef __ERROR_H 6 #define __ERROR_H 7 8 #define COMPLEX 0x00 9 #define SUCCESS 0x01 10 #define NO_FAULT 0x02 11 12 #endif /* __ERROR_H */ A.8 ⟦⟧ C: small automaton The example uses a small automaton NEW_LINE to generate a newline \n character and one output queue number 1 containing " characters to terminate resulting string. Run the first time NEW_LINE, is rewritten to ε, in all following ones, it is rewritten to \n. 24fd4f3 messages/stringl.c.m4 1 # NEW_LINE automaton 2 # ___ ____ 3 # --->/ ε \--->/ \n \---. 4 # \___/ \____/<--' 5 6 # A → β 7 define(⟦NEW_LINE⟧, ⟦define(⟦$0⟧, ⟦\n⟧)⟧) 8 9 # A → β 10 define(⟦ERROR⟧, ⟦ 11 12 divert(0)NEW_LINE⟦⟧$2⟦⟧dnl 13 divert(-1) 14 ⟧) 15 16 divert(0)dnl 17 /* 18 * DONTE() 19 */ 20 21 char error[] = 22 "divert(1)"; 23 divert(-1) $ m4 root0u.m4 stringl.c.m4 code.m4 messages.mc > stringl.c 97e45a3 messages/stringl.c 1 /* 2 * DO NOT EDIT! This file is generated automatically! 3 */ 4 5 char error[] = 6 "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~\nComplex M4 code failed successfully.\nIt's not a language fault!"; A.9 ⟦⟧ C: small automaton 2 This example is similar to the previous one, but each string is on a new line. 24fd4f3 messages/string.c.m4 1 # NEW_LINE automaton 2 # ___ _________ 3 # --->/ ε \--->/ \n"\xa" \---. 4 # \___/ \_________/<--' 5 6 # A → β 7 define(⟦NEW_LINE⟧, ⟦define(⟦$0⟧, ⟦\n" 8 "⟧)⟧) 9 10 # A → β 11 define(⟦ERROR⟧, ⟦ 12 13 divert(0)NEW_LINE⟦⟧$2⟦⟧dnl 14 divert(-1) 15 ⟧) 16 17 divert(0)dnl 18 /* 19 * DONTE() 20 */ 21 22 char error[] = 23 "divert(1)"; 24 divert(-1) $ m4 root0u.m4 string.c.m4 code.m4 messages.mc > string.c 97e45a3 messages/string.c 1 /* 2 * DO NOT EDIT! This file is generated automatically! 3 */ 4 5 char error[] = 6 "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~\n" 7 "Complex M4 code failed successfully.\n" 8 "It's not a language fault!"; A.10 ⟦⟧ HTML: output queues The example uses two output queues. The queue number 1 contains paragraphs. The queue number 2 contains closing HTML tags. Navigation links do not have to be stored anywhere, they go straight to the output. The QUERY and WARNING messages are processed in the same way as the ERROR messages. 24fd4f3 messages/messages.html.m4 1 # vim:ft=m4 2 3 # A → β 4 # β 5 define(⟦ERROR⟧, ⟦ 6 7 divert(0)dnl 8 ⟦
  • $0: $1
  • ⟧ 9 divert(1)dnl 10

    $2

    11 divert(-1) 12 ⟧) 13 14 # A → β 15 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 16 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 17 18 divert(0)dnl 19 20 21 22 23 __file__ 24 25

    The power of M4

    26
      27 divert(1)dnl 28
    29 divert(2)dnl 30 31 32 divert(-1) $ m4 root0u.m4 messages.html.m4 markup.m4 messages.mc > messages.html 97e45a3 messages/messages.html 1 2 3 4 5 messages.html.m4 6 7

    The power of M4

    8 16

    !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

    17

    Is badly written M4 code readable [N/y]?

    18

    Complex M4 code failed successfully.

    19

    Programming in M4 is addictive!

    20

    It's not a language fault!

    21

    No other errors were found.

    22 23 A.11 ⟦⟧ Branching by grammar The example shows branching by grammar, macro arguments are ignored. Input nonterminals are rewritten to terminals ERROR → 🐛, QUERY → 🐜, WARNING → 🐝. 24fd4f3 messages/insect.txt.m4 1 # A → β 2 # β 3 define(⟦ERROR⟧, ⟦ 4 5 divert(0)dnl 6 $0_INSECT⟦⟧dnl 7 divert(-1) 8 ⟧) 9 10 # A → β 11 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 12 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 13 define(⟦ERROR_INSECT⟧, ⟦🐛⟧) 14 define(⟦QUERY_INSECT⟧, ⟦🐜⟧) 15 define(⟦WARNING_INSECT⟧, ⟦🐝⟧) $ m4 root0u.m4 insect.txt.m4 messages.mc > insect.txt b53eafe messages/insect.txt 1 🐛🐜🐛🐝🐛🐝 Branching by grammar – basic principle The $0 variable is replaced by the name of the macro and concatenated with another symbol. The newly formed nonterminal is rewritten to the corresponding terminal symbol (queue number or name). $0_QU → ERROR_QU → 2 $0_END → ERROR_END → 3 $0_NAME → ERROR_NAME → error $0_QU → QUERY_QU → 0 $0_END → QUERY_END → 1 $0_NAME → QUERY_NAME → query … A.12 ⟦⟧ JSON: generating automaton The example uses two output queues and one generating automaton. The first ERROR(⟦…⟧) error message in the ERROR state generates a header with brackets and outputs the first record. The automaton goes to the state NEXT_ITEM which is a β rule. The following error messages in the NEXT_ITEM state only output individual records. At the end the output queue number 1 and number 2 print the characters ] and }} to close the resulting JSON. 24fd4f3 messages/atm.json.m4 1 # _______ ___________ 2 # --->/ ERROR \--->/ NEXT_ITEM \---. 3 # \_______/ \___________/<--' 4 5 # A → β 6 define(⟦ERROR⟧, ⟦ 7 8 # transition to the next node 9 define(⟦$0⟧, defn(⟦NEXT_ITEM⟧)) 10 11 divert(0), 12 "error": [ 13 {"⟦$1⟧": "$2"}dnl 14 divert(1) 15 ] 16 divert(-1) 17 ⟧) 18 19 # β 20 define(⟦NEXT_ITEM⟧, ⟦ 21 22 divert(0), 23 {"⟦$1⟧": "$2"}dnl 24 divert(-1) 25 ⟧) 26 27 divert(0)dnl 28 {"generating_automaton": { 29 "_comment": "DONTE()"dnl 30 divert(2)dnl 31 }} 32 divert(-1) $ m4 root0u.m4 atm.json.m4 code.m4 messages.mc > atm.json 97e45a3 messages/atm.json 1 {"generating_automaton": { 2 "_comment": "DO NOT EDIT! This file is generated automatically!", 3 "error": [ 4 {"COMPLEX": "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"}, 5 {"SUCCESS": "Complex M4 code failed successfully."}, 6 {"NO_FAULT": "It's not a language fault!"} 7 ] 8 }} A.12.1 ⟦⟧ JSON: named queues The example processes other types of messages QUERY and WARNING. It uses three automata and six output queues. If we generate more complex source code, we will soon encounter the problem of maintaining index consistency for output queues. To avoid confusion, we use queue names instead of numbers. To avoid having to define similar rules, we copy the right side of ERROR (it is also a β rule) to the right side of the QUERY and WARNING rules. 24fd4f3 messages/qnames.json.m4 1 # DO NOT WRITE INDEXES MANUALLY, use counter! 2 define(⟦QUERY_QU⟧, 0) 3 define(⟦QUERY_END⟧, 1) 4 define(⟦ERROR_QU⟧, 2) 5 define(⟦ERROR_END⟧, 3) 6 define(⟦WARNING_QU⟧, 4) 7 define(⟦WARNING_END⟧, 5) 8 define(⟦LAST_QUEUE⟧, 6) 9 10 # names of message types 11 define(⟦WARNING_NAME⟧, ⟦warning⟧) 12 define(⟦ERROR_NAME⟧, ⟦error⟧) 13 define(⟦QUERY_NAME⟧, ⟦query⟧) 14 15 # _________ ___________ 16 # --->/ ERROR \--->/ NEXT_ITEM \---. 17 # | QUERY | \___________/<--' 18 # \_WARNING_/ 19 20 # A → β 21 # β 22 define(⟦ERROR⟧, ⟦ 23 24 # transition to the next node 25 define(⟦$0⟧, defn(⟦NEXT_ITEM⟧)) 26 27 divert($0_QU), 28 "$0_NAME": [ 29 {"⟦$1⟧": "$2"}dnl 30 divert($0_END) 31 ]dnl 32 divert(-1) 33 ⟧) 34 35 # β 36 define(⟦NEXT_ITEM⟧, ⟦ 37 38 divert($0_QU), 39 {"⟦$1⟧": "$2"}dnl 40 divert(-1) 41 ⟧) 42 43 # A → β 44 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 45 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 46 47 divert(0)dnl 48 {"queue_names": { 49 "_comment": "DONTE()"dnl 50 divert(LAST_QUEUE) 51 }} 52 divert(-1) $ m4 root0u.m4 qnames.json.m4 code.m4 messages.mc > qnames.json 97e45a3 messages/qnames.json 1 {"queue_names": { 2 "_comment": "DO NOT EDIT! This file is generated automatically!", 3 "query": [ 4 {"READABLE": "Is badly written M4 code readable [N/y]?"} 5 ], 6 "error": [ 7 {"COMPLEX": "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"}, 8 {"SUCCESS": "Complex M4 code failed successfully."}, 9 {"NO_FAULT": "It's not a language fault!"} 10 ], 11 "warning": [ 12 {"ADDICTIVE": "Programming in M4 is addictive!"}, 13 {"NO_ERRORS": "No other errors were found."} 14 ] 15 }} A.12.2 ⟦⟧ JSON: generated queue indexes During development, the order and number of output queues often change, which also requires frequent changes of their indexes. It is therefore appropriate to generate indexes. We can then use a virtually unlimited number of queues. The following example shows how these indexes are generated. 24fd4f3 messages/queues.m4 1 # defines a counter for output queues 2 # A → β 3 define(⟦QUEUE_INDEX⟧, defn(⟦COUNT_UP⟧)) 4 5 # index of the first output queue (0 is stdout) 6 QUEUE_INDEX(0) 7 8 # symbolic names for indices of output queues 9 # A → β 10 define(⟦QUERY_QU⟧, QUEUE_INDEX) 11 define(⟦QUERY_END⟧, QUEUE_INDEX) 12 define(⟦ERROR_QU⟧, QUEUE_INDEX) 13 define(⟦ERROR_END⟧, QUEUE_INDEX) 14 define(⟦WARNING_QU⟧, QUEUE_INDEX) 15 define(⟦WARNING_END⟧, QUEUE_INDEX) 16 # Keep it last! 17 define(⟦LAST_QUEUE⟧, QUEUE_INDEX) 18 19 # names of message types 20 # A → β 21 define(⟦WARNING_NAME⟧, ⟦warning⟧) 22 define(⟦ERROR_NAME⟧, ⟦error⟧) 23 define(⟦QUERY_NAME⟧, ⟦query⟧) 24fd4f3 messages/messages.json.m4 1 # _________ ___________ 2 # --->/ ERROR \--->/ NEXT_ITEM \---. 3 # | QUERY | \___________/<--' 4 # \_WARNING_/ 5 6 # A → β 7 # β 8 define(⟦ERROR⟧, ⟦ 9 10 # transition to the next node 11 define(⟦$0⟧, defn(⟦NEXT_ITEM⟧)) 12 13 divert($0_QU), 14 "$0_NAME": [ 15 {"⟦$1⟧": "$2"}dnl 16 divert($0_END) 17 ]dnl 18 divert(-1) 19 ⟧) 20 21 # β 22 define(⟦NEXT_ITEM⟧, ⟦ 23 24 divert($0_QU), 25 {"⟦$1⟧": "$2"}dnl 26 divert(-1) 27 ⟧) 28 29 # A → β 30 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 31 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 32 33 divert(0)dnl 34 {"messages": { 35 "_comment": "DONTE()"dnl 36 divert(LAST_QUEUE) 37 }} 38 divert(-1) $ m4 root0u.m4 countu.m4 queues.m4 messages.json.m4 code.m4 messages.mc > messages.json 97e45a3 messages/messages.json 1 {"messages": { 2 "_comment": "DO NOT EDIT! This file is generated automatically!", 3 "query": [ 4 {"READABLE": "Is badly written M4 code readable [N/y]?"} 5 ], 6 "error": [ 7 {"COMPLEX": "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"}, 8 {"SUCCESS": "Complex M4 code failed successfully."}, 9 {"NO_FAULT": "It's not a language fault!"} 10 ], 11 "warning": [ 12 {"ADDICTIVE": "Programming in M4 is addictive!"}, 13 {"NO_ERRORS": "No other errors were found."} 14 ] 15 }} A.13 ⟦⟧ INI: discontinuous queue index The example uses three automata and two output queues number 2 and 4 defined in a separate file. INI section names are generated by symbol chaining (see branching). The example uses the same file for output queues as the example to generate JSON. 24fd4f3 messages/messages.ini.m4 1 # _________ ___________ 2 # --->/ ERROR \--->/ NEXT_ITEM \---. 3 # | QUERY | \___________/<--' 4 # \_WARNING_/ 5 6 # A → β 7 # β 8 define(⟦ERROR⟧, ⟦ 9 10 divert($0_QU) 11 [$0_NAME] 12 ⟦$1⟧="$2" 13 divert(-1) 14 15 # transition to the next node 16 define(⟦$0⟧, defn(⟦NEXT_ITEM⟧)) 17 ⟧) 18 19 # A → β 20 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 21 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 22 23 # β 24 define(⟦NEXT_ITEM⟧, ⟦ 25 26 divert($0_QU)dnl 27 ⟦$1⟧="$2" 28 divert(-1) 29 ⟧) 30 31 divert(0)dnl 32 ; DONTE() 33 divert(-1) $ m4 root0u.m4 messages.ini.m4 countu.m4 queues.m4 code.m4 messages.mc > messages.ini 97e45a3 messages/messages.ini 1 ; DO NOT EDIT! This file is generated automatically! 2 3 [query] 4 READABLE="Is badly written M4 code readable [N/y]?" 5 6 [error] 7 COMPLEX="!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" 8 SUCCESS="Complex M4 code failed successfully." 9 NO_FAULT="It's not a language fault!" 10 11 [warning] 12 ADDICTIVE="Programming in M4 is addictive!" 13 NO_ERRORS="No other errors were found." A.14 ⟦⟧ XML: mixed messages The example uses one output queue number 1 for the closing tag. 24fd4f3 messages/mixed.xml.m4 1 # A → β 2 # β 3 define(⟦ERROR⟧, ⟦ 4 5 divert(0)dnl 6 <$0_NAME> 7 ⟦$1⟧ 8 $2 9 10 divert(-1) 11 ⟧) 12 13 # A → β 14 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 15 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 16 17 divert(0)dnl 18 19 20 21 divert(1)dnl 22 23 divert(-1) $ m4 root0u.m4 queues.m4 mixed.xml.m4 markup.m4 messages.mc > mixed.xml 97e45a3 messages/mixed.xml 1 2 3 4 5 COMPLEX 6 !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 7 8 9 READABLE 10 Is badly written M4 code readable [N/y]? 11 12 13 SUCCESS 14 Complex M4 code failed successfully. 15 16 17 ADDICTIVE 18 Programming in M4 is addictive! 19 20 21 NO_FAULT 22 It's not a language fault! 23 24 25 NO_ERRORS 26 No other errors were found. 27 28 A.15 ⟦⟧ XML: separated messages The example groups messages by their type using output queues. 24fd4f3 messages/messages.xml.m4 1 # A → β 2 # β 3 define(⟦ERROR⟧, ⟦ 4 5 # transition to the next node 6 define(⟦$0⟧, defn(⟦NEXT_ITEM⟧)) 7 8 divert($0_QU)dnl 9 <$0_NAME> 10 11 ⟦$1⟧ 12 $2 13 14 divert($0_END)dnl 15 16 divert(-1) 17 ⟧) 18 19 # β 20 define(⟦NEXT_ITEM⟧, ⟦ 21 22 divert($0_QU)dnl 23 24 ⟦$1⟧ 25 $2 26 27 divert(-1) 28 ⟧) 29 30 # A → β 31 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 32 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 33 34 divert(0)dnl 35 36 37 38 divert(LAST_QUEUE)dnl 39 40 divert(-1) $ m4 root0u.m4 queues.m4 messages.xml.m4 markup.m4 messages.mc > messages.xml 97e45a3 messages/messages.xml 1 2 3 4 5 6 READABLE 7 Is badly written M4 code readable [N/y]? 8 9 10 11 12 COMPLEX 13 !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 14 15 16 SUCCESS 17 Complex M4 code failed successfully. 18 19 20 NO_FAULT 21 It's not a language fault! 22 23 24 25 26 ADDICTIVE 27 Programming in M4 is addictive! 28 29 30 NO_ERRORS 31 No other errors were found. 32 33 34 A.16 ⟦⟧ Bash $ echo "string" 24fd4f3 messages/doubleq.sh.m4 1 # A → β 2 # β 3 define(⟦ERROR⟧, ⟦ 4 5 divert(0)dnl 6 echo "$2" 7 divert(-1) 8 ⟧) 9 10 # A → β 11 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 12 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 13 14 divert(0)dnl 15 #!/bin/bash 16 # 17 ⟦#⟧ DONTE() 18 19 divert(-1) $ m4 root0u.m4 doubleq.sh.m4 doubleq.m4 messages.mc > doubleq.sh 97e45a3 messages/doubleq.sh 1 #!/bin/bash 2 # 3 # DO NOT EDIT! This file is generated automatically! 4 5 echo ""\!"\"#$%&'()*+,-./:;<=>?@[\\]^_\`{|}~" 6 echo "Is badly written M4 code readable [N/y]?" 7 echo "Complex M4 code failed successfully." 8 echo "Programming in M4 is addictive"\!"" 9 echo "It's not a language fault"\!"" 10 echo "No other errors were found." A.17 ⟦⟧ Bash $ echo 'string' 24fd4f3 messages/apost.sh.m4 1 # A → β 2 # β 3 define(⟦ERROR⟧, ⟦ 4 5 divert(0)dnl 6 echo '$2' 7 divert(-1) 8 ⟧) 9 10 # A → β 11 define(⟦QUERY⟧, defn(⟦ERROR⟧)) 12 define(⟦WARNING⟧, defn(⟦ERROR⟧)) 13 14 divert(0)dnl 15 #!/bin/bash 16 # 17 ⟦#⟧ DONTE() 18 19 divert(-1) $ m4 root0u.m4 apost.sh.m4 apost.m4 messages.mc > apost.sh 97e45a3 messages/apost.sh 1 #!/bin/bash 2 # 3 # DO NOT EDIT! This file is generated automatically! 4 5 echo '!"#$%&'\''()*+,-./:;<=>?@[\]^_`{|}~' 6 echo 'Is badly written M4 code readable [N/y]?' 7 echo 'Complex M4 code failed successfully.' 8 echo 'Programming in M4 is addictive!' 9 echo 'It'\''s not a language fault!' 10 echo 'No other errors were found.' B Preprocessor examples 🛈 Chars. {`', [], ␂␆, ⟦⟧} in the name controls the expansion of nonterminals. B.1 `' C preprocessor and M4 B.2 `' CSS: file inclusion, comment B.3 ␂␆ Bash: nonprintable characters B.1 `' C preprocessor and M4 The CPP directives are a one-line comment for M4 preventing unwanted expansion of the same named macros. If we define a safer SAF() macro, the similar SAF () macro will not be overwritten. Thus, the CPP namespace can be completely separated from the M4 namespace. The problematic (backquote) character ` is hidden in the LL() macro. The apostrophe ' does not matter in the source code. Apostrophe inside ORD() macro is hidden in RR() macro. Note the define () or ifelse () function names and where the SYMBOL is expanded. cf9fdc8 preproc/file.c.m4 1 # ORDinary and SAFe macros have different expansion: 2 3 # A → β 4 define(`ORD', `$0_M4 RR()SYMBOL`'RR()') 5 define(`SAF', `ifelse(`$#', `0', ``$0'', `($1) * ($1) /* $0_M4 SYMBOL */')') 6 7 divert(0)dnl cf9fdc8 preproc/file.c 1 /* 2 * DONTE() 3 */ 4 5 #include /* CPP SYMBOL */ 6 7 #define SYMBOL /* CPP SYMBOL */ 8 #define SAF(x) ((x) * ((x) - 1)) /* CPP SYMBOL */ 9 #define ORD(x) CPP SYMBOL x 10 11 int a = SAF (1 + 1); /* CPP */ 12 int b = SAF(2 + 2); /* M4 */ 13 char chr = 'x'; 14 char foo[] = "Let's say: 'SYMBOL'"; 15 char bar[] = "ORD (args, are, ignored)"; 16 17 static void define (char *s) { puts(s);} 18 static void ifelse (char *s) { puts(s);} 19 20 int main(void) 21 { 22 23 #ifdef SYMBOL /* SYMBOL */ 24 puts("LL()SYMBOL'"); /* note: `LL()SYMBOL' */ 25 #endif 26 27 define (foo); /* SYMBOL */ 28 ifelse (bar); /* SYMBOL() */ 29 30 return 0; 31 } $ m4 -DSYMBOL='Hello, world!' root0q.m4 file.c.m4 file.c > preproc.file.c cf9fdc8 preproc/preproc.file.c 1 /* 2 * DO NOT EDIT! This file is generated automatically! 3 */ 4 5 #include /* CPP SYMBOL */ 6 7 #define SYMBOL /* CPP SYMBOL */ 8 #define SAF(x) ((x) * ((x) - 1)) /* CPP SYMBOL */ 9 #define ORD(x) CPP SYMBOL x 10 11 int a = SAF (1 + 1); /* CPP */ 12 int b = (2 + 2) * (2 + 2) /* SAF_M4 Hello, world! */; /* M4 */ 13 char chr = 'x'; 14 char foo[] = "Let's say: 'Hello, world!'"; 15 char bar[] = "ORD_M4 'Hello, world!' (args, are, ignored)"; 16 17 static void define (char *s) { puts(s);} 18 static void ifelse (char *s) { puts(s);} 19 20 int main(void) 21 { 22 23 #ifdef SYMBOL /* SYMBOL */ 24 puts("`Hello, world!'"); /* note: LL()SYMBOL */ 25 #endif 26 27 define (foo); /* Hello, world! */ 28 ifelse (bar); /* Hello, world! */ 29 30 return 0; 31 } B.2 `' CSS: file inclusion, comment CSS uses the # character for color codes, which is also the beginning of a one-line M4 comment. The changecom(/*,*/) keyword sets a multiline /* … */ comment and rewrites itself into ε. The comments can be turned off with the same changecom keyword without parameters. 9e13656 preproc/foo.css 1 .foo { 2 border: WIDTH 2px 1px; 3 } 9e13656 preproc/file.css.m4 1 # CSS preprocessor 2 3 define(`WIDTH', `3px') 4 define(`TOP', `#f00') 5 define(`SIDES', `#0f0') 6 define(`BOTTOM', `#00f') 7 define(`SITE', `www.root.cz') 8 define(`IMAGE', `m4tux.png') 9 define(`PATH', `https://SITE/IMAGE') 10 11 divert(0)dnl 3ed8f6a preproc/file.css 1 /* DONTE() */changecom(/*,*/) 2 /* DONTE() */ 3 4 include(`foo.css')dnl 5 6 .bar { 7 border-width: WIDTH; 8 border-color: TOP SIDES BOTTOM; 9 background-image: url('PATH'); 10 } 11 12 /* DONTE() */ 13 changecom/* DONTE() */changecom(/*,*/) $ m4 -DSYMBOL='Hello, world!' root0q.m4 file.css.m4 file.css > preproc.file.css 41542d1 preproc/preproc.file.css 1 /* DO NOT EDIT! This file is generated automatically! */ 2 /* DONTE() */ 3 4 .foo { 5 border: 3px 2px 1px; 6 } 7 8 .bar { 9 border-width: 3px; 10 border-color: #f00 #0f0 #00f; 11 background-image: url('https://www.root.cz/m4tux.png'); 12 } 13 14 /* DONTE() */ 15 /* DO NOT EDIT! This file is generated automatically! */ B.3 ␂␆ Bash: nonprintable characters Bash uses both ` and [ characters. If we do not want to hide them in an LL() macro, we can use nonprintable characters for expansion control, see the example: b53eafe preproc/file.sh.m4 1 # vim:mps+=␂\:␆ 2 3 # A → β 4 define(␂LEFT␆, ␂$␂#␆␆) 5 define(␂OP␆, ␂-eq␆) 6 define(␂RIGHT␆, ␂0␆) 7 8 divert(0)dnl b53eafe preproc/file.sh 1 #!/bin/bash 2 # 3 ␂#␆ DONTE() 4 5 HELLO=`echo 'SYMBOL'` 6 7 if [[ LEFT OP RIGHT ]] 8 then 9 echo $HELLO 10 fi $ m4 -DSYMBOL='Hello, world!' root0n.m4 file.sh.m4 file.sh > preproc.file.sh b53eafe preproc/preproc.file.sh 1 #!/bin/bash 2 # 3 # DO NOT EDIT! This file is generated automatically! 4 5 HELLO=`echo 'Hello, world!'` 6 7 if [[ $# -eq 0 ]] 8 then 9 echo $HELLO 10 fi C M4: examples 🛈 Chars. {`', [], ␂␆, ⟦⟧} in the name controls the expansion of nonterminals. C.1 [] JSON: left bracket [ C.2 [] Bash: counters C.3 [] .h: brackets [], [,], [#], [dnl] C.4 [] AWK: examples of safer macros C.1 [] JSON: left bracket [ The [… nonterminals are not expanded …] inside square brackets. Therefore, the left square bracket [ is replaced by the LL() macro defined in the root file. 39013f2 hello_world/json.m4 1 # JSON 2 3 divert(0)dnl 4 {"foo": { 5 "_comment": "DONTE()", 6 "bar": LL() 7 {"baz": "SYMBOL"} 8 ] 9 }} $ m4 -DSYMBOL='Hello, world!' root0b.m4 json.m4 > hello_world.json b53eafe hello_world/hello_world.json 1 {"foo": { 2 "_comment": "DO NOT EDIT! This file is generated automatically!", 3 "bar": [ 4 {"baz": "Hello, world!"} 5 ] 6 }} C.2 [] Bash: counters The COUNT_UP and COUNT_DOWN counters are defined in the file countu.m4. The nonterminals [… inside brackets …] will not be expanded, only the outer brackets will be removed. The LL() macro defined in the root file must be used. 39013f2 hello_world/sh.m4 1 # A → β 2 define([LEFT], [$[#]]) 3 define([OP], [-eq]) 4 define([RIGHT], [0]) 5 6 # define two counters 7 # A → β 8 define([__COUNTUP__], defn([COUNT_UP])) 9 define([__COUNTDN__], defn([COUNT_DOWN])) 10 11 # init counters 12 __COUNTUP__(10) 13 __COUNTDN__(10) 14 15 divert(0)dnl 16 #!/bin/bash 17 # 18 [#] DONTE() 19 20 if [ LEFT OP RIGHT ] 21 then 22 echo '__COUNTUP__] SYMBOL LL()__COUNTDN__' 23 fi 24 25 if test LEFT OP RIGHT 26 then 27 echo '__COUNTUP__] SYMBOL LL()__COUNTDN__' 28 fi 29 30 if LL()LL() LEFT OP RIGHT ]] 31 then 32 echo '__COUNTUP__] SYMBOL LL()__COUNTDN__' 33 fi $ m4 -DSYMBOL='Hello, world!' root0u.m4 countu.m4 sh.m4 > hello_world.sh b53eafe hello_world/hello_world.sh 1 #!/bin/bash 2 # 3 # DO NOT EDIT! This file is generated automatically! 4 5 if LEFT OP RIGHT 6 then 7 echo '10] Hello, world! [10' 8 fi 9 10 if test $# -eq 0 11 then 12 echo '11] Hello, world! [9' 13 fi 14 15 if [[ $# -eq 0 ]] 16 then 17 echo '12] Hello, world! [8' 18 fi C.3 [] .h: brackets [], [,], [#], [dnl] The empty pair [] (or the empty symbol in brackets [ε]) serves as a symbol separator. Brackets around the comment character [#] turn off its original meaning as well as the meaning of the more powerful M4 comment [dnl]. They also turn off the original meaning of the comma [,] as a macro argument delimiter. These symbols become ordinary terminal symbols without any side effect. ce5cd99 hello_world/h.m4 1 # A → β 2 define([HELLO], [HELLO_WORLD]) 3 4 divert(0)dnl 5 /* 6 * [dnl] DONTE() 7 */ 8 9 [#]ifndef __[]HELLO[]_H 10 [#]define __[]HELLO[]_H 11 12 [#]define HELLO SYMBOL 13 14 [#]endif /* __[]HELLO[]_H */ $ m4 -DSYMBOL='Hello, world!' root0b.m4 h.m4 > hello_world.h 6b10c6c hello_world/hello_world.h 1 /* 2 * dnl DO NOT EDIT! This file is generated automatically! 3 */ 4 5 #ifndef __HELLO_WORLD_H 6 #define __HELLO_WORLD_H 7 8 #define HELLO_WORLD Hello, world! 9 10 #endif /* __HELLO_WORLD_H */ C.4 [] AWK: examples of safer macros The universal alert DONTE is ignored without parentheses, such as for example LL or RR. Such macros are explicitly created by a script developer, see the root file root1b.m4. 39013f2 hello_world/awk.m4 1 # AWK 2 3 divert(0)dnl 4 #!/bin/awk -f 5 # 6 [# DONTE()] ---> "DONTE()" 7 8 BEGIN { print "DONTE[]() LL () LL() SYMBOL ]" } $ m4 -DSYMBOL='Hello, world!' root1b.m4 awk.m4 > hello_world.awk 02f55b8 hello_world/hello_world.awk 1 #!/bin/awk -f 2 # 3 # DONTE() ---> "DO NOT EDIT! This file is generated automatically!" 4 5 BEGIN { print "DONTE() LL () [ Hello, world! ]" } D Why to use M4 and why not? D.1 👍 Why to generate code in M4 D.2 👎 Why to avoid M4 D.1 👍 Why to generate code in M4 • direct use of context-free grammar (recursion for free) • minimum M4 code is required for data transformation • direct use of automata • possibility to model necessary algorithms (M4 does not need versions) • direct use of stacks • stacks connected to automata extend capabilities of code generator • direct use of output queues to temporarily store resulting pieces of code • individual queues are finally dumped to output in ascending order • significantly faster code generation (compared to XSLT) • low demands on computing resources D.2 👎 Why to avoid M4 • low-level universal language (similar to C language) • which in return it provides tremendous flexibility as UNIX • almost nonexistent developer community (as of Autumn 2019) • M4 is nearly forgotten language with small number of existing projects • unusual programming paradigm requiring several prerequisites • that is why the M4 can be considered a challenging language • productivity greatly depends on experience (problem with short-term deadlines) • writing M4 scripts requires basic knowledge of automata and grammars • maintaining badly written M4 code is not easy • existing M4 code is easily thrown into confusion (supervision required!)