// This M5 spec is generated with the help of M5 itself. // Since M5 syntax appears throughout, we have to be careful about M5's processing of this syntax // with careful use of quotes, etc. m5_pragma_disable_literal_comma /// This library was written before the existence of literal commas. m4_define(['m5_\need_docs'], yes) m5_use(m5-local) m5_var(main_doc, <"> = M5 Text Processing Language User's Guide :toc: macro :toclevels: 3 // Web page meta data. :keywords: Gnu, M4, M5, macro, preprocessor, TL-Verilog, Redwood + EDA, HDL :description: M5 is a macro preprocessor on steroids. It is built on the simple principle of text + substitution but provides features and syntax on par with other simple programming languages. + It is an easy and capable tack-on enhancement to any text format as well as + a reasonable general-purpose programming language specializing in text processing. + Its broad applicability makes M5 a valuable tool in every programmer/engineer/scientist/AI's toolbelt. //:library: M5 :idprefix: m5_ :numbered: :secnums: :sectnumlevels: 4 :imagesdir: images :experimental: //:css-signature: m5doc //:max-width: 800px //:doctype: book //:sectids!: ifdef::env-github[] :note-caption: :information_source: :tip-caption: :bulb: endif::[] [.text-center] _To enrich any text format_ [.text-center] M5 version 2.0, document subversion 1, 2024 + by Steve Hoover, Redwood EDA, LLC + (mailto:steve.hoover@redwoodeda.com[steve.hoover@redwoodeda.com]) This document is licensed under the https://creativecommons.org/publicdomain/zero/1.0/legalcode[CC0 1.0 Universal] license. The M5 text processing language and tool enhances the Gnu M4 macro preprocessor, adding features typical of programming languages. toc::[] == Background Information === Overview {description} This chapter provides background and general information about M5, guidance about this specification, and instructions for using M5. === About this Specification This document covers the M5 language as well as its standard <>. This document's major version reflects the language version, and the minor version reflects the library version. There is also a document subversion distinguishing versions of this document with no corresponding language or library changes. === M5's Origin Story I created M5 as a preprocessor for the https://tl-x.org[TL-Verilog] hardware language and later decoupled it as a stand-alone tool. The original intent was to use an out-of-the box macro preprocessor to provide a stop-gap solutions to missing TL-Verilog language features for "code construction" as TL-Verilog took shape. While other hardware languages build on existing programming languages to provide code construction, I wanted a simpler approach that would be less intimidating to hardware folks. M4 was the obvious choice as the most broadly adopted macro preprocessor. M4 proved to be capable, but extremely difficult to work with. After a few years fighting with an approach that was intended to allow me to focus my attention elsewhere, I decided I needed to either find a different approach or clean up the one I had. I felt my struggles had led to some worthwhile insights and that there was a place in the world for a better text processing language/tool, so I carved out some time to polish my mountain of hacks. Though M5 would benefit from a fresh non-M4/Perl-based implementation, I had to draw the line somewhere. At this point, that legacy is mostly behind the scenes, and while it's not everything I'd like it to be, it's close, and it's way better than any other text preprocessor I'm aware of. So I hope you enjoy the language I never wanted to write. I'm actually rather proud of it and find new uses for it every day. [[vs_m4]] === M5 Versus M4 M5 uses M4 to implement a text-preprocessing language with some subtle philosophical differences. M5 aims to preserve most of the conceptual simplicity of macro preprocessing while adding features that improve readability, manageability, and debuggability for more complex use cases. This document is intended to stand on its own, independent of the https://www.gnu.org/software/m4/[M4 documentation]. The M4 documentation can, in fact, be confusing due to M5's philosophical differences with M4. Beyond M4, M5 contributes: - features that feel like a typical, simple programming language - literal string variables - functions with named arguments - variable/macro scope - an intentionally minimal amount of syntactic sugar - document generation assistance - debug aids such as stack traces - safer parsing and string manipulation - a richer core library of utilities - a future plan for modular libraries === Limitations of M5 M4 has certain limitations that M5 is unable to address. M5 uses M4 as is without modifications to the M4 implementation (though these limitations may motivate changes to M4 in the future). ==== Security M4 has full access to its host environment (similar to most programming and scripting languages, but unlike many macro preprocessors). Malware can easily do harm. Third- party M5 code should be carefully vetted before use, or M5 should be run within a contained environment. M5 provides a simple mechanism for library inclusion by URL (or it will). This enables easy execution of public third-party code, so use it with extreme caution. ==== Modularity M4 does not provide any library, namespace, and version management facilities. Though M5 does not currently address these needs, plans have been sketched in code comments. ==== String processing While macro processing is all about string processing, safely manipulating arbitrary strings is not possible in M4 or it is beyond awkward at best. M4 provides `m4_regexp`, `m4_patsubst`, and `m4_substr`. These return unquoted strings that will necessarily be elaborated, potentially altering the string. While M5 is able to jump through hoops to provide <> and <> (for strings of limited length) that return quoted (literal) text, `m4_patsubst` cannot be fixed (though <> is similar). The result of `m4_patsubst` can be quoted only by quoting the input string, which can complicate the match expression, or by ensuring that all text is matched, which can be awkward, and quoting substitutions. In addition to these issues, care must be taken to ensure that resulting text does not contain mismatching quotes or parentheses or combine with surrounding text to result in the same. Such resulting mismatches are difficult to debug. M5 provides a notion of "unquoted strings" that can be safely manipulated using <>, and <>. Additionally the regex configuration used by M4 is quite dated. For example, it does not support lookahead, lazy matches, and character codes. ==== Introspection Introspection is essentially impossible. The only way to see what is defined is to dump definitions to a file and parse this file. ==== Recursion Recursion has a fixed (command-line) depth limit, and this limit is not applied reliably. ==== Unicode M4 is an old tool and was built for ASCII text. UTF-8 is now the most common text format. It is a superset of ASCII that encodes additional characters as two or more bytes using byte codes (0x10-0xFF) that do not conflict by those defined by ASCII (0x00-0x7F). All such bytes (0x10-0xFF) are treated as characters by M4 with no special meaning, so these characters pass through, unaffected, in macro processing like most others. There are two implications to be aware of. First, <> provides a length in bytes, not characters. Second, <> and regular expressions manipulate bytes, not characters. This can result in text being split in the mid-character, resulting in invalid character encodings. ==== Debugging features M4's facilities for associating output with input only map output lines to line numbers of top-level calls. M4 does not maintain a call stack with line numbers. M4 and M5 have no debugger to step through code. Printing (see <> is the debugging mechanism of choice. ==== Performance M5 is intended for text processing, not for compute-intensive algorithms. Use a programming language for that. ==== Graphics M5 is for text processing only. ==== Status Major next steps include: - Implementing a better library system. - Some syntactic sugar (quotes, code blocks) should not be recognized in source context. See issues file in the https://github.com/rweda/M5[M5 repository] for more details. /** ==== Futures - All literal text must be quoted. - Being more explicit about text that should evaluate and text that shouldn't might be better. Quote type: - `"` for literal text (Note, there is no distinction between begin and end.) - `{`/`}` for code (for current implementation, use `[`/`]`). Code must be used as a parameter (or use eval sugar). - `<'>`/`<'>` for heavily quoted text that may have `\n` and `"` in it. (same begin/end) - All of the above followed by a newline for block versions, excluding continuation cases (see below). - Unquoted whitespace can be ignored! - There's no need for `m5_` prefix! Any unquoted text is a macro or variable instantiation. - Suddenly, there's no need for code context. There's just quoted context and unquoted context. - There's room for other unquoted syntax, like comments, though, maybe `<'> |` for line comment in heavy block quote context, where the ending `<'>` is implied by by "\n"?? Probably not. - `"+` for literal quote character within "...". There is nothing else to escape. - Require a prefix for "value of"? Yes, "$MyVar". Thus we can: - Allow unquoted text as a complete macro argument. Not all characters are permitted. This way, variable names and function parameters need not be quoted. - Continuation cases return to quote and argument list context that existed previously. This affects parenthesis checking and quote-type matching. Currently, this is based on returning to a quote level within a line. Link this instead to labeled escape quotes (permitting "<>"). Label escapes can be more generic allowing quoting within the escape, and we can permit continuation across multiple lines. Hmmm... don't have escape quotes. Instead have scoped variable instantiations and macro calls. E.g.: var(ErrorLevel, "error") # or "warning" or "info" ... macro(report, Msg, r:{call(r:$ErrorLevel, $Msg)}) This way there's no syntax ending the code block, so continuation is clear and parens clearly match. - "*" (for evaluate) is no longer worth it. - Added "3.foo" syntax for numbered function parameters to avoid "[". All other cases are handled by ending the string and starting a new one. And, it's better for the escape char to follow the quote since the quote gets you into code context with controlled syntax. Otherwise the literal escape char would need an escape char. `-` can also be thought of as code syntax for continue text with added `"`. Note that heavy quotes can also be used. - We add "heavy block quotes" (`<'>`...`<'>`) to reduce the likelihood of inadvertent end quotes. Source context is heavily block quoted. - There's no escaping (`m5_\`, `\m5_`). - Keep "~" or drop it? Required only when beginning a line? - Probably still check for balanced quotes and parens. (Not for constructed code.) - Anything to enforce consistent formatting (like in code blocks)? Probably worth it. - Allow unquoted numbers. Text that starts with a digit is automatically quoted till the end of the word. C-style numbers. - Support operators and () in unquoted context for arithmetic? Operators work on text (like everything, and that text must represent numbers). Could do full order of operations, only left- to-right, or only two operands with nesting, e.g. "((1 + Val) * 3)". - No `$1` syntax in the language itself, but this can be supported by a `macro` macro. `arg(2)` would call a builtin to access arg 2. Provide other builtins for special M4 `$` syntax. - eval("foo(1, 2)") is fine. - foo() has zero args; foo("") has one? - But... <'>MyVar<'> is clunky. Maybe for heavy text we keep `m5_`/`\m5_` as an/the escape and `///`/`/**` for comments??? Maybe just a special case for variables. Needs regex configurability. (Post-string substitutions can be performed by wrapping the heavy string in a macro call.) - We need to use `~` for something so we can call the language Tilda and give files a `~` extension, like `model.tlv~`. Two-line make code to preprocess and run. Example: Source... blah, blah, blah.<'> # comment <'> Then, blah, blah, blah.<'> | line comment? etc. <'> set(Me, "Steve") fn(foo, f:{ # f is a label "text" if(1 || $Cnt, { hi()" more text" }) "Hey, "$Player"!" if_eq($Val, 2, {hi()"."}) # A text block with explicit indentation. chomp(" # Quote can be on this line as well, as long as the text doesn't start with whitespace. v Here's a line of text as a block. ".) # "." includes a newline at the end of the block without an extra line. fn(Hi, { "Hi,"f:Me"." # Me in f: context. on_return(<{>set(greeting, {pad("Hi"<}>$ArgList<{>"!")})<}>) }) })<'> ...and on we go. ==== Implementation Plan Extending Perl `pre_m4` to get most of the benefit (though less simplification to docs): - Keep code block `{` and `[` as they are. - Keep text blocks as they are. - Keep `