# Parakeet ## Overview Parakeet is a .NET parser combinator library focused on Parsing Expression Grammars (PEG). It provides a declarative approach to defining grammars using C# classes and rules, where each grammar rule is a property that combines primitive parsers with combinators like sequence, choice, repetition, and lookahead. Parakeet generates parse trees from input text, which can then be traversed for interpretation or transformation. The library emphasizes simplicity and readability over raw performance. ## NuGet Package - `Parakeet` -- core parser combinator library ## Grammar Definition ```csharp using Parakeet; // Define a grammar by inheriting from Grammar public class ArithmeticGrammar : Grammar { // Primitive rules public Rule Digit => MatchChar(char.IsDigit); public Rule Letter => MatchChar(char.IsLetter); public Rule WS => MatchChar(char.IsWhiteSpace).ZeroOrMore(); // Number: one or more digits, optionally with decimal point public Rule Integer => Digit.OneOrMore(); public Rule Decimal => Integer + MatchChar('.') + Integer; public Rule Number => (Decimal | Integer) + WS; // Identifier public Rule Identifier => (Letter + (Letter | Digit | MatchChar('_')).ZeroOrMore()) + WS; // Operators public Rule AddOp => (MatchChar('+') | MatchChar('-')) + WS; public Rule MulOp => (MatchChar('*') | MatchChar('/')) + WS; // Expression grammar (recursive) public Rule Factor => Number | (MatchChar('(') + WS + Expr + MatchChar(')') + WS); public Rule Term => Factor + (MulOp + Factor).ZeroOrMore(); public Rule Expr => Term + (AddOp + Term).ZeroOrMore(); // Entry point public override Rule Start => WS + Expr; } ``` ## Parsing Input ```csharp using Parakeet; var grammar = new ArithmeticGrammar(); var input = "3 + 4 * (2 - 1)"; var parseResult = grammar.Parse(input); if (parseResult.Success) { Console.WriteLine("Parse succeeded!"); Console.WriteLine(parseResult.Node.ToXml()); } else { Console.WriteLine($"Parse failed at position {parseResult.Position}"); Console.WriteLine($"Expected: {parseResult.Expected}"); } ``` ## Common Combinators ```csharp using Parakeet; public class CommonPatterns : Grammar { // Sequence: A then B then C public Rule Sequence => RuleA + RuleB + RuleC; // Choice: A or B or C (ordered, PEG semantics) public Rule Choice => RuleA | RuleB | RuleC; // Repetition public Rule ZeroOrMoreDigits => Digit.ZeroOrMore(); public Rule OneOrMoreDigits => Digit.OneOrMore(); public Rule OptionalSign => (MatchChar('+') | MatchChar('-')).Optional(); // Lookahead (does not consume input) public Rule FollowedByDigit => RuleA + Digit.Lookahead(); public Rule NotFollowedByDigit => RuleA + Digit.NotAt(); // String matching public Rule Keyword => MatchString("function") + WS; public Rule Arrow => MatchString("=>") + WS; // Character classes public Rule HexDigit => MatchChar(c => char.IsDigit(c) || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F')); // Named rules for better parse tree nodes public Rule NamedNumber => Named(Number, "Number"); } ``` ## CSV Parser Example ```csharp using Parakeet; public class CsvGrammar : Grammar { // Basic elements public Rule Newline => MatchChar('\n') | MatchString("\r\n"); public Rule Comma => MatchChar(','); public Rule Quote => MatchChar('"'); // Quoted field: handles escaped quotes (doubled) public Rule EscapedQuote => MatchString("\"\""); public Rule QuotedContent => (EscapedQuote | MatchChar(c => c != '"')).ZeroOrMore(); public Rule QuotedField => Quote + QuotedContent + Quote; // Unquoted field: any chars except comma, quote, and newline public Rule UnquotedField => MatchChar(c => c != ',' && c != '"' && c != '\n' && c != '\r').ZeroOrMore(); // Field: either quoted or unquoted public Rule Field => QuotedField | UnquotedField; // Row: fields separated by commas public Rule Row => Field + (Comma + Field).ZeroOrMore(); // CSV file: rows separated by newlines public Rule File => Row + (Newline + Row).ZeroOrMore() + Newline.Optional(); public override Rule Start => File; } // Usage var csv = new CsvGrammar(); var result = csv.Parse("name,age,city\nAlice,30,\"New York\"\nBob,25,London"); if (result.Success) { // Traverse the parse tree to extract data foreach (var row in result.Node.Children) { var fields = row.Children .Where(n => n.RuleName == "Field") .Select(n => n.Text) .ToList(); Console.WriteLine(string.Join(" | ", fields)); } } ``` ## Simple Programming Language Parser ```csharp using Parakeet; public class MiniLangGrammar : Grammar { // Whitespace and basics public Rule WS => MatchChar(c => c == ' ' || c == '\t').ZeroOrMore(); public Rule NL => (MatchString("\r\n") | MatchChar('\n')) + WS; public Rule Digit => MatchChar(char.IsDigit); public Rule Letter => MatchChar(char.IsLetter); // Literals public Rule Integer => Digit.OneOrMore() + WS; public Rule StringLit => MatchChar('"') + MatchChar(c => c != '"').ZeroOrMore() + MatchChar('"') + WS; public Rule BoolLit => (MatchString("true") | MatchString("false")) + WS; public Rule Literal => Integer | StringLit | BoolLit; // Identifiers public Rule Ident => Letter + (Letter | Digit | MatchChar('_')).ZeroOrMore() + WS; // Expressions public Rule Atom => Literal | Ident | (MatchChar('(') + WS + Expr + MatchChar(')') + WS); public Rule CompOp => (MatchString("==") | MatchString("!=") | MatchString("<=") | MatchString(">=") | MatchChar('<') | MatchChar('>')) + WS; public Rule Expr => Atom + (CompOp + Atom).Optional(); // Statements public Rule LetStmt => MatchString("let") + WS + Ident + MatchChar('=') + WS + Expr + MatchChar(';') + WS; public Rule PrintStmt => MatchString("print") + WS + Expr + MatchChar(';') + WS; public Rule IfStmt => MatchString("if") + WS + Expr + MatchChar('{') + WS + NL.ZeroOrMore() + Statements + MatchChar('}') + WS; public Rule Statement => LetStmt | PrintStmt | IfStmt; public Rule Statements => (Statement + NL.ZeroOrMore()).ZeroOrMore(); public override Rule Start => WS + NL.ZeroOrMore() + Statements; } ``` ## Parse Tree Traversal ```csharp using Parakeet; public static class ParseTreeInterpreter { public static object Evaluate(ParseNode node, Dictionary env) { return node.RuleName switch { "Integer" => int.Parse(node.Text.Trim()), "StringLit" => node.Text.Trim('"', ' '), "BoolLit" => bool.Parse(node.Text.Trim()), "Ident" => env[node.Text.Trim()], "LetStmt" => EvaluateLet(node, env), "PrintStmt" => EvaluatePrint(node, env), _ => EvaluateChildren(node, env) }; } private static object EvaluateLet(ParseNode node, Dictionary env) { var children = node.Children.ToList(); var name = children[0].Text.Trim(); var value = Evaluate(children[1], env); env[name] = value; return value; } private static object EvaluatePrint(ParseNode node, Dictionary env) { var value = Evaluate(node.Children.First(), env); Console.WriteLine(value); return value; } private static object EvaluateChildren(ParseNode node, Dictionary env) { object result = null!; foreach (var child in node.Children) result = Evaluate(child, env); return result; } } ``` ## Parakeet vs Other Parsers | Feature | Parakeet | Pidgin | FParsec | Regex | |---------|----------|--------|---------|-------| | Language | C# | C# | F# | Any | | Grammar style | PEG (class-based) | Combinator functions | Combinator functions | Pattern strings | | Parse tree | Automatic | Manual construction | Manual construction | Capture groups | | Recursion | Direct property refs | Forward references | Forward references | Not supported | | Error messages | Position-based | Good | Excellent | Poor | | Best for | Grammar-oriented DSLs | High-performance parsing | F# projects | Simple patterns | ## Best Practices - Define grammars as classes inheriting from `Grammar` with each rule as a property, using PEG operators (`+` for sequence, `|` for ordered choice) for readable grammar definitions. - Use `Named()` to label important rules in the parse tree so traversal code can identify semantic nodes by name rather than position. - Handle whitespace explicitly by adding `+ WS` after token rules; PEG grammars do not skip whitespace automatically. - Use `.Lookahead()` and `.NotAt()` for zero-width assertions to disambiguate grammar rules without consuming input. - Use `.Optional()` for optional elements rather than choice with empty; it produces cleaner parse trees. - Define the grammar entry point via `override Rule Start` so the parser knows which rule to begin parsing from. - Test grammars incrementally: verify each rule parses correctly in isolation before combining into complex grammars. - Use ordered choice (`|`) carefully in PEG grammars; alternatives are tried left-to-right and the first match wins, which can prevent later alternatives from being reached. - Traverse parse trees with pattern matching on `RuleName` to interpret or transform parsed results into domain objects. - For performance-critical parsing of large inputs, consider Pidgin or FParsec instead; Parakeet prioritizes grammar readability over raw throughput.