--- name: parser-development description: Guide for implementing parsers with error recovery for new languages in Biome. Use when creating parsers for JavaScript, CSS, JSON, HTML, GraphQL, or adding new language support. Examples:User needs to add parsing support for a new languageUser wants to implement error recovery in parserUser is writing grammar definitions in .ungram format --- ## Purpose Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns. ## Prerequisites 1. Install required tools: `just install-tools` 2. Understand the language syntax you're implementing 3. Read `crates/biome_parser/CONTRIBUTING.md` for detailed concepts ## Common Workflows ### Create Grammar for New Language Create a `.ungram` file in `xtask/codegen/` (e.g., `html.ungram`): ``` // html.ungram // Legend: // Name = -- non-terminal definition // 'ident' -- token (terminal) // A B -- sequence // A | B -- alternation // A* -- zero or more repetition // (A (',' A)* ','?) -- repetition with separator and optional trailing comma // A? -- zero or one repetition // label:A -- suggested name for field HtmlRoot = element* HtmlElement = '<' tag_name: HtmlName attributes: HtmlAttributeList '>' children: HtmlElementList '<' '/' close_tag_name: HtmlName '>' HtmlAttributeList = HtmlAttribute* HtmlAttribute = | HtmlSimpleAttribute | HtmlBogusAttribute HtmlSimpleAttribute = name: HtmlName '=' value: HtmlString HtmlBogusAttribute = /* error recovery node */ ``` **Naming conventions:** - Prefix all nodes with language name: `HtmlElement`, `CssRule` - Unions start with `Any`: `AnyHtmlAttribute` - Error recovery nodes use `Bogus`: `HtmlBogusAttribute` - Lists end with `List`: `HtmlAttributeList` - Lists are mandatory (never optional), empty by default ### Generate Parser from Grammar ```shell # Generate for specific language just gen-grammar html # Generate for multiple languages just gen-grammar html css # Generate all grammars just gen-grammar ``` This creates: - `biome_html_syntax/src/generated/` - Node definitions - `biome_html_factory/src/generated/` - Node construction helpers - Parser skeleton files (you'll implement the actual parsing logic) ### Implement a Lexer Create `lexer/mod.rs` in your parser crate: ```rust use biome_html_syntax::HtmlSyntaxKind; use biome_parser::{lexer::Lexer, ParseDiagnostic}; pub(crate) struct HtmlLexer<'source> { source: &'source str, position: usize, current_kind: HtmlSyntaxKind, diagnostics: Vec, } impl<'source> Lexer<'source> for HtmlLexer<'source> { const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE; const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE; type Kind = HtmlSyntaxKind; type LexContext = (); type ReLexContext = (); fn source(&self) -> &'source str { self.source } fn current(&self) -> Self::Kind { self.current_kind } fn position(&self) -> usize { self.position } fn advance(&mut self, context: Self::LexContext) -> Self::Kind { // Implement token scanning logic let start = self.position; let kind = self.read_next_token(); self.current_kind = kind; kind } // Implement other required methods... } ``` ### Implement Token Source ```rust use biome_parser::lexer::BufferedLexer; use biome_html_syntax::HtmlSyntaxKind; use crate::lexer::HtmlLexer; pub(crate) struct HtmlTokenSource<'src> { lexer: BufferedLexer>, } impl<'source> TokenSourceWithBufferedLexer> for HtmlTokenSource<'source> { fn lexer(&mut self) -> &mut BufferedLexer> { &mut self.lexer } } ``` ### Write Parse Rules Example: Parsing an if statement: ```rust use biome_parser::prelude::*; use biome_js_syntax::JsSyntaxKind::*; fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax { // Presence test - return Absent if not at 'if' if !p.at(T![if]) { return Absent; } let m = p.start(); // Parse required tokens p.expect(T![if]); p.expect(T!['(']); // Parse required nodes with error recovery parse_any_expression(p).or_add_diagnostic(p, expected_expression); p.expect(T![')']); parse_block_statement(p).or_add_diagnostic(p, expected_block); // Parse optional else clause if p.at(T![else]) { parse_else_clause(p).ok(); } Present(m.complete(p, JS_IF_STATEMENT)) } ``` ### Parse Lists with Error Recovery Use `ParseSeparatedList` for comma-separated lists: ```rust struct ArrayElementsList; impl ParseSeparatedList for ArrayElementsList { type ParsedElement = CompletedMarker; fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax { parse_array_element(p) } fn is_at_list_end(&self, p: &mut Parser) -> bool { // Stop at array closing bracket or file end p.at(T![']']) || p.at(EOF) } fn recover( &mut self, p: &mut Parser, parsed_element: ParsedSyntax, ) -> RecoveryResult { parsed_element.or_recover( p, &ParseRecoveryTokenSet::new( JS_BOGUS_EXPRESSION, token_set![T![']'], T![,]] ), expected_array_element, ) } fn separating_element_kind(&mut self) -> JsSyntaxKind { T![,] } } // Use the list parser fn parse_array_elements(p: &mut Parser) -> CompletedMarker { let m = p.start(); ArrayElementsList.parse_list(p); m.complete(p, JS_ARRAY_ELEMENT_LIST) } ``` ### Implement Error Recovery Error recovery wraps invalid tokens in `BOGUS` nodes: ```rust // Recovery set includes: // - List terminator tokens (e.g., ']', '}') // - Statement terminators (e.g., ';') // - List separators (e.g., ',') let recovery_set = token_set![T![']'], T![,], T![;]]; parsed_element.or_recover( p, &ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set), expected_expression_error, ) ``` ### Handle Conditional Syntax For syntax only valid in certain contexts (e.g., strict mode): ```rust fn parse_with_statement(p: &mut Parser) -> ParsedSyntax { if !p.at(T![with]) { return Absent; } let m = p.start(); p.bump(T![with]); parenthesized_expression(p).or_add_diagnostic(p, expected_expression); parse_statement(p).or_add_diagnostic(p, expected_statement); let with_stmt = m.complete(p, JS_WITH_STATEMENT); // Mark as invalid in strict mode let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| { p.err_builder( "`with` statements are not allowed in strict mode", marker.range(p) ) }); Present(conditional.or_invalid_to_bogus(p)) } ``` ### Test Parser Create test files in `tests/`: ``` crates/biome_html_parser/tests/ ├── html_specs/ │ ├── ok/ │ │ ├── simple_element.html │ │ └── nested_elements.html │ └── error/ │ ├── unclosed_tag.html │ └── invalid_syntax.html └── html_test.rs ``` Run tests: ```shell cd crates/biome_html_parser cargo test ``` ## Tips - **Presence test**: Always return `Absent` if the first token doesn't match - never progress parsing before returning `Absent` - **Required vs optional**: Use `p.expect()` for required tokens, `p.eat()` for optional ones - **Missing markers**: Use `.or_add_diagnostic()` for required nodes to add missing markers and errors - **Error recovery**: Include list terminators, separators, and statement boundaries in recovery sets - **Bogus nodes**: Check grammar for which `BOGUS_*` node types are valid in your context - **Checkpoints**: Use `p.checkpoint()` to save state and `p.rewind()` if parsing fails - **Lookahead**: Use `p.at()` to check tokens, `p.nth_at()` for lookahead beyond current token - **Lists are mandatory**: Always create list nodes even if empty - use `parse_list()` not `parse_list().ok()` ## Common Patterns ```rust // Optional token if p.eat(T![async]) { // handle async } // Required token with error p.expect(T!['{']); // Optional node parse_type_annotation(p).ok(); // Required node with error parse_expression(p).or_add_diagnostic(p, expected_expression); // Lookahead if p.at(T![if]) || p.at(T![for]) { // handle control flow } // Checkpoint for backtracking let checkpoint = p.checkpoint(); if parse_something(p).is_absent() { p.rewind(checkpoint); parse_something_else(p); } ``` ## References - Full guide: `crates/biome_parser/CONTRIBUTING.md` - Grammar examples: `xtask/codegen/*.ungram` - Parser examples: `crates/biome_js_parser/src/syntax/` - Error recovery: Search for `ParseRecoveryTokenSet` in existing parsers