### The 'ap' (AI-friendly Patch) Format - Version 3.1 ## Abstract This document defines the 'ap' (AI-friendly Patch) format, a declarative, human-readable specification for describing changes to source code files. Traditional patch formats have proven to be brittle when generated by LLMs. The AP format addresses this by using a simple, human-readable set of directives, making it syntactically unambiguous and perfectly aligned with the generative capabilities of AI models. ## 0. Philosophy The `ap` format is built on a specific philosophy: **Adapt the tool to the AI, not the AI to the tool.** Large Language Models are probabilistic, not deterministic. They struggle with strict counting (line numbers), rigid syntax (JSON/YAML), and perfect consistency. Traditional tools force the AI to perform tasks it is bad at. `ap` takes the opposite approach: 1. **Ambiguity is expected:** The tool should use heuristics (like context or locality) to resolve the "which function did you mean?" problem, rather than failing immediately. 2. **Flexibility is Key:** If an AI forgets to explicitly declare a `FILE` before a `CREATE_FILE` action, the tool should infer the intent from the context. ## 1. Introduction ### 1.1. The Problem with Traditional and Structured Formats For decades, developers have used `diff`/`patch`. While powerful, these formats are fragile when used in AI code generation workflows. ### 1.2. Core Principles of the 'ap' Format The 'ap' format is designed from the ground up to be robust and AI-friendly. It is built on three core principles: 1. **Semantic Locating:** Changes are located not by line numbers, but by referencing stable, semantic constructs like function signatures or unique blocks of code. 2. **Declarative Actions:** An `ap` patch contains explicit commands like `REPLACE`, `DELETE`, or `INSERT_AFTER`. 3. **Generative Simplicity:** The format itself is so simple that it eliminates common LLM failure modes. It has no indentation rules, no complex quoting or escaping, and uses a clear, syntactically distinct separation between its own directives and the code it carries. ### 1.3. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. - **Patch File:** A file containing a set of modifications, serialized in the `ap` format. The RECOMMENDED file extension is `.ap`. - **Patcher:** A tool or utility that parses a Patch File and applies the specified modifications to a target source code tree. - **Snippet:** A line or block of code that the Patcher MUST locate within a file. It SHOULD be unique within its search scope. - **Anchor:** An OPTIONAL line or block of code to narrow the search scope for a `Snippet`. ## 2. Format Specification ### 2.1. Serialization Format An `ap` patch file MUST be a plain text file encoded in UTF-8. The format is line-oriented and consists of directives. ### 2.2. Header and Patch ID Any lines at the beginning of the file starting with `#` are considered comments and MUST be ignored by the Patcher. The first non-comment line of an `ap` patch file MUST be the header directive: `[ID] AP 3.1` - `[ID]` is a unique identifier for the entire patch, which MUST be a sequence of 8 random characters, each being a valid hexadecimal character (digit 0–9 or a lowercase letter a–f). E.g., `a0b1c2f9`. - This `[ID]` MUST be used as a prefix for all subsequent directives within the same file. A unique ID is needed to strictly separate `ap` format directives from content without the need for escaping and without the risk of a line similar to the directive appearing in the source or destination file. This also allows for the application of `ap` patches to `ap` patches. ### 2.3. Directives All instructions in the patch file are given through directives. A directive is a line that starts with the patch `[ID]`. There are two types of directives: 1. **KEY-VALUE Directive:** The directive line contains a key, and the content on the following lines (until the next directive) is the value. The value is automatically trimmed of leading and trailing blank lines by the Patcher. Example: `[ID] snippet` 2. **KEY-ARGS Directive:** The directive line contains a key and its arguments. It has no subsequent value block. Example: `[ID] REPLACE`, `[ID] include_leading_blank_lines 1` ### 2.4. File Block A block of changes for a single file begins with a `FILE` directive. `[ID] FILE [OPTIONAL_NEWLINE]` - `[ID]`: The patch's unique identifier. - `FILE`: The keyword indicating the start of a file block. - `[OPTIONAL_NEWLINE]`: An OPTIONAL argument specifying the desired line ending. Allowed values are `LF`, `CRLF`, and `CR`. If omitted, the Patcher SHOULD preserve existing line endings or use the OS default for new files. The value of this directive is the path to the target file, relative to the patch file's location. This path MUST NOT contain components that traverse parent directories (e.g., `..`). Example: ``` a1b2c3d4 FILE CRLF path/to/my/file.txt ``` ### 2.5. Modification Block A new modification block is signaled by an **Action Directive**. A file block can also represent a single `RENAME` operation. **Action Directives (KEY-ARGS)** These directives mark the beginning of a new modification. The value MUST be one of: - `REPLACE`: Replace a snippet or range with new content. - `INSERT_AFTER`: Insert content immediately after a snippet. - `INSERT_BEFORE`: Insert content immediately before a snippet. - `DELETE`: Remove a snippet or range. - `CREATE`: Create a new file or directory. **Content and Locator Directives (KEY-VALUE)** **File Operation Directives** These directives define file-level operations and MUST be mutually exclusive with other Action or File Operation Directives within the same file block. - `RENAME` (KEY-VALUE): Specifies a new name for the file or directory defined in the preceding `FILE` directive. The value of this directive is the new path. **Contextual File Deletion** The `DELETE` Action Directive has a special behavior for file-level operations. A Patcher MUST interpret a `FILE` block that contains **only a `DELETE` directive** and no other directives (such as `snippet` or `content`) as a command to delete the entire file or directory specified in the `FILE` directive. Example of deleting a file: ``` a1b2c3d4 FILE path/to/obsolete/file.txt a1b2c3d4 DELETE ``` These directives provide the data for a modification. A `snippet` or `anchor` MUST be ignored by the Patcher if the action is `CREATE`. - `snippet`: A string of code to locate. If `snippet_tail` is also provided, this marks the beginning of the range. - `anchor`: An OPTIONAL string of code to narrow the search scope for a `snippet`. - `content`: The new code to be used for the operation. MUST be present for `REPLACE`, `INSERT_AFTER`, `INSERT_BEFORE`. For `CREATE`, its presence indicates file creation, and its absence indicates directory creation. MUST be omitted for `DELETE`. - `snippet_tail`: An OPTIONAL string of code that marks the end of a range. It MUST be used together with `snippet`. **Option Directives (KEY-ARGS)** These directives provide optional parameters for a modification. - `include_leading_blank_lines [N]`: Expands the selection to include up to `[N]` blank lines before the snippet. - `include_trailing_blank_lines [N]`: Expands the selection to include up to `[N]` blank lines after the snippet. ## 3. Patcher Implementation Requirements ### 3.0. Atomicity A Patcher MUST treat the application of an entire patch file as a single, atomic transaction. If any modification cannot be successfully applied, the Patcher MUST abort the entire operation and MUST NOT write any changes to any files on the filesystem. ### 3.1. Idempotency A Patcher MUST apply modifications idempotently. Applying the same patch multiple times to the same source tree MUST NOT cause subsequent changes or errors after the first successful application. To achieve this, each modification MUST be checked for its state before the operation is attempted, according to the following rules: - **`RENAME`**: If the source file/directory does not exist but the destination file/directory does, the operation is considered already complete and MUST be skipped. If the source is missing and the destination is also missing, the Patcher MUST report an error. - **`DELETE`**: If the `snippet` is not found, the operation is considered already complete and MUST be skipped. - **`REPLACE`**: If the content at the target location is already identical to the modification's `content`, the operation MUST be skipped. If the `snippet` is not found but the `content` is already present in the search scope, this is also considered a successfully completed operation. - **`INSERT_AFTER`**: If the code immediately following the `snippet` already matches the `content`, the operation MUST be skipped. - **`INSERT_BEFORE`**: If the code immediately preceding the `snippet` already matches the `content`, the operation MUST be skipped. - **`CREATE`**: - For files (when `content` is provided): If a file at the target path already exists and its content is identical to the provided `content`, the operation MUST be skipped. If the file exists with different content, the Patcher MUST report an error. - For directories (when `content` is absent): If a directory at the target path already exists, the operation is considered complete and MUST be skipped. ### 3.2. Search and Location Algorithm A Patcher MUST use a consistent, normalized search algorithm for locating an `anchor` and `snippet`. The matching process MUST follow these steps: 1. The text to be found (`anchor` or `snippet`) is split into a list of lines. 2. Any line containing only whitespace is removed from this list. 3. Each remaining line has its leading and trailing whitespace removed. 4. The Patcher then searches the target file for a sequence of non-empty lines that, after normalization, are identical to the processed list of lines from the text being sought. **Scoping**: If an `anchor` is provided, the Patcher MUST first locate its unique occurrence using the normalized search strategy. The subsequent normalized search for the `snippet` MUST be performed only starting from the line next to the last line of that `anchor`. If the `anchor` is not found, the Patcher MUST report an error. **Uniqueness and Precedence**: - **Anchor**: If an `anchor` is provided, the Patcher MUST find exactly one occurrence of it within the file. If zero or more than one occurrences are found, the Patcher MUST report an error. - **Snippet**: - If an `anchor` is provided, the Patcher's search for the `snippet` begins at the line next to the last line of the located anchor and extends to the end of the file. The Patcher MUST use the *first* occurrence found within this scope. If zero occurrences are found, it MUST report an error. - If no `anchor` is provided, the Patcher MUST find exactly one occurrence of the `snippet` within the entire file. If zero or more than one occurrences are found, it MUST report an error. **Range-based Search**: If both `snippet` and `snippet_tail` are provided (for `REPLACE` or `DELETE` actions): 1. The Patcher first locates the `snippet` following the same uniqueness and scoping rules as a normal `snippet` (i.e., it must be unique within the file or within its `anchor`). 2. After finding the `snippet`, the Patcher searches for the *first* occurrence of the `snippet_tail` that appears *after* the end of the `snippet`. 3. If the `snippet` is found but the `snippet_tail` is not found in the remainder of the search scope, the Patcher MUST report an error. The entire region from the beginning of the `snippet` to the end of the `snippet_tail` is considered the target for the modification. **Sequential Cursoring**: A Patcher MUST process modifications for a single file sequentially. After a modification is successfully calculated (in memory), the patcher MUST record the end position of the change. All subsequent searches (`anchor` or `snippet`) for the same file MUST begin *after* this recorded end position. This ensures that patches are always applied from top to bottom, preventing ambiguous matches with code that has already been processed. ### 3.3. Modification Logic - **Blank Line Inclusion**: If `include_leading_blank_lines` or `include_trailing_blank_lines` are specified, the Patcher MUST expand the region to be modified to include the specified number of consecutive blank lines before or after the located `snippet`. - **Sequential Application**: Within a single file block, modifications MUST be processed sequentially in the order they are defined. The state of the file content *after* one modification has been calculated serves as the input for the search phase of the next modification. This sequential processing happens in memory before any files are written to disk. - **Insertion Context Awareness**: When generating `INSERT_AFTER` or `INSERT_BEFORE` actions to insert code inside a function or block, an AI generating the patch MUST ensure that inserted content does not appear between a declaration line and its opening brace or indentation block. Insertions intended to go *inside* a block MUST use a `snippet` that includes the opening brace or the start of the indented block to guarantee syntactic correctness. ### 3.4. Error Handling The Patcher MUST provide clear, human-readable error messages for failure conditions, including but not limited to target file not found, anchor/snippet not found, or ambiguous matches. ### 3.5. Post-processing After all modifications for a file have been applied in memory, the Patcher MUST remove all trailing whitespace characters (spaces and tabs) from every line of the file's final content before writing it to disk. ### 3.6. Robustness Heuristics (RECOMMENDED) To improve reliability when consuming patches generated by AI, a Patcher SHOULD implement the following heuristics to gracefully handle common generation errors. - **Hybrid Search ("Soft Start, Strict Tail"):** When searching for a multi-line `anchor` or `snippet`, the search algorithm SHOULD match the *first line* of the locator as a **suffix** of a line in the target file, while requiring all *subsequent lines* of the locator to match **exactly**. This makes the patch resilient to minor, non-semantic prefix variations (e.g., list numbers, bullet points) that AI models often add or omit in the first line of a code block. The uniqueness requirement for the entire matched block MUST be maintained. - **Range Auto-Correction:** When processing a `REPLACE` or `DELETE` action with a `snippet`/`snippet_tail` pair, the Patcher SHOULD first check if the `snippet_tail` is a suffix of the `snippet`. If it is, the Patcher SHOULD treat this as a single-snippet operation, use the `snippet` for it and ignore the `snippet_tail`. This corrects a frequent AI error where the entire block to be replaced is put into `snippet`, causing a "end snippet not found" failure. - **Ambiguous Anchor Disambiguation:** If an `anchor` is found multiple times in the target file, the Patcher SHOULD NOT immediately fail. Instead, it SHOULD temporarily search for the target `snippet` within the scope of *each* anchor occurrence. If the `snippet` is found relative to exactly one of the anchor occurrences, the Patcher MUST proceed using that specific anchor, resolving the ambiguity based on the uniqueness of the anchor-snippet pair. - **Implicit File Creation:** If a `FILE` block contains only a `content` directive and no Action Directives, the Patcher SHOULD treat this as a `CREATE` action for that file. This handles a common LLM error where `CREATE` is omitted. ## 4. AI Generation Rules ### 4.1. The "Plan-First" Principle An AI SHOULD first generate a `Summary` and `Plan` in natural language as comments at the beginning of the patch file. This provides valuable context for human reviewers and serves as a clear guide for the AI itself. ### 4.2. Locator Selection Strategy To create robust and minimal patches, an AI model MUST follow a specific hierarchical strategy for selecting locators. 1. **Assess the Nature of the Change**: First, the AI must determine if the modification is a "point change" or a "range change". - A **point change** involves a single line or a very short, atomic block. All `INSERT_AFTER` and `INSERT_BEFORE` actions are by definition point changes. - A **range change** involves deleting or replacing a larger block of code (three or more lines). 2. **Choose the Locator Type Based on the Change**: - For a **range change** (`REPLACE` or `DELETE`), the AI SHOULD use the `snippet`/`snippet_tail` pair. - For a **point change** (`REPLACE`, `DELETE`, `INSERT_AFTER`, `INSERT_BEFORE`), the AI MUST use a single `snippet`. 3. **Select the Snippet(s) and Anchor (if needed)**: After choosing the locator type, the AI proceeds to select the content for the fields: - **For a `snippet`**: 1. Identify the shortest possible `snippet` of code that is likely to be unique. 2. Test for Uniqueness: The AI MUST mentally check if this `snippet` is unique within the entire target file. 3. If unique, the selection is complete. Use only the `snippet`. DO NOT add an `anchor` if it is not needed. 4. If not unique, identify the smallest, most stable preceding semantic block (like a function/method signature) to serve as an `anchor`. The `anchor` MUST be unique within the file. - **For a `snippet`/`snippet_tail` pair**: 1. Identify a `snippet` that is short, stable, and likely to be unique within its context. 2. Identify the first corresponding `snippet_tail` that appears after the start snippet. This snippet should also be as short and stable as possible. 3. Test the `snippet` for uniqueness using the same logic as a single `snippet`, adding an `anchor` if necessary to disambiguate it. ### 4.3. Other Generation Rules - **Choose Stable and Independent Snippets for Ranges:** When using a `snippet`/`snippet_tail` pair, the AI MUST adhere to the following critical rules to avoid generating failing patches: 1. **Snippets MUST be independent.** The `snippet_tail` MUST be located entirely *after* the `snippet` in the original file. They cannot overlap, and the `snippet_tail` cannot be a part of the `snippet`. 2. **Snippets MUST be minimal.** The `snippet` and `snippet_tail` should be as short as possible while uniquely identifying the start and end of the desired block. Do NOT include the entire content of the block to be replaced inside the `snippet`. 3. **Snippets MUST be stable.** Avoid using content that is likely to change, such as numbered list items, line counters, or timestamps, as part of your snippets. Prefer semantic anchors like comments, declarations, or unique punctuation. - **Structure Logically:** The AI MUST structure the patch with a clear hierarchy: start with the `FILE` directive, then begin each modification with an `ACTION` directive (`REPLACE`, `DELETE`, etc.). - **Use Blank Lines for Readability:** The AI SHOULD insert blank lines between modification blocks to improve human readability. The Patcher is required to ignore these, so it is a safe and RECOMMENDED practice. - **Minimalism and Focus:** Patches MUST be minimal and focused on the requested change. Unrelated refactoring MUST be avoided unless explicitly requested. - **Use Full Lines for Locators:** Snippets and anchors MUST correspond to full lines of code (from the first non-whitespace character to the last), not partial lines. The AI MUST perform a final self-check to ensure it is not using a substring of an original line. - **Inserting with Surrounding Blank Lines:** The Patcher automatically trims leading and trailing blank lines from `content`. Therefore, to reliably insert a block of code separated by blank lines (e.g., a new function between two existing ones), the AI SHOULD use a `REPLACE` action on the *next* stable block of code. The `content` for this action will then consist of: the new code to insert, a blank line, and then the original code from the `snippet` being replaced. This ensures precise control over formatting. - **Code Style Consistency**: Generated `content` MUST match the existing code style of the target file (indentation, naming conventions, brace style, etc.). - **Comment Language and Style**: New code comments MUST match the natural language and style of existing comments in the target file. - **Ensure Unique Locators**: Before finalizing output, the AI MUST double-check that any chosen `anchor` is unique within the file and any `snippet` without an `anchor` is also unique within the file. ### 4.4. Other notices The reference patcher ensures that output files always end with a newline. This is not required by the format, but it is preferred. All examples in the documentation take this behavior into account. ## 5. Complete Example Given a target file `src/calculator.py`: ```python # A simple calculator module import math def add(a, b): # Deprecated: use sum() for lists return a + b def get_pi(): return 3.14 ``` The following `afix.ap` file describes the modifications: ``` # Summary: Refactor the calculator module to enhance the `add` function # and remove deprecated code. # # Plan: # 1. Import the `List` type for type hinting. # 2. Update the `add` function to also handle summing a list of numbers. # 3. Remove the unused `get_pi` function. e4a2f1b8 AP 3.1 e4a2f1b8 FILE src/calculator.py e4a2f1b8 INSERT_AFTER e4a2f1b8 snippet import math e4a2f1b8 content from typing import List e4a2f1b8 REPLACE e4a2f1b8 anchor def add(a, b): e4a2f1b8 snippet return a + b e4a2f1b8 content # New implementation supports summing a list if isinstance(a, List): return sum(a) return a + b e4a2f1b8 DELETE e4a2f1b8 snippet def get_pi(): return 3.14 e4a2f1b8 include_leading_blank_lines 1 ``` After applying the patch, `src/calculator.py` MUST look like this: ```python # A simple calculator module import math from typing import List def add(a, b): # Deprecated: use sum() for lists # New implementation supports summing a list if isinstance(a, List): return sum(a) return a + b ``` ## 6. Security Considerations An `ap` patch file contains instructions to modify source code. Applying a patch from an untrusted source is equivalent to executing untrusted code. Patch files MUST be treated with the same level of scrutiny as any other executable code. ## 7. Rationale - **YAML/JSON Brittleness:** Experience showed that LLMs frequently make subtle syntax errors (indentation, quoting) when generating strictly-structured data formats, leading to high failure rates. - **Context Confusion:** Experiments with using Python's own syntax for patches (`ast.literal_eval`) showed that LLMs can confuse the "container" syntax with the "content" code when both are the same language. - **The AP/delimit Solution:** The AP format solves these problems by: 1. **Eliminating Syntax:** There are no indentation, quoting, or bracketing rules for an LLM to violate. 2. **Creating Syntactic Distinction:** The directive format (`[ID] KEY`) is visually and structurally distinct from any code it contains, preventing context confusion. 3. **Maximizing Readability:** The format is optimized for human review, with vertical alignment and clear separation of logical blocks. 4. **Ensuring Robustness:** The unique `[ID]` prefix for each directive makes the format immune to content-based delimiter collisions.