# @memberjunction/metadata-sync

A library for synchronizing MemberJunction database metadata with local file system representations. This library is integrated into the MemberJunction CLI (`mj`) and is accessed through `mj sync` commands. It enables developers and non-technical users to manage MJ metadata using their preferred editors and version control systems while maintaining the database as the source of truth.

## Installation

MetadataSync is included with the MemberJunction CLI. Install the CLI globally:

```bash
npm install -g @memberjunction/cli
```

Then use the sync commands:
```bash
mj sync --help
```

## Overview

MemberJunction is a powerful metadata-driven system where configuration, business logic, AI prompts, templates, and more are stored as metadata in the database. This approach provides tremendous flexibility and runtime configurability, but it can create friction in modern development workflows.

The Metadata Sync tool bridges the gap between database-stored metadata and file-based workflows by:
- Pulling metadata entities from database to JSON files with external file support
- Pushing local file changes back to the database
- Supporting embedded collections for related entities
- Enabling version control for all MJ metadata through Git
- Supporting CI/CD workflows for metadata deployment
- Providing a familiar file-based editing experience

```mermaid
flowchart LR
    subgraph Files["Local File System"]
        JSON["JSON Metadata Files"]
        EXT["External Files\n(.md, .html, .sql)"]
        CFG[".mj-sync.json\nConfiguration"]
    end

    subgraph Engine["MetadataSync Engine"]
        PULL["PullService"]
        PUSH["PushService"]
        VAL["ValidationService"]
        SYNC["SyncEngine\n(Reference Resolution)"]
    end

    subgraph DB["MemberJunction Database"]
        ENT["Entity Records"]
        REL["Related Records"]
        META["Entity Metadata"]
    end

    DB -->|"mj sync pull"| PULL
    PULL --> Files
    Files -->|"mj sync push"| VAL
    VAL --> PUSH
    PUSH --> SYNC
    SYNC --> DB

    style Files fill:#2d6a9f,stroke:#1a4971,color:#fff
    style Engine fill:#7c5295,stroke:#563a6b,color:#fff
    style DB fill:#2d8659,stroke:#1a5c3a,color:#fff
```

### Why This Tool Matters

**For Developers:**
- **Full IDE Support**: Edit complex prompts and templates with syntax highlighting, IntelliSense, and all your favorite editor features
- **Version Control**: Track every change with Git -- see diffs, blame, history, and collaborate through pull requests
- **Branch-based Development**: Work on features in isolation, test changes, and merge when ready
- **CI/CD Integration**: Automatically deploy metadata changes as code moves through environments
- **Bulk Operations**: Use familiar command-line tools (grep, sed, find) to make sweeping changes
- **Offline Development**: Work on metadata without database connectivity

**For Non-Technical Users:**
- **Familiar Tools**: Edit prompts in Word, Notepad++, or any text editor
- **No Database Access Needed**: IT can set up sync, users just edit files
- **Folder Organization**: Intuitive file/folder structure instead of database IDs
- **Easy Sharing**: Send prompt files via email or shared drives
- **Simple Backups**: Copy/paste folders for personal backups

**For Organizations:**
- **Migration Path**: Metadata flows naturally from dev to staging to production with code
- **Compliance**: Full audit trail through version control
- **Collaboration**: Multiple team members can work on different metadata simultaneously
- **Disaster Recovery**: File-based backups complement database backups
- **Cross-System Sync**: Export from one MJ instance, import to another

### The Best of Both Worlds

This tool preserves the power of MJ's metadata-driven architecture while adding the convenience of file-based workflows. The database remains the source of truth for runtime operations, while files become the medium for creation, editing, and deployment.

## Key Features

### Hybrid File Storage
- **JSON files**: Store structured metadata for entities
- **External files**: Store large text fields (prompts, templates, etc.) in appropriate formats (.md, .html, .sql)
- **File references**: Use `@file:filename.ext` to link external files from JSON

### Embedded Collections
- **Related Entities**: Store related records as arrays within parent JSON files
- **Hierarchical References**: Use `@parent:` and `@root:` to reference parent/root entity fields
- **Automatic Metadata**: Related entities maintain their own primaryKey and sync metadata
- **Nested Support**: Support for multiple levels of nested relationships

### Synchronization Operations
- **Pull**: Download metadata from database to local files
  - Optionally pull related entities based on configuration
  - Filter support for selective pulling
- **Push**: Upload local file changes to database
  - Process embedded collections automatically
  - Verbose mode (`-v`) for detailed output
  - Directory filtering with `--include` and `--exclude` options
- **Status**: Show what would change without making modifications
  - Directory filtering with `--include` and `--exclude` options

### Directory Filtering
- **Selective Processing**: Use `--include` or `--exclude` to filter which entity directories are processed
- **Pattern Support**: Supports glob patterns like `ai-*`, `*-test`, etc.
- **Mutually Exclusive**: Cannot use both `--include` and `--exclude` together

### Development Workflow Integration
- Watch mode for automatic syncing during development
- Dry-run mode to preview changes
- CI/CD mode for automated deployments
- Integration with existing mj.config.cjs configuration

## Architecture

```mermaid
flowchart TD
    subgraph Services["Service Layer"]
        IS["InitService"]
        PS["PullService"]
        PUS["PushService"]
        SS["StatusService"]
        WS["WatchService"]
        VS["ValidationService"]
        FRS["FileResetService"]
        FS["FormattingService"]
    end

    subgraph Core["Core Engine"]
        SE["SyncEngine\n(Reference Resolution)"]
        CM["ConfigManager"]
        JP["JsonPreprocessor\n(@include handling)"]
        RDA["RecordDependencyAnalyzer\n(Topological Sort)"]
    end

    subgraph IO["I/O Layer"]
        FE["FieldExternalizer"]
        RP["RecordProcessor"]
        FWB["FileWriteBatch"]
        JWH["JsonWriteHelper"]
        FBM["FileBackupManager"]
    end

    subgraph Data["Data & Auditing"]
        TM["TransactionManager"]
        SL["SQLLogger"]
        DA["DeletionAuditor"]
        DRS["DatabaseReferenceScanner"]
    end

    Services --> Core
    Services --> IO
    PUS --> Data
    Core --> IO

    style Services fill:#2d6a9f,stroke:#1a4971,color:#fff
    style Core fill:#7c5295,stroke:#563a6b,color:#fff
    style IO fill:#2d8659,stroke:#1a5c3a,color:#fff
    style Data fill:#b8762f,stroke:#8a5722,color:#fff
```

## Supported Entities

The tool works with any MemberJunction entity -- both core system entities and user-created entities. Each entity type can have its own directory structure, file naming conventions, and related entity configurations.

### Important Limitation: Database-Reflected Metadata

**This tool should NOT be used to modify metadata that is reflected from the underlying database catalog.** Examples include:
- Entity field data types
- Column lengths/precision
- Primary key definitions
- Foreign key relationships
- Table/column existence

These properties are designed to flow **from** the database catalog **up** into MJ metadata, not the other way around. Attempting to modify these via file sync could create inconsistencies between the metadata and actual database schema.

The tool is intended for managing business-level metadata such as:
- Descriptions and documentation
- Display names and user-facing text
- Categories and groupings
- Custom properties and settings
- AI prompts, templates, and other content
- Permissions and security settings
- Any other data that is not reflected **up** from the underlying system database catalogs

For more information about how CodeGen reflects system-level data from the database into the MJ metadata layer, see the [CodeGen documentation](../CodeGenLib/README.md).

## Performance: Preloading and Caching During Push

The `push` command preloads existing records and caches files/lookups so the
per-record sync work avoids redundant DB round-trips and disk I/O.

### 1. Upfront bulk preload (batched `RunViews`)

Before processing records, the push scans every JSON file in the target
directories (in parallel, with a concurrency cap), collects primary keys
from records and nested `relatedEntities`, groups them by entity, and
issues a single filtered `RunView` per entity. This replaces the
prior per-record existence check with one batched read per entity at
push start.

### 2. Event-driven in-memory cache (`SyncMetadataEngine` extends `BaseEngine`)

`SyncMetadataEngine` plugs into MJ's `BaseEntity` event bus via `BaseEngine`.
Saves and deletes performed during the push update the in-memory cache in
place, so subsequent lookups inside the same run see the new state without
a re-fetch. The engine overrides `canUseImmediateMutation` to allow this
in-place update for its own preload configs — the filter is built from
the preloaded PKs, so updates and deletes stay safe, and newly-created
records are deliberately added to the cache so later lookups can find them
within the same push.

### 3. File and lookup caches

- **File cache**: parsed JSON and resolved `@include` output are cached on
  first read so the validation and sync passes don't reparse. The cache
  entry for a file is invalidated whenever the push writes that file back
  to disk.
- **Lookup cache**: resolved `@lookup` (and `@parent` / `@root`) values are
  memoized per `(entity, lookup-fields)` key. The cache is indexed by
  entity so deleting a record invalidates only the affected entries
  instead of clearing the whole cache. Keys are URI-encoded to prevent
  collisions when user data contains `=` / `&` / `|`.

> Real-world reductions vary with the size and shape of your metadata
> tree; expect the biggest wins on syncs that read many records of the
> same entity (which is the common case).

## File Structure

The tool uses a hierarchical directory structure with cascading defaults:
- Each top-level directory represents an entity type
- `.mj-sync.json` files define entities and base defaults
- `.mj-folder.json` files define folder-specific defaults (optional)
- Metadata JSON files follow the `filePattern` configured in `.mj-sync.json`
- External files (`.md`, `.html`, etc.) are referenced from the JSON files
- Defaults cascade down through the folder hierarchy

```mermaid
flowchart TD
    subgraph Root["metadata/"]
        RS[".mj-sync.json\n(Global Config)"]
    end

    subgraph AI["ai-prompts/"]
        AS[".mj-sync.json\nentity: AI Prompts"]
        subgraph CS["customer-service/"]
            CF[".mj-folder.json\n(Folder Defaults)"]
            GJ[".greeting.json\n(Record)"]
            GM["greeting.prompt.md\n(External Content)"]
        end
        subgraph AN["analytics/"]
            AF[".mj-folder.json"]
            DJ[".daily-report.json"]
            DM["daily-report.prompt.md"]
        end
    end

    subgraph TE["templates/"]
        TS[".mj-sync.json\nentity: Templates"]
    end

    Root --> AI
    Root --> TE
    AS --> CS
    AS --> AN

    style Root fill:#64748b,stroke:#475569,color:#fff
    style AI fill:#2d6a9f,stroke:#1a4971,color:#fff
    style CS fill:#7c5295,stroke:#563a6b,color:#fff
    style AN fill:#7c5295,stroke:#563a6b,color:#fff
    style TE fill:#2d6a9f,stroke:#1a4971,color:#fff
```

### Example Structure
```
metadata/
+-- .mj-sync.json                    # Global sync configuration
+-- ai-prompts/
|   +-- .mj-sync.json               # Defines entity: "AI Prompts"
|   +-- customer-service/
|   |   +-- .mj-folder.json         # Folder metadata (CategoryID, etc.)
|   |   +-- .greeting.json          # AI Prompt record with embedded models
|   |   +-- greeting.prompt.md      # Prompt content (referenced)
|   |   +-- greeting.notes.md       # Notes field (referenced)
|   +-- analytics/
|       +-- .mj-folder.json         # Folder metadata (CategoryID, etc.)
|       +-- .daily-report.json      # AI Prompt record
|       +-- daily-report.prompt.md  # Prompt content (referenced)
+-- templates/                       # Reusable JSON templates
|   +-- standard-prompt-settings.json
|   +-- standard-ai-models.json
+-- template-entities/
    +-- .mj-sync.json               # Defines entity: "Templates"
    +-- email/
    |   +-- .mj-folder.json
    |   +-- .welcome.json
    |   +-- welcome.template.html
    +-- reports/
        +-- .mj-folder.json
        +-- .invoice.json
        +-- invoice.template.html
```

### File Format Options

#### Single Record per File (Default)
Each JSON file contains one record:
```json
{
  "fields": { ... },
  "relatedEntities": { ... }
}
```

#### Multiple Records per File
JSON files can contain arrays of records:
```json
[
  {
    "fields": { ... },
    "relatedEntities": { ... }
  },
  {
    "fields": { ... },
    "relatedEntities": { ... }
  }
]
```

This is useful for grouping related records in a single file, reducing file clutter for entities with many small records, and maintaining logical groupings while using `@file:` references for large content.

## JSON Metadata Format

### Individual Record
```json
{
  "fields": {
    "Name": "Customer Greeting",
    "Description": "Friendly customer service greeting",
    "TypeID": "@lookup:AI Prompt Types.Name=Chat",
    "CategoryID": "@lookup:AI Prompt Categories.Name=Customer Service",
    "Temperature": 0.7,
    "MaxTokens": 1000,
    "Prompt": "@file:greeting.prompt.md",
    "Notes": "@file:../shared/notes/greeting-notes.md",
    "SystemPrompt": "@url:https://raw.githubusercontent.com/company/prompts/main/system/customer-service.md"
  },
  "primaryKey": {
    "ID": "550e8400-e29b-41d4-a716-446655440000"
  },
  "sync": {
    "lastModified": "2024-01-15T10:30:00Z",
    "checksum": "sha256:abcd1234..."
  }
}
```

### Record with Embedded Collections
```json
{
  "fields": {
    "Name": "Customer Service Chat",
    "Description": "Main customer service prompt",
    "TypeID": "@lookup:AI Prompt Types.Name=Chat",
    "TemplateText": "@file:customer-service.md",
    "Status": "Active"
  },
  "relatedEntities": {
    "MJ: AI Prompt Models": [
      {
        "fields": {
          "PromptID": "@parent:ID",
          "ModelID": "@lookup:AI Models.Name=GPT 4.1",
          "VendorID": "@lookup:MJ: AI Vendors.Name=OpenAI",
          "Priority": 1,
          "Status": "Active"
        },
        "primaryKey": {
          "ID": "BFA2433E-F36B-1410-8DB0-00021F8B792E"
        },
        "sync": {
          "lastModified": "2025-06-07T17:18:31.687Z",
          "checksum": "a642ebea748cb1f99467af2a7e6f4ffd3649761be27453b988af973bed57f070"
        }
      },
      {
        "fields": {
          "PromptID": "@parent:ID",
          "ModelID": "@lookup:AI Models.Name=Claude 4 Sonnet",
          "Priority": 2,
          "Status": "Active"
        }
      }
    ]
  },
  "primaryKey": {
    "ID": "C2A1433E-F36B-1410-8DB0-00021F8B792E"
  },
  "sync": {
    "lastModified": "2025-06-07T17:18:31.698Z",
    "checksum": "7cbd241cbf0d67c068c1434e572a78c87bb31751cbfe7734bfd32f8cea17a2c9"
  }
}
```

### Composite Primary Key Example
```json
{
  "primaryKey": {
    "UserID": "550e8400-e29b-41d4-a716-446655440000",
    "RoleID": "660f9400-f39c-51e5-b827-557766551111"
  },
  "fields": {
    "GrantedAt": "2024-01-15T10:30:00Z",
    "GrantedBy": "@lookup:Users.Email=admin@company.com",
    "ExpiresAt": "2025-01-15T10:30:00Z",
    "Notes": "@file:user-role-notes.md"
  },
  "sync": {
    "lastModified": "2024-01-15T10:30:00Z",
    "checksum": "sha256:abcd1234..."
  }
}
```

### Reserved Keys

| Key | Purpose |
|-----|---------|
| `fields` | Entity field values |
| `relatedEntities` | Embedded related entity records |
| `primaryKey` | Record identifier |
| `sync` | Sync metadata (lastModified, checksum) |
| `__mj_sync_notes` | System-managed resolution tracking |
| `deleteRecord` | Deletion directive |

Any key that is not one of the reserved keys above is preserved but ignored during sync operations. By convention, use an underscore prefix (`_`) for comment keys:

```json
{
  "_comments": [
    "This file configures encryption settings for the Test Tables entity"
  ],
  "fields": {
    "Name": "Test Tables",
    "BaseView": "vwTestTables"
  },
  "primaryKey": {
    "ID": "0fde4c2c-26b1-45e9-b504-5d4a6f4201cf"
  }
}
```

## Special Reference Types

The tool supports special reference types that can be used in any field that accepts text content. These references are processed during push/pull operations to handle external content, lookups, and environment-specific values.

```mermaid
flowchart TD
    subgraph References["@ Reference Types"]
        FILE["@file:\nExternal file content"]
        URL["@url:\nRemote URL content"]
        LOOKUP["@lookup:\nEntity ID resolution"]
        PARENT["@parent:\nParent record field"]
        ROOT["@root:\nRoot record field"]
        ENV["@env:\nEnvironment variable"]
        TEMPLATE["@template:\nJSON template merge"]
        INCLUDE["@include\nJSON composition"]
    end

    subgraph External["External Resources"]
        style External fill:#2d8659,stroke:#1a5c3a,color:#fff
        EF["Local Files"]
        UR["Remote URLs"]
    end

    subgraph Database["Database Lookups"]
        style Database fill:#2d6a9f,stroke:#1a4971,color:#fff
        EL["Entity Records"]
    end

    subgraph Context["Record Context"]
        style Context fill:#b8762f,stroke:#8a5722,color:#fff
        PR["Parent Entity"]
        RR["Root Entity"]
    end

    subgraph Runtime["Runtime Values"]
        style Runtime fill:#7c5295,stroke:#563a6b,color:#fff
        EV["Environment Vars"]
    end

    FILE --> EF
    URL --> UR
    LOOKUP --> EL
    PARENT --> PR
    ROOT --> RR
    ENV --> EV
    TEMPLATE --> EF
    INCLUDE --> EF

    style References fill:#64748b,stroke:#475569,color:#fff
```

### @file: References
When a field value starts with `@file:`, the tool will:
1. Read content from the specified file for push operations
2. Write content to the specified file for pull operations
3. Track both files for change detection
4. **For JSON files**: Automatically process any `@include` directives within them

Examples:
- `@file:greeting.prompt.md` -- File in same directory as JSON
- `@file:./shared/common-prompt.md` -- Relative path
- `@file:../templates/standard-header.md` -- Parent directory reference
- `@file:spec.json` -- JSON file with `@include` directives (processed automatically)

### @url: References
When a field value starts with `@url:`, the tool will:
1. Fetch content from the URL during push operations
2. Cache the content with appropriate headers
3. Support both HTTP(S) and file:// protocols

Examples:
- `@url:https://example.com/prompts/greeting.md` -- Remote content
- `@url:https://raw.githubusercontent.com/company/prompts/main/customer.md` -- GitHub raw content

### @lookup: References
Enable entity relationships using human-readable values:
- Basic syntax: `@lookup:EntityName.FieldName=Value`
- Multi-field syntax: `@lookup:EntityName.Field1=Value1&Field2=Value2`
- Auto-create syntax: `@lookup:EntityName.FieldName=Value?create`
- With additional fields: `@lookup:EntityName.FieldName=Value?create&Field2=Value2`
- Deferred lookup syntax: `@lookup:EntityName.FieldName=Value?allowDefer`
- Combined flags: `@lookup:EntityName.FieldName=Value?create&allowDefer`

Examples:
- `@lookup:AI Prompt Types.Name=Chat` -- Single field lookup, fails if not found
- `@lookup:Users.Email=john@example.com&Department=Sales` -- Multi-field lookup for precise matching
- `@lookup:AI Prompt Categories.Name=Examples?create` -- Creates if missing
- `@lookup:AI Prompt Categories.Name=Examples?create&Description=Example prompts` -- Creates with description
- `@lookup:Dashboards.Name=Data Explorer?allowDefer` -- Defers lookup if not found, retries at end of push

#### Multi-Field Lookups
When you need to match records based on multiple criteria, use the multi-field syntax:
```json
{
  "CategoryID": "@lookup:AI Prompt Categories.Name=Actions&Status=Active",
  "ManagerID": "@lookup:Users.Email=manager@company.com&Department=Engineering&Status=Active"
}
```

#### Deferred Lookups (?allowDefer)

The `?allowDefer` flag enables handling of circular dependencies between entities during push operations. Use this when Entity A references Entity B and Entity B references Entity A -- or any situation where a lookup target might not exist yet during initial processing.

**How it works:**

The flag is permission-based, not imperative. The lookup is always attempted first, and only deferred if it fails:

```mermaid
flowchart TD
    A["@lookup:Entity.Field=Value?allowDefer"] --> B{Try lookup now}
    B -->|Found| C[Return ID immediately]
    B -->|Not found| D{Has ?allowDefer?}
    D -->|Yes| E[Skip this field, continue processing]
    D -->|No| F[Fatal error - rollback transaction]
    E --> G[Save record without deferred field]
    G --> H[Queue record for re-processing]
    H --> I[Phase 2.5: Re-process entire record]
    I -->|Success| J[Update record with resolved field]
    I -->|Failure| F

    style C fill:#2d8659,stroke:#1a5c3a,color:#fff
    style F fill:#b8762f,stroke:#8a5722,color:#fff
    style J fill:#2d8659,stroke:#1a5c3a,color:#fff
    style I fill:#7c5295,stroke:#563a6b,color:#fff
```

**When to use `?allowDefer`:**
- When Entity A references Entity B, and Entity B references Entity A
- When you are creating related records that need to reference each other
- When the lookup target might not exist yet during initial processing

**Processing phases:**
1. During the initial push phase, if a lookup with `?allowDefer` fails (record not found), the **field is skipped** but the record still saves
2. The record IS saved during the initial pass (without the deferred field value), allowing other records to reference it
3. The record is queued for re-processing in Phase 2.5
4. After all other records are processed, deferred records are re-processed using the exact same logic
5. If retry succeeds, the record is updated with the resolved field; if it fails, an error is reported and the transaction rolls back

**Example: Application / Dashboard circular reference**

The Applications entity can have `DefaultNavItems` (a JSON field) that contains nested references to Dashboards, while Dashboards have an `ApplicationID` that references Applications.

Since Applications are processed before Dashboards (alphabetical order), the Dashboard lookup in `DefaultNavItems` needs `?allowDefer`:

```json
// .data-explorer-application.json
{
  "fields": {
    "Name": "Data Explorer",
    "DefaultNavItems": [
      {
        "Label": "Explorer",
        "ResourceType": "Dashboard",
        "RecordID": "@lookup:Dashboards.Name=Data Explorer?allowDefer"
      }
    ]
  }
}

// .data-explorer-dashboard.json
// Note: No ?allowDefer needed - Applications are processed first
{
  "fields": {
    "Name": "Data Explorer",
    "ApplicationID": "@lookup:Applications.Name=Data Explorer"
  }
}
```

**Combining flags:**
You can combine `?allowDefer` with `?create`:
```json
"CategoryID": "@lookup:Categories.Name=New Category?create&allowDefer"
```
This means: "Look up the category, create if missing, and if the lookup still fails for some reason, defer it."

**Important notes:**
- Deferred records are processed before the final commit (Phase 2.5)
- If any deferred record fails on retry, the entire push transaction is rolled back
- Use sparingly -- only for genuine circular dependencies
- The record must have a primaryKey defined in the metadata file

### @parent: References
Reference fields from the immediate parent entity in embedded collections:
- `@parent:ID` -- Get the parent's ID field
- `@parent:Name` -- Get the parent's Name field
- Works with any field from the parent entity

### @root: References
Reference fields from the root entity in nested structures:
- `@root:ID` -- Get the root entity's ID
- `@root:CategoryID` -- Get the root's CategoryID
- Useful for deeply nested relationships

### @env: References
Support environment-specific values:
- `@env:VARIABLE_NAME`
- Useful for different environments (dev/staging/prod)

### Primary Key Handling
The tool automatically detects primary key fields from entity metadata:
- **Single primary keys**: Most common, stored as `{"ID": "value"}` or `{"CustomKeyName": "value"}`
- **Composite primary keys**: Multiple fields that together form the primary key
- **Auto-detection**: Tool reads entity metadata to determine primary key structure
- **Reference support**: Primary key values can use `@lookup`, `@parent`, and other reference types

#### Using @lookup in Primary Keys
You can use `@lookup` references in primary key fields to avoid hardcoding GUIDs. This is especially useful when decorating existing records:

```json
{
  "fields": {
    "Encrypt": true,
    "AllowDecryptInAPI": false
  },
  "primaryKey": {
    "ID": "@lookup:Entity Fields.EntityID=@lookup:Entities.Name=Test Tables&Name=ServerOnlyEncrypted"
  }
}
```

In this example:
1. The inner `@lookup:Entities.Name=Test Tables` resolves to the Entity ID
2. That ID is used to find the Entity Field with the matching `EntityID` and `Name`
3. The resulting Entity Field ID becomes the primary key

**Note:** Primary key lookups must resolve immediately -- the `?allowDefer` flag is not supported in primary key fields since the primary key is needed to determine if a record exists.

### Automatic JSON Stringification with Reference Processing

When a field value is an array or object, the tool automatically:
1. **Recursively processes** all `@lookup:`, `@file:`, `@parent:`, `@root:` references inside the object
2. **Converts to JSON string** with pretty formatting (2-space indentation) for database storage
3. **Maintains clean structure** in source files while storing as strings in database

This is useful for JSON-typed fields like `Configuration`, `Settings`, `Metadata`, etc.

```json
{
  "fields": {
    "Name": "Agent Memory Manager Job",
    "CronExpression": "0 */15 * * * *",
    "Configuration": {
      "AgentID": "@lookup:AI Agents.Name=Memory Manager",
      "InitialMessage": "Analyze recent conversations",
      "Settings": {
        "MaxNotes": 5,
        "Strategy": "Relevant",
        "TargetAgentID": "@lookup:AI Agents.Name=Sage"
      }
    }
  }
}
```

When pushed to the database, the `Configuration` field is resolved (lookups become GUIDs) and stringified as JSON for storage.

### deleteRecord Directive
The tool supports deleting records from the database using a special `deleteRecord` directive in JSON files:

```json
{
  "fields": {
    "Name": "Obsolete Prompt",
    "Description": "This prompt is no longer needed"
  },
  "primaryKey": {
    "ID": "550e8400-e29b-41d4-a716-446655440000"
  },
  "deleteRecord": {
    "delete": true
  }
}
```

After successfully deleting the record, the tool updates the JSON file with a `deletedAt` timestamp. Important notes:
- **Primary key required**: You must specify the `primaryKey` to identify which record to delete
- **One-time operation**: Once `deletedAt` is set, the deletion will not be attempted again
- **SQL logging**: Delete operations are included in SQL logs when enabled
- **Foreign key constraints**: Deletions may fail if other records reference this record
- **Dry-run support**: Use `--dry-run` to preview what would be deleted
- **Takes precedence**: If `deleteRecord` is present, normal create/update operations are skipped

## Content Composition

### {@include} References in Files
Enable content composition within non-JSON files (like .md, .html, .txt) using JSDoc-style include syntax:
- Pattern: `{@include path/to/file.ext}`
- Supports relative paths from the containing file
- Recursive includes (includes within includes)
- Circular reference detection prevents infinite loops
- Works seamlessly with `@file:` references

```markdown
# My Prompt Template

## System Instructions
{@include ./shared/system-instructions.md}

## Context
{@include ../common/context-header.md}

## Task
Please analyze the following...
```

### @include References in JSON Files

Enable modular JSON composition by including external JSON files directly into your metadata files:

**Object Context -- Property Spreading (Default)**
```json
{
  "name": "Parent Record",
  "@include": "child.json",
  "description": "Additional fields"
}
```

**Multiple Includes with Dot Notation (Eliminates VS Code Warnings)**
```json
{
  "name": "Parent Record",
  "@include.data": "shared/data-fields.json",
  "description": "Middle field",
  "@include.config": "shared/config-fields.json",
  "status": "Active"
}
```
Use dot notation (`@include.anything`) to include multiple files at different positions in your object. The part after the dot is ignored by the processor but makes each key unique, eliminating VS Code's duplicate key warnings.

**Array Context -- Element Insertion**
```json
[
  {"name": "First item"},
  "@include:child.json",
  {"name": "Last item"}
]
```

**Explicit Mode Control**
```json
{
  "@include": {
    "file": "child.json",
    "mode": "spread"
  }
}
```

#### Modes

- **"spread" mode**: Merges all properties from the included file into the parent object. Only works when including an object into an object. Parent properties override child properties on conflict. This is the default mode for objects.
- **"element" mode**: Directly inserts the JSON content at that position. Works with any JSON type (object, array, string, number, etc.). Default mode for arrays when using string syntax.

#### Path Resolution
- All paths are relative to the file containing the @include
- Supports: `"child.json"`, `"./child.json"`, `"../shared/base.json"`, `"subfolder/config.json"`
- Circular references are detected and prevented

#### Processing Order
1. `@include` directives are processed first (recursively)
2. `@file` references are recursively resolved (including nested ones in JSON)
3. Then `@template` references
4. Finally, other `@` references (`@lookup`, etc.)

This ensures that included content can contain other special references that will be properly resolved.

### @template: References
Enable JSON template composition for reusable configurations:

**String Template Reference:**
```json
{
  "relatedEntities": {
    "MJ: AI Prompt Models": "@template:templates/standard-ai-models.json"
  }
}
```

**Object Template Merging:**
```json
{
  "fields": {
    "Name": "My Prompt",
    "@template": "templates/standard-prompt-settings.json",
    "Temperature": 0.9
  }
}
```

**Multiple Template Merging** (later templates override earlier ones):
```json
{
  "fields": {
    "@template": [
      "templates/base-settings.json",
      "templates/customer-service-defaults.json"
    ],
    "Name": "Customer Bot"
  }
}
```

## Default Value Inheritance

The tool implements a cascading inheritance system for field defaults, similar to CSS or OOP inheritance:

1. **Entity-level defaults** (in `.mj-sync.json`) -- Base defaults for all records
2. **Folder-level defaults** (in `.mj-folder.json`) -- Override/extend entity defaults
3. **Nested folder defaults** -- Override/extend parent folder defaults
4. **Record-level values** -- Override all inherited defaults

```mermaid
flowchart TD
    ELD[".mj-sync.json\nTemperature: 0.7\nMaxTokens: 1500"] --> FD1[".mj-folder.json\nTemperature: 0.8\n(overrides entity)"]
    FD1 --> FD2[".mj-folder.json\nTemperature: 0.6\n(overrides parent)"]
    FD2 --> REC["urgent.json\nTemperature: 0.9\n(overrides all)"]

    ELD -..->|inherited| FD1
    FD1 -..->|inherited| FD2
    FD2 -..->|inherited| REC

    FINAL["Final Values:\nTemperature: 0.9 (record)\nMaxTokens: 1500 (entity)"]
    REC --> FINAL

    style ELD fill:#2d6a9f,stroke:#1a4971,color:#fff
    style FD1 fill:#7c5295,stroke:#563a6b,color:#fff
    style FD2 fill:#7c5295,stroke:#563a6b,color:#fff
    style REC fill:#2d8659,stroke:#1a5c3a,color:#fff
    style FINAL fill:#b8762f,stroke:#8a5722,color:#fff
```

### Inheritance Example
```
ai-prompts/.mj-sync.json           -> Temperature: 0.7, MaxTokens: 1500
+-- customer-service/.mj-folder.json -> Temperature: 0.8 (overrides)
|   +-- greeting.json               -> Uses Temperature: 0.8, MaxTokens: 1500
|   +-- escalation/.mj-folder.json  -> Temperature: 0.6 (overrides again)
|       +-- urgent.json             -> Temperature: 0.9 (record override)
```

## Resolution Tracking with `__mj_sync_notes`

When you use `@lookup` or `@parent` references in your metadata files, MetadataSync can track how these references were resolved during push operations. This information is written to a `__mj_sync_notes` key in each record, providing transparency into the resolution process.

**Note:** This feature is **disabled by default** to keep metadata files clean. Enable it when you need to debug lookup resolutions or understand how references are being resolved.

### Enabling Resolution Tracking

Add `emitSyncNotes` to your `.mj-sync.json` configuration:

**Root-level configuration** (applies to all entity directories):
```json
{
  "version": "1.0",
  "emitSyncNotes": true,
  "directoryOrder": ["..."]
}
```

**Entity-level override** (in an entity directory's `.mj-sync.json`):
```json
{
  "entity": "AI Prompts",
  "emitSyncNotes": true
}
```

The inheritance works as follows:
- Entity-level `emitSyncNotes` takes precedence if explicitly set
- If not set at entity level, inherits from root `.mj-sync.json`
- Defaults to `false` if not set anywhere

### Example Output

```json
{
  "fields": {
    "Name": "ServerOnlyEncrypted",
    "Encrypt": true,
    "EncryptionKeyID": "@lookup:MJ: Encryption Keys.Name=Test Encryption Key"
  },
  "primaryKey": {
    "ID": "@lookup:Entity Fields.EntityID=@lookup:Entities.Name=Test Tables&Name=ServerOnlyEncrypted"
  },
  "sync": {
    "lastModified": "2025-12-25T16:14:32.605Z",
    "checksum": "7e989e08396f6cffb8b2d70958018b21..."
  },
  "__mj_sync_notes": [
    {
      "type": "lookup",
      "field": "primaryKey.ID",
      "expression": "@lookup:Entity Fields.EntityID=@lookup:Entities.Name=Test Tables&Name=ServerOnlyEncrypted",
      "resolved": "F501E294-5F5F-44C6-AD06-5C9754A13D29",
      "nested": [
        {
          "expression": "@lookup:Entities.Name=Test Tables",
          "resolved": "0fde4c2c-26b1-45e9-b504-5d4a6f4201cf"
        }
      ]
    },
    {
      "type": "lookup",
      "field": "fields.EncryptionKeyID",
      "expression": "@lookup:MJ: Encryption Keys.Name=Test Encryption Key",
      "resolved": "85B814C8-A01B-4AE3-A252-DC9D54C914C7"
    }
  ]
}
```

### Note Structure

| Field | Description |
|-------|-------------|
| `type` | Resolution type: `"lookup"` for `@lookup` references, `"parent"` for `@parent` references |
| `field` | Field path where the resolution occurred (e.g., `"primaryKey.ID"`, `"fields.CategoryID"`) |
| `expression` | The original reference expression before resolution |
| `resolved` | The resolved value (typically a GUID) |
| `nested` | (Optional) Array of nested resolutions for expressions containing nested `@lookup` references |

The `__mj_sync_notes` key uses a double underscore prefix (`__`) to clearly indicate it is system-managed. Do not manually edit this section -- it is regenerated on each push when `emitSyncNotes` is enabled.

## CLI Commands

All MetadataSync functionality is accessed through the MemberJunction CLI (`mj`) under the `sync` namespace:

```bash
# Initialize a directory for metadata sync
mj sync init

# Validate metadata files
mj sync validate
mj sync validate --dir="./metadata"
mj sync validate --verbose
mj sync validate --format=json
mj sync validate --save-report

# Pull metadata from database to files
mj sync pull --entity="AI Prompts"
mj sync pull --entity="AI Prompts" --filter="CategoryID='customer-service-id'"
mj sync pull --entity="AI Prompts" --multi-file="all-prompts"

# Push local changes to database
mj sync push
mj sync push --dir="ai-prompts"
mj sync push -v
mj sync push --dry-run
mj sync push --parallel-batch-size=20

# Directory filtering
mj sync push --exclude="actions"
mj sync push --exclude="actions,templates"
mj sync push --exclude="*-test,*-old"
mj sync push --include="prompts,agent-types"
mj sync push --include="ai-*"

# Status and watch
mj sync status
mj sync watch

# CI/CD mode
mj sync push --ci

# Skip validation
mj sync push --no-validate

# Reset file checksums
mj sync file-reset

# Incremental push (skip unchanged files and records)
mj sync push --incremental
mj sync push --dir ./metadata --incremental --verbose

# Incremental pull (only records updated since last pull)
mj sync pull --incremental

# Pull records updated after a specific timestamp
mj sync pull --entity "MJ: AI Prompts" --since "2026-04-07T10:00:00Z"
```

### Incremental Push (`--incremental`)

Skips unchanged files and records using stored checksums, significantly speeding up repeated pushes when only a few files have changed.

- The first run with `--incremental` establishes baseline checksums (runs at normal speed)
- Subsequent runs skip files whose content hash has not changed since the last push
- Also skips individual records within multi-record files when only some records changed (uses the per-record `sync.checksum` already present in metadata files)
- Unchanged files and records are skipped without any database calls
- State is stored in `~/.mj/sync-state/` (machine-local, not committed to version control)
- Combine freely with other flags: `mj sync push --dir ./metadata --incremental --verbose`

### Incremental Pull (`--incremental`)

Only pulls records updated since the last successful pull, avoiding a full re-pull of every record.

- Filters by the `__mj_UpdatedAt` column (maintained by database triggers on all tracked entities)
- Detects soft-deleted records (`__mj_DeletedAt`) and removes their local files
- State is stored in `~/.mj/sync-state/`

### Pull Since Timestamp (`--since`)

Explicit alternative to `--incremental` -- pulls only records updated after a given ISO timestamp.

```bash
mj sync pull --entity "MJ: AI Prompts" --since "2026-04-07T10:00:00Z"
```

### Performance Improvements

The following improvements apply automatically with no flags required:

**Lazy embedding model loading** -- The AIEngine no longer loads the embedding model (~50 MB `Xenova/all-mpnet-base-v2`) at startup. The model loads on the first semantic search call (`FindSimilarAgents`, etc.). CLI commands that never use semantic search (sync, codegen) skip the ~8 s model load entirely. API server behavior is unchanged -- embeddings generate on the first search request.

**Indexed batch context lookups** -- Push operations resolve `@lookup` references using an indexed data structure instead of a linear scan, reducing lookup cost from O(N) to O(1).

**Batched pull queries** -- Pull operations pre-fetch related entities with a single `IN` query per type, replacing the previous N+1 query pattern (one query per parent record per related type).

## Configuration

The tool uses the existing `mj.config.cjs` for database configuration and a hierarchical structure of `.mj-sync.json` and `.mj-folder.json` files for sync behavior.

### Root Configuration (metadata/.mj-sync.json)
```json
{
  "version": "1.0.0",
  "directoryOrder": [
    "prompts",
    "agent-types"
  ],
  "ignoreDirectories": [
    "output",
    "examples"
  ],
  "push": {
    "validateBeforePush": true,
    "requireConfirmation": true,
    "autoCreateMissingRecords": false,
    "alwaysPush": false
  },
  "sqlLogging": {
    "enabled": true,
    "outputDirectory": "./sql_logging",
    "formatAsMigration": false,
    "filterPatterns": ["*EntityFieldValue*"],
    "filterType": "exclude"
  },
  "userRoleValidation": {
    "enabled": true,
    "allowedRoles": ["Administrator", "Developer"],
    "allowUsersWithoutRoles": false
  },
  "watch": {
    "debounceMs": 1000,
    "ignorePatterns": ["*.tmp", "*.bak"]
  },
  "emitSyncNotes": false
}
```

### Entity Configuration (metadata/ai-prompts/.mj-sync.json)
```json
{
  "entity": "AI Prompts",
  "filePattern": "*.json",
  "defaults": {
    "TypeID": "@lookup:AI Prompt Types.Name=Chat",
    "Temperature": 0.7,
    "MaxTokens": 1500,
    "Status": "Active"
  },
  "pull": {
    "filePattern": "*.json",
    "updateExistingRecords": true,
    "createNewFileIfNotFound": true,
    "mergeStrategy": "merge",
    "filter": "Status = 'Active'",
    "externalizeFields": [
      {
        "field": "Prompt",
        "pattern": "@file:{Name}.prompt.md"
      }
    ],
    "relatedEntities": {
      "MJ: AI Prompt Models": {
        "entity": "MJ: AI Prompt Models",
        "foreignKey": "PromptID",
        "filter": "Status = 'Active'"
      }
    }
  }
}
```

### Folder Defaults (metadata/ai-prompts/customer-service/.mj-folder.json)
```json
{
  "defaults": {
    "CategoryID": "@lookup:AI Prompt Categories.Name=Customer Service",
    "Temperature": 0.8,
    "Tags": ["customer-service", "support"]
  }
}
```

### Push Configuration Options

#### autoCreateMissingRecords

When set to `true`, the push command automatically creates records when a primaryKey is specified but the record does not exist in the database. Useful when migrating data between environments or restoring records from backups.

```json
{
  "push": {
    "autoCreateMissingRecords": true
  }
}
```

#### alwaysPush

When set to `true`, forces ALL records to be saved to the database regardless of their dirty state. This bypasses the normal dirty checking mechanism.

Use cases:
- Ensuring complete synchronization after database restoration
- Bypassing dirty detection issues
- Force refreshing all database records with file content

```json
{
  "push": {
    "alwaysPush": true
  }
}
```

**Note**: This flag should be used judiciously as it causes database writes for all records. Enable temporarily when needed, then disable for normal operations.

### Directory Processing Order

Directory order is configured in the root-level `.mj-sync.json` file only (not inherited by subdirectories):

```json
{
  "version": "1.0.0",
  "directoryOrder": [
    "prompts",
    "agent-types"
  ]
}
```

- Directories listed in `directoryOrder` are processed first, in the specified order
- Remaining directories are processed after the ordered ones, in alphabetical order
- Use this to ensure parent entities are created before children that reference them

### Ignore Directories

Ignore directories are configured in `.mj-sync.json` files and are **cumulative** through the directory hierarchy:

```json
{
  "version": "1.0.0",
  "ignoreDirectories": [
    "output",
    "examples",
    "templates"
  ]
}
```

Child directories inherit parent ignore patterns and can add their own.

### SQL Logging

SQL logging captures all database operations during push commands for creating migration files, debugging, and deployment:

```json
{
  "sqlLogging": {
    "enabled": true,
    "outputDirectory": "./sql_logging",
    "formatAsMigration": true,
    "filterPatterns": ["*AIPrompt*", "/^EXEC sp_/i"],
    "filterType": "exclude"
  }
}
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | false | Enable SQL logging during push operations |
| `outputDirectory` | string | "./sql_logging" | Directory for SQL log files |
| `formatAsMigration` | boolean | false | Format as Flyway migration files |
| `filterPatterns` | string[] | undefined | Patterns to filter SQL statements |
| `filterType` | "exclude" or "include" | "exclude" | How to apply filter patterns |

**Filter Pattern Types:**
- **Regex patterns**: Start with `/` and optionally end with flags (e.g., `/spCreate.*Run/i`)
- **Simple wildcards**: Use `*` as a wildcard (e.g., `*AIPrompt*`)

### User Role Validation

Validates UserID fields against specific roles in the MemberJunction system:

```json
{
  "userRoleValidation": {
    "enabled": true,
    "allowedRoles": [
      "Administrator",
      "Developer",
      "Content Manager"
    ],
    "allowUsersWithoutRoles": false
  }
}
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | false | Enable user role validation |
| `allowedRoles` | string[] | [] | Role names that are allowed |
| `allowUsersWithoutRoles` | boolean | false | Allow users without any assigned roles |

## Pull Configuration

The pull command supports smart update capabilities with extensive configuration options:

```json
{
  "entity": "AI Prompts",
  "filePattern": "*.json",
  "pull": {
    "filePattern": "*.json",
    "createNewFileIfNotFound": true,
    "newFileName": ".all-new.json",
    "appendRecordsToExistingFile": true,
    "updateExistingRecords": true,
    "preserveFields": ["customField", "localNotes"],
    "mergeStrategy": "merge",
    "backupBeforeUpdate": true,
    "filter": "Status = 'Active'",
    "externalizeFields": [
      {
        "field": "TemplateText",
        "pattern": "@file:{Name}.template.md"
      }
    ],
    "excludeFields": ["InternalID", "TempField"],
    "lookupFields": {
      "CategoryID": {
        "entity": "AI Prompt Categories",
        "field": "Name"
      }
    },
    "relatedEntities": {
      "MJ: AI Prompt Models": {
        "entity": "MJ: AI Prompt Models",
        "foreignKey": "PromptID",
        "filter": "Status = 'Active'"
      }
    },
    "ignoreNullFields": false,
    "ignoreVirtualFields": false
  }
}
```

### Pull Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `filePattern` | string | Entity filePattern | Pattern for finding existing files |
| `createNewFileIfNotFound` | boolean | true | Create files for records not found locally |
| `newFileName` | string | -- | Filename for new records when appending |
| `appendRecordsToExistingFile` | boolean | false | Append new records to a single file |
| `updateExistingRecords` | boolean | true | Update existing records in local files |
| `preserveFields` | string[] | [] | Fields that retain local values during updates |
| `mergeStrategy` | string | "merge" | How to merge: "merge", "overwrite", or "skip" |
| `backupBeforeUpdate` | boolean | false | Create timestamped backups before updates |
| `backupDirectory` | string | ".backups" | Directory for backup files |
| `filter` | string | -- | SQL WHERE clause for filtering records |
| `externalizeFields` | array/object | -- | Fields to save as external files |
| `excludeFields` | string[] | [] | Fields to omit from pulled data |
| `lookupFields` | object | -- | Foreign keys to convert to @lookup references |
| `relatedEntities` | object | -- | Related entities to pull as embedded collections |
| `ignoreNullFields` | boolean | false | Exclude fields with null values |
| `ignoreVirtualFields` | boolean | false | Exclude virtual fields (view-only fields) |

### Merge Strategies

- **`merge`** (default): Combines fields from database and local file, with database values taking precedence for existing fields
- **`overwrite`**: Completely replaces local record with database version (except preserved fields)
- **`skip`**: Leaves existing records unchanged, only adds new records

### Understanding excludeFields vs preserveFields

| | excludeFields | preserveFields |
|--|---------------|----------------|
| **Purpose** | Completely omit fields from files | Protect local values from overwrite |
| **Use Case** | Remove system/internal fields | Keep custom local modifications |
| **Effect** | Fields never appear in JSON | Fields exist but retain local values |
| **Example** | Internal IDs, timestamps | Custom file paths, local notes |

When a preserved field contains a `@file:` reference, the tool updates the content at the existing file path rather than creating a new file with a generated name.

### Externalize Fields Patterns

The `externalizeFields` configuration supports dynamic file naming with placeholders:

```json
"externalizeFields": [
  {
    "field": "TemplateText",
    "pattern": "@file:{Name}.template.md"
  },
  {
    "field": "SQLQuery",
    "pattern": "@file:queries/{CategoryName}/{Name}.sql"
  }
]
```

Supported placeholders:
- `{Name}` -- The entity's name field value
- `{ID}` -- The entity's primary key
- `{FieldName}` -- The field being externalized
- `{AnyFieldName}` -- Any field from the entity record

All values are sanitized for filesystem compatibility (lowercase, spaces to hyphens, special characters removed).

### Virtual Fields Configuration

The `ignoreVirtualFields` option controls whether virtual fields (computed view-only fields) are included in pulled data.

With `ignoreVirtualFields: false` (default):
```json
{
  "fields": {
    "Name": "Test Action",
    "CategoryID": "@lookup:Action Categories.Name=System",
    "Category": "System",
    "UserID": "123",
    "User": "John Smith"
  }
}
```

With `ignoreVirtualFields: true`:
```json
{
  "fields": {
    "Name": "Test Action",
    "CategoryID": "@lookup:Action Categories.Name=System",
    "UserID": "123"
  }
}
```

## Embedded Collections

The tool supports managing related entities as embedded collections within parent JSON files:

```json
{
  "fields": {
    "Name": "Parent Entity"
  },
  "relatedEntities": {
    "Child Entity": [
      {
        "fields": {
          "ParentID": "@parent:ID",
          "Name": "Child 1"
        },
        "relatedEntities": {
          "Grandchild Entity": [
            {
              "fields": {
                "ChildID": "@parent:ID",
                "RootID": "@root:ID",
                "Name": "Grandchild 1"
              }
            }
          ]
        }
      }
    ]
  }
}
```

Benefits:
- **Single File Management**: Keep related data together
- **Atomic Operations**: Parent and children sync together
- **Cleaner Organization**: Fewer files to manage
- **Relationship Clarity**: Visual representation of data relationships

## Recursive Patterns

The tool supports automatic recursive patterns for self-referencing entities, eliminating the need to manually define each nesting level for hierarchical data structures.

```json
{
  "pull": {
    "relatedEntities": {
      "AI Agents": {
        "entity": "AI Agents",
        "foreignKey": "ParentID",
        "recursive": true,
        "maxDepth": 10
      }
    }
  }
}
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `recursive` | boolean | false | Enable automatic recursive fetching |
| `maxDepth` | number | 10 | Maximum recursion depth |

When `recursive: true` is set:
1. The tool automatically fetches child records at each level
2. Continues until no more children are found or max depth is reached
3. Circular reference protection prevents infinite loops by tracking processed record IDs
4. All recursive levels use the same `lookupFields`, `externalizeFields`, etc.

## Parallel Processing

MetadataSync supports parallel processing of records during push operations for improved performance with large datasets.

Records are automatically grouped into dependency levels:
- **Level 0**: Records with no dependencies
- **Level 1**: Records that depend only on Level 0 records
- **Level 2**: Records that depend on Level 0 or Level 1 records

Records within the same dependency level can be safely processed in parallel.

```bash
# Default processing
mj sync push

# Process 20 records in parallel
mj sync push --parallel-batch-size=20

# Maximum parallelism (50 records)
mj sync push --parallel-batch-size=50

# Conservative approach for debugging
mj sync push --parallel-batch-size=1
```

## Validation System

The MetadataSync tool includes a comprehensive validation system that checks metadata files for correctness before pushing to the database.

```mermaid
flowchart TD
    START["mj sync validate"] --> LOAD["Load .mj-sync.json"]
    LOAD --> DIRS["Discover Entity Directories"]
    DIRS --> LOOP["For Each Directory"]

    LOOP --> ENT["Validate Entity Names"]
    LOOP --> FLD["Validate Field Names & Types"]
    LOOP --> REF["Validate References\n(@file, @lookup, @template)"]
    LOOP --> DEP["Analyze Dependencies"]

    ENT --> COLLECT["Collect Errors & Warnings"]
    FLD --> COLLECT
    REF --> COLLECT
    DEP --> COLLECT

    COLLECT --> RESULT{Valid?}
    RESULT -->|Yes| PASS["Validation Passed"]
    RESULT -->|No| FAIL["Validation Failed\n(Errors Listed)"]

    style START fill:#2d6a9f,stroke:#1a4971,color:#fff
    style PASS fill:#2d8659,stroke:#1a5c3a,color:#fff
    style FAIL fill:#b8762f,stroke:#8a5722,color:#fff
    style COLLECT fill:#7c5295,stroke:#563a6b,color:#fff
```

### Automatic Validation
By default, validation runs automatically before push operations:
```bash
# These commands validate first, then proceed if valid
mj sync push
mj sync pull --entity="AI Prompts"
```

### Manual Validation
```bash
mj sync validate
mj sync validate --dir="./metadata"
mj sync validate --verbose
mj sync validate --format=json
mj sync validate --save-report
```

### Skip Validation
```bash
# Skip validation checks (use with caution)
mj sync push --no-validate
mj sync pull --entity="AI Prompts" --no-validate
```

### What Gets Validated

**Entity Validation:**
- Entity names exist in database metadata
- Entity is accessible to current user
- Entity allows data modifications

**Field Validation:**
- Field names exist on the entity
- Virtual properties (getter/setter methods) are automatically detected
- Fields are settable (not system fields)
- Field values match expected data types
- Required fields are checked intelligently (skips fields with defaults, computed fields, ReadOnly fields)
- Foreign key relationships are valid

**Reference Validation:**
- `@file:` references point to existing files
- `@lookup:` references find matching records
- `@template:` references load valid JSON
- `@parent:` and `@root:` have proper context
- Circular references are detected

**Dependency Order Validation:**
- Entities are processed in dependency order
- Parent entities exist before children
- Circular dependencies are detected and reported

### Validation Output

#### Human-Readable Format (Default)
```
Validation Report

Files:          4
Entities:       29
Errors:         2
Warnings:       5

Errors

1. Field "Status" does not exist on entity "Templates"
   Entity: Templates
   Field: Status
   File: ./metadata/templates/.my-template.json

2. File not found: ./shared/footer.html
   Entity: Templates
   Field: FooterHTML
   File: ./metadata/templates/.my-template.json
```

#### JSON Format (CI/CD)
```json
{
  "isValid": false,
  "summary": {
    "totalFiles": 4,
    "totalEntities": 29,
    "totalErrors": 2,
    "totalWarnings": 5
  },
  "errors": [
    {
      "type": "field",
      "entity": "Templates",
      "field": "Status",
      "file": "./metadata/templates/.my-template.json",
      "message": "Field \"Status\" does not exist on entity \"Templates\"",
      "suggestion": "Check spelling of 'Status'."
    }
  ]
}
```

### Common Validation Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `Field "X" does not exist` | Typo or wrong entity | Check entity definition in generated files |
| `Entity "X" not found` | Wrong entity name | Use exact entity name from database |
| `File not found` | Bad @file: reference | Check file path is relative and exists |
| `Lookup not found` | No matching record | Verify lookup value or use ?create |
| `Circular dependency` | A -> B -> A references | Restructure to avoid cycles |
| `Required field missing` | Missing required field | Add field with appropriate value |

## Programmatic Usage

### Services Overview

The MetadataSync package exports several service classes for programmatic use:

| Service | Purpose |
|---------|---------|
| `InitService` | Initialize directory structure for metadata sync |
| `PullService` | Pull metadata from database to local files |
| `PushService` | Push local file changes to database |
| `StatusService` | Compare local files with database state |
| `WatchService` | Watch for file changes and auto-sync |
| `ValidationService` | Validate metadata files for correctness |
| `FileResetService` | Reset file checksums and primary keys |
| `FormattingService` | Format validation results for display |

### Using ValidationService

```typescript
import { ValidationService, FormattingService } from '@memberjunction/metadata-sync';

// Create validator instance
const validator = new ValidationService({
    verbose: false,
    outputFormat: 'human',
    maxNestingDepth: 10,
    checkBestPractices: true
});

// Validate a directory
const result = await validator.validateDirectory('/path/to/metadata');

// Check results
if (result.isValid) {
    console.log('Validation passed!');
} else {
    console.log(`Found ${result.errors.length} errors`);

    // Format results for display
    const formatter = new FormattingService();
    const humanOutput = formatter.formatValidationResult(result, true);
    console.log(humanOutput);

    // Or get JSON output
    const jsonOutput = formatter.formatValidationResultAsJson(result);

    // Or get markdown report
    const markdownReport = formatter.formatValidationResultAsMarkdown(result);
}
```

### Using PushService

```typescript
import { SyncEngine, PushService } from '@memberjunction/metadata-sync';
import { initializeProvider, getSystemUser } from '@memberjunction/metadata-sync';

// Initialize the data provider
await initializeProvider();
const systemUser = await getSystemUser();

// Create services
const syncEngine = new SyncEngine(systemUser);
const pushService = new PushService(syncEngine, systemUser);

// Push with callbacks
const result = await pushService.push(
    { dir: './metadata', verbose: true },
    {
        onProgress: (msg) => console.log(msg),
        onSuccess: (msg) => console.log(msg),
        onError: (msg) => console.error(msg)
    }
);

console.log(`Created: ${result.created}, Updated: ${result.updated}`);
```

### CI/CD Integration

```yaml
# Example GitHub Actions workflow
- name: Validate Metadata
  run: |
    npm install @memberjunction/cli
    npx mj sync validate --dir=./metadata --format=json > validation-results.json

- name: Check Validation Results
  run: |
    if [ $(jq '.isValid' validation-results.json) = "false" ]; then
      echo "Metadata validation failed!"
      jq '.errors' validation-results.json
      exit 1
    fi

- name: Push Metadata to Production
  run: |
    npx mj sync push --ci --dir=./metadata
```

## Creating Error-Free Entity Files

### Quick Start Checklist

Before creating entity JSON files, follow this checklist:

1. **Find the Entity Definition** -- Open `packages/MJCoreEntities/src/generated/entity_subclasses.ts` and search for the entity class
2. **Check Required Fields** -- Look for fields without `?` in TypeScript definitions
3. **Validate Field Names** -- Use exact field names from the BaseEntity class (case-sensitive)
4. **Use Correct File Naming** -- Configuration files must start with dot (`.mj-sync.json`)
5. **Set Up Directory Structure** -- Create `.mj-sync.json` with proper glob patterns

### Step-by-Step Entity File Creation

#### Step 1: Research the Entity
```bash
# Open in your IDE:
packages/MJCoreEntities/src/generated/entity_subclasses.ts

# Search for your entity class (Ctrl+F):
class TemplateEntity
```

#### Step 2: Create Directory Structure
```bash
mkdir templates
cd templates

# Create entity config (dot-prefixed configuration file)
echo '{
  "entity": "Templates",
  "filePattern": "*.json"
}' > .mj-sync.json
```

#### Step 3: Create Your First Entity File
```json
{
  "fields": {
    "Name": "My First Template",
    "Description": "A test template",
    "UserID": "ECAFCCEC-6A37-EF11-86D4-000D3A4E707E"
  }
}
```

#### Step 4: Test and Validate
```bash
# Dry run to check for errors
mj sync push --dir="templates" --dry-run

# If successful, do actual push
mj sync push --dir="templates"
```

### Common Required Fields Pattern

**Always Required:**
- `ID` -- Primary key (GUID, auto-generated if not provided)
- `Name` -- Human-readable name
- `UserID` -- Creator/owner (use System User: `ECAFCCEC-6A37-EF11-86D4-000D3A4E707E`)

**Be Careful With:**
- `Status` fields -- Some entities have them, others do not
- Enum fields -- Must match exact values from database
- DateTime fields -- Use ISO format: `2024-01-15T10:30:00Z`

### Troubleshooting Quick Reference

| Error Message | Cause | Solution |
|---------------|-------|----------|
| `No entity directories found` | Missing .mj-sync.json or wrong filePattern | Check .mj-sync.json exists and uses `"*.json"` |
| `Field 'X' does not exist on entity 'Y'` | Using non-existent field | Check BaseEntity class in entity_subclasses.ts |
| `User ID cannot be null` | Missing required UserID | Add `"UserID": "ECAFCCEC-6A37-EF11-86D4-000D3A4E707E"` |
| `Processing 0 records` | Files do not match filePattern | Check files match pattern in .mj-sync.json |
| Failed validation | Wrong data type or format | Check BaseEntity class for field types |

## Console Output

### Normal Mode
Shows high-level progress:
```
Processing AI Prompts in demo/ai-prompts
  -> Processing 2 related MJ: AI Prompt Models records
Created: 1
Updated: 2
```

### Verbose Mode (-v flag)
Shows detailed field-level operations with hierarchical indentation:
```
Processing AI Prompts in demo/ai-prompts
  Setting Name: "Example Greeting Prompt 3" -> "Example Greeting Prompt 3"
  Setting Description: "A simple example prompt..." -> "A simple example prompt..."
  -> Processing 2 related MJ: AI Prompt Models records
    Setting PromptID: "@parent:ID" -> "C2A1433E-F36B-1410-8DB0-00021F8B792E"
    Setting ModelID: "@lookup:AI Models.Name=GPT 4.1" -> "123-456-789"
    Setting Priority: 1 -> 1
    Created MJ: AI Prompt Models record
```

## Use Cases

### Developer Workflow
1. Install the MJ CLI: `npm install -g @memberjunction/cli`
2. `mj sync pull --entity="AI Prompts"` to get latest prompts with their models
3. Edit prompts and adjust model configurations in VS Code
4. Test locally with `mj sync push --dry-run`
5. Commit changes to Git
6. PR review with diff visualization
7. CI/CD runs `mj sync push --ci` on merge

### Content Team Workflow
1. Pull prompts to local directory
2. Edit in preferred markdown editor
3. Adjust model priorities in JSON
4. Preview changes
5. Push updates back to database

### CI/CD Integration
```yaml
- name: Push Metadata to Production
  run: |
    npm install -g @memberjunction/cli
    mj sync push --ci --entity="AI Prompts"
```

## Dependencies

This package depends on:
- [@memberjunction/core](../MJCore/readme.md) -- Core MemberJunction framework (Metadata, RunView, BaseEntity)
- [@memberjunction/core-entities](../MJCoreEntities/readme.md) -- Generated entity subclasses
- [@memberjunction/global](../MJGlobal/README.md) -- Global utilities and class factory
- [@memberjunction/config](../Config/README.md) -- Configuration management
- [@memberjunction/sqlserver-dataprovider](../SQLServerDataProvider/README.md) -- SQL Server data access
- [@memberjunction/graphql-dataprovider](../GraphQLDataProvider/README.md) -- GraphQL data access
- [@memberjunction/server-bootstrap-lite](../ServerBootstrapLite/README.md) -- Server-side class registration

Key third-party dependencies:
- `chokidar` -- File watching for watch mode
- `fast-glob` -- Fast file pattern matching
- `cosmiconfig` -- Configuration file discovery
- `zod` -- Runtime validation
- `chalk` -- Terminal output formatting

## Migration from Standalone MetadataSync

If you were previously using the standalone `mj-sync` command:

1. **Update your installation**: Install the MJ CLI instead of standalone MetadataSync
   ```bash
   npm install -g @memberjunction/cli
   ```

2. **Update your scripts**: Replace `mj-sync` with `mj sync` in all scripts
   ```bash
   # Old command (standalone package)
   mj-sync push --dir="metadata"

   # New command
   mj sync push --dir="metadata"
   ```

3. **Configuration unchanged**: All `.mj-sync.json` configuration files work exactly the same

## Contributing

See the [MemberJunction Contributing Guide](../../CONTRIBUTING.md) for development setup and guidelines.

---

## Notes — Push Performance Optimizations (v5.38.x)

The `push` path was substantially reworked to eliminate per-record DB
round-trips on metadata trees with thousands of records. End-to-end
measurements on a representative ~36,500-record `metadata/` tree (mostly
idempotent, including a `metadata/integrations/` dir with 23,789 records):

| Scenario | Pre-optimization | Post-optimization | Speedup |
|---|---:|---:|---:|
| Full sync (incl. integrations) | ~6m 49s | **~1m 4s** | **~6.5×** |
| Partial sync (excluding integrations) | ~1m 37s | **~30.5s** | **~3.2×** |

The changes sit entirely inside the `push` flow; behaviour, configuration,
and on-disk formats are unchanged.

### What changed

1. **Upfront preloading via `SyncMetadataEngine`** (extends `BaseEngine`).
   Before processing any file the engine scans every JSON file under each
   configured entity directory once, collects the set of entity names
   referenced (including nested `relatedEntities`), and issues a single
   unfiltered `RunView` per entity through `BaseEngine.Load`. The
   resulting `BaseEntity` instances live on dynamic per-entity property
   slots that the sync path consults instead of round-tripping the DB
   per record.

   Why unfiltered: metadata entities (Actions, Prompts, Agents,
   Templates, …) are bounded by design. Loading all rows once is faster
   than computing a giant `WHERE … IN (…)` clause, and it lets
   `@lookup:` resolution hit the cache even for records that are not
   directly mentioned in local files. An oversize warning fires if any
   single entity comes back with more than 100,000 rows so operators
   notice when something non-metadata-scale slips into a sync workflow.

2. **O(1) PK index on the preload cache.** Each preloaded slot is also
   indexed into a per-entity `Map<serializedPK, BaseEntity>`. The
   sync's `loadEntity` path uses this for O(1) hash lookups instead of
   the previous `Array.find(... serializePrimaryKey(GetAll()))` scan.

   This was the single biggest fix. The naïve `Array.find` was O(N×K)
   where N = cached rows and K = records being processed. For
   `MJ: Integration Object Fields` (~100K+ DB rows, ~23,789 records
   processed) that was ~1.2B comparisons → ~38 minutes. After the Map
   index the integrations dir alone dropped from 38 minutes to seconds.
   The index has a self-healing array-scan fallback for entries
   introduced by `BaseEngine`'s event-driven slot mutations.

3. **Resolved-lookup cache + cached file reads.** Resolved `@lookup:`
   keys are memoized in a per-engine `Map<lookupKey, ID>`, indexed by
   entity so the cache invalidates only the affected entries on delete
   instead of clearing wholesale. The parsed + `@include`-preprocessed
   contents of every file are also cached, so validation and sync
   passes don't re-read or re-parse the same file twice. File-cache
   entries are invalidated at every write site (immediate and deferred
   writes) so later passes within the same push see fresh contents.

4. **Skip preload for unresolved PK references.** Records whose
   `primaryKey` still carries an unresolved `@lookup:` / `@parent:` /
   `@root:` / `@file:` / `@env:` / `@template:` reference are skipped
   at the preload step — `SyncEngine`'s per-record path resolves them
   later. Without this guard the preload would inline literal
   `@lookup:…` strings into a `WHERE ID = '…'` filter and SQL Server
   would (correctly) reject the cast to uniqueidentifier.

5. **Provider plumbing.** The push flow no longer reaches for
   `Metadata.Provider` directly during cache and lookup writes —
   `SyncEngine.getProvider()` is the single entry point. Single-process
   today; ready for multi-provider scenarios without further surgery.

### Related core fix shipped at the same time

Fixed-width / space-padded character types (`nchar`/`char` on SQL Server;
`char`/`character`/`bpchar` on PostgreSQL) used to surface their storage
padding through `BaseEntity.Get`, causing dirty-check to compare e.g.
`"Input     "` against `"Input"` and false-positive every record as
dirty. Once preload populated the in-memory comparison this manifested
as thousands of spurious "updates" per sync (~4,279 on `MJ: Action
Params` alone).

The fix is in `@memberjunction/core`:

- New `EntityFieldInfo.FixedWidthColumn` getter, delegating to a new
  `IsFixedWidthStringSQLType` predicate in `@memberjunction/sql-dialect`
  so the list of fixed-width type names stays in one place per dialect.
- `EntityField.Value` setter and `BaseEntity.Get` raw fast-path now
  rtrim string values when `FixedWidthColumn` is true, memoizing back
  into `_raw` so the trim runs once per field per record.

This is independent of MetadataSync but was exposed by the preload work
and is required for the "Unchanged" counts to be accurate.

### Observability

`SyncMetadataEngine.drainWarnings()` exposes non-fatal warnings
collected during preload (malformed `relatedEntities` shapes, oversized
entities). `PushService` surfaces them through `callbacks.onWarn` so
they appear in the CLI's standard warning channel alongside everything
else.