# @memberjunction/ai-prompts
Advanced AI prompt execution engine for MemberJunction. Provides hierarchical template composition, intelligent model selection with failover, parallel execution with judge-based result selection, structured output validation with retry, comprehensive execution tracking, and streaming support. This is the primary interface for executing AI prompts in the MemberJunction framework.
## Guides
- **[Assistant Prefill & Stop Sequences](PREFILL_AND_STOP_SEQUENCES.md)** — How to use `assistantPrefill` and `stopSequences` to control output format, reduce token usage, and eliminate verbose format instructions from prompts.
## Architecture
```mermaid
graph TD
subgraph "@memberjunction/ai-prompts"
PR["AIPromptRunner"]
style PR fill:#2d8659,stroke:#1a5c3a,color:#fff
EP["ExecutionPlanner"]
style EP fill:#7c5295,stroke:#563a6b,color:#fff
PEC["ParallelExecutionCoordinator"]
style PEC fill:#7c5295,stroke:#563a6b,color:#fff
PE["ParallelExecution"]
style PE fill:#7c5295,stroke:#563a6b,color:#fff
end
subgraph "Execution Pipeline"
T["1. Template Rendering
Handlebars + System Placeholders"]
style T fill:#b8762f,stroke:#8a5722,color:#fff
MS["2. Model Selection
Default / Specific / ByPower"]
style MS fill:#b8762f,stroke:#8a5722,color:#fff
EX["3. LLM Execution
With Streaming & Caching"]
style EX fill:#b8762f,stroke:#8a5722,color:#fff
VAL["4. Output Validation
JSON Schema + Retry"]
style VAL fill:#b8762f,stroke:#8a5722,color:#fff
TRK["5. Execution Tracking
AIPromptRun Records"]
style TRK fill:#b8762f,stroke:#8a5722,color:#fff
end
PR --> EP
PR --> PEC
PEC --> PE
PR --> T
T --> MS
MS --> EX
EX --> VAL
VAL --> TRK
subgraph Dependencies
AI["@memberjunction/ai
BaseLLM"]
style AI fill:#2d6a9f,stroke:#1a4971,color:#fff
ACP["@memberjunction/ai-core-plus
AIPromptParams"]
style ACP fill:#2d6a9f,stroke:#1a4971,color:#fff
AIE["@memberjunction/aiengine
AIEngine"]
style AIE fill:#2d6a9f,stroke:#1a4971,color:#fff
TMPL["@memberjunction/templates
TemplateEngine"]
style TMPL fill:#2d6a9f,stroke:#1a4971,color:#fff
CRED["@memberjunction/credentials
CredentialEngine"]
style CRED fill:#2d6a9f,stroke:#1a4971,color:#fff
end
AI --> PR
ACP --> PR
AIE --> PR
TMPL --> PR
CRED --> PR
```
## Installation
```bash
npm install @memberjunction/ai-prompts
```
## Key Features
### Hierarchical Template Composition
Build complex prompts from reusable sub-templates with unlimited nesting depth:
```typescript
import { AIPromptRunner } from '@memberjunction/ai-prompts';
import { AIPromptParams, ChildPromptParam } from '@memberjunction/ai-core-plus';
const runner = new AIPromptRunner();
// Parent template uses {{ analysis }} and {{ summary }} placeholders
const params = new AIPromptParams();
params.prompt = parentPrompt;
params.childPrompts = [
new ChildPromptParam(analysisParams, 'analysis'),
new ChildPromptParam(summaryParams, 'summary')
];
params.data = { userInput: 'complex data to process' };
const result = await runner.ExecutePrompt(params);
```
Execution order:
1. Child prompts render depth-first (children before parents)
2. Sibling prompts at each level execute in parallel
3. Child results replace placeholders in parent template
4. Final composed prompt executes as a single LLM call
### Model Selection Strategies
Three strategies for selecting which AI model executes a prompt:
| Strategy | Description |
|---|---|
| `Default` | Uses the AI configuration to determine the model based on priority and availability |
| `Specific` | Uses explicitly associated models from the AIPromptModels table |
| `ByPower` | Selects the highest PowerRank model matching the prompt's model type |
Model selection precedence (highest to lowest):
1. `AIPromptParams.override` -- Runtime model/vendor override
2. `AIPromptParams.modelSelectionPrompt` -- Alternate prompt for model config
3. Prompt's own model configuration (strategy + associations)
**Credential-evaluation short-circuit (performance):** Candidates are ordered by priority, so once the runner finds the highest-priority candidate that has working credentials it **stops probing the remaining candidates** and records them as `"not-evaluated"` in the `ModelSelection` telemetry. This avoids running a credential/env-var check for every configured model on every prompt run. The full ordered candidate list is still used for failover, so this only trims per-candidate availability *telemetry* for the tail. To force a complete availability report for every candidate (e.g. an admin diagnostic), set `AIPromptParams.forceFullModelEvaluation = true`.
### Parallel Execution with Judging
Execute prompts across multiple models simultaneously and select the best result:
- Configurable execution groups with different models
- AI judge prompt evaluates and ranks results
- Automatic selection of best result based on judge scoring
- Full tracking of all parallel results
### Output Validation and Retry
Automatic validation of AI outputs with configurable retry:
- JSON schema validation against `OutputExample` definitions
- Automatic JSON repair via JSON5 parsing and LLM-based repair
- Configurable retry count with the original or repaired prompts
- Validation syntax cleaning (removes `?`, `*`, `:type` markers from JSON keys)
- Detailed validation attempt tracking
**Output type coercion** (`AIPrompt.OutputType`): the raw model text is coerced before validation —
`string` (verbatim), `number` (`parseFloat`, errors on `NaN`), `boolean` (`true/yes/1` ↔ `false/no/0`,
case-insensitive + trimmed), `date` (`new Date`, errors on invalid), and `object` (JSON). For `object`,
the text is first run through `CleanJSON` (strips markdown fences etc.); if that fails and
`attemptJSONRepair` is set, it retries via JSON5 and then LLM-based repair. When an `OutputExample` is
defined, validation-syntax markers are stripped from result keys automatically. With `skipValidation`,
coercion failures return the raw output instead of throwing.
**Validation behavior** (`AIPrompt.ValidationBehavior`): `Strict` retries up to `MaxRetries` on
validation failure; `Warn` logs and returns the (invalid) output; `None` accepts as-is. The parsed
`OutputExample` is cached by content so it isn't re-parsed on every run/retry.
### Streaming Support
Real-time streaming of LLM responses:
```typescript
const params = new AIPromptParams();
params.prompt = myPrompt;
params.onStreaming = (chunk) => {
process.stdout.write(chunk.content);
};
const result = await runner.ExecutePrompt(params);
```
### Execution Tracking
Every prompt execution creates an `AIPromptRun` record with:
- Model and vendor used
- Template rendering results
- Token usage (prompt + completion)
- Cost tracking
- Execution time
- Parent/child relationships for hierarchical prompts
- Agent run linkage via `agentRunId`
**Persistence is fire-and-forget.** The initial `Running` INSERT and the final `Completed`/`Failed` UPDATE are queued, not awaited, so the model call is never blocked on a DB round-trip. Saves for the same run are chained (the INSERT always completes before the UPDATE, so a slow INSERT can't clobber the finalized row), and the record's ID is available immediately because `NewRecord()` client-generates the UUID. Save failures are logged but never fail the prompt (the record is observability, not part of the success contract). Callers that need the rows durably written before continuing can `await runner.WaitForPendingPromptRunSaves()`.
### Credential Resolution
Hierarchical credential resolution for API keys:
1. `AIPromptParams.credentialId` (per-request override)
2. `AIPromptModel.CredentialID` (prompt-model specific)
3. `AIModelVendor.CredentialID` (model-vendor specific)
4. `AIVendor.CredentialID` (vendor default)
5. `AIPromptParams.apiKeys[]` (legacy runtime keys)
6. `AI_VENDOR_API_KEY__` environment variables (legacy)
### Failover
When a model fails due to rate limiting, authentication errors, or other transient issues, the runner can automatically retry with alternate models from the selection candidates.
## Usage
### Basic Prompt Execution
```typescript
import { AIPromptRunner } from '@memberjunction/ai-prompts';
import { AIPromptParams } from '@memberjunction/ai-core-plus';
import { AIEngine } from '@memberjunction/aiengine';
// Get prompt from metadata
await AIEngine.Instance.Config(false, contextUser);
const prompt = AIEngine.Instance.Prompts.find(p => p.Name === 'Summarize Content');
const runner = new AIPromptRunner();
const params = new AIPromptParams();
params.prompt = prompt;
params.data = { content: documentText, maxLength: 500 };
params.contextUser = contextUser;
const result = await runner.ExecutePrompt(params);
if (result.success) {
console.log(result.result); // Parsed/validated result
console.log(result.promptTokens); // Input tokens used
console.log(result.completionTokens); // Output tokens generated
console.log(result.executionTimeMS); // Execution duration
}
```
### With Progress Tracking
```typescript
params.onProgress = (progress) => {
console.log(`[${progress.step}] ${progress.percentage}% - ${progress.message}`);
};
```
### With Effort Level
```typescript
params.effortLevel = 85; // High effort for thorough analysis (1-100 scale)
```
### With Runtime Model Override
```typescript
params.override = {
modelId: 'specific-model-id',
vendorId: 'specific-vendor-id'
};
```
## Dependencies
- `@memberjunction/ai` -- Core AI abstractions (BaseLLM, ChatParams)
- `@memberjunction/ai-core-plus` -- AIPromptParams, AIPromptRunResult, extended entities
- `@memberjunction/ai-engine-base` -- AIEngineBase metadata cache
- `@memberjunction/aiengine` -- AIEngine server-side operations
- `@memberjunction/core` -- MJ framework core
- `@memberjunction/core-entities` -- Generated entity classes
- `@memberjunction/credentials` -- Credential resolution
- `@memberjunction/templates` -- Template rendering engine
- `@memberjunction/templates-base-types` -- Template base types
- `json5` -- Lenient JSON parsing for repair