# ⚙️ Provider Configuration Guide NeuroLink supports multiple AI providers with flexible authentication methods. This guide covers complete setup for all supported providers. ## Supported Providers - **OpenAI** - GPT-4o, GPT-4o-mini, GPT-4-turbo - **Amazon Bedrock** - Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3 Haiku - **Amazon SageMaker** - Custom models deployed on SageMaker endpoints - **Google Vertex AI** - Gemini 3 Flash/Pro (preview), Gemini 2.5 Flash, Claude 4.0 Sonnet - **Google AI Studio** - Gemini 1.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Flash - **Anthropic** - Claude 4.5 Opus/Sonnet/Haiku, Claude 4.0 Opus/Sonnet, Claude 3.7 Sonnet - **Azure OpenAI** - GPT-4, GPT-3.5-Turbo - **LiteLLM** - 100+ models from all providers via proxy server - **Hugging Face** - 100,000+ open source models including DialoGPT, GPT-2, GPT-Neo - **Ollama** - Local AI models including Llama 2, Code Llama, Mistral, Vicuna - **Mistral AI** - Mistral Tiny, Small, Medium, and Large models - **DeepSeek** - deepseek-chat (V3) and deepseek-reasoner (R1) - **NVIDIA NIM** - Llama 3.3 70B and 400+ catalog models via NVIDIA hosted or self-hosted NIM - **LM Studio** - Any model loaded in LM Studio desktop app (local, no API key required) - **llama.cpp** - Any GGUF model served by llama-server (local, no API key required) ## 💰 Model Availability & Cost Considerations **Important Notes:** - **Model Availability**: Specific models may not be available in all regions or require special access - **Cost Variations**: Pricing differs significantly between providers and models (e.g., Claude 3.5 Sonnet vs GPT-4o) - **Rate Limits**: Each provider has different rate limits and quota restrictions - **Local vs Cloud**: Ollama (local) has no per-request cost but requires hardware resources - **Enterprise Tiers**: AWS Bedrock, Google Vertex AI, and Azure typically offer enterprise pricing **Best Practices:** - Use `new NeuroLink()` with automatic provider selection for cost-optimized routing - Monitor usage through built-in analytics to track costs - Consider local models (Ollama) for development and testing - Check provider documentation for current pricing and availability ## 🏢 Enterprise Proxy Support **All providers support corporate proxy environments automatically.** Simply set environment variables: ```bash export HTTPS_PROXY=http://your-corporate-proxy:port export HTTP_PROXY=http://your-corporate-proxy:port ``` **No code changes required** - NeuroLink automatically detects and uses proxy settings. **For detailed proxy setup** → See [Enterprise & Proxy Setup Guide](../enterprise-proxy-setup.md) ## OpenAI Configuration {#openai} ### Basic Setup ```bash export OPENAI_API_KEY="sk-your-openai-api-key" ``` ### Optional Configuration ```bash export OPENAI_MODEL="gpt-4o" # Default model to use ``` ### Supported Models - `gpt-4o` (default) - Latest multimodal model - `gpt-4o-mini` - Cost-effective variant - `gpt-4-turbo` - High-performance model ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain machine learning" }, provider: "openai", model: "gpt-4o", temperature: 0.7, maxTokens: 500, timeout: "30s", // Optional: Override default 30s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `OPENAI_TIMEOUT='45s'` (optional) ## Amazon Bedrock Configuration {#bedrock} ### 🚨 Critical Setup Requirements **⚠️ IMPORTANT: Anthropic Models Require Inference Profile ARN** For Anthropic Claude models in Bedrock, you **MUST** use the full inference profile ARN, not simple model names: ```bash # ✅ CORRECT: Use full inference profile ARN export BEDROCK_MODEL="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0" # ❌ WRONG: Simple model names cause "not authorized to invoke this API" errors # export BEDROCK_MODEL="anthropic.claude-3-sonnet-20240229-v1:0" ``` ### Basic AWS Credentials ```bash export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_REGION="us-east-2" ``` ### Session Token Support (Development) For temporary credentials (common in development environments): ```bash export AWS_SESSION_TOKEN="your-session-token" # Required for temporary credentials ``` ### Available Inference Profile ARNs Replace `` with your AWS account ID: ```bash # Claude 3.7 Sonnet (Latest - Recommended) BEDROCK_MODEL="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0" # Claude 3.5 Sonnet BEDROCK_MODEL="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0" # Claude 3 Haiku BEDROCK_MODEL="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0" ``` ### Why Inference Profiles? - **Cross-Region Access**: Faster access across AWS regions - **Better Performance**: Optimized routing and response times - **Higher Availability**: Improved model availability and reliability - **Different Permissions**: Separate permission model from base models ### Complete Bedrock Configuration ```bash # Required AWS credentials export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_REGION="us-east-2" # Optional: Session token for temporary credentials export AWS_SESSION_TOKEN="your-session-token" # Required: Inference profile ARN (not simple model name) export BEDROCK_MODEL="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0" # Alternative environment variable names (backward compatibility) export BEDROCK_MODEL_ID="arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0" ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Write a haiku about AI" }, provider: "bedrock", temperature: 0.8, maxTokens: 100, timeout: "45s", // Optional: Override default 45s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 45 seconds (longer due to cold starts) - **Supported Formats**: Milliseconds (`45000`), human-readable (`'45s'`, `'1m'`, `'2m'`) - **Environment Variable**: `BEDROCK_TIMEOUT='1m'` (optional) ### Account Setup Requirements To use AWS Bedrock, ensure your AWS account has: 1. **Bedrock Service Access**: Enable Bedrock in your AWS region 2. **Model Access**: Request access to Anthropic Claude models 3. **IAM Permissions**: Your credentials need `bedrock:InvokeModel` permissions 4. **Inference Profile Access**: Access to the specific inference profiles ### IAM Policy Example ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": ["arn:aws:bedrock:*:*:inference-profile/us.anthropic.*"] } ] } ``` ## Amazon SageMaker Configuration **Amazon SageMaker** allows you to use your own custom models deployed on SageMaker endpoints. This provider is perfect for: - **Custom Model Hosting** - Deploy your fine-tuned models - **Enterprise Compliance** - Full control over model infrastructure - **Cost Optimization** - Pay only for inference usage - **Performance** - Dedicated compute resources ### Basic AWS Credentials ```bash export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_REGION="us-east-1" # Your SageMaker region ``` ### SageMaker-Specific Configuration ```bash # Required: Your SageMaker endpoint name export SAGEMAKER_DEFAULT_ENDPOINT="your-endpoint-name" # Optional: Timeout and retry settings export SAGEMAKER_TIMEOUT="30000" # 30 seconds (default) export SAGEMAKER_MAX_RETRIES="3" # Retry attempts (default) ``` ### Advanced Model Configuration ```bash # Optional: Model-specific settings export SAGEMAKER_MODEL="custom-model-name" # Model identifier export SAGEMAKER_MODEL_TYPE="custom" # Model type export SAGEMAKER_CONTENT_TYPE="application/json" export SAGEMAKER_ACCEPT="application/json" ``` ### Session Token Support (for IAM Roles) ```bash export AWS_SESSION_TOKEN="your-session-token" # For temporary credentials ``` ### Complete SageMaker Configuration ```bash # AWS Credentials export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE" export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" export AWS_REGION="us-east-1" # SageMaker Settings export SAGEMAKER_DEFAULT_ENDPOINT="my-model-endpoint-2024" export SAGEMAKER_TIMEOUT="45000" export SAGEMAKER_MAX_RETRIES="5" ``` ### Usage Example ```bash # Test SageMaker endpoint npx @juspay/neurolink sagemaker test my-endpoint # Generate text with SageMaker npx @juspay/neurolink generate "Analyze this data" --provider sagemaker # Interactive setup npx @juspay/neurolink sagemaker setup ``` ### CLI Commands ```bash # Check SageMaker configuration npx @juspay/neurolink sagemaker status # Validate connection npx @juspay/neurolink sagemaker validate # Show current configuration npx @juspay/neurolink sagemaker config # Performance benchmark npx @juspay/neurolink sagemaker benchmark my-endpoint # List available endpoints (requires AWS CLI) npx @juspay/neurolink sagemaker list-endpoints ``` ### Timeout Configuration Configure request timeouts for SageMaker endpoints: ```bash export SAGEMAKER_TIMEOUT="60000" # 60 seconds for large models ``` ### Prerequisites 1. **SageMaker Endpoint**: Deploy a model to SageMaker and get the endpoint name 2. **AWS IAM Permissions**: Ensure your credentials have `sagemaker:InvokeEndpoint` permission 3. **Endpoint Status**: Endpoint must be in "InService" status ### IAM Policy Example ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["sagemaker:InvokeEndpoint"], "Resource": "arn:aws:sagemaker:*:*:endpoint/*" } ] } ``` ### Environment Variables Reference | Variable | Required | Default | Description | | ---------------------------- | -------- | --------- | ------------------------- | | `AWS_ACCESS_KEY_ID` | ✅ | - | AWS access key | | `AWS_SECRET_ACCESS_KEY` | ✅ | - | AWS secret key | | `AWS_REGION` | ✅ | us-east-1 | AWS region | | `SAGEMAKER_DEFAULT_ENDPOINT` | ✅ | - | SageMaker endpoint name | | `SAGEMAKER_TIMEOUT` | ❌ | 30000 | Request timeout (ms) | | `SAGEMAKER_MAX_RETRIES` | ❌ | 3 | Retry attempts | | `AWS_SESSION_TOKEN` | ❌ | - | For temporary credentials | ### 📖 Complete SageMaker Guide For comprehensive SageMaker setup, advanced features, and production deployment: **[📖 Complete SageMaker Integration Guide](../sagemaker-integration.md)** - Includes: - Model deployment examples - Cost optimization strategies - Enterprise security patterns - Multi-model endpoint management - Performance testing and monitoring - Troubleshooting and debugging ## Google Vertex AI Configuration {#vertex} NeuroLink supports **three authentication methods** for Google Vertex AI to accommodate different deployment environments: ### Method 1: Service Account File (Recommended for Production) Best for production environments where you can store service account files securely. ```bash export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" export GOOGLE_VERTEX_PROJECT="your-project-id" export GOOGLE_VERTEX_LOCATION="us-central1" ``` **Setup Steps:** 1. Create a service account in Google Cloud Console 2. Download the service account JSON file 3. Set the file path in `GOOGLE_APPLICATION_CREDENTIALS` ### Method 2: Service Account JSON String (Good for Containers/Cloud) Best for containerized environments where file storage is limited. ```bash export GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project",...}' export GOOGLE_VERTEX_PROJECT="your-project-id" export GOOGLE_VERTEX_LOCATION="us-central1" ``` **Setup Steps:** 1. Copy the entire contents of your service account JSON file 2. Set it as a single-line string in `GOOGLE_SERVICE_ACCOUNT_KEY` 3. NeuroLink will automatically create a temporary file for authentication ### Method 3: Individual Environment Variables (Good for CI/CD) Best for CI/CD pipelines where individual secrets are managed separately. ```bash export GOOGLE_AUTH_CLIENT_EMAIL="service-account@project.iam.gserviceaccount.com" export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nMIIE..." export GOOGLE_VERTEX_PROJECT="your-project-id" export GOOGLE_VERTEX_LOCATION="us-central1" ``` **Setup Steps:** 1. Extract `client_email` and `private_key` from your service account JSON 2. Set them as individual environment variables 3. NeuroLink will automatically assemble them into a temporary service account file ### Authentication Detection NeuroLink automatically detects and uses the best available authentication method in this order: 1. **File Path** (`GOOGLE_APPLICATION_CREDENTIALS`) - if file exists 2. **JSON String** (`GOOGLE_SERVICE_ACCOUNT_KEY`) - if provided 3. **Individual Variables** (`GOOGLE_AUTH_CLIENT_EMAIL` + `GOOGLE_AUTH_PRIVATE_KEY`) - if both provided ### Complete Vertex AI Configuration ```bash # Required for all methods export GOOGLE_VERTEX_PROJECT="your-gcp-project-id" # Optional export GOOGLE_VERTEX_LOCATION="us-east5" # Default: us-east5 export VERTEX_MODEL_ID="claude-sonnet-4@20250514" # Default model # Choose ONE authentication method: # Method 1: Service Account File export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" # Method 2: Service Account JSON String export GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...","client_email":"...","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}' # Method 3: Individual Environment Variables export GOOGLE_AUTH_CLIENT_EMAIL="service-account@your-project.iam.gserviceaccount.com" export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC...\n-----END PRIVATE KEY-----" ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain quantum computing" }, provider: "vertex", model: "gemini-2.5-flash", temperature: 0.6, maxTokens: 800, timeout: "1m", // Optional: Override default 60s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 60 seconds (longer due to GCP initialization) - **Supported Formats**: Milliseconds (`60000`), human-readable (`'60s'`, `'1m'`, `'2m'`) - **Environment Variable**: `VERTEX_TIMEOUT='90s'` (optional) ### Supported Models **Gemini 3 (Preview):** - `gemini-3-flash-preview` - Latest Gemini 3 Flash with extended thinking support - `gemini-3-pro-preview` - Latest Gemini 3 Pro with extended thinking support **Gemini 2.x:** - `gemini-2.5-flash` (default) - Fast, efficient model **Anthropic Models:** - `claude-sonnet-4@20250514` - High-quality reasoning (Anthropic via Vertex AI) **Video Generation:** - `veo-3.1` / `veo-3.1-generate-001` - Video generation from image + text prompt (8-second videos with audio) > **Video Generation:** Use `output.mode: "video"` with Veo 3.1 to generate videos. See [Video Generation Guide](../features/video-generation.md). > **PPT Generation:** Use `output.mode: "ppt"` with supported providers (Vertex AI, Google AI, OpenAI, Anthropic, Azure OpenAI, or Bedrock) and compatible text models to generate PowerPoint presentations. See [PPT Generation Guide](../features/ppt-generation.md). ### Gemini 3 Extended Thinking Configuration Gemini 3 models support **extended thinking** (also known as "thinking mode"), which allows the model to reason more deeply before providing responses. This is particularly useful for complex reasoning tasks, math problems, and multi-step analysis. #### Environment Variables for Gemini 3 ```bash # Required: Google Vertex AI credentials (same as above) export GOOGLE_VERTEX_PROJECT="your-project-id" export GOOGLE_VERTEX_LOCATION="us-central1" # Gemini 3 model selection export VERTEX_MODEL_ID="gemini-3-flash-preview" # or gemini-3-pro-preview ``` #### Extended Thinking Configuration Configure thinking level to control how much reasoning the model performs: ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Enable extended thinking with thinkingLevel configuration const result = await neurolink.generate({ input: { text: "Solve this complex math problem step by step: ..." }, provider: "vertex", model: "gemini-3-flash-preview", temperature: 0.7, maxTokens: 4000, // Gemini 3 extended thinking configuration thinkingLevel: "medium", // Options: "minimal", "low", "medium", "high" }); ``` #### Thinking Levels | Level | Description | Best For | | --------- | --------------------------------------- | --------------------------------- | | `minimal` | No extended thinking, fastest responses | Simple queries, quick answers | | `low` | Brief reasoning before responding | Moderate complexity tasks | | `medium` | Balanced reasoning depth (recommended) | Most use cases | | `high` | Deep reasoning, thorough analysis | Complex math, multi-step problems | #### Usage Example with Extended Thinking ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Complex reasoning task with high thinking level const result = await neurolink.generate({ input: { text: "Analyze the following business scenario and provide strategic recommendations...", }, provider: "vertex", model: "gemini-3-pro-preview", thinkingLevel: "high", maxTokens: 8000, timeout: "2m", // Extended timeout for deep thinking }); console.log(result.content); ``` #### CLI Usage with Gemini 3 ```bash # Generate with Gemini 3 Flash npx @juspay/neurolink generate "Explain quantum computing" --provider vertex --model gemini-3-flash-preview # Stream with Gemini 3 Pro npx @juspay/neurolink stream "Write a detailed analysis" --provider vertex --model gemini-3-pro-preview ``` ### Claude Sonnet 4 via Vertex AI Configuration NeuroLink provides first-class support for Claude Sonnet 4 through Google Vertex AI. This configuration has been thoroughly tested and verified working. #### Working Configuration Example ```bash # ✅ VERIFIED WORKING CONFIGURATION export GOOGLE_VERTEX_PROJECT="your-project-id" export GOOGLE_VERTEX_LOCATION="us-east5" export GOOGLE_AUTH_CLIENT_EMAIL="service-account@your-project.iam.gserviceaccount.com" export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY----- [Your private key content here] -----END PRIVATE KEY-----" ``` #### Performance Metrics (Verified) - **Generation Response**: ~2.6 seconds - **Health Check**: Working status detection - **Streaming**: Fully functional - **Tool Integration**: Ready for MCP tools #### Usage Examples ```bash # Generation test node dist/cli/index.js generate "test" --provider vertex --model claude-sonnet-4@20250514 # Streaming test node dist/cli/index.js stream "Write a short poem" --provider vertex --model claude-sonnet-4@20250514 # Health check node dist/cli/index.js status # Expected: vertex: ✅ Working (2599ms) ``` ### Google Cloud Setup Requirements To use Google Vertex AI, ensure your Google Cloud project has: 1. **Vertex AI API Enabled**: Enable the Vertex AI API in your project 2. **Service Account**: Create a service account with Vertex AI permissions 3. **Model Access**: Ensure access to the models you want to use 4. **Billing Enabled**: Vertex AI requires an active billing account ### Service Account Permissions Your service account needs these IAM roles: - `Vertex AI User` or `Vertex AI Admin` - `Service Account Token Creator` (if using impersonation) ## Google AI Studio Configuration {#google-ai} Google AI Studio provides direct access to Google's Gemini models with a simple API key authentication. ### Basic Setup ```bash export GOOGLE_AI_API_KEY="AIza-your-google-ai-api-key" ``` ### Optional Configuration ```bash export GOOGLE_AI_MODEL="gemini-2.5-pro" # Default model to use ``` ### Supported Models - `gemini-2.5-pro` - Comprehensive, detailed responses for complex tasks - `gemini-2.5-flash` (recommended) - Fast, efficient responses for most tasks ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain the future of AI" }, provider: "google-ai", model: "gemini-2.5-flash", temperature: 0.7, maxTokens: 1000, timeout: "30s", // Optional: Override default 30s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `GOOGLE_AI_TIMEOUT='45s'` (optional) ### How to Get Google AI Studio API Key 1. **Visit Google AI Studio**: Go to [aistudio.google.com](https://aistudio.google.com) 2. **Sign In**: Use your Google account credentials 3. **Create API Key**: - Navigate to the **API Keys** section - Click **Create API Key** - Copy the generated key (starts with `AIza`) 4. **Set Environment**: Add to your `.env` file or export directly ### Google AI Studio vs Vertex AI | Feature | Google AI Studio | Google Vertex AI | | ----------------------- | --------------------------- | ---------------------------- | | **Setup Complexity** | 🟢 Simple (API key only) | 🟡 Complex (Service account) | | **Authentication** | API key | Service account JSON | | **Free Tier** | ✅ Generous free limits | ❌ Pay-per-use only | | **Enterprise Features** | ❌ Limited | ✅ Full enterprise support | | **Model Selection** | 🎯 Latest Gemini models | 🔄 Broader model catalog | | **Best For** | Prototyping, small projects | Production, enterprise apps | ### Complete Google AI Studio Configuration ```bash # Required: API key from Google AI Studio (choose one) export GOOGLE_AI_API_KEY="AIza-your-google-ai-api-key" # OR export GOOGLE_GENERATIVE_AI_API_KEY="AIza-your-google-ai-api-key" # Optional: Default model selection export GOOGLE_AI_MODEL="gemini-2.5-pro" ``` ### Rate Limits and Quotas Google AI Studio includes generous free tier limits: - **Free Tier**: 15 requests per minute, 1,500 requests per day - **Paid Usage**: Higher limits available with billing enabled - **Model-Specific**: Different models may have different rate limits ### Error Handling for Google AI Studio ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); try { const result = await neurolink.generate({ input: { text: "Generate a creative story" }, provider: "google-ai", temperature: 0.8, maxTokens: 500, }); console.log(result.content); } catch (error) { if (error.message.includes("API_KEY_INVALID")) { console.error( "Invalid Google AI API key. Check your GOOGLE_AI_API_KEY environment variable.", ); } else if (error.message.includes("QUOTA_EXCEEDED")) { console.error("Rate limit exceeded. Wait before making more requests."); } else { console.error("Google AI Studio error:", error.message); } } ``` ### Security Considerations - **API Key Security**: Treat API keys as sensitive credentials - **Environment Variables**: Never commit API keys to version control - **Rate Limiting**: Implement client-side rate limiting for production apps - **Monitoring**: Monitor usage to avoid unexpected charges ## LiteLLM Configuration LiteLLM provides access to 100+ models through a unified proxy server, allowing you to use any AI provider through a single interface. ### Prerequisites 1. Install LiteLLM: ```bash pip install litellm ``` 2. Start LiteLLM proxy server: ```bash # Basic usage litellm --port 4000 # With configuration file (recommended) litellm --config litellm_config.yaml --port 4000 ``` ### Basic Setup ```bash export LITELLM_BASE_URL="http://localhost:4000" export LITELLM_API_KEY="sk-anything" # Optional, any value works ``` ### Optional Configuration ```bash export LITELLM_MODEL="openai/gpt-4o-mini" # Default model to use ``` ### Supported Model Formats LiteLLM uses the `provider/model` format: ```bash # OpenAI models openai/gpt-4o openai/gpt-4o-mini openai/gpt-4 # Anthropic models anthropic/claude-3-5-sonnet anthropic/claude-3-haiku # Google models google/gemini-2.0-flash vertex_ai/gemini-pro # Mistral models mistral/mistral-large mistral/mixtral-8x7b # And many more... ``` ### LiteLLM Configuration File (Optional) Create `litellm_config.yaml` for advanced configuration: ```yaml model_list: - model_name: openai/gpt-4o litellm_params: model: gpt-4o api_key: os.environ/OPENAI_API_KEY - model_name: anthropic/claude-3-5-sonnet litellm_params: model: claude-3-5-sonnet-20241022 api_key: os.environ/ANTHROPIC_API_KEY - model_name: google/gemini-2.0-flash litellm_params: model: gemini-2.0-flash api_key: os.environ/GOOGLE_AI_API_KEY ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Use LiteLLM provider with specific model const result = await neurolink.generate({ input: { text: "Explain quantum computing" }, provider: "litellm", model: "openai/gpt-4o", temperature: 0.7, }); console.log(result.content); ``` ### Advanced Features - **Cost Tracking**: Built-in usage and cost monitoring - **Load Balancing**: Automatic failover between providers - **Rate Limiting**: Built-in rate limiting and retry logic - **Caching**: Optional response caching for efficiency ### Production Considerations - **Deployment**: Run LiteLLM proxy as a separate service - **Security**: Configure authentication for production environments - **Scaling**: Use Docker/Kubernetes for high-availability deployments - **Monitoring**: Enable logging and metrics collection ## Hugging Face Configuration {#huggingface} ### Basic Setup ```bash export HUGGINGFACE_API_KEY="hf_your_token_here" ``` ### Optional Configuration ```bash export HUGGINGFACE_MODEL="microsoft/DialoGPT-medium" # Default model ``` ### Model Selection Strategy Hugging Face hosts 100,000+ models. Choose based on: - **Task**: text-generation, conversational, code - **Size**: Larger models = better quality but slower - **License**: Check model licenses for commercial use ### Rate Limiting - Free tier: Limited requests - PRO tier: Higher limits - Handle 503 errors (model loading) with retry logic ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain machine learning" }, provider: "huggingface", model: "gpt2", temperature: 0.8, maxTokens: 200, timeout: "45s", // Optional: Override default 30s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `HUGGINGFACE_TIMEOUT='45s'` (optional) - **Note**: Model loading may take additional time on first request ### Popular Models - `microsoft/DialoGPT-medium` (default) - Conversational AI - `gpt2` - Classic GPT-2 - `distilgpt2` - Lightweight GPT-2 - `EleutherAI/gpt-neo-2.7B` - Large open model - `bigscience/bloom-560m` - Multilingual model ### Getting Started with Hugging Face 1. **Create Account**: Visit [huggingface.co](https://huggingface.co) 2. **Generate Token**: Go to Settings → Access Tokens 3. **Create Token**: Click "New token" with "read" scope 4. **Set Environment**: Export token as `HUGGINGFACE_API_KEY` ## Ollama Configuration {#ollama} ### Local Installation Required Ollama must be installed and running locally. ### Installation Steps 1. **macOS**: ```bash brew install ollama # or curl -fsSL https://ollama.ai/install.sh | sh ``` 2. **Linux**: ```bash curl -fsSL https://ollama.ai/install.sh | sh ``` 3. **Windows**: Download from [ollama.ai](https://ollama.ai) ### Model Management ```bash # List models ollama list # Pull new model ollama pull llama2 # Remove model ollama rm llama2 ``` ### Privacy Benefits - **100% Local**: No data leaves your machine - **No API Keys**: No authentication required - **Offline Capable**: Works without internet ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Write a poem about privacy" }, provider: "ollama", model: "llama2", temperature: 0.7, maxTokens: 300, timeout: "10m", // Optional: Override default 5m timeout }); ``` ### Timeout Configuration - **Default Timeout**: 5 minutes (longer for local model processing) - **Supported Formats**: Milliseconds (`300000`), human-readable (`'5m'`, `'10m'`, `'30m'`) - **Environment Variable**: `OLLAMA_TIMEOUT='10m'` (optional) - **Note**: Local models may need longer timeouts for complex prompts ### Popular Models - `llama2` (default) - Meta's Llama 2 - `codellama` - Code-specialized Llama - `mistral` - Mistral 7B - `vicuna` - Fine-tuned Llama - `phi` - Microsoft's small model ### Environment Variables ```bash # Optional: Custom Ollama server URL export OLLAMA_BASE_URL="http://localhost:11434" # Optional: Default model export OLLAMA_MODEL="llama2" ``` ### Performance Optimization ```bash # Set memory limit OLLAMA_MAX_MEMORY=8GB ollama serve # Use specific GPU OLLAMA_CUDA_DEVICE=0 ollama serve ``` ## OpenRouter Configuration {#openrouter} OpenRouter provides access to 300+ AI models from 60+ providers through a single unified API with automatic failover and cost optimization. ### Basic Setup ```bash export OPENROUTER_API_KEY="sk-or-v1-your-api-key" ``` ### Optional Configuration ```bash # Attribution for OpenRouter dashboard export OPENROUTER_REFERER="https://yourapp.com" export OPENROUTER_APP_NAME="Your App Name" # Default model export OPENROUTER_MODEL="anthropic/claude-3-5-sonnet" ``` ### Supported Models OpenRouter supports 300+ models including: - `anthropic/claude-3-5-sonnet` (default) - Best overall quality - `openai/gpt-4o` - Excellent code generation - `google/gemini-2.0-flash` - Fast and cost-effective - `meta-llama/llama-3.1-70b-instruct` - Best open source ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain quantum computing" }, provider: "openrouter", model: "anthropic/claude-3-5-sonnet", temperature: 0.7, maxTokens: 500, }); ``` ### Complete Guide For comprehensive OpenRouter setup including model selection, cost optimization, and best practices, see the [OpenRouter Provider Guide](./providers/openrouter.md). ## Mistral AI Configuration {#mistral} ### Basic Setup ```bash export MISTRAL_API_KEY="your_mistral_api_key" ``` ### European Compliance - GDPR compliant - Data processed in Europe - No training on user data ### Model Selection - **mistral-tiny**: Fast responses, basic tasks - **mistral-small**: Balanced choice (default) - **mistral-medium**: Complex reasoning - **mistral-large**: Maximum capability ### Cost Optimization Mistral offers competitive pricing: - Tiny: $0.14 / 1M tokens - Small: $0.6 / 1M tokens - Medium: $2.5 / 1M tokens - Large: $8 / 1M tokens ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Translate to French: Hello world" }, provider: "mistral", model: "mistral-small", temperature: 0.3, maxTokens: 100, timeout: "30s", // Optional: Override default 30s timeout }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `MISTRAL_TIMEOUT='45s'` (optional) ### Getting Started with Mistral AI 1. **Create Account**: Visit [mistral.ai](https://mistral.ai) 2. **Get API Key**: Navigate to API Keys section 3. **Generate Key**: Create new API key 4. **Add Billing**: Set up payment method ### Environment Variables ```bash # Required: API key export MISTRAL_API_KEY="your_mistral_api_key" # Optional: Default model export MISTRAL_MODEL="mistral-small" # Optional: Custom endpoint export MISTRAL_ENDPOINT="https://api.mistral.ai" ``` ### Multilingual Support Mistral models excel at multilingual tasks: - English, French, Spanish, German, Italian - Code generation in multiple programming languages - Translation between supported languages ## Anthropic Configuration {#anthropic} Direct access to Anthropic's Claude models. Supports both API key and OAuth (Claude subscription) authentication. ### Basic Setup ```bash # Option 1: API key authentication export ANTHROPIC_API_KEY="sk-ant-api03-your-key-here" # Option 2: OAuth authentication (Claude Pro/Max subscribers) neurolink auth login anthropic ``` ### Optional Configuration ```bash export ANTHROPIC_MODEL="claude-3-5-sonnet-20241022" # Default model ``` ### Supported Models - `claude-opus-4-5-20251101` - Claude 4.5 Opus (most capable) - `claude-sonnet-4-5-20250929` - Claude 4.5 Sonnet - `claude-haiku-4-5-20251001` - Claude 4.5 Haiku (fastest) - `claude-opus-4-1-20250805` - Claude 4.1 Opus - `claude-opus-4-20250514` - Claude 4.0 Opus - `claude-sonnet-4-20250514` - Claude 4.0 Sonnet - `claude-3-7-sonnet-20250219` - Claude 3.7 Sonnet ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain quantum computing" }, provider: "anthropic", model: "claude-3-5-sonnet-20241022", temperature: 0.7, maxTokens: 1000, timeout: "30s", }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `ANTHROPIC_TIMEOUT='45s'` (optional) ### Getting Started with Anthropic 1. **API Key**: Visit [console.anthropic.com](https://console.anthropic.com), navigate to API Keys, and export as `ANTHROPIC_API_KEY` 2. **OAuth (Subscription)**: Run `neurolink auth login anthropic` to authenticate with your Claude Pro/Max subscription ### Complete Guide For comprehensive Anthropic setup including OAuth configuration, subscription tiers, and advanced options, see the [Detailed Anthropic Provider Guide](providers/anthropic.md) and the [Claude Subscription Guide](../features/claude-subscription.md). ## Azure OpenAI Configuration {#azure} Azure OpenAI provides enterprise-grade access to OpenAI models through Microsoft Azure. ### Basic Setup ```bash export AZURE_OPENAI_API_KEY="your-azure-openai-key" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_ID="your-deployment-name" ``` ### Optional Configuration ```bash export AZURE_OPENAI_API_VERSION="2024-02-15-preview" # API version ``` ### Supported Models Azure OpenAI supports deployment of: - `gpt-4o` - Latest multimodal model - `gpt-4` - Advanced reasoning - `gpt-4-turbo` - Optimized performance - `gpt-3.5-turbo` - Cost-effective ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain machine learning" }, provider: "azure", temperature: 0.7, maxTokens: 500, timeout: "30s", }); ``` ### Timeout Configuration - **Default Timeout**: 30 seconds - **Supported Formats**: Milliseconds (`30000`), human-readable (`'30s'`, `'1m'`, `'5m'`) - **Environment Variable**: `AZURE_TIMEOUT='45s'` (optional) ### Azure Setup Requirements 1. **Azure Subscription**: Active Azure subscription 2. **Azure OpenAI Resource**: Create Azure OpenAI resource in Azure Portal 3. **Model Deployment**: Deploy a model to get deployment ID 4. **API Key**: Get API key from resource's Keys and Endpoint section ### Environment Variables Reference | Variable | Required | Description | | ---------------------------- | -------- | ----------------------------- | | `AZURE_OPENAI_API_KEY` | ✅ | Azure OpenAI API key | | `AZURE_OPENAI_ENDPOINT` | ✅ | Resource endpoint URL | | `AZURE_OPENAI_DEPLOYMENT_ID` | ✅ | Model deployment name | | `AZURE_OPENAI_API_VERSION` | ❌ | API version (default: latest) | ## OpenAI Compatible Configuration {#openai-compatible} Connect to any OpenAI-compatible API endpoint (LocalAI, vLLM, Ollama with OpenAI compatibility, etc.) ### Basic Setup ```bash export OPENAI_COMPATIBLE_BASE_URL="http://localhost:8080/v1" export OPENAI_COMPATIBLE_API_KEY="optional-api-key" # Some servers don't require this ``` ### Optional Configuration ```bash export OPENAI_COMPATIBLE_MODEL="your-model-name" ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Hello from custom endpoint" }, provider: "openai-compatible", model: "your-model", temperature: 0.7, maxTokens: 500, }); ``` ### Compatible Servers This works with any server implementing the OpenAI API: - **LocalAI** - Local AI server - **vLLM** - High-performance inference server - **Ollama** (with `OLLAMA_OPENAI_COMPAT=1`) - **Text Generation WebUI** - **Custom inference servers** ### Environment Variables ```bash # Required: Base URL of your OpenAI-compatible server export OPENAI_COMPATIBLE_BASE_URL="http://localhost:8080/v1" # Optional: API key (if your server requires one) export OPENAI_COMPATIBLE_API_KEY="your-api-key-if-needed" # Optional: Default model name export OPENAI_COMPATIBLE_MODEL="your-model-name" ``` ## DeepSeek Configuration {#deepseek} DeepSeek provides cost-effective access to its own frontier models: the general-purpose V3 chat model and the R1 reasoning model. ### Basic Setup ```bash export DEEPSEEK_API_KEY="sk-your-deepseek-api-key" ``` ### Optional Configuration ```bash export DEEPSEEK_MODEL="deepseek-chat" # Default: deepseek-chat export DEEPSEEK_BASE_URL="https://api.deepseek.com" # Default base URL (override for compatible proxies) ``` ### Supported Models - `deepseek-chat` (default) - DeepSeek V3, high-quality general chat at low cost - `deepseek-reasoner` - DeepSeek R1, extended chain-of-thought reasoning (thinking mode) ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // General chat with DeepSeek V3 const result = await neurolink.generate({ input: { text: "Explain transformers in simple terms" }, provider: "deepseek", model: "deepseek-chat", temperature: 0.7, maxTokens: 1000, }); // Extended reasoning with DeepSeek R1 const reasoned = await neurolink.generate({ input: { text: "Solve step by step: ..." }, provider: "deepseek", model: "deepseek-reasoner", thinkingLevel: "high", }); ``` ### CLI Usage ```bash # Use DeepSeek V3 npx @juspay/neurolink generate "Explain quantum computing" --provider deepseek # Use DeepSeek R1 with alias npx @juspay/neurolink generate "Solve this math problem" --provider ds --model deepseek-reasoner ``` ### Getting Started with DeepSeek 1. **Create Account**: Visit [platform.deepseek.com](https://platform.deepseek.com) 2. **Generate Key**: Navigate to **API Keys** and create a new key 3. **Add Billing**: Top up your account balance at [platform.deepseek.com/usage](https://platform.deepseek.com/usage) 4. **Set Environment**: Export `DEEPSEEK_API_KEY` ### Environment Variables Reference | Variable | Required | Default | Description | | ------------------- | -------- | -------------------------- | ------------------------------------------------------- | | `DEEPSEEK_API_KEY` | ✅ | - | DeepSeek API key | | `DEEPSEEK_MODEL` | ❌ | `deepseek-chat` | Model: `deepseek-chat` (V3) or `deepseek-reasoner` (R1) | | `DEEPSEEK_BASE_URL` | ❌ | `https://api.deepseek.com` | Override for proxies or alternative endpoints | ### Provider ID and Aliases - **Provider ID**: `deepseek` - **Aliases**: `ds` --- ## NVIDIA NIM Configuration {#nvidia-nim} NVIDIA NIM provides access to 400+ optimized models through NVIDIA's hosted cloud inference API, and also supports self-hosted NIM deployments. ### Basic Setup ```bash export NVIDIA_NIM_API_KEY="nvapi-your-nvidia-api-key" ``` ### Optional Configuration ```bash export NVIDIA_NIM_MODEL="meta/llama-3.3-70b-instruct" # Default model export NVIDIA_NIM_BASE_URL="https://integrate.api.nvidia.com/v1" # Default; override for self-hosted NIM ``` ### NIM-Specific Extras (Advanced) These environment variables pass NIM-specific request body extensions. Leave them unset unless you have a specific need: ```bash export NVIDIA_NIM_TOP_K="" # Integer; -1 or unset = disabled export NVIDIA_NIM_MIN_P="" # Float; 0 or unset = disabled export NVIDIA_NIM_REPETITION_PENALTY="" # Float; 1.0 or unset = disabled export NVIDIA_NIM_MIN_TOKENS="" # Integer; 0 or unset = disabled export NVIDIA_NIM_CHAT_TEMPLATE="" # Override model chat template (advanced) ``` ### Supported Models - `meta/llama-3.3-70b-instruct` (default) - Meta Llama 3.3 70B Instruct - Any model from the [NVIDIA NIM catalog](https://build.nvidia.com/models) ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Explain GPU architecture" }, provider: "nvidia-nim", model: "meta/llama-3.3-70b-instruct", temperature: 0.7, maxTokens: 1000, }); ``` ### CLI Usage ```bash # Use NVIDIA NIM with default model npx @juspay/neurolink generate "Explain GPU architecture" --provider nvidia-nim # Use nim alias npx @juspay/neurolink generate "Hello" --provider nim --model "mistralai/mistral-7b-instruct-v0.3" ``` ### Self-Hosted NIM Endpoints Override the base URL to point at your own NIM deployment: ```bash export NVIDIA_NIM_BASE_URL="http://your-nim-server:8000/v1" ``` ### Getting Started with NVIDIA NIM 1. **Create Account**: Visit [build.nvidia.com](https://build.nvidia.com/) 2. **Open Settings**: Navigate to **Settings → API Keys** 3. **Generate Key**: Create a new Bearer token API key 4. **Browse Models**: Explore the catalog at [build.nvidia.com/models](https://build.nvidia.com/models) 5. **Set Environment**: Export `NVIDIA_NIM_API_KEY` ### Environment Variables Reference | Variable | Required | Default | Description | | ------------------------------- | -------- | ------------------------------------- | --------------------------------------- | | `NVIDIA_NIM_API_KEY` | ✅ | - | NVIDIA NIM API key (Bearer token) | | `NVIDIA_NIM_MODEL` | ❌ | `meta/llama-3.3-70b-instruct` | Default model | | `NVIDIA_NIM_BASE_URL` | ❌ | `https://integrate.api.nvidia.com/v1` | Override for self-hosted NIM | | `NVIDIA_NIM_TOP_K` | ❌ | - | Top-K sampling parameter | | `NVIDIA_NIM_MIN_P` | ❌ | - | Min-P sampling parameter | | `NVIDIA_NIM_REPETITION_PENALTY` | ❌ | - | Repetition penalty | | `NVIDIA_NIM_MIN_TOKENS` | ❌ | - | Minimum tokens to generate | | `NVIDIA_NIM_CHAT_TEMPLATE` | ❌ | - | Override model chat template (advanced) | ### Provider ID and Aliases - **Provider ID**: `nvidia-nim` - **Aliases**: `nim`, `nvidia` --- ## LM Studio Configuration {#lm-studio} LM Studio is a local AI provider — it runs models entirely on your machine with no data sent to any external service. No API key is required for standard (non-proxied) installations. ### Prerequisites 1. Install LM Studio from [lmstudio.ai](https://lmstudio.ai/) 2. Open LM Studio and download a model from the **Discover** tab 3. Go to **Local Server** and click **Start Server** The server starts at `http://localhost:1234/v1` by default. NeuroLink auto-discovers the currently loaded model via `/v1/models` — you do not need to specify a model name. ### Optional Configuration ```bash export LM_STUDIO_BASE_URL="http://localhost:1234/v1" # Default; override if server is on a different host/port export LM_STUDIO_MODEL="" # Blank = auto-discover; set to force a specific model ID # export LM_STUDIO_API_KEY="your-key" # Only needed behind an auth-proxying reverse-proxy ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Model is auto-discovered from LM Studio const result = await neurolink.generate({ input: { text: "Explain machine learning" }, provider: "lm-studio", temperature: 0.7, maxTokens: 500, }); // Or specify a model explicitly (must be loaded in LM Studio) const result2 = await neurolink.generate({ input: { text: "Write a poem" }, provider: "lm-studio", model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", }); ``` ### CLI Usage ```bash # Auto-discover loaded model npx @juspay/neurolink generate "Hello from LM Studio" --provider lm-studio # Use alias npx @juspay/neurolink generate "Hello" --provider lmstudio ``` ### Notes - **API key**: Not required for vanilla LM Studio installs. Set `LM_STUDIO_API_KEY` only when running LM Studio behind an authenticating reverse-proxy. - **Model auto-discovery**: If the server is not running or has no model loaded, NeuroLink logs a warning and falls back gracefully. Start LM Studio and load a model, then retry. ### Timeout Configuration - **Default Timeout**: 5 minutes (longer for local CPU/GPU inference) - **Environment Variable**: `LM_STUDIO_TIMEOUT='10m'` (optional) ### Environment Variables Reference | Variable | Required | Default | Description | | -------------------- | -------- | -------------------------- | ----------------------------------------------------- | | `LM_STUDIO_BASE_URL` | ❌ | `http://localhost:1234/v1` | LM Studio server URL | | `LM_STUDIO_MODEL` | ❌ | _(auto-discovered)_ | Force a specific model ID; blank = use loaded model | | `LM_STUDIO_API_KEY` | ❌ | - | API key — only for reverse-proxy authenticated setups | ### Provider ID and Aliases - **Provider ID**: `lm-studio` - **Aliases**: `lmstudio`, `lms` --- ## llama.cpp Configuration {#llamacpp} llama.cpp's `llama-server` is a local AI provider — it runs GGUF models entirely on your machine. No API key is required for standard (non-proxied) installations. ### Prerequisites 1. Build llama.cpp: follow the [build instructions](https://github.com/ggerganov/llama.cpp#build) 2. Download a GGUF model file (e.g., from [Hugging Face](https://huggingface.co/models?library=gguf)) 3. Start the server: ```bash # Basic usage ./llama-server -m model.gguf --port 8080 # With tool/function-call support (required for MCP tools) ./llama-server -m model.gguf --port 8080 --jinja ``` The server starts at `http://localhost:8080/v1` by default. NeuroLink auto-discovers the loaded model via `/v1/models`. ### Optional Configuration ```bash export LLAMACPP_BASE_URL="http://localhost:8080/v1" # Default; override if server is on a different host/port export LLAMACPP_MODEL="" # Blank = auto-discover; set to force a specific model ID # export LLAMACPP_API_KEY="your-key" # Only needed behind an auth-proxying reverse-proxy ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Model is auto-discovered from llama-server const result = await neurolink.generate({ input: { text: "Explain machine learning" }, provider: "llamacpp", temperature: 0.7, maxTokens: 500, }); ``` ### CLI Usage ```bash # Auto-discover loaded model npx @juspay/neurolink generate "Hello from llama.cpp" --provider llamacpp # Use alias npx @juspay/neurolink generate "Hello" --provider "llama.cpp" ``` ### Notes - **API key**: Not required for vanilla llama-server installs. Set `LLAMACPP_API_KEY` only when running behind an authenticating reverse-proxy. - **Tool support**: llama-server must be started with the `--jinja` flag to enable tool/function-call support. Without it, tool calls return a 400 error. - **Model auto-discovery**: llama-server hosts one model at a time. NeuroLink reads it from `/v1/models` automatically. - **Health check**: NeuroLink validates connectivity via the `/health` endpoint with up to 3 retries. ### Timeout Configuration - **Default Timeout**: 5 minutes (longer for local CPU/GPU inference) - **Environment Variable**: `LLAMACPP_TIMEOUT='10m'` (optional) ### Environment Variables Reference | Variable | Required | Default | Description | | ------------------- | -------- | -------------------------- | ----------------------------------------------------- | | `LLAMACPP_BASE_URL` | ❌ | `http://localhost:8080/v1` | llama-server URL | | `LLAMACPP_MODEL` | ❌ | _(auto-discovered)_ | Force a specific model ID; blank = use loaded model | | `LLAMACPP_API_KEY` | ❌ | - | API key — only for reverse-proxy authenticated setups | ### Provider ID and Aliases - **Provider ID**: `llamacpp` - **Aliases**: `llama.cpp` --- ## Redis Configuration {#redis} Redis integration for distributed conversation memory and session state. ### Basic Setup ```bash export REDIS_URL="redis://localhost:6379" ``` ### Optional Configuration ```bash export REDIS_PASSWORD="your-redis-password" # If authentication enabled export REDIS_DB="0" # Database number (default: 0) export REDIS_KEY_PREFIX="neurolink:" # Key prefix for namespacing ``` ### Advanced Configuration ```bash # Connection settings export REDIS_HOST="localhost" export REDIS_PORT="6379" export REDIS_TLS="false" # Set to "true" for TLS connections # Pool settings export REDIS_MAX_RETRIES="3" export REDIS_RETRY_DELAY="1000" # milliseconds export REDIS_CONNECTION_TIMEOUT="5000" # milliseconds ``` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink({ memory: { type: "redis", url: process.env.REDIS_URL, }, }); const result = await neurolink.generate({ input: { text: "Remember this conversation" }, sessionId: "user-123", // Session stored in Redis }); ``` ### Redis Cloud Setup For managed Redis (Redis Cloud, AWS ElastiCache, etc.): ```bash export REDIS_URL="rediss://username:password@your-redis-host:6380" ``` ### Docker Redis (Development) ```bash # Start Redis in Docker docker run -d -p 6379:6379 redis:latest # Set environment export REDIS_URL="redis://localhost:6379" ``` ### Features Enabled by Redis - **Distributed Memory**: Share conversation state across instances - **Session Persistence**: Conversations survive application restarts - **Export/Import**: Export full session history as JSON - **Multi-tenant**: Isolate conversations by session ID - **Scalability**: Handle thousands of concurrent conversations ### Environment Variables Reference | Variable | Required | Default | Description | | ------------------ | --------------- | ---------- | ------------------------- | | `REDIS_URL` | Recommended | - | Full Redis connection URL | | `REDIS_HOST` | Alternative | localhost | Redis host | | `REDIS_PORT` | Alternative | 6379 | Redis port | | `REDIS_PASSWORD` | If auth enabled | - | Redis password | | `REDIS_DB` | ❌ | 0 | Database number | | `REDIS_KEY_PREFIX` | ❌ | neurolink: | Key prefix | ## Environment File Template Create a `.env` file in your project root: ```bash # NeuroLink Environment Configuration # OpenAI OPENAI_API_KEY=sk-your-openai-key-here OPENAI_MODEL=gpt-4o # Amazon Bedrock AWS_ACCESS_KEY_ID=your-aws-access-key AWS_SECRET_ACCESS_KEY=your-aws-secret-key AWS_REGION=us-east-2 AWS_SESSION_TOKEN=your-session-token # Optional: for temporary credentials BEDROCK_MODEL=arn:aws:bedrock:us-east-2::inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0 # Google Vertex AI (choose one method) # Method 1: File path GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account.json # Method 2: JSON string (uncomment to use) # GOOGLE_SERVICE_ACCOUNT_KEY={"type":"service_account","project_id":"your-project",...} # Method 3: Individual variables (uncomment to use) # GOOGLE_AUTH_CLIENT_EMAIL=service-account@your-project.iam.gserviceaccount.com # GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY_HERE\n-----END PRIVATE KEY-----" # Required for all Google Vertex AI methods GOOGLE_VERTEX_PROJECT=your-gcp-project-id GOOGLE_VERTEX_LOCATION=us-east5 VERTEX_MODEL_ID=claude-sonnet-4@20250514 # Alternative: Gemini 3 models with extended thinking support # VERTEX_MODEL_ID=gemini-3-flash-preview # VERTEX_MODEL_ID=gemini-3-pro-preview # Google AI Studio GOOGLE_AI_API_KEY=AIza-your-googleAiStudio-key GOOGLE_AI_MODEL=gemini-2.5-pro # Anthropic ANTHROPIC_API_KEY=sk-ant-api03-your-key # Azure OpenAI AZURE_OPENAI_API_KEY=your-azure-key AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" AZURE_OPENAI_DEPLOYMENT_ID=your-deployment-name # Hugging Face HUGGINGFACE_API_KEY=hf_your_token_here HUGGINGFACE_MODEL=microsoft/DialoGPT-medium # Optional # Ollama (Local AI) OLLAMA_BASE_URL=http://localhost:11434 # Optional OLLAMA_MODEL=llama2 # Optional # Mistral AI MISTRAL_API_KEY=your_mistral_api_key MISTRAL_MODEL=mistral-small # Optional # DeepSeek DEEPSEEK_API_KEY=sk-your-deepseek-key DEEPSEEK_MODEL=deepseek-chat # Optional (deepseek-chat or deepseek-reasoner) # NVIDIA NIM NVIDIA_NIM_API_KEY=nvapi-your-nvidia-key NVIDIA_NIM_MODEL=meta/llama-3.3-70b-instruct # Optional # LM Studio (local — no API key required) LM_STUDIO_BASE_URL=http://localhost:1234/v1 # Optional # llama.cpp (local — no API key required) LLAMACPP_BASE_URL=http://localhost:8080/v1 # Optional # Application Settings DEFAULT_PROVIDER=auto NEUROLINK_DEBUG=false ``` ## Provider Priority and Fallback ### Automatic Provider Selection NeuroLink automatically selects the best available provider when no provider is specified: ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Automatically selects best available provider const result = await neurolink.generate({ input: { text: "Hello, world!" }, }); ``` ### Provider Priority Order The default priority order (most reliable first): 1. **OpenAI** - Most reliable, fastest setup 2. **Anthropic** - High quality, simple setup 3. **Google AI Studio** - Free tier, easy setup 4. **Azure OpenAI** - Enterprise reliable 5. **Google Vertex AI** - Good performance, multiple auth methods 6. **Mistral AI** - European compliance, competitive pricing 7. **Hugging Face** - Open source variety 8. **Amazon Bedrock** - High quality, requires careful setup 9. **Ollama** - Local only, no fallback ### Specifying Provider and Model ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Explicitly specify provider and model const result = await neurolink.generate({ input: { text: "Hello" }, provider: "bedrock", model: "anthropic.claude-3-sonnet-20240229-v1:0", }); ``` ### Environment-Based Selection ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Different providers for different environments const result = await neurolink.generate({ input: { text: "Hello" }, provider: process.env.NODE_ENV === "production" ? "bedrock" : "openai", model: process.env.NODE_ENV === "production" ? undefined : "gpt-4o-mini", }); ``` ## Testing Provider Configuration ### CLI Status Check ```bash # Test all providers npx @juspay/neurolink status --verbose # Expected output: # 🔍 Checking AI provider status... # ✅ openai: ✅ Working (234ms) # ❌ bedrock: ❌ Invalid credentials - The security token included in the request is expired # ⚪ vertex: ⚪ Not configured - Missing environment variables ``` ### Programmatic Testing ```typescript import { NeuroLink } from "@juspay/neurolink"; async function testProviders() { const providers = [ "openai", "bedrock", "vertex", "anthropic", "azure", "google-ai", "huggingface", "ollama", "mistral", ]; const neurolink = new NeuroLink(); for (const providerName of providers) { try { const start = Date.now(); const result = await neurolink.generate({ input: { text: "Test" }, provider: providerName, maxTokens: 10, }); console.log(`✅ ${providerName}: Working (${Date.now() - start}ms)`); } catch (error) { console.log(`❌ ${providerName}: ${error.message}`); } } } testProviders(); ``` ## Common Configuration Issues ### OpenAI Issues ``` Error: Cannot find API key for OpenAI provider ``` **Solution**: Set `OPENAI_API_KEY` environment variable ### Bedrock Issues ``` Your account is not authorized to invoke this API operation ``` **Solutions**: 1. Use full inference profile ARN (not simple model name) 2. Check AWS account has Bedrock access 3. Verify IAM permissions include `bedrock:InvokeModel` 4. Ensure model access is enabled in your AWS region ### Vertex AI Issues ``` Cannot find package '@google-cloud/vertexai' ``` **Solution**: Install peer dependency: `npm install @google-cloud/vertexai` ``` Authentication failed ``` **Solutions**: 1. Verify service account JSON is valid 2. Check project ID is correct 3. Ensure Vertex AI API is enabled 4. Verify service account has proper permissions ## Security Best Practices ### Environment Variables - Never commit API keys to version control - Use different keys for development/staging/production - Rotate keys regularly - Use minimal permissions for service accounts ### AWS Security - Use IAM roles instead of access keys when possible - Enable CloudTrail for audit logging - Use VPC endpoints for additional security - Implement resource-based policies ### Google Cloud Security - Use service account keys with minimal permissions - Enable audit logging - Use VPC Service Controls for additional isolation - Rotate service account keys regularly ### General Security - Use environment-specific configurations - Implement rate limiting in your applications - Monitor usage and costs - Use HTTPS for all API communications --- ## OpenAI TTS Configuration {#openai-tts} OpenAI TTS provides text-to-speech synthesis using the same API key as the OpenAI LLM provider. No additional credentials are required. ### Basic Setup ```bash export OPENAI_API_KEY="sk-your-openai-api-key" ``` **Note:** `OPENAI_API_KEY` is shared with the OpenAI LLM provider. No separate key is needed. ### Supported Models - `tts-1` (default) - Optimized for speed, lower latency - `tts-1-hd` - Optimized for quality, higher fidelity audio ### Supported Voices `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer` ### Supported Output Formats `mp3` (default), `opus`, `wav`, `ogg` ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Hello, world!" }, tts: { enabled: true, provider: "openai-tts", voice: "alloy", format: "mp3", }, }); ``` ### CLI Usage ```bash npx @juspay/neurolink generate "Hello, world!" --tts --tts-provider openai-tts ``` ### Environment Variables Reference | Variable | Required | Default | Description | | ---------------- | -------- | ------- | ----------------------------------- | | `OPENAI_API_KEY` | ✅ | - | Shared with the OpenAI LLM provider | ### Provider ID and Aliases - **Provider ID**: `openai-tts` --- ## ElevenLabs Configuration {#elevenlabs} ElevenLabs provides high-quality, multilingual text-to-speech synthesis with a wide selection of voices and voice cloning support. ### Basic Setup ```bash export ELEVENLABS_API_KEY="your-elevenlabs-api-key" ``` ### How to Get ElevenLabs API Key 1. Visit [ElevenLabs](https://elevenlabs.io) 2. Sign up or log in to your account 3. Navigate to **Profile → API Key** 4. Copy the key ### Supported Models - `eleven_multilingual_v2` (default) - Best quality, 29 languages - `eleven_turbo_v2_5` - Low-latency streaming, 32 languages - `eleven_flash_v2_5` - Fastest, suitable for real-time applications ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Bonjour le monde!" }, tts: { enabled: true, provider: "elevenlabs", voice: "Rachel", model: "eleven_multilingual_v2", }, }); ``` ### CLI Usage ```bash npx @juspay/neurolink generate "Hello, world!" --tts --tts-provider elevenlabs ``` ### Notes - **Multilingual support**: ElevenLabs models support up to 32 languages with natural prosody - **Voice cloning**: ElevenLabs supports custom voice IDs from your ElevenLabs account ### Environment Variables Reference | Variable | Required | Default | Description | | -------------------- | -------- | ------- | ------------------ | | `ELEVENLABS_API_KEY` | ✅ | - | ElevenLabs API key | ### Provider ID and Aliases - **Provider ID**: `elevenlabs` --- ## Deepgram STT Configuration {#deepgram} Deepgram provides fast, accurate speech-to-text transcription with support for real-time streaming and pre-recorded audio. ### Basic Setup ```bash export DEEPGRAM_API_KEY="your-deepgram-api-key" ``` ### How to Get Deepgram API Key 1. Visit [Deepgram Console](https://console.deepgram.com) 2. Sign up or log in to your account 3. Navigate to **API Keys** 4. Click **Create a New API Key** 5. Copy the key ### Supported Models - `nova-3` (default) - Latest, highest accuracy - `nova-2` - High accuracy, broad language support - `base` - Balanced accuracy and speed ### Usage Example ```typescript import { NeuroLink } from "@juspay/neurolink"; import { readFileSync } from "fs"; const neurolink = new NeuroLink(); const audioBuffer = readFileSync("audio.wav"); const result = await neurolink.generate({ input: { text: "Respond to what was said" }, stt: { enabled: true, provider: "deepgram", audio: audioBuffer, model: "nova-3", language: "en", }, }); ``` ### CLI Usage ```bash npx @juspay/neurolink generate "Respond to this" --stt --stt-provider deepgram --input-audio file.wav ``` ### Notes - **Streaming transcription**: Deepgram supports real-time audio streaming for live transcription - **Language support**: Deepgram nova models support 30+ languages ### Environment Variables Reference | Variable | Required | Default | Description | | ------------------ | -------- | ------- | ---------------- | | `DEEPGRAM_API_KEY` | ✅ | - | Deepgram API key | ### Provider ID and Aliases - **Provider ID**: `deepgram` (STT only — Deepgram's TTS product is not wired today) --- ## Whisper Configuration {#whisper} Whisper is OpenAI's speech-to-text model — registered as the provider id `whisper`. It accepts MP3, WAV, M4A, and FLAC inputs up to 25 MB. ```bash # Required environment variable OPENAI_API_KEY=sk-... ``` Get your API key from: **OpenAI Platform** > **API Keys**. ### Usage ```typescript const result = await neurolink.generate({ input: { text: "Repeat what was said" }, provider: "openai", stt: { enabled: true, provider: "whisper", audio: audioBuffer, format: "mp3", }, }); console.log(result.transcription?.text); ``` ### CLI ```bash neurolink generate "Repeat what was said" \ --provider openai \ --stt --stt-provider whisper --input-audio ./audio.mp3 ``` ### Provider ID - **Provider ID**: `whisper` --- ## Azure Speech Configuration {#azure-speech} Azure Cognitive Services Speech provides both TTS (`azure-tts`) and STT (`azure-stt`). ```bash # Required environment variables AZURE_SPEECH_KEY=your-speech-key AZURE_SPEECH_REGION=eastus ``` Get credentials from: **Azure Portal** > **Cognitive Services** > **Speech** > **Keys and Endpoint**. ### TTS Usage ```typescript const result = await neurolink.generate({ input: { text: "Hello world" }, tts: { enabled: true, provider: "azure-tts", voice: "en-US-JennyNeural", format: "mp3", }, }); ``` ### STT Usage > **MP3 not supported** — Azure's short-audio REST endpoint only decodes WAV > PCM and Ogg/Opus. Passing `format: "mp3"` to `azure-stt` throws > `STT_INVALID_AUDIO_FORMAT` early. Convert with > `ffmpeg -i in.mp3 -ar 16000 -ac 1 out.wav` first. ```typescript const result = await neurolink.generate({ input: { text: "" }, provider: "openai", stt: { enabled: true, provider: "azure-stt", audio: wavBuffer, format: "wav", language: "en-US", }, }); ``` ### Provider IDs - **TTS**: `azure-tts` - **STT**: `azure-stt` --- ## Fish Audio TTS Configuration {#fish-audio} Low-cost TTS provider focused on voice cloning. Wrapped as a TTSHandler so it slots into the same `generate({ tts: { provider: "fish-audio" } })` flow as OpenAI / ElevenLabs / Azure / Google AI TTS. ```bash # Required FISH_AUDIO_API_KEY=your-fish-audio-api-key # Optional: default voice (any reference_id from the Fish library) # FISH_AUDIO_VOICE_ID=... # Optional: base URL override # FISH_AUDIO_BASE_URL=https://api.fish.audio ``` Get an API key from [fish.audio](https://fish.audio/) → dashboard. ### Usage ```typescript const result = await neurolink.generate({ input: { text: "Hello world from Fish Audio" }, provider: "openai", tts: { enabled: true, provider: "fish-audio", format: "mp3", }, }); ``` - **Provider ID**: `fish-audio` - **Default model**: `s1` (override via `tts.model`: `speech-1.5`, `speech-1.6`, `s1`) - **Max text length**: 5000 characters - **Output formats**: `mp3` (default, 44.1 kHz), `wav` (44.1 kHz), `pcm16` (raw, 44.1 kHz) - **Languages**: 14 (English, Mandarin, Cantonese, Japanese, Korean, French, German, Spanish, Italian, Portuguese, Russian, Arabic, Hindi, Indonesian) - **Voice cloning**: 15 s of reference audio → custom `reference_id` Full guide: [Fish Audio TTS Provider](./providers/fish-audio.md). --- ## Cartesia TTS Configuration {#cartesia} Low-latency TTS provider running Cartesia's Sonic models. The synchronous `/tts/bytes` endpoint is wrapped as a TTSHandler; the realtime WebSocket flow is exposed separately as `CartesiaStream` for the voice server. ```bash # Required CARTESIA_API_KEY=sk_car_... # Optional: default voice id (any voice from your Cartesia library) # CARTESIA_VOICE_ID=... # Optional: model override (default sonic-2) # CARTESIA_MODEL=sonic-2 # Optional: API version header (default 2025-04-16) # CARTESIA_API_VERSION=2025-04-16 # Optional: base URL override # CARTESIA_BASE_URL=https://api.cartesia.ai ``` Get an API key from [play.cartesia.ai/keys](https://play.cartesia.ai/keys). ### Usage ```typescript const result = await neurolink.generate({ input: { text: "Hello world from Cartesia Sonic" }, provider: "openai", tts: { enabled: true, provider: "cartesia", format: "mp3", }, }); ``` - **Provider ID**: `cartesia` - **Default model**: `sonic-2` (also `sonic`) - **Default voice**: `694f9389-aac1-45b6-b726-9d9369183238` ("Bright Female", English) - **Max text length**: 5000 characters - **Output formats**: `mp3` (default, 44.1 kHz), `wav` (PCM s16le @ 44.1 kHz), `pcm16` (raw, 24 kHz) - **Streaming**: synchronous via this handler; WebSocket via `CartesiaStream` adapter Full guide: [Cartesia TTS Provider](./providers/cartesia.md). --- ## Google Speech Configuration {#google-speech} Covers both Google Cloud TTS (`google-tts` / via `google-ai`) and Google Cloud Speech-to-Text (`google-stt`). Both share the same service-account credentials. ```bash # Required environment variable GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json # OR (for TTS only) an API key GOOGLE_API_KEY=AIza... ``` > **Speech-to-Text API must be enabled** in your Google Cloud project for > `google-stt` to work. Enable it at > [console.cloud.google.com/apis/library/speech.googleapis.com](https://console.cloud.google.com/apis/library/speech.googleapis.com). ### TTS Usage ```typescript const result = await neurolink.generate({ input: { text: "Hello world" }, tts: { enabled: true, provider: "google-ai", voice: "en-US-Neural2-A", format: "mp3", }, }); ``` ### STT Usage ```typescript const result = await neurolink.generate({ input: { text: "" }, provider: "openai", stt: { enabled: true, provider: "google-stt", audio: audioBuffer, format: "mp3", }, }); ``` ### Provider IDs - **TTS**: `google-ai` (or `google-tts` alias) - **STT**: `google-stt` --- ## OpenAI Realtime Configuration {#openai-realtime} Real-time voice via the OpenAI Realtime WebSocket API. Provider id `openai-realtime` is registered for future use; the typical pattern is to launch the integrated voice server (`neurolink serve voice`) which wires this through Soniox/Cartesia. ```bash OPENAI_API_KEY=sk-... ``` ### Provider ID - **Provider ID**: `openai-realtime` - **Audio chunk format**: `pcm16` — raw 16-bit PCM at 24 kHz, **NOT** WAV-headered. Do not pass these chunks to a WAV duration parser. --- ## Gemini Live Configuration {#gemini-live} Real-time voice via Google's Gemini Live WebSocket API. Provider id `gemini-live` is registered for future use. ```bash GOOGLE_API_KEY=AIza... # OR GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json ``` ### Provider ID - **Provider ID**: `gemini-live` --- ## Streaming + Voice Patterns {#streaming-voice} ### `stream()` + STT (transcribe before stream) ```typescript const audio = readFileSync("./recording.mp3"); const r = await neurolink.stream({ input: { text: "" }, provider: "openai", stt: { enabled: true, provider: "whisper", audio, format: "mp3" }, }); console.log("transcription:", r.transcription?.text); // available before iterating for await (const chunk of r.stream) { if ("content" in chunk) process.stdout.write(chunk.content); } ``` ### `stream()` + TTS Mode 2 (synthesise the streamed reply) Two ergonomic options — both deliver byte-identical audio: ```typescript const r = await neurolink.stream({ input: { text: "Tell me a fact." }, provider: "openai", tts: { enabled: true, useAiResponse: true, provider: "openai-tts", format: "mp3", }, }); // --- Option A: collect inline while iterating --- const audioBufs: Buffer[] = []; for await (const c of r.stream) { if ("content" in c) process.stdout.write(c.content); else if (c.type === "audio") audioBufs.push(c.audio.data); } writeFileSync("./out.mp3", Buffer.concat(audioBufs)); // --- Option B: ergonomic Promise — read after the stream completes --- const tts = await r.audio; // resolves to TTSResult or undefined if (tts) writeFileSync("./out.mp3", tts.buffer); ``` When `tts.useAiResponse` is `false` (Mode 1) or TTS is not enabled, `r.audio` resolves to `undefined` rather than hanging. --- [← Back to Main README](../index.md) | [Next: API Reference →](./api-reference.md)