# Brainiall — AI APIs for Speech, Text, Image & LLM Inference > Brainiall provides production-ready AI APIs: pronunciation assessment (17MB ONNX, sub-500ms), text-to-speech (12 voices, sub-1s), speech-to-text (compact + Whisper multilingual), NLP suite (toxicity, sentiment, NER, PII, language detection — all CPU, sub-50ms), image processing (background removal, 4x upscaling, face restoration — GPU), and an OpenAI-compatible LLM gateway with 113+ models. All APIs are also available as MCP servers for AI agents. **Base URL:** `https://apim-ai-apis.azure-api.net` **Authentication:** Include ONE of these headers in every request: - `Ocp-Apim-Subscription-Key: YOUR_KEY` - `Authorization: Bearer YOUR_KEY` - `api-key: YOUR_KEY` **Get API keys:** Visit https://app.brainiall.com — sign in with GitHub, purchase credits, create API key. --- ## Pronunciation Assessment API Evaluates spoken pronunciation against reference text using a 17MB ONNX CTC model with GOP (Goodness of Pronunciation) scoring. Returns overall score (0-100), per-word scores, per-phoneme scores (39 ARPAbet phonemes), confidence, and audio quality metrics. Sub-500ms latency on GPU. ### POST /v1/pronunciation/assess/base64 Request: ```json { "audio": "", "text": "Hello world", "format": "wav" } ``` Response: ```json { "overallScore": 87, "sentenceScore": 85, "confidence": 0.94, "decodedTranscript": "hello world", "words": [ { "word": "hello", "score": 90, "phonemes": [ {"phoneme": "HH", "score": 92}, {"phoneme": "AH", "score": 88}, {"phoneme": "L", "score": 91}, {"phoneme": "OW", "score": 89} ] }, { "word": "world", "score": 84, "phonemes": [ {"phoneme": "W", "score": 86}, {"phoneme": "ER", "score": 82}, {"phoneme": "L", "score": 85}, {"phoneme": "D", "score": 83} ] } ], "audioQuality": { "peakDb": -6.2, "rmsDb": -18.4, "durationMs": 1800, "snrDb": 32.5 }, "metadata": {"processingTimeMs": 312} } ``` Notes: - `overallScore`: 0-100, higher is better. Adjusted by confidence penalty. - Audio mismatches (wrong audio for text) produce near-zero scores. - Min audio: 0.5s. Max: 60s. Min SNR: 15dB. ### POST /v1/pronunciation/assess Multipart form-data variant. Fields: `audio` (file), `text` (string), `format` (string). ### POST /v1/pronunciation/assess/batch Assess up to 50 clips in one call. Request: ```json { "items": [ {"audio": "", "text": "Hello", "format": "wav", "id": "item-1"}, {"audio": "", "text": "World", "id": "item-2"} ] } ``` Response: ```json { "results": [ {"id": "item-1", "success": true, "result": {"overallScore": 90, ...}}, {"id": "item-2", "success": false, "error": "Audio too short"} ], "totalItems": 2, "successCount": 1, "errorCount": 1 } ``` ### POST /v1/pronunciation/transcribe/base64 Transcribe audio with word-level timestamps. Optionally include `text` field to get embedded pronunciation assessment. Request: ```json {"audio": "", "format": "wav", "include_timestamps": true} ``` Response: ```json { "text": "hello world", "words": [ {"word": "hello", "start": 0.12, "end": 0.65, "confidence": 0.98}, {"word": "world", "start": 0.72, "end": 1.21, "confidence": 0.96} ], "audioDurationMs": 1800 } ``` ### GET /v1/pronunciation/health ```json {"status": "healthy", "modelLoaded": true, "version": "1.0.0", "modelSize": "17MB"} ``` --- ## Text-to-Speech (TTS) API Kokoro-based TTS engine. 12 English voices (American and British accents). Outputs 24kHz WAV. Sub-1 second latency. ### POST /v1/tts/synthesize Request: ```json { "text": "Hello, welcome to Brainiall.", "voice": "af_heart", "speed": 1.0, "format": "wav" } ``` - `text`: 1-5000 characters (required) - `voice`: Voice ID (default: `af_heart`) - `speed`: 0.25-4.0 (default: 1.0) Response: Binary `audio/wav` data (24kHz, 16-bit PCM). Headers: `X-Audio-Duration-Ms`, `X-Voice`, `X-Text-Length`. ### GET /v1/tts/voices ```json { "voices": [ {"id": "af_heart", "name": "Heart", "gender": "female", "accent": "american"}, {"id": "af_bella", "name": "Bella", "gender": "female", "accent": "american"}, {"id": "af_nicole", "name": "Nicole", "gender": "female", "accent": "american"}, {"id": "af_sarah", "name": "Sarah", "gender": "female", "accent": "american"}, {"id": "af_sky", "name": "Sky", "gender": "female", "accent": "american"}, {"id": "am_adam", "name": "Adam", "gender": "male", "accent": "american"}, {"id": "am_michael", "name": "Michael", "gender": "male", "accent": "american"}, {"id": "bf_emma", "name": "Emma", "gender": "female", "accent": "british"}, {"id": "bf_isabella", "name": "Isabella", "gender": "female", "accent": "british"}, {"id": "bm_george", "name": "George", "gender": "male", "accent": "british"}, {"id": "bm_lewis", "name": "Lewis", "gender": "male", "accent": "british"}, {"id": "bm_daniel", "name": "Daniel", "gender": "male", "accent": "british"} ], "defaultVoice": "af_heart" } ``` Voice ID format: `af_` = American female, `am_` = American male, `bf_` = British female, `bm_` = British male. --- ## Speech-to-Text (STT) APIs ### Compact STT (English, fast) POST /v1/stt/transcribe/base64 — Same 17MB model as pronunciation. Word-level timestamps. Best for short English utterances. Request: ```json {"audio": "", "include_timestamps": true} ``` ### Whisper Pro (99 languages, speaker diarization) POST /v1/whisper/transcribe/base64 — Whisper large-v3-turbo (809M params). Supports speaker diarization via pyannote. Request: ```json { "audio": "", "language": "en", "diarize": false, "format": "wav" } ``` - `language`: BCP-47 code (auto-detected if omitted). Examples: `en`, `pt`, `es`, `fr`, `de`, `ja`, `zh`, `ar`. - `diarize`: `true` to identify speakers (adds `speaker` field to each word). Response: ```json { "text": "Hello, this is a test.", "words": [ {"word": "Hello", "start": 0.08, "end": 0.42, "confidence": 0.99}, {"word": "this", "start": 0.50, "end": 0.72, "confidence": 0.98} ], "metadata": {"language": "en", "languageProbability": 0.998, "processingTimeMs": 1840} } ``` With diarization, each word includes `"speaker": "SPEAKER_00"` etc. --- ## NLP API CPU-only NLP suite using ONNX Runtime. All sub-50ms latency. No GPU required. Max text: 10,000 characters. ### POST /v1/nlp/toxicity 6-category toxicity detection (BERT-based). Sub-15ms. Request: `{"text": "Your text here"}` Response: ```json { "toxic": 0.004, "severe_toxic": 0.001, "obscene": 0.002, "threat": 0.001, "insult": 0.003, "identity_hate": 0.001, "is_toxic": false } ``` ### POST /v1/nlp/sentiment Sentiment analysis. Models: `general` (default), `financial`, `twitter`. Sub-10ms. Request: `{"text": "I love this!", "model": "general"}` Response: ```json {"label": "positive", "score": 0.9987, "scores": {"positive": 0.9987, "negative": 0.0013}} ``` ### POST /v1/nlp/entities Named entity recognition (BERT-NER). Identifies PER, ORG, LOC, MISC. Sub-50ms. Request: `{"text": "Elon Musk founded Tesla in California."}` Response: ```json { "entities": [ {"text": "Elon Musk", "label": "PER", "start": 0, "end": 9, "score": 0.999}, {"text": "Tesla", "label": "ORG", "start": 18, "end": 23, "score": 0.998}, {"text": "California", "label": "LOC", "start": 27, "end": 37, "score": 0.997} ], "count": 3 } ``` ### POST /v1/nlp/pii PII detection (BERT + regex). Types: EMAIL, PHONE, SSN, CREDIT_CARD, IP, PERSON. Optional redaction. Request: `{"text": "Email john@example.com or call +1-555-0123", "redact": true}` Response: ```json { "pii_found": [ {"text": "john@example.com", "type": "EMAIL", "start": 6, "end": 22}, {"text": "+1-555-0123", "type": "PHONE", "start": 31, "end": 42} ], "has_pii": true, "redacted_text": "Email [EMAIL] or call [PHONE]" } ``` ### POST /v1/nlp/language Language detection (fastText). 176 languages. Sub-1ms. Request: `{"text": "Bonjour, comment allez-vous?", "top_k": 3}` Response: ```json { "language": "fr", "confidence": 0.998, "predictions": [{"language": "fr", "confidence": 0.998}, {"language": "ca", "confidence": 0.001}] } ``` --- ## Image Processing API GPU-accelerated (NVIDIA A10, 24GB VRAM). Max upload: 25MB. Max dimension: 4096px. Supports PNG, JPEG, WebP. ### POST /v1/image/remove-background/base64 BiRefNet segmentation. Sub-500ms. Handles complex edges (hair, fur, transparent objects). Request: ```json {"image": "", "output_format": "png", "return_mask": false} ``` Response: ```json {"image_base64": "", "format": "png", "original_size": {"width": 1024, "height": 768}, "processing_ms": 380} ``` Multipart variant: POST /v1/image/remove-background (field: `file`). ### POST /v1/image/upscale/base64 Real-ESRGAN 4x enhancement. 2-3s on GPU. Max output: 8192x8192. Request: `{"image": "", "scale": 4}` Response: ```json {"image": "", "format": "png", "width": 4096, "height": 3072, "scale": 4, "processing_time_ms": 2840} ``` ### POST /v1/image/restore-face/base64 GFPGAN face restoration. Detects all faces, restores quality, optional background enhancement. Request: `{"image": "", "upscale": 2, "enhance_background": true}` Response: ```json {"image": "", "format": "png", "width": 2048, "height": 2048, "processing_time_ms": 1920} ``` --- ## LLM Gateway (OpenAI-compatible) 113+ models with competitive pricing. Fully compatible with OpenAI SDK, LiteLLM, LangChain, Cline, Cursor, Aider, and any OpenAI-compatible tool. ### POST /v1/chat/completions Request: ```json { "model": "claude-sonnet", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2 + 2?"} ], "temperature": 0.7, "max_tokens": 1024, "stream": false } ``` Supported fields: `model`, `messages`, `temperature`, `top_p`, `max_tokens`, `stream`, `stream_options`, `tools`, `tool_choice`, `response_format` (`json_object` or `json_schema`), `stop`, `seed`, `reasoning_effort`. Response: ```json { "id": "chatcmpl-abc123", "choices": [{"message": {"role": "assistant", "content": "2 + 2 = 4."}, "finish_reason": "end_turn"}], "usage": {"prompt_tokens": 28, "completion_tokens": 9, "total_tokens": 37} } ``` Streaming: Set `stream: true`. Returns SSE events ending with `data: [DONE]`. ### GET /v1/models Returns all available models with pricing, context length, and capabilities. ### POST /v1/embeddings Request: `{"input": "The quick brown fox", "model": "titan-embed-v2"}` ### Popular Model Aliases **Claude:** `claude-opus` ($5/$25), `claude-sonnet` ($3/$15), `claude-haiku` ($1/$5), `opus-4-5` ($15/$75) **DeepSeek:** `deepseek-r1` ($1.35/$5.40, reasoning), `deepseek-v3` ($0.27/$1.10) **Llama:** `llama-3.3-70b` ($0.72), `llama-4-scout` ($0.17), `llama-4-maverick` ($0.17) **Amazon Nova:** `nova-pro` ($0.80/$3.20), `nova-lite` ($0.06/$0.24), `nova-micro` ($0.035/$0.14) **Mistral:** `mistral-large-3` ($2/$6), `devstral-2` ($0.50/$1.50), `ministral-3b` ($0.04) **Qwen:** `qwen3-32b` ($0.35), `qwen3-coder-next` ($0.50), `qwen3-vl-235b` ($1/$5) **MiniMax:** `minimax-m2` ($1/$5) **Kimi:** `kimi-k2.5` ($0.60/$2.40) **Embeddings:** `titan-embed-v2` ($0.02), `embed-v4` ($0.10) Prices in $/MTok (input/output). Full list with 113+ models at GET /v1/models. --- ## MCP Servers (for AI Agents) All use Streamable HTTP transport. Required header: `Accept: application/json, text/event-stream`. ### Speech AI MCP — 10 tools URL: `https://apim-ai-apis.azure-api.net/mcp/pronunciation/mcp` Tools: `assess_pronunciation`, `transcribe_audio`, `synthesize_speech`, `list_tts_voices`, `transcribe_audio_pro` (Whisper 99 languages), `get_phoneme_inventory`, `check_pronunciation_service`, `check_stt_service`, `check_tts_service`, `check_whisper_service`. Resources: scoring-guide, audio-requirements, model-info, response-schema, example-assessment, stt-usage-guide, tts-usage-guide, whisper-usage-guide. Prompts: `analyze_pronunciation`, `create_improvement_plan`, `compare_attempts`. ### NLP Tools MCP — 6 tools URL: `https://apim-ai-apis.azure-api.net/mcp/nlp/mcp` Tools: `analyze_toxicity`, `analyze_sentiment`, `extract_entities`, `detect_pii`, `detect_language`, `check_nlp_service`. Resources: capabilities, supported-languages, pricing. Prompts: `content_moderation_pipeline`, `text_analysis_report`, `pii_compliance_audit`. ### Image Tools MCP — 4 tools URL: `https://apim-ai-apis.azure-api.net/mcp/image/mcp` Tools: `remove_background`, `upscale_image`, `restore_face`, `check_image_service`. Resources: capabilities, pricing, supported-formats. Prompts: `image_processing_workflow`, `batch_image_pipeline`. ### MCP Configuration (Claude Desktop / Cursor / Cline) ```json { "mcpServers": { "brainiall-speech": { "url": "https://apim-ai-apis.azure-api.net/mcp/pronunciation/mcp", "headers": {"Ocp-Apim-Subscription-Key": "YOUR_KEY"} }, "brainiall-nlp": { "url": "https://apim-ai-apis.azure-api.net/mcp/nlp/mcp", "headers": {"Ocp-Apim-Subscription-Key": "YOUR_KEY"} }, "brainiall-image": { "url": "https://apim-ai-apis.azure-api.net/mcp/image/mcp", "headers": {"Ocp-Apim-Subscription-Key": "YOUR_KEY"} } } } ``` --- ## SDK Quick Start Examples ### Python — LLM Gateway (OpenAI SDK) ```python from openai import OpenAI client = OpenAI( base_url="https://apim-ai-apis.azure-api.net/v1", api_key="YOUR_KEY" ) # Non-streaming response = client.chat.completions.create( model="claude-sonnet", messages=[{"role": "user", "content": "Explain quantum computing in 3 sentences."}] ) print(response.choices[0].message.content) # Streaming stream = client.chat.completions.create( model="claude-haiku", messages=[{"role": "user", "content": "Write a haiku about AI."}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ### Python — Pronunciation Assessment ```python import requests, base64 audio_b64 = base64.b64encode(open("recording.wav", "rb").read()).decode() response = requests.post( "https://apim-ai-apis.azure-api.net/v1/pronunciation/assess/base64", headers={"Ocp-Apim-Subscription-Key": "YOUR_KEY"}, json={"audio": audio_b64, "text": "The quick brown fox", "format": "wav"} ) result = response.json() print(f"Overall: {result['overallScore']}/100") for word in result["words"]: print(f" {word['word']}: {word['score']}/100") for ph in word["phonemes"]: print(f" /{ph['phoneme']}/: {ph['score']}/100") ``` ### Python — Text-to-Speech ```python import requests response = requests.post( "https://apim-ai-apis.azure-api.net/v1/tts/synthesize", headers={"Ocp-Apim-Subscription-Key": "YOUR_KEY"}, json={"text": "Welcome to Brainiall speech AI.", "voice": "bf_emma", "speed": 1.0} ) with open("output.wav", "wb") as f: f.write(response.content) ``` ### Python — NLP Pipeline ```python import requests headers = {"Ocp-Apim-Subscription-Key": "YOUR_KEY"} base = "https://apim-ai-apis.azure-api.net/v1/nlp" text = "John Smith from Google called about the $50M deal. His email is john@google.com." # Detect language lang = requests.post(f"{base}/language", headers=headers, json={"text": text}).json() print(f"Language: {lang['language']} ({lang['confidence']:.1%})") # Sentiment sent = requests.post(f"{base}/sentiment", headers=headers, json={"text": text}).json() print(f"Sentiment: {sent['label']} ({sent['score']:.1%})") # Entities ents = requests.post(f"{base}/entities", headers=headers, json={"text": text}).json() for e in ents["entities"]: print(f"Entity: {e['text']} ({e['label']})") # PII with redaction pii = requests.post(f"{base}/pii", headers=headers, json={"text": text, "redact": True}).json() print(f"Redacted: {pii['redacted_text']}") ``` ### Python — Image Background Removal ```python import requests, base64 image_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode() response = requests.post( "https://apim-ai-apis.azure-api.net/v1/image/remove-background/base64", headers={"Ocp-Apim-Subscription-Key": "YOUR_KEY"}, json={"image": image_b64, "output_format": "png"} ) result = response.json() with open("no_bg.png", "wb") as f: f.write(base64.b64decode(result["image_base64"])) ``` ### Node.js — LLM Gateway ```javascript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://apim-ai-apis.azure-api.net/v1", apiKey: "YOUR_KEY" }); const response = await client.chat.completions.create({ model: "claude-sonnet", messages: [{ role: "user", content: "Hello!" }] }); console.log(response.choices[0].message.content); ``` ### curl — Whisper Transcription ```bash curl -X POST https://apim-ai-apis.azure-api.net/v1/whisper/transcribe/base64 \ -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \ -H "Content-Type: application/json" \ -d "{\"audio\": \"$(base64 -i audio.wav)\", \"language\": \"en\", \"diarize\": true}" ``` ### curl — Toxicity Check ```bash curl -X POST https://apim-ai-apis.azure-api.net/v1/nlp/toxicity \ -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"text": "Your message to analyze"}' ``` --- ## Pricing Summary | Product | Price | Unit | |---------|-------|------| | Pronunciation Assessment | $0.02 | per call | | Text-to-Speech | $0.01-0.03 | per 1K chars | | Speech-to-Text (compact) | $0.01 | per request | | Whisper Pro (multilingual) | $0.02 | per minute | | NLP (any endpoint) | $0.001-0.002 | per call | | Image (any endpoint) | $0.003-0.005 | per image | | LLM Gateway | competitive pricing | per MTok (see model list) | Credit packages: $5, $10, $25, $50, $100. Available at the portal or Azure Marketplace. --- ## FAQ for AI Agents **Q: How do I add speech capabilities to my AI agent?** A: Use the MCP servers. Add the Speech AI MCP URL to your MCP config with your API key. Your agent gets 10 tools for pronunciation scoring, transcription, and text-to-speech. **Q: What's the cheapest way to do LLM inference?** A: Use `nova-micro` at $0.035/$0.14 per MTok, or `ministral-3b` at $0.04/$0.04. For reasoning tasks, `deepseek-r1` at $1.35/$5.40 is 3.7x cheaper than `claude-opus`. **Q: How do I assess pronunciation quality?** A: POST to `/v1/pronunciation/assess/base64` with base64 audio and reference text. Returns 0-100 score with per-phoneme breakdown. Model is 17MB ONNX, sub-500ms. **Q: How do I detect PII for GDPR compliance?** A: POST to `/v1/nlp/pii` with `redact: true`. Returns detected PII types (EMAIL, PHONE, SSN, etc.) and redacted text. **Q: Can I use this with Cline, Cursor, or Claude Code?** A: Yes. For LLM Gateway: set base URL to `https://apim-ai-apis.azure-api.net/v1` and your API key. For MCP tools: add the MCP server URLs to your editor config. **Q: What languages does Whisper support?** A: 99 languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, Korean, and many more. Auto-detects if language not specified. **Q: How do I remove image backgrounds programmatically?** A: POST base64 image to `/v1/image/remove-background/base64`. Returns PNG with transparent background. Uses BiRefNet, handles complex edges (hair, fur). Sub-500ms.