# langchain-brainiall — Full Documentation ## Overview langchain-brainiall is a LangChain integration package for the Brainiall LLM Gateway. It provides `ChatBrainiall` (chat models) and `BrainiallEmbeddings` (embeddings) classes that give LangChain users access to 113+ AI models from 17 providers (Anthropic, DeepSeek, Meta, Qwen, Mistral, Amazon, NVIDIA, MiniMax, Moonshot, and more) through a single OpenAI-compatible API. Key benefits: - **One API key, 113+ models**: Access Claude, DeepSeek, Llama, Qwen, Mistral, Nova, and more - **Drop-in replacement**: Swap `ChatOpenAI` for `ChatBrainiall` with zero code changes - **Full LangChain compatibility**: Streaming, tool calling, structured output, async, batching, RAG, agents - **Cost optimization**: Use cheap models ($0.035/MTok) for drafting, powerful models ($5/MTok) for refinement - **Enterprise-grade reliability**: response caching, automatic failover, low-latency routing ## Installation ```bash pip install langchain-brainiall ``` For development with all extras: ```bash pip install langchain-brainiall langchain-chroma faiss-cpu langgraph ``` ## Quick Start ```python from langchain_brainiall import ChatBrainiall llm = ChatBrainiall( model="claude-sonnet-4-6", api_key="your-api-key", # or set BRAINIALL_API_KEY env var ) response = llm.invoke("Explain quantum computing in one sentence.") print(response.content) ``` ## Environment Variables | Variable | Description | |----------|-------------| | `BRAINIALL_API_KEY` | API key for authentication (required) | | `BRAINIALL_API_BASE` | Override the default API base URL (optional) | ## ChatBrainiall Thin wrapper around `ChatOpenAI` that pre-configures the Brainiall endpoint. All `ChatOpenAI` features are supported: streaming, tool calling, structured output, multi-modal input, async, batching, and more. ### Basic Usage ```python from langchain_brainiall import ChatBrainiall llm = ChatBrainiall( model="claude-sonnet-4-6", temperature=0, max_tokens=1024, # api_key="...", # or set BRAINIALL_API_KEY env var ) # Simple invocation response = llm.invoke("What is the capital of France?") print(response.content) # With message history messages = [ ("system", "You are a helpful math tutor."), ("human", "What is the derivative of x^3?"), ] response = llm.invoke(messages) print(response.content) ``` ### Streaming ```python from langchain_brainiall import ChatBrainiall llm = ChatBrainiall(model="claude-sonnet-4-6") for chunk in llm.stream("Write a haiku about programming"): print(chunk.content, end="", flush=True) ``` ### Async Support ```python import asyncio from langchain_brainiall import ChatBrainiall async def main(): llm = ChatBrainiall(model="claude-haiku-4-5") # Async invoke response = await llm.ainvoke("Hello!") print(response.content) # Async streaming async for chunk in llm.astream("Tell me a joke"): print(chunk.content, end="", flush=True) asyncio.run(main()) ``` ### Tool Calling ```python from pydantic import BaseModel, Field from langchain_brainiall import ChatBrainiall class GetWeather(BaseModel): """Get current weather for a location.""" location: str = Field(description="City name") unit: str = Field(default="celsius", description="Temperature unit") class SearchDatabase(BaseModel): """Search the database for records.""" query: str = Field(description="Search query") limit: int = Field(default=10, description="Max results") llm = ChatBrainiall(model="claude-sonnet-4-6") llm_with_tools = llm.bind_tools([GetWeather, SearchDatabase]) response = llm_with_tools.invoke("What's the weather in Tokyo and Paris?") for tool_call in response.tool_calls: print(f"Tool: {tool_call['name']}, Args: {tool_call['args']}") ``` ### Structured Output ```python from pydantic import BaseModel from langchain_brainiall import ChatBrainiall class MovieReview(BaseModel): title: str rating: float summary: str pros: list[str] cons: list[str] llm = ChatBrainiall(model="claude-sonnet-4-6") structured = llm.with_structured_output(MovieReview) review = structured.invoke("Review the movie Inception") print(f"{review.title}: {review.rating}/10") print(f"Summary: {review.summary}") print(f"Pros: {', '.join(review.pros)}") print(f"Cons: {', '.join(review.cons)}") ``` ### Multi-Model Chains Use different models for different steps -- cheap models for drafting, powerful models for refinement: ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate fast = ChatBrainiall(model="nova-micro", temperature=0.7) smart = ChatBrainiall(model="claude-opus-4-6", temperature=0) # Draft with fast model ($0.035/$0.14 per MTok) draft = fast.invoke("Write a product description for wireless earbuds") # Refine with powerful model ($5/$25 per MTok) final = smart.invoke(f"Improve this product description:\n{draft.content}") print(final.content) ``` ### RAG Pipeline with Prompt Templates ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate llm = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) prompt = ChatPromptTemplate.from_messages([ ("system", "Answer the question based only on the following context:\n\n{context}"), ("human", "{question}") ]) chain = prompt | llm response = chain.invoke({ "context": "Python was created by Guido van Rossum in 1991. It emphasizes code readability.", "question": "Who created Python and when?" }) print(response.content) ``` ### With LangGraph Agents ```python from langchain_brainiall import ChatBrainiall from langgraph.prebuilt import create_react_agent from langchain_core.tools import tool @tool def calculate(expression: str) -> str: """Calculate a mathematical expression.""" return str(eval(expression)) @tool def get_current_time() -> str: """Get the current UTC time.""" from datetime import datetime return datetime.utcnow().isoformat() llm = ChatBrainiall(model="claude-sonnet-4-6") agent = create_react_agent(llm, [calculate, get_current_time]) result = agent.invoke({"messages": [("human", "What is 25 * 48 + 137?")]}) for msg in result["messages"]: print(f"{msg.type}: {msg.content}") ``` ### LCEL Chains (LangChain Expression Language) ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser llm = ChatBrainiall(model="claude-sonnet-4-6") chain = ( ChatPromptTemplate.from_template("Translate '{text}' to {language}.") | llm | StrOutputParser() ) result = chain.invoke({"text": "Hello, how are you?", "language": "Spanish"}) print(result) ``` ### Batch Processing ```python from langchain_brainiall import ChatBrainiall llm = ChatBrainiall(model="claude-haiku-4-5") # Process multiple inputs at once questions = [ "What is machine learning?", "What is deep learning?", "What is reinforcement learning?", ] # Batch invoke (runs concurrently) responses = llm.batch(questions) for q, r in zip(questions, responses): print(f"Q: {q}") print(f"A: {r.content[:100]}...\n") ``` ## Advanced Usage ### Conversation Memory with RunnableWithMessageHistory ```python from langchain_brainiall import ChatBrainiall from langchain_core.chat_history import InMemoryChatMessageHistory from langchain_core.runnables.history import RunnableWithMessageHistory from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder llm = ChatBrainiall(model="claude-sonnet-4-6") prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant. Be concise."), MessagesPlaceholder(variable_name="history"), ("human", "{input}"), ]) chain = prompt | llm # In-memory store for session histories store = {} def get_session_history(session_id: str): if session_id not in store: store[session_id] = InMemoryChatMessageHistory() return store[session_id] with_history = RunnableWithMessageHistory( chain, get_session_history, input_messages_key="input", history_messages_key="history", ) # First message config = {"configurable": {"session_id": "user-123"}} response = with_history.invoke({"input": "My name is Alice"}, config=config) print(response.content) # Second message -- remembers context response = with_history.invoke({"input": "What's my name?"}, config=config) print(response.content) # "Your name is Alice" ``` ### Streaming with Callbacks ```python from langchain_brainiall import ChatBrainiall from langchain_core.callbacks import StreamingStdOutCallbackHandler llm = ChatBrainiall( model="claude-sonnet-4-6", streaming=True, callbacks=[StreamingStdOutCallbackHandler()], ) # Tokens are printed to stdout as they arrive response = llm.invoke("Write a short story about a robot learning to paint") ``` ### Fallback Chains Use a cheaper model first, fall back to a more powerful one on failure: ```python from langchain_brainiall import ChatBrainiall # Primary: fast and cheap primary = ChatBrainiall(model="nova-micro", max_tokens=512) # Fallback: more capable fallback = ChatBrainiall(model="claude-sonnet-4-6", max_tokens=2048) # Creates a chain that tries primary first, then fallback chain = primary.with_fallbacks([fallback]) response = chain.invoke("Explain the theory of relativity in detail") print(response.content) ``` ### Router Chain — Dynamic Model Selection ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnableLambda # Define specialized models models = { "code": ChatBrainiall(model="deepseek-v3", temperature=0), "creative": ChatBrainiall(model="claude-sonnet-4-6", temperature=0.9), "fast": ChatBrainiall(model="nova-micro", temperature=0.3), "reasoning": ChatBrainiall(model="deepseek-r1", temperature=0), } # Router that picks the best model router_llm = ChatBrainiall(model="claude-haiku-4-5", temperature=0) router_prompt = ChatPromptTemplate.from_template( "Classify this task into exactly one category: code, creative, fast, reasoning.\n" "Task: {input}\nCategory:" ) router_chain = router_prompt | router_llm | StrOutputParser() def route(info): category = info["category"].strip().lower() model = models.get(category, models["fast"]) return model.invoke(info["input"]) # Full pipeline chain = ( {"input": lambda x: x, "category": router_chain} | RunnableLambda(route) ) # Automatically routes to the right model print(chain.invoke("Write a Python function to merge two sorted lists")) print(chain.invoke("Write a poem about autumn leaves")) ``` ### Parallel Execution with RunnableParallel ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnableParallel from langchain_core.output_parsers import StrOutputParser llm = ChatBrainiall(model="claude-haiku-4-5") # Run multiple chains in parallel parallel = RunnableParallel( summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm | StrOutputParser(), sentiment=ChatPromptTemplate.from_template("What is the sentiment of: {text}") | llm | StrOutputParser(), keywords=ChatPromptTemplate.from_template("Extract 5 keywords from: {text}") | llm | StrOutputParser(), ) result = parallel.invoke({"text": "LangChain makes it easy to build AI applications with composable chains."}) print(f"Summary: {result['summary']}") print(f"Sentiment: {result['sentiment']}") print(f"Keywords: {result['keywords']}") ``` ### Output Parsing with Pydantic ```python from pydantic import BaseModel, Field from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate class Recipe(BaseModel): name: str = Field(description="Recipe name") ingredients: list[str] = Field(description="List of ingredients") steps: list[str] = Field(description="Cooking steps") prep_time_minutes: int = Field(description="Preparation time in minutes") difficulty: str = Field(description="easy, medium, or hard") llm = ChatBrainiall(model="claude-sonnet-4-6") structured = llm.with_structured_output(Recipe) prompt = ChatPromptTemplate.from_template( "Create a recipe for {dish}. Use common ingredients." ) chain = prompt | structured recipe = chain.invoke({"dish": "pasta carbonara"}) print(f"{recipe.name} ({recipe.difficulty}, {recipe.prep_time_minutes} min)") for i, step in enumerate(recipe.steps, 1): print(f" {i}. {step}") ``` ### Multi-Modal Input (Vision) ```python from langchain_brainiall import ChatBrainiall from langchain_core.messages import HumanMessage llm = ChatBrainiall(model="claude-sonnet-4-6") message = HumanMessage( content=[ {"type": "text", "text": "What do you see in this image? Describe in detail."}, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg" }, }, ], ) response = llm.invoke([message]) print(response.content) ``` ### JSON Mode ```python from langchain_brainiall import ChatBrainiall llm = ChatBrainiall( model="claude-sonnet-4-6", model_kwargs={"response_format": {"type": "json_object"}}, ) response = llm.invoke( "Extract structured data as JSON: John Smith, 35, Senior Engineer at Google in Mountain View" ) import json data = json.loads(response.content) print(json.dumps(data, indent=2)) ``` ## RAG Patterns ### Full RAG Pipeline with ChromaDB ```python from langchain_brainiall import ChatBrainiall, BrainiallEmbeddings from langchain_chroma import Chroma from langchain_core.documents import Document from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # Initialize embeddings = BrainiallEmbeddings(model="bge-m3") llm = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) # Load documents docs = [ Document(page_content="Python was created in 1991 by Guido van Rossum.", metadata={"source": "wiki"}), Document(page_content="JavaScript was created in 1995 by Brendan Eich.", metadata={"source": "wiki"}), Document(page_content="Rust was first released in 2010 by Mozilla.", metadata={"source": "wiki"}), Document(page_content="Go was designed at Google and released in 2009.", metadata={"source": "wiki"}), Document(page_content="TypeScript was developed by Microsoft and released in 2012.", metadata={"source": "wiki"}), ] # Create vector store db = Chroma.from_documents(docs, embeddings) retriever = db.as_retriever(search_kwargs={"k": 3}) # RAG prompt prompt = ChatPromptTemplate.from_messages([ ("system", "Answer based on the context below. If unsure, say so.\n\nContext:\n{context}"), ("human", "{question}"), ]) def format_docs(docs): return "\n".join(doc.page_content for doc in docs) # RAG chain rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Query answer = rag_chain.invoke("Which programming languages were created by individuals vs companies?") print(answer) ``` ### RAG with Document Loaders and Text Splitting ```python from langchain_brainiall import ChatBrainiall, BrainiallEmbeddings from langchain_chroma import Chroma from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.document_loaders import TextLoader from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # Load and split documents loader = TextLoader("my_document.txt") docs = loader.load() splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""], ) chunks = splitter.split_documents(docs) # Create vector store embeddings = BrainiallEmbeddings(model="bge-m3") db = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db") retriever = db.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 10}) # RAG chain with sources llm = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) prompt = ChatPromptTemplate.from_messages([ ("system", "Answer the question based on the context. Cite sources.\n\nContext:\n{context}"), ("human", "{question}"), ]) def format_docs_with_sources(docs): return "\n\n".join( f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}" for doc in docs ) rag_chain = ( {"context": retriever | format_docs_with_sources, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) answer = rag_chain.invoke("What are the main topics covered in the document?") print(answer) ``` ### Conversational RAG with Memory ```python from langchain_brainiall import ChatBrainiall, BrainiallEmbeddings from langchain_chroma import Chroma from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_core.messages import HumanMessage, AIMessage embeddings = BrainiallEmbeddings(model="bge-m3") llm = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) # Assume db is already populated db = Chroma(embedding_function=embeddings, persist_directory="./chroma_db") retriever = db.as_retriever(search_kwargs={"k": 3}) # Contextualize question using chat history contextualize_prompt = ChatPromptTemplate.from_messages([ ("system", "Given the chat history and latest question, reformulate the question to be standalone."), MessagesPlaceholder(variable_name="chat_history"), ("human", "{input}"), ]) contextualize_chain = contextualize_prompt | llm | StrOutputParser() # Answer with context answer_prompt = ChatPromptTemplate.from_messages([ ("system", "Answer based on context:\n\n{context}"), MessagesPlaceholder(variable_name="chat_history"), ("human", "{input}"), ]) def format_docs(docs): return "\n".join(doc.page_content for doc in docs) # Full conversational RAG chat_history = [] def ask(question: str) -> str: # Contextualize if there's history if chat_history: standalone = contextualize_chain.invoke({ "chat_history": chat_history, "input": question, }) else: standalone = question # Retrieve and answer docs = retriever.invoke(standalone) context = format_docs(docs) answer = (answer_prompt | llm | StrOutputParser()).invoke({ "context": context, "chat_history": chat_history, "input": question, }) # Update history chat_history.append(HumanMessage(content=question)) chat_history.append(AIMessage(content=answer)) return answer # Multi-turn conversation print(ask("What is Python?")) print(ask("Who created it?")) # Understands "it" = Python print(ask("When was that?")) # Understands "that" = creation date ``` ## BrainiallEmbeddings Thin wrapper around `OpenAIEmbeddings` that pre-configures the Brainiall endpoint for embedding model access. ### Basic Usage ```python from langchain_brainiall import BrainiallEmbeddings embeddings = BrainiallEmbeddings( model="bge-m3", api_key="your-api-key", ) # Embed a single query vector = embeddings.embed_query("What is machine learning?") print(f"Dimensions: {len(vector)}") # Embed multiple documents vectors = embeddings.embed_documents([ "Machine learning is a subset of AI.", "Deep learning uses neural networks.", "NLP processes human language.", ]) print(f"Embedded {len(vectors)} documents, each with {len(vectors[0])} dimensions") ``` ### With Vector Store (ChromaDB) ```python from langchain_brainiall import ChatBrainiall, BrainiallEmbeddings from langchain_chroma import Chroma from langchain_core.documents import Document embeddings = BrainiallEmbeddings(model="bge-m3") llm = ChatBrainiall(model="claude-sonnet-4-6") # Create vector store docs = [ Document(page_content="Python was created in 1991 by Guido van Rossum."), Document(page_content="JavaScript was created in 1995 by Brendan Eich."), Document(page_content="Rust was first released in 2010 by Mozilla."), ] db = Chroma.from_documents(docs, embeddings) # Query results = db.similarity_search("Who created Rust?", k=1) print(results[0].page_content) ``` ### With FAISS Vector Store ```python from langchain_brainiall import BrainiallEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.documents import Document embeddings = BrainiallEmbeddings(model="titan-embed-v2") docs = [ Document(page_content="Neural networks are inspired by biological neurons."), Document(page_content="Gradient descent optimizes model parameters."), Document(page_content="Transformers use self-attention mechanisms."), ] # Create FAISS index db = FAISS.from_documents(docs, embeddings) # Save and load db.save_local("faiss_index") loaded_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True) # Similarity search with scores results = loaded_db.similarity_search_with_score("How do transformers work?", k=2) for doc, score in results: print(f"Score: {score:.4f} — {doc.page_content}") ``` ### Document Similarity Comparison ```python from langchain_brainiall import BrainiallEmbeddings import numpy as np embeddings = BrainiallEmbeddings(model="bge-m3") texts = [ "Machine learning automates analytical model building", "Deep learning is a subset of machine learning", "The weather today is sunny and warm", ] vectors = embeddings.embed_documents(texts) # Cosine similarity def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) for i in range(len(texts)): for j in range(i + 1, len(texts)): sim = cosine_similarity(vectors[i], vectors[j]) print(f"Similarity({i},{j}): {sim:.4f} — '{texts[i][:40]}' vs '{texts[j][:40]}'") ``` ## Agent Patterns ### Multi-Tool Agent with LangGraph ```python from langchain_brainiall import ChatBrainiall from langgraph.prebuilt import create_react_agent from langchain_core.tools import tool import json @tool def search_products(query: str, max_results: int = 5) -> str: """Search for products in the catalog.""" # Simulated product search products = [ {"name": "Wireless Mouse", "price": 29.99, "rating": 4.5}, {"name": "Mechanical Keyboard", "price": 89.99, "rating": 4.8}, {"name": "USB-C Hub", "price": 45.99, "rating": 4.2}, ] return json.dumps(products[:max_results]) @tool def calculate_discount(price: float, discount_percent: float) -> str: """Calculate the discounted price.""" discounted = price * (1 - discount_percent / 100) return f"Original: ${price:.2f}, Discount: {discount_percent}%, Final: ${discounted:.2f}" @tool def check_inventory(product_name: str) -> str: """Check if a product is in stock.""" # Simulated inventory check return f"{product_name} is in stock. 42 units available." llm = ChatBrainiall(model="claude-sonnet-4-6") agent = create_react_agent(llm, [search_products, calculate_discount, check_inventory]) result = agent.invoke({ "messages": [("human", "Find keyboards under $100 and apply a 15% discount")] }) for msg in result["messages"]: if msg.content: print(f"{msg.type}: {msg.content}") ``` ### Supervisor Agent Pattern ```python from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Specialized workers researcher = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) writer = ChatBrainiall(model="claude-sonnet-4-6", temperature=0.7) editor = ChatBrainiall(model="claude-haiku-4-5", temperature=0) # Supervisor coordinates the workflow supervisor = ChatBrainiall(model="claude-opus-4-6", temperature=0) def research_and_write(topic: str) -> str: # Step 1: Research research_prompt = ChatPromptTemplate.from_template( "Research the topic '{topic}' and provide 5 key facts with sources." ) research = (research_prompt | researcher | StrOutputParser()).invoke({"topic": topic}) # Step 2: Write write_prompt = ChatPromptTemplate.from_template( "Write a 300-word article based on these facts:\n{facts}" ) draft = (write_prompt | writer | StrOutputParser()).invoke({"facts": research}) # Step 3: Edit edit_prompt = ChatPromptTemplate.from_template( "Edit this article for clarity, grammar, and flow. Return the improved version:\n{draft}" ) final = (edit_prompt | editor | StrOutputParser()).invoke({"draft": draft}) return final article = research_and_write("the impact of quantum computing on cryptography") print(article) ``` ## Cost Optimization Patterns ### Model Tiering by Task Complexity ```python from langchain_brainiall import ChatBrainiall # Tier 1: Ultra-cheap for simple tasks ($0.035/$0.14 per MTok) tier1 = ChatBrainiall(model="nova-micro", temperature=0) # Tier 2: Balanced for moderate tasks ($1.00/$5.00 per MTok) tier2 = ChatBrainiall(model="claude-haiku-4-5", temperature=0) # Tier 3: Premium for complex tasks ($3.00/$15.00 per MTok) tier3 = ChatBrainiall(model="claude-sonnet-4-6", temperature=0) # Tier 4: Best quality for critical tasks ($5.00/$25.00 per MTok) tier4 = ChatBrainiall(model="claude-opus-4-6", temperature=0) # Use the right model for each task classification = tier1.invoke("Is this positive or negative: 'I love this product'") summary = tier2.invoke("Summarize this paragraph: ...") analysis = tier3.invoke("Analyze the legal implications of this contract clause: ...") strategy = tier4.invoke("Design a go-to-market strategy for a new SaaS product targeting enterprise...") ``` ### Batch Processing for Cost Efficiency ```python from langchain_brainiall import ChatBrainiall import asyncio llm = ChatBrainiall(model="claude-haiku-4-5") # Process 100 items efficiently using batch items = [f"Classify this text as positive/negative: '{text}'" for text in texts_list] # Batch with concurrency control responses = llm.batch( items, config={"max_concurrency": 10}, # Limit concurrent requests ) # Or async batch for even better performance async def process_async(): responses = await llm.abatch( items, config={"max_concurrency": 20}, ) return responses results = asyncio.run(process_async()) ``` ## LangServe Deployment ### Serve ChatBrainiall as a REST API ```python # server.py from fastapi import FastAPI from langserve import add_routes from langchain_brainiall import ChatBrainiall from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser app = FastAPI(title="Brainiall LangServe", version="1.0") # Simple chat endpoint llm = ChatBrainiall(model="claude-sonnet-4-6") add_routes(app, llm, path="/chat") # Custom chain endpoint prompt = ChatPromptTemplate.from_template("Translate to {language}: {text}") chain = prompt | llm | StrOutputParser() add_routes(app, chain, path="/translate") if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` ### Client Usage ```python from langserve import RemoteRunnable # Connect to the LangServe endpoint chain = RemoteRunnable("http://localhost:8000/translate") # Invoke remotely result = chain.invoke({"text": "Hello, world!", "language": "French"}) print(result) # Stream remotely for chunk in chain.stream({"text": "Tell me a story", "language": "Spanish"}): print(chunk, end="", flush=True) ``` ## Migration Guides ### From langchain-openai ```python # Before (langchain-openai) from langchain_openai import ChatOpenAI, OpenAIEmbeddings llm = ChatOpenAI(model="gpt-4o", api_key="sk-...") embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="sk-...") # After (langchain-brainiall) — drop-in replacement from langchain_brainiall import ChatBrainiall, BrainiallEmbeddings llm = ChatBrainiall(model="claude-sonnet-4-6", api_key="br-...") embeddings = BrainiallEmbeddings(model="bge-m3", api_key="br-...") # All existing chains, agents, and tools work unchanged # because ChatBrainiall extends ChatOpenAI ``` ### From langchain-anthropic ```python # Before (langchain-anthropic) from langchain_anthropic import ChatAnthropic llm = ChatAnthropic(model="claude-sonnet-4-6", anthropic_api_key="sk-ant-...") # After (langchain-brainiall) — same models, lower cost from langchain_brainiall import ChatBrainiall llm = ChatBrainiall(model="claude-sonnet-4-6", api_key="br-...") # Benefit: Access to 113+ models with a single API key # Plus: automatic response caching and failover ``` ### From Multiple Providers to Single Gateway ```python # Before: Managing multiple API keys and packages from langchain_openai import ChatOpenAI from langchain_anthropic import ChatAnthropic from langchain_google_genai import ChatGoogleGenerativeAI gpt = ChatOpenAI(model="gpt-4o", api_key="sk-openai-...") claude = ChatAnthropic(model="claude-sonnet-4-6", anthropic_api_key="sk-ant-...") # gemini = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key="AIza...") # After: One package, one API key, 113+ models from langchain_brainiall import ChatBrainiall claude = ChatBrainiall(model="claude-sonnet-4-6") # Anthropic deepseek = ChatBrainiall(model="deepseek-r1") # DeepSeek llama = ChatBrainiall(model="llama-3.3-70b") # Meta qwen = ChatBrainiall(model="qwen-3-235b") # Qwen/Alibaba mistral = ChatBrainiall(model="mistral-large-3") # Mistral nova = ChatBrainiall(model="nova-pro") # Amazon # All use the same BRAINIALL_API_KEY environment variable ``` ## Available Chat Models | Model | Provider | Context | Max Output | Input $/MTok | Output $/MTok | |-------|----------|---------|------------|-------------|--------------| | claude-opus-4-6 | Anthropic | 200K | 64K | $5.00 | $25.00 | | claude-opus-4-6-1m | Anthropic | 1M | 64K | $5.00 | $25.00 | | claude-opus-4-5 | Anthropic | 200K | 32K | $15.00 | $75.00 | | claude-sonnet-4-6 | Anthropic | 200K | 64K | $3.00 | $15.00 | | claude-sonnet-4-6-1m | Anthropic | 1M | 64K | $3.00 | $15.00 | | claude-haiku-4-5 | Anthropic | 200K | 16K | $1.00 | $5.00 | | claude-3-opus | Anthropic | 200K | 4K | $15.00 | $75.00 | | deepseek-r1 | DeepSeek | 128K | 64K | $1.35 | $5.40 | | deepseek-v3 | DeepSeek | 128K | 16K | $0.27 | $1.10 | | llama-3.3-70b | Meta | 128K | 4K | $0.72 | $0.72 | | llama-4-scout-17b | Meta | 1M | 16K | $0.17 | $0.17 | | llama-4-maverick-17b | Meta | 1M | 16K | $0.20 | $0.60 | | qwen-3-235b | Qwen | 128K | 16K | $0.80 | $2.40 | | qwen-3-32b | Qwen | 128K | 16K | $0.35 | $0.35 | | qwen-3-8b | Qwen | 128K | 16K | $0.045 | $0.18 | | qwen-3-80b | Qwen | 128K | 16K | $0.40 | $1.20 | | mistral-large-3 | Mistral | 128K | 16K | $2.00 | $6.00 | | mistral-small-3 | Mistral | 128K | 16K | $0.10 | $0.30 | | nova-pro | Amazon | 300K | 5K | $0.80 | $3.20 | | nova-lite | Amazon | 300K | 5K | $0.06 | $0.24 | | nova-micro | Amazon | 128K | 5K | $0.035 | $0.14 | | minimax-m2 | MiniMax | 1M | 128K | $0.50 | $2.20 | | nemotron-ultra-253b | NVIDIA | 128K | 16K | $0.72 | $0.72 | | kimi-k2.5 | Moonshot | 128K | 16K | $0.60 | $2.40 | ## Available Embedding Models | Model | Dimensions | Max Tokens | Price $/MTok | |-------|-----------|------------|-------------| | bge-m3 | 1024 | 8192 | $0.02 | | bge-large-en-v1.5 | 1024 | 512 | $0.02 | | cohere-embed-v3 | 1024 | 512 | $0.10 | | titan-embed-v2 | 1024 | 8192 | $0.02 | ## Class Reference ### ChatBrainiall ```python class ChatBrainiall(ChatOpenAI): """ Chat model for the Brainiall LLM Gateway. Parameters: model (str): Model name. Default: "claude-sonnet-4-6" api_key (str): API key. Falls back to BRAINIALL_API_KEY env var. base_url (str): API base URL. Default: Brainiall gateway. temperature (float): Sampling temperature 0-2. max_tokens (int): Max tokens to generate. max_retries (int): Max retries on failure. Default: 2. timeout (float): Request timeout in seconds. streaming (bool): Enable streaming mode. model_kwargs (dict): Additional model parameters (e.g., response_format). Class methods: get_available_models() -> list[str]: List available model names. get_model_info(model: str) -> dict: Get context/output info for a model. Inherited from ChatOpenAI: invoke(input) -> AIMessage stream(input) -> Iterator[AIMessageChunk] batch(inputs) -> list[AIMessage] ainvoke(input) -> AIMessage astream(input) -> AsyncIterator[AIMessageChunk] abatch(inputs) -> list[AIMessage] bind_tools(tools) -> Runnable with_structured_output(schema) -> Runnable with_fallbacks(fallbacks) -> RunnableWithFallbacks """ ``` ### BrainiallEmbeddings ```python class BrainiallEmbeddings(OpenAIEmbeddings): """ Embeddings model for the Brainiall LLM Gateway. Parameters: model (str): Embedding model name. Default: "bge-m3" api_key (str): API key. Falls back to BRAINIALL_API_KEY env var. base_url (str): API base URL. Default: Brainiall gateway. Class methods: get_available_models() -> list[str]: List available embedding models. Inherited from OpenAIEmbeddings: embed_query(text: str) -> list[float] embed_documents(texts: list[str]) -> list[list[float]] aembed_query(text: str) -> list[float] aembed_documents(texts: list[str]) -> list[list[float]] """ ``` ## Error Handling ```python from langchain_brainiall import ChatBrainiall from langchain_core.exceptions import OutputParserException import openai llm = ChatBrainiall(model="claude-sonnet-4-6", max_retries=3) try: response = llm.invoke("Hello") print(response.content) except openai.AuthenticationError: print("Invalid API key. Set BRAINIALL_API_KEY env var.") except openai.RateLimitError: print("Rate limit exceeded. Reduce concurrency or upgrade plan.") except openai.APIConnectionError: print("Cannot connect to API. Check network and base URL.") except openai.APITimeoutError: print("Request timed out. Increase timeout or reduce max_tokens.") except OutputParserException as e: print(f"Failed to parse structured output: {e}") ``` ## Links - Website: https://brainiall.com - Get API Key: https://brainiall.com - PyPI: https://pypi.org/project/langchain-brainiall/ - LLM Gateway: https://github.com/fasuizu-br/brainiall-llm-gateway - Speech AI: https://github.com/fasuizu-br/speech-ai-examples - NLP API: https://github.com/fasuizu-br/brainiall-nlp-api - Image API: https://github.com/fasuizu-br/brainiall-image-api