--- name: huggingface-js description: Runs ML models in the browser and Node.js with Transformers.js and Hugging Face Inference API. Use when adding local inference, embeddings, or calling hosted models without GPU servers. --- # Hugging Face JavaScript Run ML models locally with Transformers.js or via the Inference API. Supports text generation, embeddings, image classification, speech recognition, and more. ## Transformers.js (Local Inference) Run models directly in browser or Node.js using ONNX Runtime. ```bash npm install @huggingface/transformers ``` ### Text Generation ```typescript import { pipeline } from '@huggingface/transformers'; const generator = await pipeline('text-generation', 'Xenova/gpt2'); const result = await generator('The quick brown fox', { max_new_tokens: 50, }); console.log(result[0].generated_text); ``` ### Text Classification (Sentiment) ```typescript import { pipeline } from '@huggingface/transformers'; const classifier = await pipeline( 'text-classification', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' ); const result = await classifier('I love this product!'); // [{ label: 'POSITIVE', score: 0.9998 }] ``` ### Embeddings ```typescript import { pipeline } from '@huggingface/transformers'; const embedder = await pipeline( 'feature-extraction', 'Xenova/all-MiniLM-L6-v2' ); const result = await embedder('Hello, world!', { pooling: 'mean', normalize: true, }); const embedding = Array.from(result.data); // [0.123, -0.456, ...] - 384 dimensions ``` ### Question Answering ```typescript import { pipeline } from '@huggingface/transformers'; const qa = await pipeline( 'question-answering', 'Xenova/distilbert-base-cased-distilled-squad' ); const result = await qa({ question: 'What is the capital of France?', context: 'France is a country in Europe. Paris is the capital of France.', }); console.log(result); // { answer: 'Paris', score: 0.98, start: 42, end: 47 } ``` ### Translation ```typescript import { pipeline } from '@huggingface/transformers'; const translator = await pipeline( 'translation', 'Xenova/nllb-200-distilled-600M' ); const result = await translator('Hello, how are you?', { src_lang: 'eng_Latn', tgt_lang: 'fra_Latn', }); console.log(result[0].translation_text); ``` ### Speech Recognition (Whisper) ```typescript import { pipeline } from '@huggingface/transformers'; const transcriber = await pipeline( 'automatic-speech-recognition', 'Xenova/whisper-tiny.en' ); const result = await transcriber('./audio.mp3'); console.log(result.text); ``` ### Image Classification ```typescript import { pipeline } from '@huggingface/transformers'; const classifier = await pipeline( 'image-classification', 'Xenova/vit-base-patch16-224' ); const result = await classifier('https://example.com/cat.jpg'); // [{ label: 'tabby cat', score: 0.95 }, ...] ``` ### Object Detection ```typescript import { pipeline } from '@huggingface/transformers'; const detector = await pipeline( 'object-detection', 'Xenova/detr-resnet-50' ); const result = await detector('https://example.com/image.jpg'); // [{ label: 'cat', score: 0.98, box: { xmin, ymin, xmax, ymax } }, ...] ``` ### Zero-Shot Classification ```typescript import { pipeline } from '@huggingface/transformers'; const classifier = await pipeline( 'zero-shot-classification', 'Xenova/bart-large-mnli' ); const result = await classifier( 'This is a tutorial about machine learning', ['education', 'politics', 'sports'] ); console.log(result); // { labels: ['education', ...], scores: [0.95, ...] } ``` ## Hugging Face Inference API Call hosted models without local computation. ```bash npm install @huggingface/inference ``` ### Setup ```typescript import { HfInference } from '@huggingface/inference'; const hf = new HfInference(process.env.HF_ACCESS_TOKEN); ``` ### Text Generation ```typescript const result = await hf.textGeneration({ model: 'meta-llama/Llama-2-7b-chat-hf', inputs: 'What is the meaning of life?', parameters: { max_new_tokens: 100, temperature: 0.7, }, }); console.log(result.generated_text); ``` ### Streaming Text Generation ```typescript const stream = hf.textGenerationStream({ model: 'meta-llama/Llama-2-7b-chat-hf', inputs: 'Tell me a story', parameters: { max_new_tokens: 200, }, }); for await (const chunk of stream) { process.stdout.write(chunk.token.text); } ``` ### Chat Completion ```typescript const result = await hf.chatCompletion({ model: 'meta-llama/Llama-2-7b-chat-hf', messages: [ { role: 'user', content: 'Hello!' }, ], max_tokens: 100, }); console.log(result.choices[0].message.content); ``` ### Embeddings ```typescript const result = await hf.featureExtraction({ model: 'sentence-transformers/all-MiniLM-L6-v2', inputs: 'Hello, world!', }); console.log(result); // embedding vector ``` ### Image Generation ```typescript const result = await hf.textToImage({ model: 'stabilityai/stable-diffusion-2', inputs: 'A futuristic city at sunset', parameters: { negative_prompt: 'blurry, low quality', }, }); // result is a Blob const buffer = Buffer.from(await result.arrayBuffer()); fs.writeFileSync('output.png', buffer); ``` ### Image Classification ```typescript const result = await hf.imageClassification({ model: 'google/vit-base-patch16-224', data: await fs.openAsBlob('cat.jpg'), }); console.log(result); // [{ label: 'tabby cat', score: 0.95 }, ...] ``` ### Speech Recognition ```typescript const result = await hf.automaticSpeechRecognition({ model: 'openai/whisper-large-v3', data: await fs.openAsBlob('audio.mp3'), }); console.log(result.text); ``` ## Inference Endpoints For dedicated hosted models. ```typescript import { InferenceClient } from '@huggingface/inference'; const client = new InferenceClient(process.env.HF_ACCESS_TOKEN); const endpoint = client.endpoint('https://your-endpoint.endpoints.huggingface.cloud'); const result = await endpoint.textGeneration({ inputs: 'Hello, world!', }); ``` ## Next.js Integration ```typescript // app/api/generate/route.ts import { HfInference } from '@huggingface/inference'; import { NextResponse } from 'next/server'; const hf = new HfInference(process.env.HF_ACCESS_TOKEN); export async function POST(request: Request) { const { prompt } = await request.json(); const result = await hf.textGeneration({ model: 'meta-llama/Llama-2-7b-chat-hf', inputs: prompt, parameters: { max_new_tokens: 200, }, }); return NextResponse.json({ text: result.generated_text }); } ``` ### Streaming Response ```typescript // app/api/stream/route.ts import { HfInference } from '@huggingface/inference'; const hf = new HfInference(process.env.HF_ACCESS_TOKEN); export async function POST(request: Request) { const { prompt } = await request.json(); const stream = hf.textGenerationStream({ model: 'meta-llama/Llama-2-7b-chat-hf', inputs: prompt, parameters: { max_new_tokens: 200 }, }); const encoder = new TextEncoder(); const readable = new ReadableStream({ async start(controller) { for await (const chunk of stream) { controller.enqueue(encoder.encode(chunk.token.text)); } controller.close(); }, }); return new Response(readable, { headers: { 'Content-Type': 'text/plain' }, }); } ``` ## Browser Usage Transformers.js works in the browser with WebGPU acceleration. ```html ``` ### With WebGPU ```typescript import { pipeline, env } from '@huggingface/transformers'; // Enable WebGPU env.backends.onnx.wasm.proxy = true; const classifier = await pipeline('text-classification', 'model-name', { device: 'webgpu', }); ``` ## Configuration ```typescript import { env } from '@huggingface/transformers'; // Cache settings env.cacheDir = './models'; env.localModelPath = './local-models'; // Disable remote models (offline mode) env.allowRemoteModels = false; // Disable local models env.allowLocalModels = false; ``` ## Available Tasks | Task | Pipeline | Example Model | |------|----------|---------------| | Text Classification | text-classification | distilbert-base-uncased-finetuned-sst-2-english | | Text Generation | text-generation | gpt2, llama | | Question Answering | question-answering | distilbert-base-cased-distilled-squad | | Summarization | summarization | t5-small | | Translation | translation | nllb-200-distilled-600M | | Feature Extraction | feature-extraction | all-MiniLM-L6-v2 | | Image Classification | image-classification | vit-base-patch16-224 | | Object Detection | object-detection | detr-resnet-50 | | Speech Recognition | automatic-speech-recognition | whisper-tiny | | Zero-Shot Classification | zero-shot-classification | bart-large-mnli | ## Environment Variables ```bash HF_ACCESS_TOKEN=hf_xxxxxxxx ``` ## Best Practices 1. **Cache models** - Download once, reuse 2. **Use WebGPU** - Faster inference in browsers 3. **Choose small models** - For client-side use 4. **Stream responses** - Better UX for generation 5. **Use Inference API** - For large models 6. **Consider endpoints** - For production workloads 7. **Quantized models** - Smaller, faster (look for ONNX models)