--- name: lobe-tts version: "1.0.0" description: LobeTTS - High-quality TypeScript TTS/STT toolkit with EdgeSpeech, Microsoft, OpenAI engines, React hooks, audio visualization components, and both server and browser support --- # LobeTTS Skill **LobeTTS** (@lobehub/tts) is a high-quality TypeScript-based toolkit for text-to-speech (TTS) and speech-to-text (STT) functionality. It supports usage both on the server-side and in the browser, offering developers an open-source alternative to proprietary TTS solutions with quality comparable to OpenAI's TTS service. **Key Value Proposition**: Generate high-quality speech with minimal code (~15 lines), supporting multiple TTS engines (Edge, Microsoft, OpenAI) with React hooks and audio visualization components for seamless frontend integration. ## When to Use This Skill - Implementing text-to-speech in Node.js applications - Adding speech synthesis to React/Next.js applications - Building voice-enabled chatbots or assistants - Creating audio players with visualization - Implementing speech-to-text functionality - Comparing or switching between TTS providers - Building accessible applications with audio output ## When NOT to Use This Skill - For native mobile TTS (use platform-specific APIs) - For real-time voice streaming (use WebRTC solutions) - For voice cloning or custom voice training - For offline TTS without internet (Edge/Microsoft require connectivity) --- ## Core Concepts ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ @lobehub/tts │ │ (TypeScript Library) │ └─────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ TTS Core │ │ React Hooks │ │ Components │ ├───────────────┤ ├───────────────┤ ├───────────────┤ │ EdgeSpeechTTS │ │ useEdgeSpeech │ │ AudioPlayer │ │ MicrosoftTTS │ │ useMicrosoft │ │ AudioVisualzr │ │ OpenAITTS │ │ useOpenAITTS │ │ │ │ OpenAISTT │ │ useOpenAISTT │ │ │ │ SpeechSynth │ │ useTTS │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Server │ │ Browser │ │ Styling │ ├───────────────┤ ├───────────────┤ ├───────────────┤ │ • Node.js │ │ • React │ │ • Waveforms │ │ • Edge/Vercel │ │ • Next.js │ │ • Progress │ │ • File output │ │ • SPA │ │ • Controls │ └───────────────┘ └───────────────┘ └───────────────┘ ``` ### Project Statistics | Metric | Value | |--------|-------| | GitHub Stars | 695+ | | Forks | 93+ | | Contributors | 13+ | | Releases | 100+ | | Dependents | ~1,500 projects | | Primary Language | TypeScript (98.8%) | | License | MIT | ### Supported TTS/STT Engines | Engine | Type | Provider | Quality | Cost | |--------|------|----------|---------|------| | **EdgeSpeechTTS** | TTS | Microsoft Edge | High | Free | | **MicrosoftTTS** | TTS | Azure Cognitive | Very High | Paid | | **OpenAITTS** | TTS | OpenAI | Premium | Paid | | **OpenAISTT** | STT | OpenAI Whisper | Premium | Paid | | **SpeechSynthesisTTS** | TTS | Browser Native | Variable | Free | --- ## Installation ### Package Installation ```bash # pnpm (recommended) pnpm add @lobehub/tts # bun bun add @lobehub/tts # npm npm install @lobehub/tts # yarn yarn add @lobehub/tts ``` **Important**: This is an ESM-only package. ### Next.js Configuration Add to `next.config.js`: ```javascript /** @type {import('next').NextConfig} */ const nextConfig = { transpilePackages: ['@lobehub/tts'], }; module.exports = nextConfig; ``` ### Node.js WebSocket Polyfill For server-side usage, polyfill WebSocket: ```javascript import WebSocket from 'ws'; global.WebSocket = WebSocket; ``` --- ## Server-Side Usage ### EdgeSpeechTTS (Free, High Quality) ```typescript import { EdgeSpeechTTS } from '@lobehub/tts'; import { Buffer } from 'buffer'; import fs from 'fs'; import path from 'path'; // Polyfill WebSocket for Node.js import WebSocket from 'ws'; global.WebSocket = WebSocket; // Initialize TTS engine const tts = new EdgeSpeechTTS({ locale: 'en-US' }); // Create speech payload const payload = { input: 'Hello! This is a speech demonstration using LobeTTS.', options: { voice: 'en-US-GuyNeural', }, }; // Generate speech const response = await tts.create(payload); // Save to file const mp3Buffer = Buffer.from(await response.arrayBuffer()); const speechFile = path.resolve('./speech.mp3'); fs.writeFileSync(speechFile, mp3Buffer); console.log(`Speech saved to: ${speechFile}`); ``` ### MicrosoftTTS (Azure) ```typescript import { MicrosoftSpeechTTS } from '@lobehub/tts'; const tts = new MicrosoftSpeechTTS({ locale: 'en-US', subscriptionKey: process.env.AZURE_SPEECH_KEY, region: process.env.AZURE_SPEECH_REGION, // e.g., 'eastus' }); const response = await tts.create({ input: 'Premium quality speech from Azure.', options: { voice: 'en-US-JennyNeural', style: 'cheerful', // emotional style rate: '1.0', pitch: '0%', }, }); ``` ### OpenAI TTS ```typescript import { OpenAITTS } from '@lobehub/tts'; const tts = new OpenAITTS({ apiKey: process.env.OPENAI_API_KEY, }); const response = await tts.create({ input: 'OpenAI TTS provides natural-sounding speech.', options: { model: 'tts-1-hd', // 'tts-1' or 'tts-1-hd' voice: 'alloy', // alloy, echo, fable, onyx, nova, shimmer speed: 1.0, // 0.25 to 4.0 }, }); ``` ### OpenAI STT (Speech-to-Text) ```typescript import { OpenAISTT } from '@lobehub/tts'; import fs from 'fs'; const stt = new OpenAISTT({ apiKey: process.env.OPENAI_API_KEY, }); // From file const audioBuffer = fs.readFileSync('./recording.mp3'); const result = await stt.create({ file: audioBuffer, options: { model: 'whisper-1', language: 'en', response_format: 'json', }, }); console.log('Transcription:', result.text); ``` --- ## API Routes (Next.js/Vercel) ### Edge Speech API Route ```typescript // app/api/tts/edge/route.ts import { EdgeSpeechTTS } from '@lobehub/tts'; import { NextRequest, NextResponse } from 'next/server'; export async function POST(req: NextRequest) { const { text, voice = 'en-US-GuyNeural' } = await req.json(); const tts = new EdgeSpeechTTS({ locale: 'en-US' }); const response = await tts.create({ input: text, options: { voice }, }); const audioBuffer = await response.arrayBuffer(); return new NextResponse(audioBuffer, { headers: { 'Content-Type': 'audio/mpeg', 'Content-Length': audioBuffer.byteLength.toString(), }, }); } ``` ### OpenAI TTS API Route ```typescript // app/api/tts/openai/route.ts import { OpenAITTS } from '@lobehub/tts'; import { NextRequest, NextResponse } from 'next/server'; export async function POST(req: NextRequest) { const { text, voice = 'alloy', model = 'tts-1' } = await req.json(); const tts = new OpenAITTS({ apiKey: process.env.OPENAI_API_KEY!, }); const response = await tts.create({ input: text, options: { voice, model }, }); const audioBuffer = await response.arrayBuffer(); return new NextResponse(audioBuffer, { headers: { 'Content-Type': 'audio/mpeg', }, }); } ``` --- ## React Components ### AudioPlayer Component ```tsx import { AudioPlayer, useAudioPlayer } from '@lobehub/tts/react'; interface TTSPlayerProps { audioUrl: string; } export default function TTSPlayer({ audioUrl }: TTSPlayerProps) { const { ref, isLoading, ...audio } = useAudioPlayer(audioUrl); return ( ); } ``` ### AudioVisualizer Component ```tsx import { AudioPlayer, AudioVisualizer, useAudioPlayer } from '@lobehub/tts/react'; import { Flexbox } from 'react-layout-kit'; interface AudioPlayerWithVisualizerProps { url: string; } export default function AudioPlayerWithVisualizer({ url }: AudioPlayerWithVisualizerProps) { const { ref, isLoading, ...audio } = useAudioPlayer(url); return ( ); } ``` ### Complete TTS Component ```tsx 'use client'; import { useState } from 'react'; import { AudioPlayer, useAudioPlayer } from '@lobehub/tts/react'; export default function TextToSpeech() { const [text, setText] = useState(''); const [audioUrl, setAudioUrl] = useState(null); const [loading, setLoading] = useState(false); const handleSpeak = async () => { setLoading(true); try { const response = await fetch('/api/tts/edge', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text }), }); const blob = await response.blob(); const url = URL.createObjectURL(blob); setAudioUrl(url); } finally { setLoading(false); } }; const { ref, isLoading, ...audio } = useAudioPlayer(audioUrl || ''); return (