# 🍱 Semantic Chunking API Documentation Complete API reference for the semantic-chunking microservice. ## Table of Contents - [Overview](#overview) - [Base URL](#base-url) - [Authentication](#authentication) - [Endpoints](#endpoints) - [POST /api/chunkit](#post-apichunkit) - [POST /api/cramit](#post-apicramit) - [POST /api/sentenceit](#post-apisentenceit) - [GET /api/health](#get-apihealth) - [GET /api/version](#get-apiversion) - [Request Format](#request-format) - [Response Format](#response-format) - [Error Handling](#error-handling) - [Docker Deployment](#docker-deployment) - [Production Setup](#production-setup) --- ## Overview The Semantic Chunking API provides RESTful endpoints for semantically chunking text using ONNX embedding models. The API is designed to be deployed as a microservice in containerized environments. **Key Features:** - Three chunking methods: semantic, dense, and sentence-level - Optional Bearer token authentication - CORS enabled for cross-origin requests - JSON request/response format - Docker-ready with health checks --- ## Base URL **Local Development:** ``` http://localhost:3001 ``` **Docker (default):** ``` http://localhost:3001 ``` **Production:** Configure your reverse proxy (nginx, Traefik) to forward requests to the container on port 3001. --- ## Authentication Authentication is **optional** and controlled via the `API_AUTH_TOKEN` environment variable. ### Enabling Authentication Set the `API_AUTH_TOKEN` environment variable: ```bash # Local export API_AUTH_TOKEN=your-secret-token-here # Docker docker run -e API_AUTH_TOKEN=your-secret-token-here ... # Docker Compose environment: - API_AUTH_TOKEN=your-secret-token-here ``` ### Using Authentication When enabled, include the Bearer token in the Authorization header: ```bash curl -X POST http://localhost:3001/api/chunkit \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-secret-token-here" \ -d '{"documents": [...]}' ``` ### Endpoints Without Auth The following endpoints are always accessible without authentication: - `GET /` (landing page) - `GET /api/health` - `GET /api/version` --- ## Endpoints ### POST /api/chunkit Semantically chunk text based on sentence similarity using cosine similarity scores. **URL:** `/api/chunkit` **Method:** `POST` **Auth Required:** Optional (if `API_AUTH_TOKEN` is set) **Request Body:** ```json { "documents": [ { "document_name": "example", "document_text": "Your long text here..." } ], "options": { "maxTokenSize": 500, "similarityThreshold": 0.5, "dynamicThresholdLowerBound": 0.4, "dynamicThresholdUpperBound": 0.8, "numSimilaritySentencesLookahead": 3, "combineChunks": true, "combineChunksSimilarityThreshold": 0.5, "onnxEmbeddingModel": "Xenova/all-MiniLM-L6-v2", "dtype": "q8", "device": "cpu", "returnEmbedding": false, "returnTokenLength": true, "chunkPrefix": null, "excludeChunkPrefixInResults": false, "maxMergesPerPass": 500, "maxUncappedPasses": 100, "maxMergesPerPassPercentage": 40, "uncappedCandidateMerges": 12 } } ``` **Success Response (200):** ```json [ { "document_id": 1234567890, "document_name": "example", "number_of_chunks": 5, "chunk_number": 1, "model_name": "Xenova/all-MiniLM-L6-v2", "dtype": "q8", "text": "First chunk of text...", "token_length": 245 }, ... ] ``` **Example with curl:** ```bash curl -X POST http://localhost:3001/api/chunkit \ -H "Content-Type: application/json" \ -d '{ "documents": [ { "document_name": "test", "document_text": "This is a test document. It has multiple sentences. Each sentence will be analyzed for semantic similarity." } ], "options": { "maxTokenSize": 100, "similarityThreshold": 0.5 } }' ``` **Merge Optimization Parameters:** The following parameters control the multi-pass chunk merging algorithm: | Parameter | Default | Description | |-----------|---------|-------------| | `maxMergesPerPass` | 500 | Maximum number of chunk merges per optimization pass | | `maxUncappedPasses` | 100 | Maximum optimization iterations | | `maxMergesPerPassPercentage` | 40 | Percentage of merge candidates to process per pass | | `uncappedCandidateMerges` | 12 | When candidate count is below this, all are merged | **Note:** The `embedCallback` parameter is only available when using the library directly via JavaScript/TypeScript, not via the REST API. The REST API always uses the ONNX embedding model. --- ### POST /api/cramit Pack sentences into dense chunks up to the maximum token size without considering semantic similarity. **URL:** `/api/cramit` **Method:** `POST` **Auth Required:** Optional (if `API_AUTH_TOKEN` is set) **Request Body:** ```json { "documents": [ { "document_name": "example", "document_text": "Your text here..." } ], "options": { "maxTokenSize": 500, "onnxEmbeddingModel": "Xenova/all-MiniLM-L6-v2", "dtype": "q8", "device": "cpu", "returnEmbedding": false, "returnTokenLength": true, "chunkPrefix": null, "excludeChunkPrefixInResults": false } } ``` **Success Response (200):** Same format as `/api/chunkit`. **Example with curl:** ```bash curl -X POST http://localhost:3001/api/cramit \ -H "Content-Type: application/json" \ -d '{ "documents": [ { "document_text": "Quick and dirty chunking. No semantic analysis. Just pack sentences together." } ], "options": { "maxTokenSize": 50 } }' ``` **Note:** The `embedCallback` parameter is only available when using the library directly via JavaScript/TypeScript, not via the REST API. --- ### POST /api/sentenceit Split text into individual sentences. **URL:** `/api/sentenceit` **Method:** `POST` **Auth Required:** Optional (if `API_AUTH_TOKEN` is set) **Request Body:** ```json { "documents": [ { "document_name": "example", "document_text": "Your text here..." } ], "options": { "onnxEmbeddingModel": "Xenova/all-MiniLM-L6-v2", "dtype": "q8", "device": "cpu", "returnEmbedding": false, "returnTokenLength": false, "chunkPrefix": null, "excludeChunkPrefixInResults": false } } ``` **Success Response (200):** ```json [ { "document_id": 1234567890, "document_name": "example", "number_of_sentences": 10, "sentence_number": 1, "text": "First sentence." }, ... ] ``` **Example with curl:** ```bash curl -X POST http://localhost:3001/api/sentenceit \ -H "Content-Type: application/json" \ -d '{ "documents": [ { "document_text": "First sentence. Second sentence. Third sentence." } ], "options": {} }' ``` **Note:** The `embedCallback` parameter is only available when using the library directly via JavaScript/TypeScript, not via the REST API. --- ### GET /api/health Health check endpoint for monitoring and load balancers. **URL:** `/api/health` **Method:** `GET` **Auth Required:** No **Success Response (200):** ```json { "status": "ok", "version": "2.4.4", "timestamp": "2025-01-15T12:00:00.000Z" } ``` **Example with curl:** ```bash curl http://localhost:3001/api/health ``` --- ### GET /api/version Get API version information. **URL:** `/api/version` **Method:** `GET` **Auth Required:** No **Success Response (200):** ```json { "version": "2.4.4", "package": "semantic-chunking" } ``` **Example with curl:** ```bash curl http://localhost:3001/api/version ``` --- ## Request Format All POST endpoints expect a JSON body with the following structure: ```json { "documents": [ { "document_name": "optional-name", "document_text": "required-text-content" } ], "options": { // Optional configuration parameters } } ``` **Required Fields:** - `documents`: Array of document objects - `documents[].document_text`: The text content to process **Optional Fields:** - `documents[].document_name`: Name/identifier for the document - `options`: Configuration object (see individual endpoints for available options) --- ## Response Format ### Success Responses All successful requests return: - **Status Code:** 200 OK - **Content-Type:** application/json - **Body:** Array of chunk/sentence objects ### Chunk Object Structure ```json { "document_id": 1234567890, "document_name": "name", "number_of_chunks": 5, "chunk_number": 1, "model_name": "Xenova/all-MiniLM-L6-v2", "dtype": "q8", "text": "Chunk text...", "token_length": 123, // if returnTokenLength: true "embedding": [0.1, 0.2, ...] // if returnEmbedding: true } ``` --- ## Error Handling ### Error Response Format ```json { "error": "Error Type", "message": "Detailed error message", "stack": "Stack trace (only in development mode)" } ``` ### Common Status Codes | Code | Meaning | Cause | |------|---------|-------| | 400 | Bad Request | Invalid request body or missing required fields | | 401 | Unauthorized | Missing or invalid Bearer token (when auth is enabled) | | 500 | Internal Server Error | Processing error (e.g., model loading failed) | ### Example Error Responses **400 Bad Request:** ```json { "error": "Bad Request", "message": "Request body must include a \"documents\" array" } ``` **401 Unauthorized:** ```json { "error": "Unauthorized", "message": "Invalid or missing authorization token" } ``` **500 Internal Server Error:** ```json { "error": "Internal Server Error", "message": "Error loading model: model not found" } ``` --- ## Docker Deployment ### Quick Start **API Only (default):** ```bash docker-compose up -d ``` The API will be available at `http://localhost:3001` **With Web UI:** ```bash docker-compose --profile webui up -d ``` - API: `http://localhost:3001` - Web UI: `http://localhost:3000` ### Docker Run ```bash # Build the image docker build -t semantic-chunking . # Run API server (default) docker run -d \ -p 3001:3001 \ -v ./models:/app/models \ -e API_AUTH_TOKEN=your-token \ --name semantic-chunking-api \ semantic-chunking # Run Web UI (override command) docker run -d \ -p 3000:3000 \ -v ./models:/app/models \ --name semantic-chunking-webui \ semantic-chunking node webui/server.js ``` ### Environment Variables ```bash PORT=3001 # Server port (default: 3001) NODE_ENV=production # Environment mode API_AUTH_TOKEN=secret-token # Optional Bearer token auth ``` ### Volume Mounts Mount the models directory to persist downloaded ONNX models: ```bash -v ./models:/app/models ``` Models are downloaded automatically on first use and range from 23MB to 548MB depending on the model selected. --- ## Production Setup ### HTTPS / TLS Termination The API server runs on HTTP only. For production, use a reverse proxy to handle HTTPS: #### nginx Example ```nginx upstream semantic_chunking { server localhost:3001; } server { listen 443 ssl http2; server_name api.yourdomain.com; ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem; location / { proxy_pass http://semantic_chunking; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Increase timeout for large documents proxy_read_timeout 300s; proxy_connect_timeout 300s; } } ``` #### Traefik Example (docker-compose.yml) ```yaml services: semantic-chunking-api: # ... existing config ... labels: - "traefik.enable=true" - "traefik.http.routers.semantic-api.rule=Host(`api.yourdomain.com`)" - "traefik.http.routers.semantic-api.entrypoints=websecure" - "traefik.http.routers.semantic-api.tls.certresolver=letsencrypt" - "traefik.http.services.semantic-api.loadbalancer.server.port=3001" ``` ### Security Best Practices 1. **Always use HTTPS in production** (via reverse proxy) 2. **Enable Bearer token authentication** by setting `API_AUTH_TOKEN` 3. **Use strong, randomly generated tokens** (min 32 characters) 4. **Rotate tokens regularly** 5. **Use environment variables or secrets management** (never hardcode tokens) 6. **Monitor logs for unauthorized access attempts** 7. **Keep Docker images updated** (`docker-compose pull && docker-compose up -d`) ### Monitoring Use the health check endpoint for monitoring: ```bash # Direct check curl http://localhost:3001/api/health # Docker health check (already configured in docker-compose.yml) healthcheck: test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:3001/api/health"] interval: 30s timeout: 10s retries: 3 ``` ### Rate Limiting Consider adding rate limiting at the reverse proxy level: **nginx:** ```nginx limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s; location / { limit_req zone=api_limit burst=20 nodelay; # ... proxy settings ... } ``` **Traefik:** ```yaml labels: - "traefik.http.middlewares.api-ratelimit.ratelimit.average=10" - "traefik.http.middlewares.api-ratelimit.ratelimit.burst=20" - "traefik.http.routers.semantic-api.middlewares=api-ratelimit" ``` --- ## Support - **GitHub:** [https://github.com/jparkerweb/semantic-chunking](https://github.com/jparkerweb/semantic-chunking) - **Issues:** [https://github.com/jparkerweb/semantic-chunking/issues](https://github.com/jparkerweb/semantic-chunking/issues) - **NPM:** [https://www.npmjs.com/package/semantic-chunking](https://www.npmjs.com/package/semantic-chunking) --- **Maintained by [eQuill Labs](https://www.equilllabs.com)**