vocabulary: - term: parse definition: > Convert a document (PDF, Word, spreadsheet, image, etc.) into structured JSON output preserving layout, tables, figures, and text with high fidelity. related: - chunk - ocr - layout - term: extract definition: > Pull schema-defined fields from a document using a provided JSON Schema, returning structured key-value data matched against the document contents. related: - schema - field - term: split definition: > Divide a multi-document file into labeled sections or partitions based on classification rules, separating logically distinct documents within a single file. related: - partition - classify - term: classify definition: > Determine the document type from a set of user-defined categories, returning a category label and confidence scores for each candidate category. related: - category - confidence - term: edit definition: > Modify fillable forms and PDFs using natural language instructions, enabling programmatic form-filling and document editing without manual intervention. related: - widget - form - term: pipeline definition: > A reusable, composable sequence of Reducto processing steps (parse, extract, split, classify) configured once and applied to multiple documents at scale. related: - workflow - compose - term: chunk definition: > A contiguous segment of extracted document content with associated metadata such as page range, section headers, and visual/non-visual classification. related: - parse - content - term: job definition: > An asynchronous processing unit identified by a job_id, representing the execution of a parse, extract, split, classify, or edit operation that can be polled or receive a webhook callback upon completion. related: - async - webhook - term: upload definition: > Pre-upload a file to Reducto's storage to obtain a document URL for use in subsequent API calls, avoiding repeated file transfers. related: - document_url - storage - term: ocr definition: > Optical Character Recognition — the process of converting scanned or image-based document content into machine-readable text, used by Reducto's layout-aware parsing engine. related: - parse - scan - term: webhook definition: > An HTTP callback delivered by Reducto to a user-specified URL when an asynchronous job completes, carrying the job result payload. related: - job - async - term: document_url definition: > A publicly accessible URL pointing to the document to be processed, or a Reducto-hosted URL obtained via the /upload endpoint. related: - upload - parse - term: bounding_box definition: > Spatial coordinates (top, left, width, height) normalized to [0, 1] identifying the position of a text block, table, or figure on a page. related: - layout - parse_block - term: parse_block definition: > A single detected element within a parsed document such as a text paragraph, table, figure, or heading, with its content, type, bounding box, and page number. related: - chunk - bounding_box - term: settings definition: > Global configuration options for a Reducto operation, controlling OCR behavior, table formats, figure extraction, chunking strategy, and performance vs. quality tradeoffs. related: - parse - extract - term: mcp_server definition: > Model Context Protocol server provided by Reducto, enabling AI agents to call Reducto document processing endpoints directly within agent tool loops. related: - agent - integration - term: hybrid_vpc definition: > Reducto's deployment option allowing document processing to occur within a customer's private cloud environment while maintaining API compatibility, used in regulated industries (healthcare, finance, legal). related: - security - compliance