openapi: 3.0.3 info: title: Nanonets OCR API description: | Synchronous and asynchronous OCR prediction and training endpoints for Nanonets custom OCR models. Upload files by local path (multipart) or by public URL, retrieve predictions for a file, a page, or a batch within a time window, and train or retrain a model. Sync endpoints are optimized for files of 3 pages or fewer; async endpoints handle larger documents. Authentication uses HTTP Basic with the API key as the username and an empty password. All traffic must be HTTPS. version: 2.0.0 contact: name: Nanonets url: https://nanonets.com email: support@nanonets.com servers: - url: https://app.nanonets.com/api/v2 security: - BasicAuth: [] tags: - name: OCR Predict description: Predict on uploaded files or file URLs against a Nanonets OCR model. - name: OCR Retrieve description: Retrieve prediction results for a file, page, or batch. - name: OCR Train description: Upload training images and train or retrain an OCR model. paths: /OCR/Model/{model_id}/LabelFile/: post: tags: [OCR Predict] summary: Prediction For Image File description: | Upload one or more files from the local filesystem to a Nanonets OCR model in sync mode. Optimized for files of 3 pages or fewer. operationId: ocrModelLabelFileByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: multipart/form-data: schema: type: object required: [file] properties: file: type: array items: type: string format: binary request_metadata: type: string description: Free-form identifier echoed in the response. responses: '200': description: Prediction result. content: application/json: schema: $ref: '#/components/schemas/PredictionResponse' /OCR/Model/{model_id}/LabelFile/Async/: post: tags: [OCR Predict] summary: Async Prediction For Image File description: | Upload one or more files from the local filesystem to a Nanonets OCR model in async mode. Recommended for files larger than 3 pages. operationId: ocrModelLabelFileAsyncByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: multipart/form-data: schema: type: object required: [file] properties: file: type: array items: type: string format: binary async: type: boolean default: true request_metadata: type: string responses: '200': description: Async prediction accepted with `request_file_id` for polling. content: application/json: schema: $ref: '#/components/schemas/AsyncAcceptedResponse' /OCR/Model/{model_id}/LabelUrls/: post: tags: [OCR Predict] summary: Prediction For Image URL description: Send one or more publicly accessible URLs to a Nanonets OCR model in sync mode. operationId: ocrModelLabelUrlsByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: application/x-www-form-urlencoded: schema: type: object required: [urls] properties: urls: type: array items: type: string format: uri request_metadata: type: string responses: '200': description: Prediction result. content: application/json: schema: $ref: '#/components/schemas/PredictionResponse' /OCR/Model/{model_id}/LabelUrls/Async/: post: tags: [OCR Predict] summary: Async Prediction For Image URL description: Send one or more publicly accessible URLs to a Nanonets OCR model in async mode. operationId: ocrModelLabelUrlsAsyncByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: application/x-www-form-urlencoded: schema: type: object required: [urls] properties: urls: type: array items: type: string format: uri async: type: boolean default: true request_metadata: type: string responses: '200': description: Async prediction accepted. content: application/json: schema: $ref: '#/components/schemas/AsyncAcceptedResponse' /Inferences/Model/{model_id}/InferenceRequest/{request_file_id}: get: tags: [OCR Retrieve] summary: Get Prediction File By File ID description: | Retrieve prediction results for a single file by `model_id` and `request_file_id`. Includes the model prediction, modifications, and the final processed outcome plus signed URLs to download the source file. operationId: ocrModelGetPredictionFileByFileId parameters: - $ref: '#/components/parameters/ModelId' - name: request_file_id in: path required: true schema: { type: string } responses: '200': description: File-level prediction result. content: application/json: schema: $ref: '#/components/schemas/ModeratedFileResponse' /Inferences/Model/{model_id}/InferenceRequest/{request_file_id}/page/{page_id}: get: tags: [OCR Retrieve] summary: Get Prediction File By Page ID description: Retrieve prediction results for a specific page by unique page id. operationId: ocrModelGetPredictionFileByPageId parameters: - $ref: '#/components/parameters/ModelId' - name: request_file_id in: path required: true schema: { type: string } - name: page_id in: path required: true schema: { type: string } responses: '200': description: Page-level prediction result. content: application/json: schema: $ref: '#/components/schemas/ModeratedFileResponse' /Inferences/Model/{model_id}/InferenceRequest/: get: tags: [OCR Retrieve] summary: Get All Prediction Files description: | Get prediction results for all files uploaded to a model within a specified timeframe. Pages are bucketed into `moderated_images` (approved) and `unmoderated_images` (rejected or not yet approved). operationId: ocrModelListPredictionFiles parameters: - $ref: '#/components/parameters/ModelId' - name: start_day_interval in: query required: true schema: { type: integer } description: Number of days back from the current batch day (days since epoch). - name: current_batch_day in: query required: true schema: { type: integer } description: Most recent day-since-epoch boundary for the window. responses: '200': description: Time-windowed batch of predictions. content: application/json: schema: $ref: '#/components/schemas/ModeratedFileResponse' /OCR/Model/{model_id}/UploadFile/: post: tags: [OCR Train] summary: Upload Training Images By File description: Upload locally-stored training images for a model. operationId: ocrModelUploadFileByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: multipart/form-data: schema: type: object required: [file, data] properties: file: type: array items: type: string format: binary data: type: string description: JSON describing the annotation for each uploaded file. responses: '200': description: Upload acknowledged with totals per category. /OCR/Model/{model_id}/UploadUrls/: post: tags: [OCR Train] summary: Upload Training Images By URL description: Upload training images for a model via publicly accessible URLs. operationId: ocrModelUploadUrlsByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' requestBody: required: true content: application/json: schema: type: object required: [urls] properties: urls: type: array items: { type: string, format: uri } data: type: string responses: '200': description: Upload acknowledged. /OCR/Model/{model_id}/Train/: post: tags: [OCR Train] summary: Train Model description: Train or retrain a Nanonets OCR model after training data has been uploaded. operationId: ocrModelTrainByModelIdPost parameters: - $ref: '#/components/parameters/ModelId' responses: '200': description: Training job queued or running. components: securitySchemes: BasicAuth: type: http scheme: basic description: | HTTP Basic Auth where the API key is sent as the username and the password is left empty. Generate keys at https://app.nanonets.com/#keys. parameters: ModelId: name: model_id in: path required: true schema: { type: string } description: Unique identifier for the Nanonets model. schemas: PredictionResponse: type: object properties: message: { type: string, description: Overall success status. } result: type: array items: { $ref: '#/components/schemas/PredictionPage' } signed_urls: type: object description: Signed URLs to original and processed file artifacts. PredictionPage: type: object properties: message: { type: string } input: { type: string, description: Uploaded filename. } prediction: type: array items: { $ref: '#/components/schemas/Prediction' } page: { type: integer } request_file_id: { type: string } id: { type: string } request_metadata: { type: string } processing_type: type: string enum: ['', async] size: type: object properties: width: { type: number } height: { type: number } Prediction: type: object properties: id: { type: string } label: { type: string } xmin: { type: number } ymin: { type: number } xmax: { type: number } ymax: { type: number } score: { type: number, minimum: 0, maximum: 1 } ocr_text: { type: string } type: type: string enum: [field, table] status: { type: string } validation_status: type: string enum: [failed] validation_message: { type: string } page_no: { type: integer } label_id: { type: string } cells: type: array items: { $ref: '#/components/schemas/TableCell' } TableCell: type: object properties: id: { type: string } row: { type: integer } col: { type: integer } label: { type: string } xmin: { type: number } ymin: { type: number } xmax: { type: number } ymax: { type: number } score: { type: number } text: { type: string } verification_status: { type: string } status: { type: string } failed_validation: { type: string } label_id: { type: string } AsyncAcceptedResponse: type: object properties: message: { type: string } result: type: array items: type: object properties: message: { type: string } request_file_id: { type: string } filepath: { type: string } status: { type: string } ModeratedFileResponse: type: object properties: moderated_images_count: { type: integer } unmoderated_images_count: { type: integer } moderated_images: type: array items: { $ref: '#/components/schemas/ModeratedPage' } unmoderated_images: type: array items: { $ref: '#/components/schemas/ModeratedPage' } signed_urls: { type: object } ModeratedPage: type: object properties: model_id: { type: string } day_since_epoch: { type: integer } is_moderated: { type: boolean } id: { type: string } url: { type: string, format: uri } predicted_boxes: type: array items: { $ref: '#/components/schemas/Prediction' }