# REST Service (v0.6.2) OpenMed `v0.6.2` hardens the FastAPI service introduced in `v0.6.1` with shared model reuse, explicit model unloading, and idle model cleanup: - `GET /health` - `GET /models/loaded` - `POST /models/unload` - `POST /analyze` - `POST /pii/extract` - `POST /pii/deidentify` This release adds stricter request validation, shared model/pipeline reuse, optional startup preload, model keep-alive controls, and a unified non-2xx error envelope. ## Run Locally Install the service dependencies: ```bash uv pip install -e ".[hf,service]" ``` Start the API server: ```bash uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080 ``` Optional profile selection (defaults to `prod`): ```bash OPENMED_PROFILE=dev uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080 ``` Optional shared model preload at startup: ```bash OPENMED_SERVICE_PRELOAD_MODELS=disease_detection_superclinical,OpenMed/OpenMed-PII-SuperClinical-Small-44M-v1 \ uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080 ``` `OPENMED_SERVICE_PRELOAD_MODELS` is a comma-separated list of registry aliases or full Hugging Face ids. Empty entries are ignored and duplicates are removed. Optional default model keep-alive: ```bash OPENMED_SERVICE_KEEP_ALIVE=10m uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080 ``` `OPENMED_SERVICE_KEEP_ALIVE` accepts seconds as a number or duration strings such as `30s`, `5m`, `1h30m`, or `1d`. Omit it for indefinite caching, use `0` for unload-after-request behavior, or use request-level `keep_alive` to override the default for one call. ## Reliability Changes - Requests now run against one shared service runtime per process, including a shared `OpenMedConfig` and shared `ModelLoader`. - Blocking inference is executed off the event loop and guarded by the active profile timeout (`prod=300s`, `test=60s`, etc.). - Loaded model pipelines can be released manually with `POST /models/unload`. - Inference requests accept `keep_alive` to schedule model unloading after the model becomes idle. - Non-2xx responses use one JSON envelope across validation, bad-request, timeout, and internal errors. - `/pii/deidentify` still accepts the legacy `shift_dates` boolean, but it is now a deprecated alias for `method="shift_dates"`. ## Endpoints ### `GET /health` Health response: ```json { "status": "ok", "service": "openmed-rest", "version": "0.6.2", "profile": "prod" } ``` ### `GET /models/loaded` Returns currently cached model resources and idle-unload status: ```json { "default_keep_alive_seconds": 600.0, "models": { "OpenMed/OpenMed-NER-DiseaseDetect-SuperClinical-434M": { "models": 0, "tokenizers": 0, "pipelines": 1, "active_requests": 0, "keep_alive_seconds_remaining": 287.4 } } } ``` ### `POST /models/unload` Unload one inactive model: ```json { "model_name": "disease_detection_superclinical" } ``` Unload all inactive models: ```json { "all": true } ``` If a model has active requests, the service leaves it loaded and reports the active request count. ### `POST /analyze` Request body: ```json { "text": "Patient started imatinib for CML.", "model_name": "disease_detection_superclinical", "confidence_threshold": 0.0, "group_entities": false, "aggregation_strategy": "simple", "keep_alive": "5m" } ``` Returns the same shape as OpenMed `analyze_text(..., output_format="dict")`. ### `POST /pii/extract` Request body: ```json { "text": "Paciente: Maria Garcia, DNI: 12345678Z", "lang": "es", "use_smart_merging": true, "keep_alive": "10m" } ``` Returns the same shape as `extract_pii(...).to_dict()`. ### `POST /pii/deidentify` Request body: ```json { "text": "Paciente: Maria Garcia, DNI: 12345678Z", "method": "mask", "lang": "es", "keep_mapping": true, "keep_alive": "10m" } ``` Date shifting: ```json { "text": "Paciente: Maria Garcia, fecha: 15/01/2020", "method": "shift_dates", "date_shift_days": 30, "lang": "es" } ``` The deprecated `shift_dates: true` boolean is still accepted as an alias for `method: "shift_dates"`. Returns `deidentify(...).to_dict()`. When `keep_mapping=true` and mapping data exists, a `mapping` field is included. ## Error Envelope All non-2xx responses use this shape: ```json { "error": { "code": "validation_error|bad_request|timeout|internal_error", "message": "human-readable summary", "details": null } } ``` Validation example: ```json { "error": { "code": "validation_error", "message": "Request validation failed", "details": [ { "field": "body.text", "message": "Text must not be blank", "type": "value_error" } ] } } ``` Timeout example: ```json { "error": { "code": "timeout", "message": "Request exceeded configured timeout of 300 seconds", "details": { "timeout_seconds": 300 } } } ``` ## Docker Build: ```bash docker build -t openmed:0.6.2 . ``` Run: ```bash docker run --rm -p 8080:8080 \ -e OPENMED_PROFILE=prod \ -e OPENMED_SERVICE_KEEP_ALIVE=10m \ -e OPENMED_SERVICE_PRELOAD_MODELS=disease_detection_superclinical \ openmed:0.6.2 ``` Smoke check: ```bash curl http://127.0.0.1:8080/health ```