--- name: train-model description: Train or retrain a credit scoring model. Use when the user asks to train, retrain, fine-tune, or calibrate a model, or when new training data is available. --- # Train model Train or retrain an OpenCredit scoring model with full MLflow tracking and post-training validation. ## Workflow ### 1. Validate prerequisites - Confirm training data exists (check `data/` or feature store) - Confirm model config YAML exists in `configs/models/` - Confirm MLflow is accessible (`uv run mlflow ui` or docker service) ### 2. Run training ```bash uv run python -m opencredit.models.train \ --config configs/models/.yaml \ --experiment-name \ --tags market= version= ``` ### 3. Evaluate After training completes, immediately run evaluation: ```bash uv run python -m opencredit.models.evaluate \ --model-id \ --test-data data/test.parquet ``` Check these metrics meet thresholds: - AUC-ROC ≥ 0.72 - Gini ≥ 0.44 - KS statistic ≥ 0.30 - Calibration: Brier score ≤ 0.20 ### 4. Bias audit (MANDATORY before promotion) ```bash uv run python -m opencredit.compliance.bias_audit \ --model-id \ --attributes gender age_group region ``` Fail criteria: disparate impact ratio outside 0.8-1.25 on ANY group. ### 5. Generate model card ```bash uv run python -m opencredit.compliance.docs_generator \ --model-id \ --output docs/compliance/ ``` ### 6. Register in MLflow Only if evaluation AND bias audit pass: ```bash uv run python -m opencredit.models.register \ --model-id \ --stage production ``` ## Important - NEVER skip the bias audit step, even for quick experiments. - Log ALL hyperparameters — no magic numbers in training scripts. - If training on new market data, create a new experiment in MLflow, don't reuse existing ones. - Save the SHAP background dataset alongside the model artifact.