---
name: datarobot-model-explainability
description: >
  Tools and guidance for model explainability, prediction explanations, feature impact analysis,
  SHAP values, SHAP distributions, anomaly assessment, and model diagnostics. Use when analyzing
  model explanations, feature impact, SHAP values, SHAP distributions, anomaly assessment, or
  diagnosing model behavior.
---

# DataRobot Model Explainability Skill

This skill covers SHAP insights, XEMP prediction explanations, anomaly explanations, and model diagnostics.

> **SDK version**: Use `datarobot>=3.6.0` for the full API set in this skill (`ShapDistributions`
> was added in 3.6; `ShapMatrix`, `ShapImpact`, and `ShapPreview` are available in
> `datarobot>=3.4.0`). Use `from datarobot.insights import ShapMatrix, ...` with
> `entity_id=model_id` — not legacy `datarobot.models.ShapMatrix` (`project_id` / `dataset_id`).
> `ShapMatrix`, `ShapImpact`, `ShapPreview`, and `ShapDistributions` are the canonical SHAP API.
> The older `dr.PredictionExplanations` (XEMP-based) remains available but is the secondary path.

---

## Quick Start

| Goal | API to use | Prerequisites |
|------|-----------|---------------|
| SHAP values for all features, all rows | `ShapMatrix.create(entity_id=model_id)` | None - universal SHAP |
| Per-row top-feature explanations | `ShapPreview.create(entity_id=model_id)` | None |
| Aggregated feature importance via SHAP | `ShapImpact.create(entity_id=model_id)` | None |
| SHAP value distributions across features | `ShapDistributions.create(entity_id=model_id)` | None |
| SHAP for a filtered segment | `dr.DataSlice.create(...)` + `ShapMatrix.create(..., data_slice_id=...)` | Data slice definition |
| XEMP-based prediction explanations | `dr.PredictionExplanations.create(...)` | Feature Impact; PE initialization; dataset uploaded |
| Anomaly explanations (time series) | `AnomalyAssessmentRecord.compute(project_id, model_id, ...)` | Anomaly model |
| ROC / lift / confusion (insights) | `RocCurve.create(...)` / `LiftChart.create(...)` / `ConfusionMatrix.create(...)` | Validation data |
| ROC / lift / confusion (Model helpers) | `model.get_roc_curve()` / `model.get_lift_chart()` / `model.get_confusion_chart()` | Validation data |

**Universal SHAP is the preferred path** - no dataset pre-upload or Feature Impact step required.

## When to use this skill

Use this skill when you need to explain leaderboard model behavior, compute SHAP insights, use
XEMP prediction explanations, analyze anomaly explanations, or retrieve model diagnostics.

## Key capabilities

### 1. SHAP insights

- Compute `ShapMatrix`, `ShapPreview`, `ShapImpact`, and `ShapDistributions`
- Filter insights with `dr.DataSlice`

### 2. XEMP and anomaly explanations

- Use XEMP `dr.PredictionExplanations` when specifically required
- Retrieve time series anomaly assessment records and explanations

### 3. Diagnostics

- Retrieve ROC, lift, and confusion insights
- Use Model helpers for ROC, lift, confusion, and feature effects

## Setup

```python
import os
import datarobot as dr
from datarobot.insights import ShapMatrix, ShapImpact, ShapPreview, ShapDistributions

dr.Client(
    token=os.environ["DATAROBOT_API_TOKEN"],
    endpoint=os.environ.get("DATAROBOT_ENDPOINT", "https://app.datarobot.com/api/v2"),
)
```

---

## Core API: `datarobot.insights`

```python
import pandas as pd
from datarobot.insights import ShapMatrix, ShapImpact, ShapPreview, ShapDistributions

model_id = "YOUR_MODEL_ID"

matrix = ShapMatrix.create(entity_id=model_id)
df = pd.DataFrame(matrix.matrix, columns=matrix.columns)

impact = ShapImpact.create(entity_id=model_id)
preview = ShapPreview.create(entity_id=model_id)
distributions = ShapDistributions.create(entity_id=model_id)
```

Use `ShapMatrix` for full row-by-feature SHAP values, `ShapPreview` for compact top-driver rows,
`ShapImpact` for aggregated SHAP importance, and `ShapDistributions` for per-feature SHAP
distributions. Use `source="externalTestSet"` plus `external_dataset_id` for external datasets.
See `references/shap_api_reference.md` for parameters, exports, and limitations.

---

## Secondary path: XEMP Prediction Explanations

Use `dr.PredictionExplanations` when XEMP explanations are specifically required (e.g., certain
regulatory contexts, or when SHAP is unavailable for the model type).

**Prerequisites** (all required before calling `.create()`):
1. Feature Impact must be computed: `model.request_feature_impact()` and wait
2. Prediction explanations initialized: `dr.PredictionExplanationsInitialization.create(...)`
3. Scoring dataset uploaded to the AI Catalog

```python
import datarobot as dr

model = dr.Model.get(project=project_id, model_id=model_id)
model.request_feature_impact().wait_for_completion()
dr.PredictionExplanationsInitialization.create(project_id=project_id, model_id=model_id)

dataset = dr.Dataset.upload("./data/scoring_data.csv")
pe_job = dr.PredictionExplanations.create(
    project_id=project_id,
    model_id=model_id,
    dataset_id=dataset.id,
    max_explanations=5,      # top N features per row, up to 50
    threshold_high=0.5,      # only explain rows with prediction >= threshold
    threshold_low=0.1,       # only explain rows with prediction <= threshold
)

pe_obj = pe_job.get_result_when_complete()
```

Use `pe_obj.get_rows()`, `pe_obj.get_all_as_dataframe()`, or `pe_obj.download_to_csv(...)` to
retrieve results. For parameters, multiclass modes, and exposure-adjusted predictions, see
`references/xemp_pe_reference.md`.

## Data slices for filtered insights

Use `dr.DataSlice` when the user asks to explain model behavior for a segment, such as a
region, product line, target class, or high-risk cohort. Pass the resulting `data_slice_id` into
the `datarobot.insights` SHAP APIs.

```python
import datarobot as dr
from datarobot.insights import ShapMatrix

data_slice = dr.DataSlice.create(
    name="high_income_customers",
    filters=[{"operand": "income", "operator": ">", "values": 100000}],
    project=project_id,
)

shap_matrix = ShapMatrix.create(
    entity_id=model_id,
    source="validation",
    data_slice_id=data_slice.id,
)
```

---

## Anomaly assessment (time series models)

For time series anomaly detection models, use `AnomalyAssessmentRecord`.

```python
from datarobot.models.anomaly_assessment import AnomalyAssessmentRecord

record = AnomalyAssessmentRecord.compute(
    project_id=project_id,
    model_id=model_id,
    backtest=0,           # backtest index (int) or "holdout"
    source="validation",  # "training" or "validation" only
    series_id=None,       # required for multiseries projects
)

records = AnomalyAssessmentRecord.list(project_id=project_id, model_id=model_id)
latest = record.get_latest_explanations()

regions = record.get_predictions_preview().find_anomalous_regions()
explanations = record.get_explanations_data_in_regions(regions=regions)

ranged = record.get_explanations(
    start_date="2024-01-01T00:00:00.000000Z",
    end_date="2024-06-01T00:00:00.000000Z",
)
```

---

## Model diagnostics

Use the same `entity_id=model_id` pattern as SHAP insights. `FeatureEffects` / partial dependence
is still retrieved through Model helpers (not in `datarobot.insights`).

### Insights diagnostics (preferred — matches SHAP API)

```python
from datarobot.insights import RocCurve, LiftChart, ConfusionMatrix

roc = RocCurve.create(entity_id=model_id)
lift = LiftChart.create(entity_id=model_id)
confusion = ConfusionMatrix.create(entity_id=model_id)
```

### Model helpers (alternative)

```python
model = dr.Model.get(project=project_id, model_id=model_id)

roc = model.get_roc_curve(source="validation")
lift = model.get_lift_chart(source="validation")
confusion = model.get_confusion_chart(source="validation")

# Feature Impact (non-SHAP) and Feature Effects (partial dependence for top features)
fi = model.get_feature_impact()
feature_effects = model.get_feature_effect(source="validation")
```

---

## Interpreting SHAP values

- **Positive value**: feature pushes prediction higher than baseline
- **Negative value**: feature pushes prediction lower than baseline
- **Magnitude**: size of influence; larger absolute value = stronger effect
- **Sum**: all SHAP values for a row sum to `prediction - base_value` in the link-function space
- **`base_value`**: the model's mean prediction (the "no information" baseline)

Example: if `base_value = 0.35` and a row's prediction is `0.72`, the row's SHAP values sum to
`0.37` when `link_function = "identity"`. A feature with SHAP `+0.20` contributed 20 units in
that same link-function space above baseline.

When `link_function = "logit"`, SHAP values are in log-odds space. Add feature contributions to
`base_value` in log-odds space, then use inverse-logit (`scipy.special.expit`) on the resulting
total to convert it to a probability. Do not apply `expit` to individual SHAP values as if they
were probability deltas.

---

## Decision guide

```
Task: explain predictions
    |
    - Need all features + all rows?     -> ShapMatrix.create(entity_id=model_id)
    - Need top-N features per row?      -> ShapPreview.create(entity_id=model_id)
    - Need aggregated importance?       -> ShapImpact.compute(entity_id=model_id)
    - Need feature SHAP distributions?  -> ShapDistributions.create(entity_id=model_id)
    - Need a segment/cohort only?       -> dr.DataSlice + data_slice_id
    - XEMP required (regulatory/type)?  -> dr.PredictionExplanations.create(...)
    - Time series / anomaly model?      -> AnomalyAssessmentRecord.compute(project_id, model_id, ...)
```

---

## Common errors

| Error | Cause | Fix |
|-------|-------|-----|
| `SHAP not available for this model` | Unsupported model type, or anomaly-detection model with >1000 features | Check model support; use XEMP PE if SHAP is unavailable |
| `Feature Impact not computed` | PredictionExplanations prerequisite missing | Run `model.request_feature_impact()` and wait |
| Missing `PredictionExplanationsInitialization` | PE not initialized | Call `PredictionExplanationsInitialization.create()` |
| `source='holdout'` fails | Holdout not unlocked | Unlock holdout in project settings first |
| Empty `previews` | No rows in partition | Check partition contains data |

---

## Reference files

- `references/shap_api_reference.md` - full parameter signatures for ShapMatrix, ShapImpact, ShapPreview, ShapDistributions
- `references/xemp_pe_reference.md` - PredictionExplanations and PredictionExplanationsInitialization parameter reference
- `scripts/compute_shap_matrix.py` - compute and export ShapMatrix to CSV or DataFrame

## Resources

- [datarobot.insights API reference](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/insights.html)
- [Prediction Explanations user guide](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/reference/modeling/insights/prediction_explanations.html)