# SMILE — Validation Metrics

The `smile.validation.metric` package provides scalar evaluation metrics for
**classification**, **probabilistic classification**, **regression**, and
**clustering** tasks. Every metric is a stateless, serializable object that
implements one of four functional interfaces. The `static of(...)` factory
methods let you compute a score in one line without instantiating a class.

---

## 1. Package Overview

### Functional interfaces

| Interface | Method signature | Who implements it |
|---|---|---|
| `ClassificationMetric` | `double score(int[] truth, int[] prediction)` | `Accuracy`, `Error`, `Precision`, `Recall`, `FScore`, `FDR`, `Fallout`, `Specificity`, `Sensitivity`, `MatthewsCorrelation` |
| `ProbabilisticClassificationMetric` | `double score(int[] truth, double[] probability)` | `AUC`, `LogLoss` |
| `RegressionMetric` | `double score(double[] truth, double[] prediction)` | `MSE`, `RMSE`, `RSS`, `MAD`, `R2` |
| `ClusteringMetric` | `double score(int[] truth, int[] cluster)` | `RandIndex`, `AdjustedRandIndex`, `MutualInformation`, `NormalizedMutualInformation`, `AdjustedMutualInformation` |

All four interfaces extend `java.util.function.ToDoubleBiFunction` so they can
be used directly as lambdas or method references wherever that type is expected.

### Common conventions

- **Label arrays are zero-indexed integers.** Binary metrics expect labels in
  `{0, 1}` (0 = negative, 1 = positive). Multi-class metrics accept any
  non-negative integer labels; `max(label) + 1` is used as the number of
  classes.
- **Arrays must have equal length.** All `of(truth, prediction)` methods throw
  `IllegalArgumentException` if sizes differ.
- **Higher is better** for most metrics; exceptions are `Error`, `MSE`, `RMSE`,
  `RSS`, `MAD`, `LogLoss`, and `CrossEntropy` where lower is better.
- **Singleton instances** (`Accuracy.instance`, `AUC.instance`, etc.) are
  provided for convenience when you need a reusable object reference.

---

## 2. Classification Metrics

Classification metrics compare an integer label array `truth` against a
predicted integer label array `prediction`.

### 2.1 Accuracy

```java
double acc = Accuracy.of(truth, prediction);
// or via instance
Accuracy accuracy = new Accuracy();
double acc = accuracy.score(truth, prediction);
```

**Formula:** `acc = (number of correct predictions) / n`

Accuracy is symmetric and works for any number of classes. It is the complement
of the error rate: `accuracy + errorRate == 1.0`.

```java
int[] truth      = {1, 0, 1, 0, 1, 0};
int[] prediction = {1, 0, 0, 0, 1, 1};
double acc = Accuracy.of(truth, prediction);   // (4 correct) / 6 ≈ 0.667
```

**Caveat:** Accuracy is misleading on imbalanced datasets. A classifier that
always predicts the majority class can achieve 99 % accuracy on a 99:1 dataset
while being completely useless for the minority class.

---

### 2.2 Error

```java
int errors = Error.of(truth, prediction);
```

Returns the **raw count** of mismatches (not a rate). Cast to `double` when
used via `ClassificationMetric.score()`.

```java
int n      = truth.length;
int errors = Error.of(truth, prediction);
double errorRate = (double) errors / n;
double accuracy  = Accuracy.of(truth, prediction);
// errorRate + accuracy == 1.0
```

---

### 2.3 Precision, Recall, F-score

These three metrics work in both **binary** and **multi-class** modes.

#### Binary mode

```java
// Both arrays must contain only 0 and 1.
double p = Precision.of(truth, prediction);
double r = Recall.of(truth, prediction);
double f1 = FScore.of(truth, prediction, 1.0, null);  // F₁ = harmonic mean of P and R
```

| Metric | Formula | Numerator | Denominator |
|---|---|---|---|
| Precision | TP / (TP + FP) | True positives | All predicted positives |
| Recall | TP / (TP + FN) | True positives | All actual positives |
| F₁ | 2PR / (P + R) | — | — |

When there are **no predicted positives** (Precision) or **no actual positives**
(Recall), the result is `NaN`. Handle this defensively:

```java
double p = Precision.of(truth, prediction);
if (Double.isNaN(p)) {
    // model made no positive predictions
}
```

#### Multi-class mode — `Averaging` strategy

Pass one of three `Averaging` enum values as the third argument:

| Strategy | Description |
|---|---|
| `Averaging.Macro` | Compute per-class metric, take unweighted mean. Treats all classes equally. |
| `Averaging.Micro` | Pool all TP/FP/FN globally. Equivalent to accuracy for Micro-Precision/Recall. |
| `Averaging.Weighted` | Compute per-class metric, weight by class support in `truth`. |

```java
import smile.validation.metric.Averaging;

double macroPrecision   = Precision.of(truth, pred, Averaging.Macro);
double microPrecision   = Precision.of(truth, pred, Averaging.Micro);
double weightedPrecision = Precision.of(truth, pred, Averaging.Weighted);

double macroRecall   = Recall.of(truth, pred, Averaging.Macro);
double macroF1       = FScore.of(truth, pred, 1.0, Averaging.Macro);
```

#### Generalized Fβ score

The `beta` parameter controls the trade-off between precision and recall:

```
Fβ = (1 + β²) · (P · R) / (β²·P + R)
```

- **β < 1**: weights precision more heavily (e.g., spam detection where false
  positives matter).
- **β = 1**: F₁, the harmonic mean of P and R (most common choice).
- **β > 1**: weights recall more heavily (e.g., disease screening where missing
  a case is costly).

```java
double f2 = FScore.of(truth, prediction, 2.0, null);  // binary, recall-weighted
FScore f05Instance = new FScore(0.5, Averaging.Macro); // reusable instance
double score = f05Instance.score(truth, prediction);
```

`beta` must be strictly positive; passing `0` or a negative value throws
`IllegalArgumentException("Non-positive beta: ...")`.

---

### 2.4 False Discovery Rate (FDR)

```java
double fdr = FDR.of(truth, prediction);
```

**Formula:** `FDR = FP / (TP + FP) = 1 − Precision`

Only applicable to binary labels `{0, 1}`. Returns `NaN` if no positive
predictions are made.

---

### 2.5 Fallout (False Positive Rate)

```java
double fpr = Fallout.of(truth, prediction);
```

**Formula:** `FPR = FP / (FP + TN) = FP / (number of true negatives)`

The *negatives* in this metric are samples where `truth[i] != 1` (i.e., any
non-positive label counts as negative, not just `truth == 0`).

Returns `NaN` if there are no negative samples in `truth`.

---

### 2.6 Specificity (True Negative Rate)

```java
double tnr = Specificity.of(truth, prediction);
```

**Formula:** `TNR = TN / (TN + FP) = TN / (number of samples where truth == 0)`

Specificity counts only samples where `truth[i] == 0` as negatives (stricter
than `Fallout`). Returns `NaN` if no negative samples exist.

`Specificity = 1 − Fallout` only when all non-positive labels are exactly `0`.

---

### 2.7 Sensitivity (True Positive Rate / Recall)

```java
double tpr = Sensitivity.of(truth, prediction);
```

**Formula:** `TPR = TP / (TP + FN)`

Binary only; identical to binary `Recall.of(truth, prediction)`. Returns `NaN`
if there are no positive samples.

---

### 2.8 Matthews Correlation Coefficient (MCC)

```java
double mcc = MatthewsCorrelation.of(truth, prediction);
```

**Formula:**

```
MCC = (TP·TN − FP·FN) / sqrt((TP+FP)(TP+FN)(TN+FP)(TN+FN))
```

MCC is widely considered the most informative single metric for binary
classification because it accounts for all four cells of the confusion matrix
and is robust to class imbalance.

- **MCC = +1**: perfect prediction.
- **MCC = 0**: no better than random.
- **MCC = −1**: total disagreement (inverted classifier).

The input labels must reduce to a **2×2** confusion matrix (exactly two
distinct classes). Returns `NaN` when the denominator is zero (e.g., all
predictions or all truths are the same class).

```java
int[] truth      = {1, 0, 1, 0, 1, 0, 1, 0};
int[] prediction = {1, 0, 1, 0, 0, 1, 1, 0};
double mcc = MatthewsCorrelation.of(truth, prediction);  // ≈ 0.5
```

---

### 2.9 Confusion Matrix

A `ConfusionMatrix` is not a scalar metric; it is a 2-D summary from which any
per-class breakdown can be derived.

```java
ConfusionMatrix cm = ConfusionMatrix.of(truth, prediction);
int[][] matrix = cm.matrix();
// matrix[t][p] = count of samples with true label t, predicted as p
System.out.println(cm);  // formatted table
```

The matrix dimension is `(max_label + 1) × (max_label + 1)` based on the
union of values in `truth` and `prediction`.

---

## 3. Probabilistic Classification Metrics

Probabilistic metrics require a **continuous score** (probability) in addition
to the integer ground truth.

### 3.1 AUC (Area Under the ROC Curve)

```java
double auc = AUC.of(truth, probability);
```

- `truth`: binary labels `{0, 1}`.
- `probability`: positive-class probability score (higher means more likely
  to be positive).

**Interpretation:** AUC equals the probability that a randomly chosen positive
sample is ranked higher than a randomly chosen negative sample.

| AUC value | Meaning |
|---|---|
| 1.0 | Perfect ranking — all positives rank above all negatives |
| 0.5 | Random classifier |
| 0.0 | Worst-case — all positives rank below all negatives |

**Algorithm:** Mann–Whitney U rank statistic with tie-averaging:

```
AUC = (sum_ranks_of_positives − pos*(pos+1)/2) / (pos * neg)
```

Ties in `probability` receive the average of their ranks.

```java
int[]    truth = {0, 0, 1, 1};
double[] prob  = {0.1, 0.4, 0.35, 0.8};
double auc = AUC.of(truth, prob);  // = 0.75
```

Returns `NaN` when `truth` contains only one class (no positive or no negative
samples).

---

### 3.2 Log Loss (Binary Cross-Entropy)

```java
double loss = LogLoss.of(truth, probability);
```

- `truth`: binary labels `{0, 1}`.
- `probability`: predicted probability for the positive class, in `(0, 1)`.

**Formula:**

```
LogLoss = −(1/n) Σᵢ [ truth[i]·log(pᵢ) + (1−truth[i])·log(1−pᵢ) ]
```

Computed in nats (natural logarithm). For `truth[i] == 0`, uses
`Math.log1p(−pᵢ)` for numerical accuracy at values near 0.

```java
int[]    truth = {0, 0, 1, 1, 0};
double[] prob  = {0.1, 0.4, 0.35, 0.8, 0.1};
double loss = LogLoss.of(truth, prob);  // ≈ 0.3989
```

- **Lower is better.** Perfect confidence yields 0; confident wrong predictions
  yield `+∞`.
- Probabilities must be in `(0, 1)`; values of exactly 0 or 1 at the wrong
  class produce infinite loss.
- Only binary labels `{0, 1}` are accepted; other values throw
  `IllegalArgumentException`.

---

### 3.3 Cross-Entropy (Multiclass Log Loss)

```java
double ce = CrossEntropy.of(truth, probability);
```

- `truth`: integer class index for each sample.
- `probability`: `double[n][k]` — row `i` contains the probability distribution
  over `k` classes for sample `i`.

**Formula:**

```
CE = −(1/n) Σᵢ log(probability[i][truth[i]])
```

`CrossEntropy` is an interface (not a class); call the `static of(...)` method
directly. It generalizes `LogLoss` to any number of classes; for `k = 2` the
values are identical up to the column selection convention.

```java
int[] truth = {0, 1, 2, 0};
double[][] prob = {
    {0.9, 0.05, 0.05},
    {0.05, 0.9, 0.05},
    {0.05, 0.05, 0.9},
    {0.9, 0.05, 0.05}
};
double ce = CrossEntropy.of(truth, prob);  // = -log(0.9) ≈ 0.1054
```

---

## 4. Regression Metrics

Regression metrics compare continuous truth and prediction arrays.

### 4.1 RSS — Residual Sum of Squares

```java
double rss = RSS.of(truth, prediction);
```

**Formula:** `RSS = Σ (yᵢ − ŷᵢ)²`

RSS is scale-dependent and grows with `n`. Use it when you need the raw
magnitude of the fit, not a normalized quantity.

---

### 4.2 MSE — Mean Squared Error

```java
double mse = MSE.of(truth, prediction);
```

**Formula:** `MSE = RSS / n = (1/n) Σ (yᵢ − ŷᵢ)²`

MSE penalizes large errors heavily (squaring effect) and is the optimization
objective for ordinary least squares. Scale is in squared units of `y`.

---

### 4.3 RMSE — Root Mean Squared Error

```java
double rmse = RMSE.of(truth, prediction);
```

**Formula:** `RMSE = √MSE`

Same units as `y`; directly interpretable as "typical error magnitude".
`RMSE ≥ MAD` always (by Jensen's inequality).

---

### 4.4 MAD — Mean Absolute Error

```java
double mae = MAD.of(truth, prediction);
```

**Formula:** `MAD = (1/n) Σ |yᵢ − ŷᵢ|`

Also called MAE in many frameworks. Less sensitive to outliers than MSE/RMSE
because it does not square the residuals. Both `MAD(truth, pred)` and
`MAD(pred, truth)` produce the same result.

---

### 4.5 R² — Coefficient of Determination

```java
double r2 = R2.of(truth, prediction);
```

**Formula:**

```
R² = 1 − RSS / TSS
TSS = Σ (yᵢ − ȳ)²   (total sum of squares)
```

| R² value | Interpretation |
|---|---|
| 1.0 | Perfect fit — predictions equal truth exactly |
| 0.0 | Model is no better than always predicting `mean(truth)` |
| < 0 | Model is worse than predicting the mean |

**Important:** When `truth` is constant (`TSS = 0`), R² is undefined and
returns `NaN` or infinity. Check before comparing:

```java
double r2 = R2.of(truth, prediction);
if (!Double.isFinite(r2)) {
    // constant truth — R² is not meaningful
}
```

---

### 4.6 Aggregated regression metrics

`smile.validation.RegressionMetrics` bundles all five into a single record:

```java
import smile.validation.RegressionMetrics;

RegressionMetrics m = RegressionMetrics.of(fitTime, scoreTime, truth, prediction);
System.out.println(m.RSS());
System.out.println(m.MSE());
System.out.println(m.RMSE());
System.out.println(m.MAD());
System.out.println(m.R2());
```

---

## 5. Clustering Metrics

Clustering metrics compare a **ground-truth labelling** against a proposed
**cluster assignment**. Labels in both arrays are **permutation-invariant**:
the metrics only care about which samples are grouped together, not which
integer label names each group.

All clustering metrics use a `ContingencyTable` internally, which remaps the
raw integer labels to contiguous indices automatically.

---

### 5.1 Rand Index

```java
double ri = RandIndex.of(truth, cluster);
```

The Rand index measures the fraction of pairs of samples that are either
**both in the same group** or **both in different groups** in both labellings.

**Formula:**

```
RI = (number of agreeing pairs) / C(n, 2)
   = (T − P/2 − Q/2 + C(n,2)) / C(n,2)
```

where:
- `T = Σ C(nᵢⱼ, 2)` — pairs that agree in both clusterings.
- `P = Σᵢ C(aᵢ, 2)` — pairs in the same ground-truth class.
- `Q = Σⱼ C(bⱼ, 2)` — pairs in the same predicted cluster.

Range: `[0, 1]`. A value of 1 means perfect agreement.

**Limitation:** The Rand index has a non-zero expected value for random
clusterings (especially when many clusters are used). Use **Adjusted Rand
Index** for chance-corrected evaluation.

---

### 5.2 Adjusted Rand Index (ARI)

```java
double ari = AdjustedRandIndex.of(truth, cluster);
```

Corrects the Rand index for the expected agreement under chance:

```
ARI = (RI − E[RI]) / (max(RI) − E[RI])
```

| ARI value | Interpretation |
|---|---|
| 1.0 | Perfect agreement |
| 0.0 | Agreement at the level of a random clustering |
| < 0 | Worse than random |

ARI is the standard clustering quality metric when ground-truth labels are
available.

```java
int[] clusters = {0, 0, 1, 1, 2, 2};
int[] alt      = {1, 1, 0, 0, 2, 2};  // same partition, different labels
double ari = AdjustedRandIndex.of(clusters, alt);  // = 1.0 (perfect)
```

---

### 5.3 Mutual Information (MI)

```java
double mi = MutualInformation.of(truth, cluster);
```

Measures the information shared between two labellings (in **nats**, natural
log):

```
I(X;Y) = Σᵢⱼ (nᵢⱼ/n) · log[ (nᵢⱼ/n) / ((aᵢ/n)(bⱼ/n)) ]
```

- **MI = H(truth)** when `truth == cluster` (perfect clustering).
- **MI = 0** when the two labellings are statistically independent.
- Non-negative by definition.

```java
int[] x = {0, 0, 0, 1, 1, 1};
MutualInformation.of(x, x);   // = ln(2) ≈ 0.6931 nats
MutualInformation.of(x, new int[]{0,1,0,1,0,1});  // = 0.0 (independent)
```

---

### 5.4 Normalized Mutual Information (NMI)

NMI scales MI to the interval `[0, 1]` by dividing by a normalization factor
derived from the marginal entropies. Five normalization methods are available:

| Constant | Formula | Notes                                                                        |
|---|---|------------------------------------------------------------------------------|
| `NormalizedMutualInformation.JOINT` | `I / H(X,Y)` | H(X,Y) = joint entropy                                                       |
| `NormalizedMutualInformation.MAX` | `I / max(H(X), H(Y))` | Bounded by the larger entropy                                                |
| `NormalizedMutualInformation.MIN` | `I / min(H(X), H(Y))` | Can reach 1 even for imperfect clustering if one labelling has lower entropy |
| `NormalizedMutualInformation.SUM` | `2I / (H(X) + H(Y))` | Symmetric F-measure-like                                                     |
| `NormalizedMutualInformation.SQRT` | `I / √(H(X)·H(Y))` | Geometric mean normalization                                                |

```java
double nmi = NormalizedMutualInformation.max(truth, cluster);
// or via instance:
double nmi = NormalizedMutualInformation.MAX.score(truth, cluster);
```

All variants equal **1.0** for a perfect clustering and **0.0** for
statistically independent labellings. The variants differ for intermediate cases.

**Note:** Due to floating-point arithmetic, values may be infinitesimally above
1.0 (e.g., 1.0000000000000002); treat values within `1 + 1e-10` as 1.0 in
downstream comparisons.

---

### 5.5 Adjusted Mutual Information (AMI)

AMI corrects MI for chance under a hypergeometric model (analogous to how ARI
corrects RI):

```
AMI = (I − E[MI]) / (norm − E[MI])
```

Four normalization methods are provided:

| Constant | Denominator |
|---|---|
| `AdjustedMutualInformation.MAX` | `max(H(X), H(Y)) − E[MI]` |
| `AdjustedMutualInformation.MIN` | `min(H(X), H(Y)) − E[MI]` |
| `AdjustedMutualInformation.SUM` | `0.5·(H(X) + H(Y)) − E[MI]` |
| `AdjustedMutualInformation.SQRT` | `√(H(X)·H(Y)) − E[MI]` |

```java
double ami = AdjustedMutualInformation.max(truth, cluster);
// or via instance:
double ami = AdjustedMutualInformation.MAX.score(truth, cluster);
```

| AMI value | Interpretation |
|---|---|
| 1.0 | Perfect agreement |
| 0.0 | No more information than chance |
| < 0 | Worse than chance |

**Warning:** The expected MI computation involves a double sum over the
hypergeometric support and can be slow for large numbers of clusters.

---

## 6. Choosing the Right Metric

### Classification

| Scenario | Recommended metric(s) |
|---|---|
| Balanced classes, overall correctness | `Accuracy` |
| Imbalanced classes | `F₁`, `MCC`, `AUC` |
| Precision–recall trade-off | `Precision`, `Recall`, `Fβ`, `FDR` |
| Confident probabilistic output | `LogLoss`, `AUC` |
| Multi-class, equal class importance | `Macro F₁` |
| Multi-class, class-proportional | `Weighted F₁` |
| Best single binary metric | `MCC` |

### Regression

| Scenario | Recommended metric(s) |
|---|---|
| General-purpose | `RMSE`, `R²` |
| Outlier-robust | `MAD` |
| Comparing across different scales | `R²` |
| Matching the loss function (OLS) | `RSS` or `MSE` |

### Clustering

| Scenario                                          | Recommended metric(s) |
|---------------------------------------------------|---|
| Ground truth available, absolute quality          | `ARI` |
| Information-theoretic comparison                  | `AMI (MAX)` |
| Pairwise agreement, no correction                 | `Rand Index` |
| Raw MI for downstream use                         | `MutualInformation` |
| Normalized to `[0, 1]` without chance correction | `NMI (MAX)` |

---

## 7. Usage Patterns

### 7.1 Quick one-liner evaluation

```java
import smile.validation.metric.*;

// Classification
double acc  = Accuracy.of(truth, pred);
double f1   = FScore.of(truth, pred, 1.0, null);
double mcc  = MatthewsCorrelation.of(truth, pred);
double auc  = AUC.of(truth, prob);
double loss = LogLoss.of(truth, prob);

// Regression
double rmse = RMSE.of(yTrue, yPred);
double r2   = R2.of(yTrue, yPred);
double mad  = MAD.of(yTrue, yPred);

// Clustering
double ari = AdjustedRandIndex.of(labels, clusters);
double ami = AdjustedMutualInformation.max(labels, clusters);
double nmi = NormalizedMutualInformation.max(labels, clusters);
```

### 7.2 Reusable metric instances

Pass metrics as `ClassificationMetric` / `RegressionMetric` parameters:

```java
ClassificationMetric metric = new FScore(2.0, Averaging.Macro);  // F₂, macro
double score = metric.score(truth, prediction);

// Use as lambda / method reference
ClassificationMetric simpleAcc = Accuracy::of;  // won't work; use instance
ClassificationMetric acc = Accuracy.instance;
```

Pre-built singleton instances for all metrics:

```java
Accuracy.instance
Error.instance
AUC.instance
LogLoss.instance
MSE.instance
RMSE.instance
RSS.instance
MAD.instance
R2.instance
MutualInformation.instance
RandIndex.instance
AdjustedRandIndex.instance
NormalizedMutualInformation.JOINT   // or MAX, MIN, SUM, SQRT
AdjustedMutualInformation.MAX       // or MIN, SUM, SQRT
```

### 7.3 Aggregated metrics via `ClassificationMetrics`

```java
import smile.validation.ClassificationMetrics;

ClassificationMetrics m = ClassificationMetrics.of(fitTime, scoreTime,
                                                    truth, prediction, prob);
System.out.println(m.accuracy());
System.out.println(m.f1());
System.out.println(m.mcc());
System.out.println(m.auc());
System.out.println(m.logloss());
```

### 7.4 Cross-validation evaluation

```java
import smile.validation.*;

var result = CrossValidation.classification(5, data, labels,
    (x, y) -> SVM.fit(x, y, kernel, C, tol),
    ClassificationMetrics::of);

System.out.println(result);
```

---

## 8. Numeric Examples

### Binary classification summary

```java
int[] truth = {1,1,1,1,1,0,0,0,0,0};
int[] pred  = {1,1,1,0,0,1,0,0,0,0};
// TP=3, FN=2, FP=1, TN=4

Accuracy.of(truth, pred)             // 7/10 = 0.7
Error.of(truth, pred)                // 3
Precision.of(truth, pred)            // 3/(3+1) = 0.75
Recall.of(truth, pred)               // 3/(3+2) = 0.60
FScore.of(truth, pred, 1.0, null)    // 2*0.75*0.60/(0.75+0.60) ≈ 0.667
FDR.of(truth, pred)                  // 1/4 = 0.25
Specificity.of(truth, pred)          // 4/(4+1) = 0.80
Sensitivity.of(truth, pred)          // same as Recall = 0.60
MatthewsCorrelation.of(truth, pred)  // ≈ 0.398
```

### AUC with tie-breaking

```java
int[]    truth = {0, 0, 1, 1};
double[] prob  = {0.1, 0.4, 0.35, 0.8};
// Sorted ascending by prob: labels=[0,1,0,1], ranks=[1,2,3,4]
// Sum of positive ranks = 2+4 = 6
// AUC = (6 - 2*3/2) / (2*2) = 3/4 = 0.75
AUC.of(truth, prob);  // 0.75
```

### R² interpretation

```java
double[] truth = {3.0, -0.5, 2.0, 7.0};
double[] pred  = {2.5,  0.0, 2.0, 8.0};
R2.of(truth, pred);   // ≈ 0.948 — excellent fit

double[] naive = {3.625, 3.625, 3.625, 3.625};  // always predict mean
R2.of(truth, naive);  // = 0.0 — no better than the mean
```

### Perfect clustering vs independent clustering

```java
int[] x = {0, 0, 0, 1, 1, 1};

// Perfect: truth == cluster
AdjustedRandIndex.of(x, x)                       // 1.0
NormalizedMutualInformation.max(x, x)             // 1.0
AdjustedMutualInformation.max(x, x)               // 1.0

// Independent: balanced 2×2 contingency → MI = 0
int[] y = {0, 1, 0, 1, 0, 1};
MutualInformation.of(x, y)                        // 0.0... actually non-zero here
// Use balanced n=4 for exact independence:
MutualInformation.of(new int[]{0,0,1,1}, new int[]{0,1,0,1})  // 0.0
```

---

## 9. Edge Cases and Pitfalls

**All predictions from one class (no positives predicted)**  
`Precision` and `FDR` return `NaN` when no sample is predicted positive.

**All ground truth from one class**  
`Recall`, `Sensitivity` return `NaN`; `AUC` returns `NaN`; `MCC` returns
`NaN`; `R2` returns `NaN`/infinity if all truth values are equal (TSS = 0).
Guard with `Double.isFinite(result)` before using.

**Only one class in truth for MCC**  
`MatthewsCorrelation` requires a 2×2 confusion matrix. If `truth` contains
only one distinct value, the resulting confusion matrix has only one non-zero
row/column and `MCC` returns `NaN`.

**AMI performance**  
`AdjustedMutualInformation` computes the expected MI via a double loop over
the hypergeometric support. For datasets with many small clusters (large `R`
and `C`), this is noticeably slow. Prefer `ARI` or `NMI` for exploratory work.

**NMI slightly above 1.0**  
Floating-point rounding can produce NMI values of `1.0 + ε`. When using NMI
in comparisons (e.g., storing the best score), clamp to `[0.0, 1.0]`:

```java
double nmi = Math.min(1.0, NormalizedMutualInformation.max(truth, cluster));
```

**Large label IDs in `ConfusionMatrix` / `Precision` / `Recall`**  
These metrics allocate arrays of size `max(label) + 1`. Labels like
`{0, 1000}` allocate a 1001-element array. Use remapped/contiguous labels
for efficiency.

**Probabilistic metrics require calibrated probabilities**  
`LogLoss`, `CrossEntropy`, and `AUC` use the raw score values directly.
`LogLoss` blows up (`+∞`) if a probability of exactly `0.0` or `1.0` is
submitted for the wrong class. Calibrate or clip probabilities before use:

```java
double p = Math.max(1e-15, Math.min(1 - 1e-15, rawProbability));
```

---

*SMILE — Copyright © 2010-2026 Haifeng Li. GNU GPL licensed.*