# SMILE — Independent Component Analysis (ICA) User Guide

## Table of Contents

1. [Overview](#overview)
2. [Quick Start](#quick-start)
3. [Mathematical Background](#mathematical-background)
   - [The ICA Model](#the-ica-model)
   - [Assumptions](#assumptions)
   - [Non-Gaussianity as Independence Proxy](#non-gaussianity-as-independence-proxy)
   - [The FastICA Algorithm](#the-fastica-algorithm)
   - [Data Whitening](#data-whitening)
4. [Contrast Functions](#contrast-functions)
   - [LogCosh (default)](#logcosh-default)
   - [Gaussian (Exp)](#gaussian-exp)
   - [Kurtosis](#kurtosis)
   - [Choosing a Contrast Function](#choosing-a-contrast-function)
   - [Custom Contrast Functions](#custom-contrast-functions)
5. [Hyperparameters](#hyperparameters)
   - [Options Record](#options-record)
   - [Persisting Options with Properties](#persisting-options-with-properties)
6. [Input Data Layout](#input-data-layout)
7. [Working with the Result](#working-with-the-result)
8. [Practical Guidance](#practical-guidance)
   - [Number of Components](#number-of-components)
   - [Convergence and Iteration Limit](#convergence-and-iteration-limit)
   - [Reproducibility and Seeding](#reproducibility-and-seeding)
   - [Sign and Ordering Ambiguity](#sign-and-ordering-ambiguity)
   - [Limitations](#limitations)
9. [Complete Examples](#complete-examples)
   - [Cocktail Party Problem](#cocktail-party-problem)
   - [Feature Extraction](#feature-extraction)
   - [Configuring via Properties](#configuring-via-properties)
10. [ICA vs PCA](#ica-vs-pca)
11. [References](#references)

---

## Overview

The `smile.ica` package provides **Independent Component Analysis (ICA)** via the
**FastICA** algorithm invented by Aapo Hyvärinen. ICA is a blind source-separation
technique that decomposes a set of mixed observed signals into a set of maximally
statistically independent components.

| Class / Record | Role |
|---|---|
| `ICA` | The fitted model; holds the unmixing matrix |
| `ICA.Options` | Hyperparameters: contrast function, iteration limit, tolerance |
| `LogCosh` | Default contrast function — general purpose |
| `Exp` | Gaussian contrast function — super-Gaussian / robust |
| `Kurtosis` | Kurtosis contrast function — simple, sensitive to outliers |

All contrast-function classes implement `smile.util.function.DifferentiableFunction`
and `java.io.Serializable`, so custom implementations can be serialized alongside
the model.

---

## Quick Start

```java
import smile.ica.*;
import smile.math.MathEx;

// data[i] is sample i, data[i][j] is the j-th mixed signal value.
// Rearrange to variables × samples before calling fit().
double[][] data = /* load your mixed-signal matrix (samples × variables) */;
double[][] X    = MathEx.transpose(data); // variables × samples

// Fit ICA — extract 2 independent components using default settings.
MathEx.setSeed(19650218);       // for reproducibility
ICA ica = ICA.fit(X, 2);

// Each row of components() is one unit-norm independent component vector.
double[][] components = ica.components();
System.out.printf("Component 0, sample 0: %.5f%n", components[0][0]);
System.out.printf("Component 1, sample 0: %.5f%n", components[1][0]);
```

---

## Mathematical Background

### The ICA Model

ICA assumes the observed signal vector **x** is a linear instantaneous mixture
of statistically independent source signals **s**:

```
x = A · s
```

where:
- **x** ∈ ℝᵐ is the vector of observed (mixed) signals at one time instant
- **s** ∈ ℝᵖ is the vector of unknown independent source signals
- **A** ∈ ℝ^{m×p} is the unknown mixing matrix

The goal is to estimate the **unmixing matrix W** such that:

```
ŝ = W · x
```

recovers estimates of the original sources **s** up to permutation and scaling.

### Assumptions

ICA requires the following assumptions to hold:

1. **Statistical independence** — the source signals `s₁, s₂, …, sₚ` are
   mutually statistically independent.
2. **Non-Gaussianity** — at most one source may be Gaussian (by the Central
   Limit Theorem, a mixture of independent sources becomes more Gaussian; ICA
   reverses this by maximizing non-Gaussianity).
3. **Linearity** — the mixing is instantaneous and linear (no time delays or
   convolution).
4. **Sufficient observations** — at least as many observed signals as source
   signals (`m ≥ p`).

### Non-Gaussianity as Independence Proxy

FastICA measures non-Gaussianity using **negentropy** approximations based on a
non-quadratic, non-linear contrast function G(u). Negentropy is always
non-negative and equals zero if and only if the variable is Gaussian.

The negentropy approximation is:

```
J(y) ≈ [E{G(y)} - E{G(ν)}]²
```

where ν is a standard Gaussian variable and the expectation is over samples.
Maximizing this over unit-norm directions w gives the most non-Gaussian
(most independent) projection.

### The FastICA Algorithm

FastICA uses a **fixed-point iteration** to find each unmixing vector **w**:

```
w ← (1/n) · X · g(Xᵀw) − mean(g′(Xᵀw)) · w
w ← w / ‖w‖
```

where:
- **X** ∈ ℝ^{n×m} is the whitened data matrix (n samples, m variables)
- `g = G′` is the first derivative of the contrast function G
- `g′ = G″` is the second derivative

After convergence of each component, **deflation orthogonalization** removes
its contribution so that subsequent components are orthogonal:

```
w ← w − Σₖ (wₖᵀ w) wₖ    for all previously found components wₖ
w ← w / ‖w‖
```

Convergence is declared when:

```
min(‖w − w_old‖, ‖w + w_old‖) < tol
```

The two-case test handles the sign ambiguity: a direction and its negative
represent the same component.

### Data Whitening

Before applying FastICA the data is **pre-whitened** (sphered):

1. **Center** — subtract the mean of each observed variable.
2. **Eigendecompose** — compute the eigendecomposition of the sample covariance
   matrix `Cₓ = XᵀX / n = E D Eᵀ`.
3. **Scale** — form the whitened data `Z = X E D^{-1/2}`.

After whitening, `E{ZᵀZ} = I`, so the covariance is the identity matrix.
Whitening reduces the ICA problem to finding an orthogonal rotation, which
simplifies the optimization considerably.

> **Numerical note:** If any eigenvalue of the covariance matrix is smaller
> than `1e-8`, an `IllegalArgumentException` is thrown — the data is nearly
> linearly dependent and ICA is not applicable without dimensionality reduction.

---

## Contrast Functions

A contrast function G must be:
- non-quadratic
- non-linear
- twice differentiable

The three built-in contrast functions cover the most common use cases.

### LogCosh (default)

```java
new LogCosh()   // or  new ICA.Options("LogCosh", 100)
```

| Property | Value |
|---|---|
| G(u) | `log(cosh(u))`  ≈ `\|u\| − log 2` for large \|u\| |
| G′(u) = g(u) | `tanh(u)` |
| G″(u) = g′(u) | `1 − tanh²(u)` |
| Implementation | Numerically stable: `\|u\| + log1p(exp(−2\|u\|)) − log 2` |
| Suitable for | **General purpose** — balances robustness and accuracy |
| Signal types | Sub-Gaussian and super-Gaussian sources |

`LogCosh` is the recommended default. It is smooth, bounded in derivative, and
avoids the numerical overflow that `log(cosh(x))` would produce for `|x| > 710`.

### Gaussian (Exp)

```java
new Exp()   // or  new ICA.Options("Gaussian", 100)
```

| Property | Value |
|---|---|
| G(u) | `−exp(−u²/2)` |
| G′(u) = g(u) | `u · exp(−u²/2)` |
| G″(u) = g′(u) | `(1 − u²) · exp(−u²/2)` |
| Suitable for | **Super-Gaussian sources**, robustness to outliers |
| Signal types | Sparse signals, impulsive noise |

The Gaussian contrast function down-weights extreme values, making it more
robust when the data contain outliers.

### Kurtosis

```java
new Kurtosis()   // or  new ICA.Options("Kurtosis", 100)
```

| Property | Value |
|---|---|
| G(u) | `u⁴ / 4` |
| G′(u) = g(u) | `u³` |
| G″(u) = g′(u) | `3u²` |
| Suitable for | Simple / educational use, clean data |
| Signal types | Any, but **sensitive to outliers** |

Kurtosis is a classical measure of non-Gaussianity. However, because G grows as
`u⁴`, it is highly sensitive to large values (outliers). Prefer `LogCosh` or
`Exp` for real data.

### Choosing a Contrast Function

| Scenario | Recommended |
|---|---|
| No prior knowledge | `LogCosh` |
| Super-Gaussian signals (e.g., speech, sparse) | `Exp` |
| Sub-Gaussian signals (e.g., uniform) | `LogCosh` |
| Clean data, no outliers, quick test | `Kurtosis` |
| Outlier-prone data | `Exp` |

### Custom Contrast Functions

Implement `DifferentiableFunction` and optionally `Serializable`:

```java
import smile.util.function.DifferentiableFunction;
import java.io.Serializable;

public class MyContrast implements DifferentiableFunction, Serializable {
    @java.io.Serial
    private static final long serialVersionUID = 1L;

    @Override
    public double f(double x) {
        // G(u) — the contrast function itself (not required by FastICA
        // but required by the DifferentiableFunction interface)
        return Math.log1p(x * x);
    }

    @Override
    public double g(double x) {
        // G′(u) — first derivative
        return 2.0 * x / (1.0 + x * x);
    }

    @Override
    public double g2(double x) {
        // G″(u) — second derivative
        double d = 1.0 + x * x;
        return 2.0 * (1.0 - x * x) / (d * d);
    }
}

// Use it
ICA.Options opts = new ICA.Options(new MyContrast(), 200, 1E-5);
ICA ica = ICA.fit(X, p, opts);
```

---

## Hyperparameters

### Options Record

```java
ICA.Options(DifferentiableFunction contrast, int maxIter, double tol)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `contrast` | `DifferentiableFunction` | `new LogCosh()` | Contrast function G |
| `maxIter` | `int` | `100` | Maximum fixed-point iterations per component |
| `tol` | `double` | `1e-4` | Convergence tolerance on ‖w − w_old‖ |

**Convenience constructors:**

```java
// Default tolerance 1e-4
new ICA.Options(new LogCosh(), 100)

// Custom tolerance
new ICA.Options(new LogCosh(), 200, 1E-6)

// By name
new ICA.Options("LogCosh",  100)
new ICA.Options("Gaussian", 100)
new ICA.Options("Kurtosis", 100)
```

### Persisting Options with Properties

`Options` can be serialized to/from `java.util.Properties`, which is convenient
for configuration files or command-line parameter passing.

```java
// Save
ICA.Options opts = new ICA.Options(new Exp(), 150, 1E-5);
Properties props = opts.toProperties();
// props contains:
//   smile.ica.contrast   = Gaussian
//   smile.ica.iterations = 150
//   smile.ica.tolerance  = 1.0E-5

// Restore
ICA.Options restored = ICA.Options.of(props);
```

Property keys:

| Key | Value |
|---|---|
| `smile.ica.contrast` | `"LogCosh"`, `"Gaussian"`, `"Kurtosis"`, or fully-qualified class name |
| `smile.ica.iterations` | integer string |
| `smile.ica.tolerance` | double string |

When a fully-qualified class name is stored, `Options.of()` instantiates it via
reflection using its no-argument constructor — so custom contrast classes must
have a public no-arg constructor.

---

## Input Data Layout

`ICA.fit()` expects data in **variables × samples** layout:

```
data[i][j]  →  value of the i-th observed signal (variable) at time j (sample)
```

- Rows: observed signals / channels (dimension = m)
- Columns: time steps / observations (dimension = n)

If your data is in the conventional **samples × variables** layout (each row is
one observation), transpose it first:

```java
double[][] samplesXvars = /* ... */;              // shape: n × m
double[][] X = MathEx.transpose(samplesXvars);    // shape: m × n
ICA ica = ICA.fit(X, p);
```

The number of components `p` must satisfy `1 ≤ p ≤ m` (number of signals).

---

## Working with the Result

`ICA` is a Java `record` with a single field:

```java
double[][] components = ica.components();
```

- `components.length` == `p` (number of independent components extracted)
- `components[i].length` == `n` (number of samples)
- Each row `components[i]` is a **unit-norm** vector in the whitened sample
  space representing the i-th independent component.
- Rows are **mutually orthogonal**: `components[i] · components[j] ≈ 0` for
  `i ≠ j`.

`ICA` implements `java.io.Serializable` (serialVersionUID = 2), so models can
be saved and loaded with standard Java object serialization or any compatible
framework.

---

## Practical Guidance

### Number of Components

Set `p` to the number of source signals you believe are present. In the absence
of domain knowledge:

- Start with `p = m` (full decomposition) to see all components, then keep
  only the most interpretable ones.
- Use domain knowledge or cross-validation to select a smaller `p`.
- `p` must not exceed `m` (the number of observed signals).

### Convergence and Iteration Limit

If a component does not converge within `maxIter` iterations a `WARN`-level
SLF4J message is emitted:

```
Component 2 did not converge in 100 iterations.
```

Suggested remedies:

1. **Increase `maxIter`** — try 200 or 500.
2. **Loosen `tol`** — e.g., `1e-3` if sub-sample precision is acceptable.
3. **Change contrast function** — `Exp` sometimes converges faster for
   super-Gaussian sources.
4. **Check data quality** — near-collinear signals or strong outliers can
   prevent convergence.

### Reproducibility and Seeding

FastICA initializes **w** with random Gaussian vectors. Results are
non-deterministic across runs. For reproducible output call:

```java
MathEx.setSeed(19650218);
ICA ica = ICA.fit(X, p);
```

### Sign and Ordering Ambiguity

ICA has two fundamental ambiguities that cannot be resolved algorithmically:

1. **Sign** — `w` and `−w` define the same independent subspace. The sign of
   each extracted component is arbitrary.
2. **Order** — there is no canonical ordering of the components. The order
   may vary between runs even with the same seed.

If consistent ordering matters, sort the components by a domain-specific
criterion (e.g., variance of the recovered source, frequency content, etc.)
after fitting.

### Limitations

| Limitation | Details |
|---|---|
| Linear mixing only | ICA does not handle nonlinear or convolutive mixtures |
| No temporal structure | FastICA ignores time-ordering; for time-series use SOBI or TDSEP |
| Gaussian sources | Cannot separate more than one Gaussian source |
| Square or over-determined systems | Requires `m ≥ p` (at least as many sensors as sources) |
| Outlier sensitivity (`Kurtosis`) | Use `LogCosh` or `Exp` for noisy real data |

---

## Complete Examples

### Cocktail Party Problem

The classic motivating example: separate two mixed speech-like signals.

```java
import smile.ica.*;
import smile.math.MathEx;

MathEx.setSeed(12345);

int T = 2000;

// Two non-Gaussian source signals
double[] s1 = new double[T];   // sawtooth
double[] s2 = new double[T];   // square wave
for (int t = 0; t < T; t++) {
    s1[t] = 2.0 * ((t % 50) / 50.0) - 1.0;
    s2[t] = Math.signum(Math.sin(2 * Math.PI * t / 30.0));
}

// Linear mixing: two microphones
double[][] mixed = new double[T][2];
for (int t = 0; t < T; t++) {
    mixed[t][0] = 0.7 * s1[t] + 0.3 * s2[t];   // microphone 1
    mixed[t][1] = 0.4 * s1[t] + 0.6 * s2[t];   // microphone 2
}

// Fit ICA (must transpose to variables × samples layout)
ICA ica = ICA.fit(MathEx.transpose(mixed), 2);

double[][] w = ica.components();
System.out.printf("‖w₀‖ = %.6f (should be 1.0)%n", MathEx.norm(w[0]));
System.out.printf("‖w₁‖ = %.6f (should be 1.0)%n", MathEx.norm(w[1]));
System.out.printf("w₀ · w₁ = %.6f (should be ≈ 0)%n", MathEx.dot(w[0], w[1]));
```

### Feature Extraction

ICA can be used for feature extraction as an alternative to PCA. Unlike PCA,
ICA components are statistically independent (not just uncorrelated), which
makes them more meaningful for non-Gaussian data such as natural images or
EEG signals.

```java
import smile.ica.*;
import smile.math.MathEx;

// Suppose eegData is channels × timepoints
double[][] eegData = loadEEG(); // shape: 64 × 10000

MathEx.setSeed(42);

// Extract 20 independent components from 64-channel EEG
ICA.Options opts = new ICA.Options(new LogCosh(), 300, 1E-5);
ICA ica = ICA.fit(eegData, 20, opts);

// The components matrix: 20 × 10000
double[][] icComponents = ica.components();

// Inspect convergence via SLF4J logging output.
// Each row icComponents[i] is a unit-norm vector in the whitened sample space.
```

### Configuring via Properties

Useful for externalizing configuration in applications or ML pipelines:

```java
import smile.ica.*;
import java.util.Properties;

// ---- at configuration time ----
Properties props = new Properties();
props.setProperty("smile.ica.contrast",   "Gaussian");
props.setProperty("smile.ica.iterations", "200");
props.setProperty("smile.ica.tolerance",  "1E-5");

// ---- at fit time ----
ICA.Options opts = ICA.Options.of(props);  // throws ReflectiveOperationException
ICA ica = ICA.fit(X, p, opts);

// ---- serialize the options for later ----
Properties savedProps = opts.toProperties();
// savedProps.getProperty("smile.ica.contrast") == "Gaussian"
```

---

## ICA vs PCA

Both ICA and PCA decompose a multivariate signal, but with different goals:

| Aspect | PCA | ICA |
|---|---|---|
| Criterion | Maximum variance | Maximum statistical independence |
| Output | Uncorrelated components | Independent components |
| Gaussian data | Optimal | Undefined (see Limitations) |
| Non-Gaussian data | Sub-optimal | Optimal |
| Ordering | Decreasing variance | Arbitrary |
| Sign | Consistent (largest projection positive) | Arbitrary |
| Preprocessing | No | Requires whitening (done automatically) |
| Typical use | Dimensionality reduction, compression | Blind source separation, artifact removal |

In practice, **PCA whitening is applied as a pre-processing step inside
FastICA** — the two methods are complementary rather than competing.

---

## References

1. Aapo Hyvärinen. *Fast and robust fixed-point algorithms for independent
   component analysis.* IEEE Transactions on Neural Networks, 10(3):626–634,
   1999.

2. Aapo Hyvärinen and Erkki Oja. *Independent component analysis: Algorithms
   and applications.* Neural Networks, 13(4–5):411–430, 2000.

3. Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. *Independent Component
   Analysis.* Wiley, 2001.

4. Pierre Comon. *Independent component analysis, a new concept?* Signal
   Processing, 36(3):287–314, 1994.


---

*SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.*