# SMILE — Time Series Analysis

The `smile.timeseries` package provides tools for modeling and forecasting
univariate time series. It covers four areas:

| Class / Interface | Purpose |
|---|---|
| `TimeSeries` | Static utilities: differencing, autocovariance, ACF, PACF |
| `BoxTest` | Portmanteau autocorrelation tests (Box–Pierce, Ljung–Box) |
| `AR` | Autoregressive model AR(*p*) — Yule–Walker and OLS fitting |
| `ARMA` | Autoregressive moving-average model ARMA(*p*, *q*) |

---

## 1. Foundations

### 1.1 Stationarity

All models in this package assume **weak stationarity**: the mean and the
autocovariance between any two lags are time-invariant. For many real-world
series (trending prices, seasonal data) stationarity must be achieved by
pre-processing before fitting.

Two common transforms are:

* **Differencing** — removes a polynomial trend.
* **Log transform** — stabilizes variance (e.g., asset prices).

A log-return series `log(p_t) − log(p_{t-1})` typically satisfies stationarity
and is a natural starting point for financial data.

### 1.2 Model taxonomy

```
AR(p)   : x_t = b + φ₁x_{t-1} + … + φₚx_{t-p} + ε_t
MA(q)   : x_t = b + ε_t + θ₁ε_{t-1} + … + θ_qε_{t-q}
ARMA(p,q): combination of AR and MA
```

`ε_t` is white noise with variance σ². The intercept `b` captures a non-zero
series mean.

---

## 2. `TimeSeries` — Utility Functions

`TimeSeries` is a Java interface with only `static` methods; you never
instantiate it.

### 2.1 Differencing

```java
// First difference (lag=1)
double[] ret = TimeSeries.diff(logPrice, 1);

// Seasonal difference (lag=12 for monthly data)
double[] season = TimeSeries.diff(monthly, 12);

// Multi-order differencing — returns every intermediate series
double[][] d2 = TimeSeries.diff(x, 1, 2);
// d2[0] = first-difference pass
// d2[1] = second-difference pass (difference of the difference)
```

**Signature overview**

```java
static double[]   diff(double[] x, int lag)
static double[][] diff(double[] x, int lag, int differences)
```

* `lag` must be ≥ 1.
* `differences` must be ≥ 1.
* `lag * differences` must be strictly less than `x.length`.

Each differencing pass shrinks the series by `lag` elements.

**Example — second difference of a geometric series `{1, 2, 4, 8, 16}`**

```java
// lag=1, differences=2
double[][] d = TimeSeries.diff(new double[]{1,2,4,8,16}, 1, 2);
// d[0] = {1, 2, 4, 8}   (x[i+1]-x[i])
// d[1] = {1, 2, 4}       (d[0][i+1]-d[0][i])
```

### 2.2 Autocovariance

```java
static double cov(double[] x, int lag)
```

Returns the **unnormalized** sample autocovariance at the given lag:

```
cov(lag) = Σ_{i=lag}^{T-1} (x[i] - μ)(x[i-lag] - μ)
```

where `μ = mean(x)`. This is not divided by `T`, so it represents the raw sum
of cross-products. Useful when you need to build your own ACF normalization.

Negative lags are silently converted to their absolute value.

### 2.3 Autocorrelation Function (ACF)

```java
static double acf(double[] x, int lag)
```

Returns the sample autocorrelation at the given lag:

```
acf(0) = 1
acf(k) = cov(k) / cov(0)
```

```java
// ACF of the log-return series at lag 1
double r1 = TimeSeries.acf(logPriceDiff, 1);

// Lag 0 is always 1.0
assertEquals(1.0, TimeSeries.acf(x, 0), 1e-10);
```

### 2.4 Partial Autocorrelation Function (PACF)

```java
static double pacf(double[] x, int lag)
```

The PACF at lag *k* measures the direct correlation between `x_t` and
`x_{t-k}` after removing the contributions of all intermediate lags. It is
computed via the Yule–Walker equations on the ACF vector.

```java
double phi = TimeSeries.pacf(logPriceDiff, 3);
```

`pacf(0) = 1` and `pacf(1) = acf(1)`.

**Rule of thumb for order selection**

| Pattern | Suggested model |
|---|---|
| ACF cuts off at lag *q*; PACF tails off | MA(*q*) |
| PACF cuts off at lag *p*; ACF tails off | AR(*p*) |
| Both tail off | ARMA(*p*, *q*) |

---

## 3. `BoxTest` — Portmanteau Tests

After fitting a model, you want to check whether the residuals look like white
noise. The Box–Pierce and Ljung–Box tests formally test the null hypothesis
that the first `lag` autocorrelations are jointly zero.

### 3.1 Box–Pierce

```java
BoxTest result = BoxTest.pierce(residuals, lag);
```

Statistic: `Q = n * Σ_{l=1}^{lag} r_l²`

### 3.2 Ljung–Box (preferred for small samples)

```java
BoxTest result = BoxTest.ljung(residuals, lag);
```

Statistic: `Q = n(n+2) * Σ_{l=1}^{lag} r_l² / (n-l)`

The Ljung–Box correction reduces the downward bias of the Box–Pierce statistic
in small samples and is the default choice in most software.

### 3.3 Reading results

```java
BoxTest box = BoxTest.ljung(residuals, 20);

box.type      // BoxTest.Type.Ljung_Box
box.df        // degrees of freedom = lag
box.q         // test statistic
box.pvalue    // chi-square p-value

System.out.println(box);
// Ljung-Box test
// Q* = 9.0415, df = 5, p-value = 0.1074
// ...
```

A **large p-value** (e.g., > 0.05) means you fail to reject the white-noise
null — the residuals are consistent with white noise, indicating a good model
fit.

### 3.4 Validation

```java
// lag must be in [1, x.length-1]
assertThrows(IllegalArgumentException.class, () -> BoxTest.ljung(x, 0));
assertThrows(IllegalArgumentException.class, () -> BoxTest.ljung(x, x.length));
```

---

## 4. `AR` — Autoregressive Model

The AR(*p*) model is:

```
x_t = b + φ₁x_{t-1} + φ₂x_{t-2} + … + φₚx_{t-p} + ε_t
```

### 4.1 Fitting methods

Two estimators are available, chosen by the factory method you call:

| Method | Factory | When to use |
|---|---|---|
| Yule–Walker | `AR.fit(x, p)` | Guaranteed stationary fit; fast; uses only ACF |
| OLS / Least Squares | `AR.ols(x, p)` | More accurate coefficients; may be non-stationary |

```java
AR ywModel  = AR.fit(logPriceDiff, 6);   // Yule-Walker
AR olsModel = AR.ols(logPriceDiff, 6);   // OLS with standard errors
AR olsFast  = AR.ols(logPriceDiff, 6, false); // OLS, skip SE computation
```

The `stderr` flag controls whether standard errors and t-tests are computed.
Passing `false` is faster when you only need point estimates.

### 4.2 Parameter access

```java
int    p    = model.p();            // AR order
double b    = model.intercept();    // intercept
double[] ar = model.ar();           // φ₁, φ₂, …, φₚ

double[] fit  = model.fittedValues();  // length = x.length - p
double[] res  = model.residuals();     // length = x.length - p
double rss    = model.RSS();           // residual sum of squares
double var    = model.variance();      // RSS / df
int df        = model.df();            // x.length - p
double r2     = model.R2();
double adjR2  = model.adjustedR2();
```

### 4.3 Significance tests (OLS only)

When `stderr = true` (the default for `AR.ols`), the model carries a `p × 4`
matrix returned by `ttest()`:

| Column | Content |
|---|---|
| 0 | Coefficient estimate |
| 1 | Standard error |
| 2 | t-statistic |
| 3 | p-value |

```java
double[][] tt = model.ttest();
for (int i = 0; i < model.p(); i++) {
    System.out.printf("φ_%d  estimate=%.4f  SE=%.4f  t=%.3f  p=%.4f%n",
        i+1, tt[i][0], tt[i][1], tt[i][2], tt[i][3]);
}
```

`ttest()` returns `null` when the model was fitted with `stderr = false` or
via the Yule–Walker method.

### 4.4 Forecasting

```java
// One-step-ahead forecast
double next = model.forecast();

// l-step-ahead forecast (iterated)
double[] horizon = model.forecast(3);
// horizon[0] = one step ahead
// horizon[1] = two steps ahead
// horizon[2] = three steps ahead
```

Multi-step forecasts are produced by iterating the AR recursion: each
predicted value is fed back as a lagged input for subsequent steps. As the
horizon grows, the forecast converges toward the long-run mean.

```java
// Forecast horizon must be positive
assertThrows(IllegalArgumentException.class, () -> model.forecast(0));
```

### 4.5 Model summary

```java
System.out.println(model);
```

Prints residual quantiles, a coefficient table (with t-stats if available),
the residual variance, R², and adjusted R².

### 4.6 Full example — AR(6) on Bitcoin log returns

```java
var bitcoin = new BitcoinPrice();
double[] logPrice    = bitcoin.logPrice();
double[] logReturn   = TimeSeries.diff(logPrice, 1);  // make stationary

// Examine PACF to choose order
for (int k = 1; k <= 10; k++) {
    System.out.printf("PACF(%2d) = %+.4f%n", k, TimeSeries.pacf(logReturn, k));
}

// Fit AR(6) with OLS
AR model = AR.ols(logReturn, 6);
System.out.println(model);

// Forecast next 5 trading days
double[] forecast = model.forecast(5);
System.out.println(Arrays.toString(forecast));

// Diagnostic: test residuals for autocorrelation
BoxTest lbTest = BoxTest.ljung(model.residuals(), 20);
System.out.println(lbTest);
```

---

## 5. `ARMA` — Autoregressive Moving-Average Model

The ARMA(*p*, *q*) model is:

```
x_t = b + φ₁x_{t-1} + … + φₚx_{t-p}
        + ε_t + θ₁ε_{t-1} + … + θ_qε_{t-q}
```

The MA terms allow the model to capture autocorrelation patterns that pure AR
models require a very high order to approximate.

### 5.1 Fitting — Hannan–Rissanen algorithm

```java
ARMA model = ARMA.fit(x, p, q);
```

The fitting procedure is a two-stage least-squares method (Hannan–Rissanen):

1. Fit a long AR(*m*) where `m = p + q + 20` to obtain proxy residuals.
2. Regress `x_t` on `p` AR lags and `q` MA lags (from the stage-1 residuals)
   plus an intercept using SVD-based least squares.

Standard errors for all `p + q` AR and MA coefficients are provided
automatically.

**Minimum series length:** the series must satisfy
`x.length > p + q + 20 + max(p, q)`, so at least ~50 observations are needed
for even modest orders.

### 5.2 Parameter access

```java
int p        = model.p();           // AR order
int q        = model.q();           // MA order
double b     = model.intercept();   // intercept
double[] ar  = model.ar();          // φ₁, …, φₚ
double[] ma  = model.ma();          // θ₁, …, θ_q

double[] fit  = model.fittedValues();
double[] res  = model.residuals();
double rss    = model.RSS();
double var    = model.variance();
int df        = model.df();         // n = x.length - (p+q+20) - max(p,q)
double r2     = model.R2();
double adjR2  = model.adjustedR2();
```

### 5.3 Significance tests

`ttest()` returns a `(p+q) × 4` matrix in the same layout as `AR`:
rows 0..p-1 are the AR coefficients; rows p..p+q-1 are the MA coefficients.

```java
double[][] tt = model.ttest();
for (int i = 0; i < model.p() + model.q(); i++) {
    String label = i < model.p()
        ? "AR[" + (i+1) + "]"
        : "MA[" + (i - model.p() + 1) + "]";
    System.out.printf("%-8s  est=%.4f  SE=%.4f  t=%.3f  p=%.4f%n",
        label, tt[i][0], tt[i][1], tt[i][2], tt[i][3]);
}
```

### 5.4 Forecasting

```java
// One-step-ahead (uses last p observed values + last q residuals)
double next = model.forecast();

// l-step-ahead
double[] h = model.forecast(5);
```

Future residuals beyond the training window are assumed to be zero (the
expected value of white noise), so the forecast converges toward the long-run
mean as the horizon increases.

```java
// First element of multi-step forecast equals the single-step forecast
assertEquals(model.forecast(), model.forecast(5)[0], 1e-12);

// Horizon must be positive
assertThrows(IllegalArgumentException.class, () -> model.forecast(0));
```

### 5.5 Full example — ARMA(6, 3) on Bitcoin log returns

```java
var bitcoin = new BitcoinPrice();
double[] logReturn = TimeSeries.diff(bitcoin.logPrice(), 1);

ARMA model = ARMA.fit(logReturn, 6, 3);
System.out.println(model);

// One-step forecast
double nextStep = model.forecast();

// Three-step forecast
double[] h3 = model.forecast(3);

// Diagnostic
BoxTest lb = BoxTest.ljung(model.residuals(), 20);
if (lb.pvalue < 0.05) {
    System.out.println("Residuals still autocorrelated — consider higher order");
}
```

---

## 6. Workflow: Fitting an ARMA Model

```
                       Raw series
                           │
                    Stationarity check
                    (visual ACF plot,
                     unit-root tests)
                           │
                  Non-stationary ──► Apply diff() / log()
                           │
                    Stationary series
                           │
               ┌──────────┴──────────┐
         ACF/PACF analysis         Information criteria
         (cut-off patterns)        (AIC, BIC — external)
               └──────────┬──────────┘
                           │
                   Choose p, q
                           │
                 ARMA.fit(x, p, q)  or  AR.ols(x, p)
                           │
                  Inspect coefficients
                  and t-statistics
                           │
                 BoxTest.ljung(residuals, 20)
                  p-value > 0.05 ? ──► ✓ Accept model
                      │
                  p-value ≤ 0.05 ──► Increase p or q
                           │
                    model.forecast(l)
```

### 6.1 Choosing between AR and ARMA

Prefer **AR** when:
* The PACF cuts off sharply at lag *p* and the ACF tails off.
* You need a fast, non-iterative fit (both methods are closed form).
* You want Yule–Walker to guarantee a stationary fit.

Prefer **ARMA** when:
* Both ACF and PACF tail off (MA component is present).
* A parsimonious model is needed: ARMA(1,1) often beats AR(5) in parsimony
  and out-of-sample accuracy.

### 6.2 Choosing between Yule–Walker and OLS for AR

| Criterion | Yule–Walker | OLS |
|---|---|---|
| Stationarity of fitted model | Guaranteed | Not guaranteed |
| Coefficient accuracy | Slightly biased | Unbiased (consistent) |
| Standard errors available | No | Yes |
| Speed | Comparable | Comparable |

In practice, **OLS is recommended** unless you explicitly need the stationarity
guarantee (e.g., when using the fitted model as a starting point for a
stationary process simulator).

---

## 7. Serialization

Both `AR` and `ARMA` implement `java.io.Serializable` (`serialVersionUID = 2L`).
You can persist and restore models with standard Java serialization or any
compatible framework.

```java
// Save
try (var out = new ObjectOutputStream(new FileOutputStream("ar6.ser"))) {
    out.writeObject(model);
}

// Load
AR loaded;
try (var in = new ObjectInputStream(new FileInputStream("ar6.ser"))) {
    loaded = (AR) in.readObject();
}
double nextForecast = loaded.forecast();
```

---

## 8. Quick-reference API

### `TimeSeries`

```java
static double[]   diff(double[] x, int lag)
static double[][] diff(double[] x, int lag, int differences)
static double     cov(double[] x, int lag)
static double     acf(double[] x, int lag)
static double     pacf(double[] x, int lag)
```

### `BoxTest`

```java
static BoxTest pierce(double[] x, int lag)  // Box-Pierce test
static BoxTest ljung(double[] x, int lag)   // Ljung-Box test

Type   type     // Box_Pierce or Ljung_Box
int    df       // degrees of freedom (= lag)
double q        // test statistic
double pvalue   // chi-square p-value
```

### `AR`

```java
static AR fit(double[] x, int p)                      // Yule-Walker
static AR ols(double[] x, int p)                      // OLS with SE
static AR ols(double[] x, int p, boolean stderr)      // OLS optional SE

int       p()
double    intercept()
double[]  ar()
double[]  fittedValues()
double[]  residuals()
double    RSS()
double    variance()
int       df()
double    R2()
double    adjustedR2()
double[][] ttest()      // null if stderr=false or Yule-Walker
double    forecast()
double[]  forecast(int l)
```

### `ARMA`

```java
static ARMA fit(double[] x, int p, int q)

int       p()
int       q()
double    intercept()
double[]  ar()
double[]  ma()
double[]  fittedValues()
double[]  residuals()
double    RSS()
double    variance()
int       df()
double    R2()
double    adjustedR2()
double[][] ttest()      // (p+q) × 4; rows 0..p-1 = AR, rows p..p+q-1 = MA
double    forecast()
double[]  forecast(int l)
```

---

## 9. Common Pitfalls

**Fitting on a non-stationary series**  
Applying `AR.ols` or `ARMA.fit` directly to a trending or seasonal series
produces meaningless coefficients. Always verify stationarity (e.g., with an
augmented Dickey–Fuller test) or apply `diff()` / log-transform first.

**Choosing `p` or `q` too large**  
Overfitting produces near-unit-root AR coefficients and inflated MA terms.
Start small (order 1–3), increase only if the Ljung–Box test rejects
white-noise residuals.

**Minimum series length for ARMA**  
`ARMA.fit(x, p, q)` requires `x.length > p + q + 20 + max(p, q)`. For
ARMA(5,5) this means at least 51 observations. An `IllegalArgumentException`
is thrown otherwise.

**`AR.ols` does not guarantee stationarity**  
If the fitted AR polynomial has roots near or outside the unit circle, forecasts
may diverge. Use `AR.fit` (Yule–Walker) or check the characteristic roots
externally when stationarity is essential.

**`forecast(int l)` and the MA memory horizon**  
For ARMA, residuals beyond the training window are treated as zero. This means
the MA contribution to multi-step forecasts decays to zero after *q* steps,
and the forecast converges toward the long-run mean driven entirely by the AR
part. This is expected behaviour, not a bug.

**`ttest()` covers AR coefficients only in `AR` (not the intercept)**  
The `ttest()` matrix for `AR` has `p` rows (one per AR lag). The intercept
`b` is available via `intercept()` but does not have an associated row in the
t-test table.

---

*SMILE — Copyright © 2010-2026 Haifeng Li. GNU GPL licensed.*