# PQC KV Cache Encryption

![PQC Native](https://img.shields.io/badge/PQC-Native-blue)
![ML-KEM-768](https://img.shields.io/badge/ML--KEM--768-FIPS%20203-green)
![AES-256-GCM](https://img.shields.io/badge/AES--256--GCM-NIST%20SP%20800--38D-teal)
![License](https://img.shields.io/badge/License-Apache%202.0-orange)
![Version](https://img.shields.io/badge/version-0.1.0-lightgrey)

**Per-tenant, quantum-safe encryption for the LLM KV cache.** Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh **AES-256-GCM** envelope whose key is derived per session via **ML-KEM-768**, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.

## The Problem

Long-context LLM inference keeps past token activations in the **KV cache** - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:

- **Side-channel reads.** A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.
- **Cross-request leakage.** A bug in cache eviction or session routing can hand one tenant's intermediate state to another.
- **Harvest-now-decrypt-later.** Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.
- **Regulated workloads.** Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.

## The Solution

- **ML-KEM-768** derives a fresh 32-byte symmetric key per `TenantSession`. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to [`quantumshield`](https://github.com/dyber-pqc/quantumshield).
- **AES-256-GCM** encrypts every `KVCacheEntry`. One nonce per entry, AAD binds `EntryMetadata` + `sequence_number` + `key_len` so tampering with layer/position/sequence surfaces as a `DecryptionError`.
- **`TenantIsolationManager`** holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises `TenantIsolationError` before AES touches the bytes.
- **`KeyRotationPolicy`** rotates the per-session key after N entries or T seconds, resetting the sequence counter.
- **`KVAuditLog`** is append-only and records `encrypt`, `decrypt`, `rotate`, and `isolation-violation` events.

## Installation

```bash
pip install pqc-kv-cache-encryption
```

Development:

```bash
pip install -e ".[dev]"
```

## Quick Start

```python
import os

from pqc_kv_cache import (
    CacheDecryptor,
    CacheEncryptor,
    EntryMetadata,
    KVCacheEntry,
    TenantIdentity,
    establish_tenant_session,
)

# 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).
tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")
session = establish_tenant_session(tenant)

# 2. Wrap a KV cache entry in a signed envelope.
meta = EntryMetadata(
    tenant_id=tenant.tenant_id,
    session_id=session.session_id,
    layer_idx=0,
    position=12,
    token_id=2048,
)
entry = KVCacheEntry(
    metadata=meta,
    key_tensor_bytes=os.urandom(64),   # raw bytes of K vector
    value_tensor_bytes=os.urandom(64), # raw bytes of V vector
)
enc = CacheEncryptor(session).encrypt_entry(entry)

# 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.
decrypted = CacheDecryptor(session).decrypt_entry(enc)
assert decrypted.key_tensor_bytes == entry.key_tensor_bytes
```

Multi-tenant with strict isolation:

```python
from pqc_kv_cache import TenantIsolationManager, TenantIsolationError

mgr = TenantIsolationManager()
mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))
mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))

alice_enc = mgr.encrypt("tenant-alice", alice_entry)

# Bob can NEVER decrypt Alice's entry, even when using his own valid session.
try:
    mgr.decrypt("tenant-bob", alice_enc)
except TenantIsolationError:
    print("blocked at the isolation boundary")
```

## Architecture

```
+-----------------------------+              +-----------------------------+
|  Tenant Alice               |              |  Tenant Bob                 |
|  (client)                   |              |  (client)                   |
+--------------+--------------+              +--------------+--------------+
               |                                             |
               |  ML-KEM-768 handshake (per session)         |
               v                                             v
+---------------------------------------------------------------------------+
|                   Inference Server (multi-tenant)                         |
|                                                                           |
|  TenantIsolationManager                                                   |
|    +------------------------+        +------------------------+           |
|    | TenantSession (alice)  |        | TenantSession (bob)    |           |
|    |   symmetric_key (32B)  |        |   symmetric_key (32B)  |           |
|    |   next_sequence        |        |   next_sequence        |           |
|    |   entries_encrypted    |        |   entries_encrypted    |           |
|    +----------+-------------+        +----------+-------------+           |
|               |                                 |                         |
|               v                                 v                         |
|    CacheEncryptor / CacheDecryptor   CacheEncryptor / CacheDecryptor      |
|       AES-256-GCM + AAD                 AES-256-GCM + AAD                 |
|       + tenant-id enforcement           + tenant-id enforcement           |
|               |                                 |                         |
|               v                                 v                         |
|    +---------------------+        +---------------------+                 |
|    | EncryptedEntry      |        | EncryptedEntry      |                 |
|    |  (alice ciphertext) |        |  (bob ciphertext)   |                 |
|    +---------+-----------+        +---------+-----------+                 |
|              |                              |                             |
|              +-----------+------------------+                             |
|                          v                                                |
|             +---------------------------+                                 |
|             |  KV cache in GPU/host RAM |  (only ciphertext lives here)   |
|             +---------------------------+                                 |
|                                                                           |
|  KeyRotationPolicy  -- rotates session keys on entry count / age          |
|  KVAuditLog         -- encrypt / decrypt / rotate / isolation-violation   |
+---------------------------------------------------------------------------+
```

## Cryptography

| Primitive                  | Purpose                                                     | Algorithm     |
| -------------------------- | ----------------------------------------------------------- | ------------- |
| Per-session key            | Fresh 32-byte symmetric key per tenant session              | ML-KEM-768    |
| Per-entry encryption       | Confidentiality + integrity of K/V tensor bytes             | AES-256-GCM   |
| AAD binding                | `EntryMetadata` + `sequence_number` + `key_len` -> tag      | AES-GCM tag   |
| Session-key derivation     | SHA3-256 over KEM keypair bytes (production: Decapsulate)   | SHA3-256      |

Signing and KEM keys are delegated to [`quantumshield`](https://github.com/dyber-pqc/quantumshield), which prefers real `liboqs` ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.

## Threat Model

| Adversary capability                                            | Coverage                                                                      |
| --------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| Read KV cache pages for another tenant                          | All entries are AES-256-GCM encrypted; attacker sees only ciphertext.         |
| Replay a previously captured `EncryptedEntry`                   | `CacheDecryptor` tracks seen nonces and raises `NonceReplayError`.            |
| Tamper with `EntryMetadata` (layer_idx, position, tenant_id)    | AAD binding -> AES-GCM tag fails -> `DecryptionError`.                        |
| Submit another tenant's ciphertext through a valid session      | `TenantIsolationError` raised before AES touches bytes.                       |
| Long-lived session key exposure                                 | `KeyRotationPolicy` rotates on entry-count / age; sequence counter resets.    |
| Session outlives its TTL                                        | `SessionExpiredError` on every encrypt/decrypt after `expires_at`.            |
| Harvest-now-decrypt-later on the KEM handshake                  | ML-KEM-768 provides IND-CCA2 security under quantum adversaries.              |
| Orphaned tenant state after disconnect                          | `close_session()` drops the session and its key from memory.                  |

## Performance Considerations

This library is written in pure Python and is intended as the **cryptographic envelope** for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:

- A CUDA / ROCm kernel that operates on the K/V tensors in device memory.
- A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).
- A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.

The envelope formats (`EncryptedEntry`, AAD shape, `TenantSession` state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.

## API Reference

### `TenantIdentity`
`tenant_id: str`, `display_name: str = ""` — frozen dataclass identifying a tenant.

### `establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession`
Derive a fresh 32-byte symmetric key for `tenant` via ML-KEM-768 and return a `TenantSession`.

### `TenantSession`
Holds `symmetric_key`, `next_sequence`, `entries_encrypted`, `created_at`, `expires_at`. Methods: `is_valid()`, `check_valid()`, `consume_sequence()`, `rotate_key(new_key)`, `to_public_dict()`.

### `KVCacheEntry` / `EncryptedEntry` / `EntryMetadata`
`KVCacheEntry` holds `metadata`, `key_tensor_bytes`, `value_tensor_bytes`. `EncryptedEntry` holds `metadata`, `nonce` (hex), `ciphertext` (hex), `key_len`, `sequence_number`. `EntryMetadata` is frozen and carries `tenant_id`, `session_id`, `layer_idx`, `position`, `token_id`, `kv_role`.

### `CacheEncryptor(session)` / `CacheDecryptor(session)`
`encrypt_entry(KVCacheEntry) -> EncryptedEntry` and `decrypt_entry(EncryptedEntry) -> KVCacheEntry`. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.

### `KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)`
`should_rotate(session) -> (bool, RotationTrigger | None)` and `rotate(session) -> bytes` (new 32-byte key). `RotationTrigger` is `ENTRY_COUNT`, `TIME_ELAPSED`, or `MANUAL`.

### `TenantIsolationManager`
`create_session(tenant)`, `get_session(tenant_id)`, `encrypt(tenant_id, entry)`, `decrypt(tenant_id, enc)`, `close_session(tenant_id)`, `list_active_tenants()`.

### `KVAuditLog` / `KVAuditEntry`
`log_encrypt(...)`, `log_decrypt(...)`, `log_rotate(...)`, `log_isolation_violation(...)`, `entries(limit, tenant_id, operation)`, `export_json()`.

### Errors
All under `KVCacheError`: `TenantIsolationError`, `SessionExpiredError`, `DecryptionError`, `NonceReplayError`, `KeyRotationRequiredError`, `UnknownTenantError`.

## Why PQC Matters for the KV Cache

Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:

- **Healthcare (HIPAA):** 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.
- **Finance (SEC 17a-4, MiFID II):** 5-7 year retention on all communications with a client, including AI-assisted drafting.
- **Legal (privilege / e-discovery):** communications privilege only survives if the confidentiality chain is intact.

The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.

## Examples

- `examples/basic_kv_encryption.py` - single tenant, encrypt/decrypt 3 entries, inspect audit log.
- `examples/multi_tenant_isolation.py` - Alice and Bob co-resident, cross-tenant decrypt is rejected.
- `examples/key_rotation.py` - `KeyRotationPolicy` with `max_entries=5`, observe rotation mid-stream.

## License

Apache License 2.0 - see `LICENSE`.