# CI Worker Threat Model

This document describes the security boundary for docstore's CI execution
environment and the mitigations in place to limit the blast radius of malicious
user code.

## Trust boundary

The `ci-worker` binary is trusted. Everything that runs after it hands execution
to BuildKit — user-defined build steps executing inside the Kata CLH
microVM — is treated as adversarial.

```
ci-worker binary (trusted)
  ├── claims job from ci-scheduler (K8s SA proof)
  ├── fetches ci.yaml via request_token
  ├── obtains presigned archive URL via request_token
  └── hands off to BuildKit ← trust boundary
        └── user build steps (untrusted)
              ├── reads request_token from /run/secrets/
              └── has host network namespace (--oci-worker-net=host)
```

## Credentials available inside the VM

| Credential | How obtained | Notes |
|---|---|---|
| `request_token` | BuildKit secret mount at `/run/secrets/docstore_oidc_request_token` | Readable by any build step |
| OIDC token URL | BuildKit secret mount at `/run/secrets/docstore_oidc_request_url` | Needed to exchange request_token for JWT |
| GCP metadata server | Plain HTTP to `169.254.169.254` | See mitigations below |
| Docker daemon | `tcp://localhost:2375`, unauthenticated; `DOCKER_HOST` is set | Gives full container control within the VM |
| Cluster-internal network | `--oci-worker-net=host` gives build containers the VM's network namespace | Can reach cluster services |

## What the request_token can do

The `request_token` is a short-lived opaque token bound to a single CI job. It
is accepted by endpoints on the docstore server and the ci-scheduler. All
docstore endpoints enforce that `job.Repo` matches the URL path repo:

| Server | Endpoint | Purpose |
|---|---|---|
| docstore | `POST /repos/{repo}/-/archive/presign` | Get presigned source archive URL |
| docstore | `POST /repos/{repo}/-/check/{name}/logs` | Upload check run log content |
| docstore | `GET /repos/{repo}/-/ci/config` | Fetch `.docstore/ci.yaml` for the job's branch/sequence |
| docstore | `POST /repos/{repo}/-/check` | Report check run status |
| ci-scheduler | `POST /jobs/{id}/heartbeat` | Keep job alive (cluster-internal only) |
| ci-scheduler | `POST /jobs/{id}/complete` | Report job completion (cluster-internal only) |

The ci-scheduler endpoints are only reachable from within the cluster
(`ci-scheduler.docstore-ci.svc.cluster.local`). Both validate the request_token
and enforce that the token's job ID matches the URL `{id}`.

The request_token can also be exchanged at the ci-oidc endpoint for a
short-lived OIDC JWT. The audience determines what the JWT can access:

- `aud=ci-registry` — authenticate to the BuildKit layer cache registry
- `aud=docstore` — authenticate to the docstore API (see below)

## OIDC JWT (aud=docstore) permissions

The OIDC JWT is validated by the docstore server. After validation, the request
is checked against an allowlist before reaching the inner API mux:

1. The URL path repo must match `jobID.Repo` — no cross-repo access.
2. The endpoint must be permitted by the job's declared permissions.

Default permissions (no `permissions:` block in ci.yaml): `checks: write` only,
which allows `POST /repos/{own-repo}/-/check`.

Elevated permissions can be declared in `.docstore/ci.yaml`:

```yaml
permissions:
  contents: write    # commit, branch, merge, rebase, purge
  proposals: write   # open proposals, post reviews/comments
  issues: write      # create/close/comment on issues
  releases: write    # create/delete releases
  ci: write          # trigger CI runs on own repo
```

**Permissions are evaluated at job dispatch time, not at request time.** For
proposal (PR) jobs, permissions are read from the *target branch* (base branch)
ci.yaml, not the source branch. A PR cannot grant itself elevated permissions —
they only take effect after the permission change is reviewed and merged. See
[ci.md](ci.md) for details.

The `on:` trigger filter (which controls whether CI runs at all for a given
proposal) is evaluated from the *source branch* ci.yaml. A PR author can
therefore suppress or expand which base branches trigger CI on their branch, but
this does not affect permissions, which remain base-branch-gated.

## GCP metadata server

The `entrypoint-worker.sh` adds an iptables rule to block outbound traffic to
`169.254.169.254` before buildkitd and dockerd start:

```sh
iptables -I OUTPUT -d 169.254.169.254 -j DROP
```

**This is not a hard security boundary.** Because build steps run with host
networking and the Docker daemon is unauthenticated, a sufficiently motivated
build step can remove this rule and reach the metadata server:

```sh
docker run --net=host --cap-add=NET_ADMIN alpine \
  sh -c "iptables -F OUTPUT && curl -H 'Metadata-Flavor: Google' \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"
```

The real security boundary is the ci-runner GCP service account's IAM grants,
which are intentionally minimal (see below). The iptables rule is defense-in-depth.

## ci-runner GCP service account permissions

The `ci-runner@dlorenc-chainguard.iam.gserviceaccount.com` SA is bound to the
`ci-worker` Kubernetes service account via Workload Identity. Its grants are:

| Scope | Role | Rationale |
|---|---|---|
| Project | `roles/artifactregistry.reader` | Pull the ci-worker container image |

No other project-level roles. No bucket-level grants.

Notably absent and intentionally so:
- **No `roles/cloudsql.client`** — ci-worker talks to ci-scheduler over HTTP; it
  never connects to the database directly.
- **No GCS access** — log writes go through the docstore server's
  `request_token`-gated endpoint; ci-worker has no direct GCS dependency.

## ci-registry cache access

The BuildKit layer cache registry uses a separate SA
(`ci-registry@dlorenc-chainguard.iam.gserviceaccount.com`) with
`roles/storage.objectAdmin` on the cache bucket. Access is scoped at two levels:

1. **Org-level**: the OIDC JWT audience `ci-registry` is required.
2. **Repo-level**: `auth.go` enforces exact repo equality — a token for
   `acme/repo-a` can only push/pull `acme/repo-a:*` refs, not `acme/repo-b:*`.

## K8s service account token

The K8s SA token for the ci-worker pod is used to claim jobs from ci-scheduler
(k8sproof validation). The scheduler enforces one-claim-per-pod: once a pod has
claimed a job, its SA token cannot be used to claim another. A malicious build
step that steals the SA token and calls `/claim` will receive a rejection.

## What is NOT reachable

- Other tenants' `request_token`s or source archives — separate Kata VMs, no
  state sharing between jobs
- The OIDC JWT signing key — lives in GCP KMS, never touches the VM
- Cross-repo API operations — enforced at the OIDC JWT allowlist gate
- Other tenants' presigned archive URLs — `job.Repo == URL repo` enforced in
  the presign handler
- Cross-org ci-registry operations — enforced in `auth.go`
- Cloud SQL — ci-runner SA has no `cloudsql.client` grant
- Other tenants' build logs — ci-runner SA has no GCS grants; log access goes
  through the docstore server which enforces repo-level authorization

## Residual risks and future work

- **iptables bypass**: a privileged build step with Docker daemon access can
  remove the metadata server block. Mitigated by minimal SA permissions. Long-term
  fix: run buildkitd/dockerd as a separate less-privileged process, or use a
  network policy at the Kata VM level.
- **Cluster-internal network**: host networking gives build steps access to
  cluster services. The ci-scheduler and docstore server do not accept requests
  from arbitrary cluster workloads, but this is worth hardening with NetworkPolicy.
- **Cache poisoning within same org**: repo-level scoping in ci-registry
  prevents cross-repo cache poisoning. Cache integrity relies on BuildKit's
  content-addressable layer verification.