# Installation

This section walks through everything you need to get `cargo oxide run vecadd` working on your machine.

---

## Prerequisites

| Requirement      | Version             | Notes                                                         |
|------------------|---------------------|---------------------------------------------------------------|
| **Linux**        | Ubuntu 24.04 tested | Other distros may work but are untested                       |
| **NVIDIA GPU**   | Ampere+ (sm_80+)    | Driver 545+ recommended                                       |
| **CUDA Toolkit** | 12.x+               | `nvcc` and `cuda.h` must be available                         |
| **LLVM**         | 21+                 | Must include the NVPTX backend                                |
| **Clang**        | 21+                 | `clang-21` — needed by `bindgen` for host `cuda-bindings`     |
| **Rust**         | Nightly (pinned)    | Pinned in `rust-toolchain.toml`                               |

:::{note}
cuda-oxide currently targets **Linux only**. Windows is not supported.
:::

---

## Dev Container

The repository includes a standard devcontainer setup in `.devcontainer/`.
Using it is the quickest way to get a reproducible development environment with
CUDA Toolkit 13.0, LLVM 21, Clang 21, and the pinned Rust nightly already
installed.

The host does not need the CUDA Toolkit installed. It does need:

- an NVIDIA GPU
- an NVIDIA driver compatible with CUDA 13.0
- Docker with the NVIDIA Container Toolkit installed

With a devcontainer-aware editor, open the repository and choose "Reopen in
Container" when prompted. The editor reads `.devcontainer/devcontainer.json`,
builds the image, requests GPU access with `--gpus=all`, and opens the checkout
inside the container.

For CLI-only usage, start the container with:

```bash
npx -y @devcontainers/cli up --workspace-folder .
```

Then run commands inside it with:

```bash
npx -y @devcontainers/cli exec --workspace-folder . cargo oxide doctor
npx -y @devcontainers/cli exec --workspace-folder . cargo oxide run vecadd
```

If the host driver is too old, GPU commands such as `nvidia-smi`,
`cargo oxide doctor`, or `cargo oxide run vecadd` will fail inside the
container. Update the host NVIDIA driver rather than installing a different
CUDA Toolkit in the container.

If you use the devcontainer, you can skip the manual CUDA, LLVM, Clang, and
Rust setup sections below.

---

## CUDA Toolkit

Install the CUDA Toolkit from the [NVIDIA CUDA Downloads](https://developer.nvidia.com/cuda-downloads) page, then make sure it is on your `PATH`:

```bash
export PATH="/usr/local/cuda/bin:$PATH"
```

Verify the install:

```bash
nvcc --version
```

:::{tip}
If you installed CUDA to a non-default location, set `CUDA_TOOLKIT_PATH` to its root directory (the one containing `include/cuda.h`). If unset, cuda-oxide defaults to `/usr/local/cuda`.
:::

---

## LLVM (21+)

cuda-oxide uses LLVM's NVPTX backend to lower LLVM IR to PTX. Install LLVM 21 or newer and make sure `llc-21` (or `llc-22`) is on your `PATH`:

```bash
# Ubuntu / Debian
sudo apt install llvm-21
```

If your distro packages do not provide `llvm-21`, use LLVM's apt helper:

```bash
sudo apt-get install -y lsb-release wget software-properties-common gnupg
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh 21
```

Verify that the NVPTX target is present:

```bash
llc-21 --version | grep nvptx
```

You should see a line containing `nvptx64` in the registered targets. The
pipeline auto-discovers `llc-22` and `llc-21` in that order; pin a specific
binary with `CUDA_OXIDE_LLC=/usr/bin/llc-21` if needed.

:::{warning}
A stock LLVM build without the NVPTX backend will not work. The `llvm-21` Ubuntu package includes it by default, but if you build LLVM from source you must pass `-DLLVM_TARGETS_TO_BUILD="X86;NVPTX"` to CMake.
:::

:::{important}
**Why LLVM 21?** We emit TMA / tcgen05 / WGMMA intrinsics that `llc` from LLVM 20 and earlier can't handle. Simple kernels might still work with an older `llc` (set `CUDA_OXIDE_LLC=/path/to/llc-20`), but anything Hopper / Blackwell needs 21+.
:::

---

## Clang (host `cuda-bindings`)

The host `cuda-bindings` crate runs [`bindgen`](https://github.com/rust-lang/rust-bindgen), which loads `libclang.so` at runtime and needs clang's own resource-dir `stddef.h` — a bare `libclang1-*` runtime is not enough. Three packages cover both halves:

```bash
sudo apt install libclang-21-dev libclang-cpp21-dev libclang-common-21-dev
```

`cargo oxide doctor` catches this up front; the symptom otherwise is `'stddef.h' file not found` during the host build.

:::{note}
**Fresh Ubuntu 24.04 / DGX-OS:** after installing LLVM 21 via `apt.llvm.org/llvm.sh` as shown above, the versioned `clang-21` / `clang++-21` binaries are present but the unversioned aliases `cargo oxide doctor` looks for are not. Add them with `update-alternatives`:

```bash
sudo update-alternatives --install /usr/bin/clang   clang   /usr/bin/clang-21   100
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-21 100
```

Validated on Asus GX10 / NVIDIA DGX Spark.
:::

---

## Rust toolchain

The workspace ships a `rust-toolchain.toml` that pins the exact nightly version and required components. When you first run any `cargo` command inside the repo, `rustup` will install the correct toolchain automatically.

If you need to install it manually:

```bash
rustup toolchain install nightly-2026-04-03
rustup component add rust-src rustc-dev rust-analyzer --toolchain nightly-2026-04-03
```

The two extra components are required by the codegen backend:

- `rust-src` -- source of the Rust standard library, needed for cross-compiling to the NVPTX target.
- `rustc-dev` -- compiler internals that the backend links against.

---

## cargo-oxide

`cargo-oxide` is the cargo subcommand that drives the entire build pipeline (`cargo oxide run`, `build`, `debug`, `pipeline`, etc.).

**Inside the cuda-oxide repo**, it works out of the box via a workspace alias -- no extra install step.

**For use outside the repo** (your own projects):

```bash
cargo install --git https://github.com/NVlabs/cuda-oxide.git cargo-oxide
```

On first run, `cargo-oxide` will automatically fetch and build the codegen backend. Subsequent runs reuse the cached build.

---

## Verifying your installation

Run the built-in diagnostics check:

```bash
cargo oxide doctor
```

`cargo oxide doctor` validates your Rust toolchain, CUDA toolkit (including libNVVM / nvJitLink / libdevice for kernels that use math intrinsics), LLVM installation, and codegen backend in one shot.

Then build and run an example end-to-end:

```bash
cargo oxide run vecadd
```

If everything is configured correctly, this compiles a Rust kernel to PTX, launches it on the GPU, and prints a success message.

:::{tip}
**Common issues:**

- `No working llc-21 or llc-22 found on PATH` -- install LLVM 21+ (`sudo apt install llvm-21`), add `/usr/lib/llvm-21/bin` to your `PATH`, or set `CUDA_OXIDE_LLC=/usr/bin/llc-21`.
- `'stddef.h' file not found` when building host `cuda-bindings` -- install clang dev headers: `sudo apt install clang-21` (or `libclang-common-21-dev`).
- `cuda.h not found` -- Set `CUDA_TOOLKIT_PATH` to your CUDA install root, or ensure `/usr/local/cuda/include/cuda.h` exists.
- `rust-src component missing` -- Run `rustup component add rust-src --toolchain nightly-2026-04-03`.
:::