---
name: cairo-optimization
description: Use as an explicit post-test optimization pass — fixing slow loops, expensive arithmetic, integer splitting or limb assembly, modular reduction, storage slot packing, or BoundedInt type bounds. Invoke after implementation and tests pass.
license: Apache-2.0
metadata: {"author":"feltroidprime","contributors":["starknet-agentic"],"version":"1.1.0","org":"keep-starknet-strange","upstream":"https://github.com/feltroidprime/cairo-skills/tree/main/skills/cairo-coding"}
keywords: [cairo, optimization, gas, bounded-int, storage-packing, arithmetic, starknet, smart-contracts, performance, garaga]
allowed-tools: [Bash, Read, Write, Glob, Grep, Task]
user-invocable: true
---

# Coding Cairo

Rules and patterns for writing efficient Cairo code. Sourced from audit findings and production profiling.

> **Attribution:** Originally authored by [feltroidprime](https://github.com/feltroidprime) ([cairo-skills](https://github.com/feltroidprime/cairo-skills)). Integrated with permission into starknet-agentic.

## When to Use

- Implementing arithmetic (modular, parity checks, quotient/remainder)
- Optimizing loops (slow iteration, repeated `.len()` calls, index-based access)
- Splitting or assembling integer limbs (u256 → u128, u32s → u128, felt252 → u96)
- Packing struct fields into storage slots
- Using `BoundedInt` for zero-overhead arithmetic with compile-time bounds
- Choosing integer types (u128 vs u256, BoundedInt vs native types)

**Not for:** Profiling/benchmarking (use benchmarking-cairo if available)

## Quick Reference — All Rules

| # | Rule | Instead of | Use |
|---|------|-----------|-----|
| 1 | Combined quotient+remainder | `x / m` + `x % m` | `DivRem::div_rem(x, m)` |
| 2 | Cheap loop conditions | `while i < n` | `while i != n` |
| 3 | Constant powers of 2 | `2_u32.pow(k)` | `match`-based lookup table |
| 4 | Pointer-based iteration | `*data.at(i)` in index loop | `pop_front` / `for` / `multi_pop_front` |
| 5 | Cache array length | `.len()` in loop condition | `let n = data.len();` before loop |
| 6 | Pointer-based slicing | Manual loop extraction | `span.slice(start, length)` |
| 7 | Cheap parity/halving | `index & 1`, `index / 2` | `DivRem::div_rem(index, 2)` |
| 8 | Smallest integer type | `u256` when range < 2^128 | `u128` (type encodes constraint) |
| 9 | Storage slot packing | One slot per field | `StorePacking` trait |
| 10 | BoundedInt for limbs | Bitwise ops / raw u128 math | `bounded_int::{div_rem, mul, add}` |
| 11 | Fast 2-input Poseidon | `poseidon_hash_span([x,y])` | `hades_permutation(x, y, 2)` |

## Always / Never Rules

### 1. Always use `DivRem::div_rem` — never separate `%` and `/`

Cairo computes quotient and remainder in a single operation. Using both `%` and `/` on the same value doubles the cost.

```cairo
// BAD
let q = x / m;
let r = x % m;

// GOOD
let (q, r) = DivRem::div_rem(x, m);
```

### 2. Never use `<` or `>` in while loop conditions — use `!=`

Equality checks are cheaper than comparisons in Cairo.

```cairo
// BAD
while i < n { ... i += 1; }

// GOOD
while i != n { ... i += 1; }
```

### 3. Never compute `2^k` with `pow()` — use a lookup table

`u32::pow()` is expensive. Use a `match` lookup for known ranges.

```cairo
// BAD
let p = 2_u32.pow(depth.into());

// GOOD — match-based lookup
fn pow2(n: u32) -> u32 {
    match n {
        0 => 1, 1 => 2, 2 => 4, 3 => 8, 4 => 16, 5 => 32,
        6 => 64, 7 => 128, 8 => 256, 9 => 512, 10 => 1024,
        // extend as needed
        _ => core::panic_with_felt252('pow2 out of range'),
    }
}
```

### 4. Always iterate arrays with `pop_front` / `for` / `multi_pop_front` — never index-loop

Index-based access (`array.at(i)`) is more expensive than pointer-based iteration.

```cairo
// BAD
let mut i = 0;
while i != data.len() {
    let val = *data.at(i);
    i += 1;
}

// GOOD — pop_front
while let Option::Some(val) = data.pop_front() { ... }

// GOOD — for loop (equivalent)
for val in data { ... }

// GOOD — batch iteration
while let Option::Some(chunk) = data.multi_pop_front::<4>() { ... }
```

### 5. Never call `.len()` inside a loop condition — cache it

`.len()` recomputes every iteration. Store it once.

```cairo
// BAD
while i != data.len() { ... i += 1; }

// GOOD
let n = data.len();
while i != n { ... i += 1; }
```

### 6. Always use `span.slice()` instead of manual loop extraction

`slice()` manipulates pointers directly — no element-by-element copying.

```cairo
// BAD
let mut result: Array = array![];
let mut i = 0;
while i != length {
    result.append(*data.at(start + i));
    i += 1;
}

// GOOD
let result = data.slice(start, length);
```

### 7. Always use `DivRem` for parity checks — never use bitwise ops

Bitwise AND is more expensive than `div_rem` in Cairo. Use `DivRem::div_rem(x, 2)` to get both the halved value and parity in one operation.

```cairo
// BAD
let is_odd = (index & 1) == 1;
index = index / 2;

// GOOD
let (q, r) = DivRem::div_rem(index, 2);
if r == 1 { /* odd branch */ }
index = q;
```

### 8. Always use the smallest integer type that fits the value range

`u128` instead of `u256` when the range is known. Adds clarity, prevents intermediate overflow.

```cairo
// BAD — u256 for a value known to be < 2^128
fn deposit(value: u256) { assert(value < MAX_U128, '...'); ... }

// GOOD — type encodes the constraint
fn deposit(value: u128) { ... }
```

### 9. Always use `StorePacking` to pack small fields into one storage slot

Multiple small fields (basis points, flags, bounded amounts) can share a single `felt252` slot.

```cairo
use starknet::storage_access::StorePacking;

const POW_2_128: felt252 = 0x100000000000000000000000000000000;

impl MyStorePacking of StorePacking<MyStruct, felt252> {
    fn pack(value: MyStruct) -> felt252 {
        value.amount.into() + value.fee_bps.into() * POW_2_128
    }
    fn unpack(value: felt252) -> MyStruct {
        let u256 { low, high } = value.into();
        MyStruct { amount: low, fee_bps: high.try_into().unwrap() }
    }
}
```

### 10. Always use BoundedInt for byte cutting, limb assembly, and type conversions

Never use bitwise ops (`&`, `|`, shifts) or raw `u128`/`u256` arithmetic for splitting or combining integer limbs. Use `bounded_int::div_rem` to extract parts and `bounded_int::mul` + `bounded_int::add` to assemble them. BoundedInt tracks bounds at compile time, eliminating overflow checks.

**Assembling limbs** (e.g., 4 x u32 → u128):

```cairo
// BAD — direct u128 arithmetic (28,340 gas)
fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {
    d0.into() + d1.into() * POW_2_32 + d2.into() * POW_2_64 + d3.into() * POW_2_96
}

// GOOD — BoundedInt (13,840 gas, 2x faster)
fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {
    let d0_bi: u32_bi = upcast(d0);
    let d1_bi: u32_bi = upcast(d1);
    let d2_bi: u32_bi = upcast(d2);
    let d3_bi: u32_bi = upcast(d3);
    let r: u128_bi = add(add(add(d0_bi, mul(d1_bi, POW_32_UI)), mul(d2_bi, POW_64_UI)), mul(d3_bi, POW_96_UI));
    upcast(r)
}
```

**Splitting values** (e.g., felt252 → two u96 limbs):

```cairo
// GOOD — div_rem to split, mul+add to reassemble
fn felt252_to_two_u96(value: felt252) -> (u96, u96) {
    match u128s_from_felt252(value) {
        U128sFromFelt252Result::Narrow(low) => {
            let (hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);
            (lo96, upcast(hi32))
        },
        U128sFromFelt252Result::Wide((high, low)) => {
            let (lo_hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);
            let hi64: BoundedInt<0, { POW64 - 1 }> = downcast(high).unwrap();
            (lo96, bounded_int::add(bounded_int::mul(hi64, POW32_TYPED), lo_hi32))
        },
    }
}
```

**Extracting bits** (e.g., building a 4-bit selector):

```cairo
// GOOD — div_rem by 2 extracts LSB, quotient is right-shifted value
let (qu1, bit0) = bounded_int::div_rem(u1, TWO_NZ); // bit0 in {0,1}
let (qu2, bit1) = bounded_int::div_rem(u2, TWO_NZ);
let selector = add(bit0, mul(bit1, TWO_UI)); // selector in {0..3}
```

See [garaga/selectors.cairo](https://github.com/keep-starknet-strange/garaga/blob/main/src/src/ec/selectors.cairo) and [cairo-perfs-snippets](https://github.com/feltroidprime/cairo-perfs-snippets) for production examples.

### 11. Always use `hades_permutation` for 2-input Poseidon hashes

`poseidon_hash_span` has overhead for span construction. For exactly two inputs, use the permutation directly.

```cairo
// BAD
let hash = poseidon_hash_span(array![x, y].span());

// GOOD
let (hash, _, _) = hades_permutation(x, y, 2);
```

## Code Quality

- **DRY:** Extract repeated validation into helper functions. If two functions validate-then-write the same struct, extract a shared `_set_config()`.
- **`scarb fmt`:** Run before every commit.
- **`.tool-versions`:** Pin Scarb (2.15.1) and Starknet Foundry (0.56.0) versions with ASDF for reproducible builds.
- **Keep dependencies updated:** Newer Scarb/Foundry versions include gas optimizations and compiler improvements.

---

## BoundedInt Optimization

`BoundedInt<MIN, MAX>` encodes value constraints in the type system, eliminating runtime overflow checks. Use the CLI tool (`bounded_int_calc.py` in this skill directory) to compute bounds — do NOT calculate manually.

### Critical Architecture Decision: Avoid Downcast

**The #1 optimization pitfall:** Converting between `u16`/`u32`/`u64` and `BoundedInt` at function boundaries.

#### The Problem

If your functions take `u16` and return `u16`, you must:
1. `downcast` input to `BoundedInt` (expensive — requires range check)
2. Do bounded arithmetic (cheap)
3. `upcast` result back to `u16` (cheap but wasteful)

The `downcast` operation adds a range check that **dominates the savings** from bounded arithmetic. In profiling:
- `downcast`: 161,280 steps (18.86%)
- `bounded_int_div_rem`: 204,288 steps (23.89%)
- Total bounded approach: worse than original!

#### The Solution: BoundedInt Throughout

**Use `BoundedInt` types as function inputs AND outputs.** This eliminates downcast entirely.

```cairo
// BAD: Converts at every call (downcast overhead kills performance)
pub fn add_mod(a: u16, b: u16) -> u16 {
    let a: Zq = downcast(a).expect('overflow'); // EXPENSIVE
    let b: Zq = downcast(b).expect('overflow'); // EXPENSIVE
    let sum: ZqSum = add(a, b);
    let (_q, rem) = bounded_int_div_rem(sum, nz_q);
    upcast(rem)
}

// GOOD: BoundedInt in, BoundedInt out (no downcast)
pub fn add_mod(a: Zq, b: Zq) -> Zq {
    let sum: ZqSum = add(a, b);
    let (_q, rem) = bounded_int_div_rem(sum, nz_q);
    rem
}
```

#### Refactoring Strategy

When optimizing existing code:
1. **Identify the hot path** — profile to find which functions use modular arithmetic heavily
2. **Change signatures** — update function inputs/outputs to use `BoundedInt` types
3. **Propagate types outward** — callers must also use `BoundedInt`
4. **Downcast only at boundaries** — convert from u16/u32 only at system entry points (e.g., deserialization)

#### Type Conversion Rules

| From | To | Operation | Cost |
|------|-----|-----------|------|
| `u16` | `BoundedInt<0, 65535>` | `upcast` | Free (superset) |
| `u16` | `BoundedInt<0, 12288>` | `downcast` | **Expensive** (range check) |
| `BoundedInt<0, 12288>` | `u16` | `upcast` | Free (subset) |
| `BoundedInt<A, B>` | `BoundedInt<C, D>` where [A,B] ⊆ [C,D] | `upcast` | Free |
| `BoundedInt<A, B>` | `BoundedInt<C, D>` where [A,B] ⊄ [C,D] | `downcast` | **Expensive** |

**Key insight:** `upcast` only works when target range is a **superset** of source range. You cannot upcast `u32` to `BoundedInt<0, 150994944>` because `u32` max (4294967295) > 150994944.

### Prerequisites

```toml
# Scarb.toml
[dependencies]
corelib_imports = "0.1.2"
```

```cairo
// CORRECT imports — copy exactly
use corelib_imports::bounded_int::{
    BoundedInt, upcast, downcast, bounded_int_div_rem,
    AddHelper, MulHelper, DivRemHelper, UnitInt,
};
use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul};
```

### Copy-Paste Template

Working example for modular arithmetic mod 100:

```cairo
use corelib_imports::bounded_int::{
    BoundedInt, upcast, downcast, bounded_int_div_rem,
    AddHelper, MulHelper, DivRemHelper, UnitInt,
};
use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul};

type Val = BoundedInt<0, 99>;          // [0, 99]
type ValSum = BoundedInt<0, 198>;      // [0, 198]
type ValConst = UnitInt<100>;          // singleton {100}

impl AddValImpl of AddHelper<Val, Val> {
    type Result = ValSum;
}

impl DivRemValImpl of DivRemHelper<ValSum, ValConst> {
    type DivT = BoundedInt<0, 1>;
    type RemT = Val;
}

fn add_mod_100(a: Val, b: Val) -> Val {
    let sum: ValSum = add(a, b);
    let nz_100: NonZero<ValConst> = 100;
    let (_q, rem) = bounded_int_div_rem(sum, nz_100);
    rem
}
```

### CLI Tool

Use `bounded_int_calc.py` in this skill directory. **Always use CLI — never calculate manually.**

```bash
# Addition: [a_lo, a_hi] + [b_lo, b_hi]
python3 bounded_int_calc.py add 0 12288 0 12288
# -> BoundedInt<0, 24576>

# Subtraction: [a_lo, a_hi] - [b_lo, b_hi]
python3 bounded_int_calc.py sub 0 12288 0 12288
# -> BoundedInt<-12288, 12288>

# Multiplication
python3 bounded_int_calc.py mul 0 12288 0 12288
# -> BoundedInt<0, 150994944>

# Division: quotient and remainder bounds
python3 bounded_int_calc.py div 0 24576 12289 12289
# -> DivT: BoundedInt<0, 1>, RemT: BoundedInt<0, 12288>

# Custom impl name
python3 bounded_int_calc.py mul 0 12288 0 12288 --name MulZqImpl
```

### BoundedInt Bounds Quick Reference

| Operation | Formula |
|-----------|---------|
| Add | `[a_lo + b_lo, a_hi + b_hi]` |
| Sub | `[a_lo - b_hi, a_hi - b_lo]` |
| Mul (unsigned) | `[a_lo * b_lo, a_hi * b_hi]` |
| Div quotient | `[a_lo / b_hi, a_hi / b_lo]` |
| Div remainder | `[0, b_hi - 1]` |

### Negative Dividends: SHIFT Pattern

`bounded_int_div_rem` doesn't support negative lower bounds. When a subtraction produces a negative-bounded result that needs reduction, add a multiple of the modulus first:

```cairo
// sub_mod: (a - b) mod Q via SHIFT
pub fn sub_mod(a: Zq, b: Zq) -> Zq {
    let a_plus_q: BoundedInt<12289, 24577> = add(a, Q_CONST); // shift by +Q
    let diff: BoundedInt<1, 24577> = sub(a_plus_q, b);        // now non-negative
    let (_q, rem) = bounded_int_div_rem(diff, nz_q());
    rem
}

// fused_sub_mul_mod: a - (b*c) mod Q via large SHIFT
// OFFSET = 12288 * Q = 151007232 (smallest multiple of Q >= max product)
pub fn fused_sub_mul_mod(a: Zq, b: Zq, c: Zq) -> Zq {
    let prod: ZqProd = mul(b, c);
    let a_offset: BoundedInt<151007232, 151019520> = add(a, OFFSET_CONST);
    let diff: BoundedInt<12288, 151019520> = sub(a_offset, prod);
    let (_q, rem) = bounded_int_div_rem(diff, nz_q());
    rem
}
```

Rule: SHIFT = `ceil(|min_possible_value| / modulus) * modulus`. Adding SHIFT preserves the result mod Q (since SHIFT ≡ 0 mod Q) while making all values non-negative.

### Common BoundedInt Mistakes

- **Downcast at every function call** — the biggest performance killer. Use `BoundedInt` types throughout, not just inside arithmetic functions.
- **Trying to upcast to a narrower type** — `upcast(val: u32)` to `BoundedInt<0, 150994944>` fails because u32 max > 150994944.
- **Wrong imports** — use exact imports from Prerequisites section above.
- **Wrong subtraction bounds** — it's `[a_lo - b_hi, a_hi - b_lo]`, NOT `[a_lo - b_lo, a_hi - b_hi]`.
- **Negative dividend in `bounded_int_div_rem`** — div_rem doesn't support negative lower bounds. Add a SHIFT (multiple of modulus) before reducing. See SHIFT pattern above.
- **Missing intermediate types** — always annotate: `let sum: ZqSum = add(a, b);`
- **Division quotient off-by-one** — integer division floors: `24576 / 12289 = 1`, not 2.
- **Using `UnitInt` vs `BoundedInt` for constants** — use `UnitInt<N>` for singleton constants like divisors.
- **Using `div_rem` vs `bounded_int_div_rem`** — the function is `bounded_int_div_rem`, not `div_rem`.
- **Bounds exceed u128::max** — BoundedInt bounds are hard-capped at 2^128. Larger values crash the Sierra specializer: 'Provided generic argument is unsupported.'