---
name: benchmark
description: Benchmark HBF performance using Apache Bench (ab). Measures static asset serving, versioned filesystem operations, and QuickJS runtime routes. Maintains historical data for performance tracking over time.
---

# HBF Benchmark Skill

This skill provides reproducible performance benchmarking for HBF using Apache Bench (ab). It tests three main categories of operations and maintains historical data for comparison over time.

## Benchmark Categories

### 1. Static Assets (latest_fs reads)
Tests serving static files from the embedded SQLite database via the `latest_files` view.

**Endpoints:**
- `/static/style.css` - Small CSS file
- `/static/favicon.ico` - Empty file

### 2. QuickJS Runtime Routes
Tests endpoints that require QuickJS execution and JSON serialization.

**Endpoints:**
- `/hello` - Simple JSON response
- `/user/42` - Parameterized route with path parsing
- `/echo` - Echo endpoint reflecting request details

## Benchmark Parameters

Default test parameters (configurable):
- **Requests:** 1,000 requests per endpoint
- **Concurrency:** 10 concurrent connections
- **Port:** 5309 (HBF default)

These defaults provide faster test runs while maintaining statistical validity.

## Historical Data Storage

Results are stored in `/workspaces/hbf/.benchmark/results.db` using SQLite:

```sql
CREATE TABLE IF NOT EXISTS benchmark_runs (
    run_id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    git_commit TEXT,
    git_branch TEXT,
    build_mode TEXT,
    notes TEXT
);

CREATE TABLE IF NOT EXISTS benchmark_results (
    result_id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id INTEGER NOT NULL,
    category TEXT NOT NULL,
    endpoint TEXT NOT NULL,
    requests INTEGER NOT NULL,
    concurrency INTEGER NOT NULL,
    time_taken REAL NOT NULL,
    requests_per_sec REAL NOT NULL,
    time_per_request REAL NOT NULL,
    transfer_rate REAL NOT NULL,
    failed_requests INTEGER NOT NULL,
    FOREIGN KEY (run_id) REFERENCES benchmark_runs(run_id)
);
```

## Usage

When you invoke this skill, it will:

1. **Build HBF binary** (optimized release mode by default)
2. **Start server** in background on port 5309
3. **Run benchmarks** for all endpoints across three categories
4. **Store results** in historical database
5. **Display summary** with comparison to previous runs
6. **Clean up** (stop server, save results)

## Commands

### Run Full Benchmark Suite

```bash
# Default: 1k requests, concurrency 10
./benchmark.sh

# Custom parameters
./benchmark.sh --requests 50000 --concurrency 50

# With build mode specification
./benchmark.sh --build-mode opt --requests 10000

# Add notes for this run
./benchmark.sh --notes "After QuickJS optimization"
```

### View Historical Results

```bash
# Show recent benchmark runs
./show_results.sh

# Compare specific runs
./show_results.sh --compare run1 run2

# Show trend for specific endpoint
./show_results.sh --trend /hello
```

## Benchmark Script Implementation

The main `benchmark.sh` script performs:

1. **Environment setup**
   - Check prerequisites (ab, sqlite3)
   - Initialize results database
   - Get git metadata (commit, branch)

2. **Build binary**
   - `bazel build //:hbf --compilation_mode=opt`
   - Use consistent build flags for reproducibility

3. **Start server**
   - Launch in background with `--port 5309`
   - Wait for health check

4. **Run benchmarks**
   - Category 1: Static assets
   - Category 2: QuickJS routes

5. **Parse results**
   - Extract key metrics from ab output
   - Store in SQLite database

6. **Generate report**
   - Summary table with all endpoints
   - Comparison with previous run (if available)
   - Highlight regressions/improvements

7. **Cleanup**
   - Kill background servers
   - Save database

## Output Format

The skill generates a markdown-formatted report:

```
# HBF Benchmark Results

**Run ID:** 42
**Timestamp:** 2025-01-15 10:30:00
**Commit:** abc1234
**Branch:** main
**Build Mode:** opt

## Summary

| Category | Endpoint | Req/sec | Avg Time (ms) | Failed |
|----------|----------|---------|---------------|--------|
| Static | /static/style.css | 45,230 | 0.22 | 0 |
| Runtime | /hello | 38,500 | 0.26 | 0 |
| Runtime | /user/42 | 37,800 | 0.26 | 0 |
| FS Write | PUT /__dev/api/file | 8,500 | 1.18 | 0 |

## Comparison with Previous Run

| Endpoint | Previous | Current | Change |
|----------|----------|---------|--------|
| /hello | 38,200 | 38,500 | +0.8% 📈 |
| /static/style.css | 44,800 | 45,230 | +1.0% 📈 |

## Recommendations

✅ All endpoints performing within expected ranges
⚠️  Consider investigating if any regressions > 5%
```

## Files

This skill includes:

- `SKILL.md` (this file) - Skill documentation
- `benchmark.sh` - Main benchmark runner script
- `show_results.sh` - Historical results viewer
- `lib/db.sh` - Database operations helper
- `lib/server.sh` - Server lifecycle management
- `lib/parser.sh` - Apache Bench output parser

## Prerequisites

- Apache Bench (`ab`) installed (included in apache2-utils)
- SQLite3 CLI (`sqlite3`)
- Bazel build system
- Git (for commit metadata)

## Reproducibility

For reproducible benchmarks:

1. **Consistent environment:** Run on same hardware/VM
2. **Isolated execution:** Close other applications
3. **Fixed parameters:** Use same request count and concurrency
4. **Build mode:** Use `--compilation_mode=opt` for release builds
5. **System state:** Run when system is idle (low load)

## Tips

- Run benchmarks multiple times and average results
- Use `--notes` to document changes between runs
- Compare similar build modes (opt vs opt, dbg vs dbg)
- Watch for failed requests (should always be 0)
- Monitor system resources during benchmark

## Example Workflow

```bash
# Initial baseline
./benchmark.sh --notes "Baseline before optimization"

# Make code changes...
# (edit versioned filesystem code)

# Run new benchmark
./benchmark.sh --notes "After index optimization"

# Compare results
./show_results.sh --compare $(sqlite3 .benchmark/results.db "SELECT run_id FROM benchmark_runs ORDER BY run_id DESC LIMIT 2")
```

## Performance Expectations

Typical performance on modern hardware (2020+ CPU):

- **Static assets:** 40,000-60,000 req/sec
- **QuickJS routes:** 30,000-50,000 req/sec
- **FS writes:** 5,000-10,000 req/sec (limited by SQLite WAL)
- **FS reads:** 35,000-50,000 req/sec

Lower performance may indicate:
- Debug build mode (use `--compilation_mode=opt`)
- High system load
- Disk I/O bottlenecks
- Memory pressure

## Limitations

- Tests only GET and PUT operations (no DELETE benchmarks)
- Single-threaded HBF server (one core utilized)
- Local benchmarking only (no network latency)
- SQLite WAL mode may affect write performance
- Apache Bench limitation: cannot test SSE endpoints

## Future Enhancements

Potential improvements:
- Add percentile latency measurements (p50, p95, p99)
- Test different concurrency levels automatically
- Add memory profiling integration
- Support custom pod benchmarking
- Add warmup phase before measurements
- Export results to CSV/JSON for external analysis