---
name: running-performance-tests
description: 'Execute load testing, stress testing, and performance benchmarking.

  Use when performing specialized testing.

  Trigger with phrases like "run load tests", "test performance", or "benchmark the
  system".

  '
allowed-tools: Read, Write, Edit, Grep, Glob, Bash(test:perf-*)
version: 1.0.0
author: Jeremy Longshore <jeremy@intentsolutions.io>
license: MIT
tags:
- testing
- performance
- performance-tests
compatibility: Designed for Claude Code, also compatible with Codex and OpenClaw
---
# Performance Test Suite

## Overview

Execute load testing, stress testing, and performance benchmarking to identify bottlenecks, establish baseline metrics, and verify SLA compliance. Supports k6 (recommended), Artillery, Apache JMeter, Locust (Python), and autocannon (Node.js).

## Prerequisites

- Performance testing tool installed (`k6`, `artillery`, `locust`, `jmeter`, or `autocannon`)
- Target application deployed in a production-like environment (not local dev)
- Baseline performance metrics or SLA targets (e.g., p95 < 200ms, 99.9% availability)
- Monitoring stack accessible (Grafana, CloudWatch, Datadog) for resource metrics during tests
- Test data sufficient to avoid cache-only responses

## Instructions

1. Define performance test scenarios based on production traffic patterns:
   - **Load test**: Simulate expected peak traffic (e.g., 500 concurrent users for 10 minutes).
   - **Stress test**: Ramp beyond expected capacity to find the breaking point.
   - **Spike test**: Sudden burst of traffic (0 to 1000 users in 10 seconds).
   - **Soak test**: Sustained moderate load for extended duration (1-4 hours) to detect memory leaks.
2. Create test scripts targeting critical endpoints:
   - Identify the top 5-10 most-hit API endpoints from production access logs.
   - Include both read (GET) and write (POST/PUT/DELETE) operations.
   - Simulate realistic user behavior with think time between requests.
   - Use parameterized data to avoid cache-only hits (randomize query parameters, user IDs).
3. Configure load profiles:
   - Define virtual user (VU) ramp-up stages (e.g., 10 VUs for 1 minute, then 50 VUs for 5 minutes).
   - Set test duration appropriate to the scenario (load: 10-15 min, soak: 1-4 hours).
   - Configure request timeouts matching production settings.
4. Execute the performance test:
   - Run from a machine with sufficient network bandwidth and CPU.
   - Avoid running from the same host as the application under test.
   - Monitor application metrics (CPU, memory, DB connections) during execution.
5. Analyze results against SLA thresholds:
   - p50, p90, p95, p99 response times.
   - Requests per second (throughput).
   - Error rate (target: < 0.1% for load test, higher tolerance for stress test).
   - Resource utilization (CPU < 80%, memory < 85% at peak load).
6. Identify and document bottlenecks:
   - Slow database queries (check slow query logs).
   - CPU-bound operations (profiling data).
   - Memory leaks (growing RSS over soak test).
   - Connection pool exhaustion (database or HTTP client).
7. Generate a performance report with visualizations and recommendations.

## Output

- Performance test scripts (k6 `.js`, Artillery `.yml`, or Locust `.py` files)
- Execution results with response time percentiles, throughput, and error rates
- Performance report comparing results against SLA thresholds
- Bottleneck analysis with specific recommendations
- CI integration configuration for automated performance regression detection

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| Connection reset by peer | Server or load balancer dropping connections under load | Check max connections settings; increase connection pool size; verify keep-alive configuration |
| Timeouts spike at certain VU count | Application thread pool or database connection pool exhausted | Profile connection usage; increase pool size; add connection queuing; optimize slow queries |
| Inconsistent results between runs | Cache warming, garbage collection pauses, or noisy neighbor effects | Run a warm-up phase before measurement; use dedicated test infrastructure; average across 3 runs |
| Load generator CPU maxed out | Single machine cannot generate sufficient load | Distribute load generation across multiple machines; use cloud-based load generation services |
| All requests return cached responses | Test data not sufficiently varied | Randomize request parameters; use unique IDs per request; disable CDN caching for test environment |

## Examples

**k6 load test script:**

```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up
    { duration: '5m', target: 50 },   // Sustained load
    { duration: '2m', target: 200 },  // Stress  # HTTP 200 OK
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],  # 500: HTTP 200 OK
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.test.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,  # HTTP 200 OK
    'response time OK': (r) => r.timings.duration < 300,  # 300: timeout: 5 minutes
  });
  sleep(1); // Think time
}
```

**Artillery test configuration:**

```yaml
config:
  target: "https://api.test.com"
  phases:
    - duration: 120
      arrivalRate: 10
      name: "Warm up"
    - duration: 300  # 300: timeout: 5 minutes
      arrivalRate: 50
      name: "Sustained load"
  ensure:
    p95: 200  # HTTP 200 OK
    maxErrorRate: 1
scenarios:
  - flow:
      - get:
          url: "/api/products"
      - think: 1
      - post:
          url: "/api/cart"
          json: { productId: "{{ $randomString() }}" }
```

## Resources

- k6 documentation: https://grafana.com/docs/k6/latest/
- Artillery: https://www.artillery.io/docs
- Locust (Python): https://docs.locust.io/
- Apache JMeter: https://jmeter.apache.org/
- autocannon (Node.js): https://github.com/mcollina/autocannon
- Performance testing best practices: