---
name: ab-test-setup
description: >
  This skill should be used when the user asks to "set up an A/B test", "calculate sample size",
  "design an experiment", "analyze A/B test results", "check statistical significance",
  "determine test duration", or "evaluate conversion rate experiments".
license: MIT + Commons Clause
metadata:
  version: 1.0.0
  author: borghei
  category: marketing
  domain: experimentation
  updated: 2026-04-02
  tags: [ab-testing, experimentation, statistics, sample-size, conversion-rate]
---
# A/B Test Setup Skill

## Overview

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

## Quick Start

```bash
# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

# Design a complete A/B test plan
python scripts/test_designer.py test_config.json

# Analyze A/B test results
python scripts/results_analyzer.py results.json
```

## Tools Overview

| Tool | Purpose | Input | Output |
|------|---------|-------|--------|
| `sample_size_calculator.py` | Sample size calculation | Baseline rate, MDE, power | Required samples + duration |
| `test_designer.py` | Test plan design | JSON test config | Complete test plan document |
| `results_analyzer.py` | Results analysis | JSON with test results | Statistical analysis + recommendation |

## Workflows

### Workflow 1: New A/B Test Setup

1. Define hypothesis and success metric
2. Run `sample_size_calculator.py` with baseline conversion and minimum detectable effect
3. Create test configuration JSON (see Common Patterns)
4. Run `test_designer.py` to generate complete test plan
5. Share plan with stakeholders for alignment before launch

### Workflow 2: Test Results Analysis

1. Collect test results into JSON format
2. Run `results_analyzer.py` to get statistical significance
3. Review confidence interval, p-value, and effect size
4. Check for segment-level effects if overall result is inconclusive
5. Make ship/no-ship decision based on analysis

### Workflow 3: Experimentation Program Review

1. Compile results from multiple past tests
2. Run `results_analyzer.py --batch` on all results
3. Review win rate, average effect size, and velocity
4. Identify patterns in winning vs losing tests
5. Optimize test pipeline based on learnings

## Reference Documentation

See `references/ab-testing-guide.md` for comprehensive methodology covering:
- Statistical foundations (z-tests, confidence intervals)
- Sample size theory and trade-offs
- Common experimentation pitfalls
- Multi-variant and sequential testing
- Bayesian vs frequentist approaches

## Common Patterns

### Pattern: Test Configuration JSON
```json
{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}
```

### Pattern: Test Results JSON
```json
{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}
```

### Quick Reference: Common Effect Sizes

| Context | Small Effect | Medium Effect | Large Effect |
|---------|-------------|---------------|--------------|
| Conversion Rate | 2-5% relative | 5-15% relative | > 15% relative |
| Revenue per User | 1-3% | 3-8% | > 8% |
| Engagement Rate | 3-5% | 5-10% | > 10% |