---
name: test-property-based
description: |
  Property-based testing with Hypothesis for Python projects. Use when writing property tests,
  testing invariants, generating test cases, fuzz testing, roundtrip testing, or validating
  behavior across many inputs. Triggers on "property test", "hypothesis test", "generate test
  cases", "invariant testing", "edge case testing", "stateful testing", "roundtrip test".
  Works with Python (.py) test files, pytest, pytest-asyncio, and Pydantic models.
allowed-tools:
  - Bash
  - Read
  - Grep
  - Glob
  - Edit
  - Write
---

# Property-Based Testing with Hypothesis

## Quick Start

Property-based testing automatically generates hundreds of test cases to validate invariants:

```python
from hypothesis import given, strategies as st

# Instead of writing many example tests...
# def test_sort_1(): assert sorted([3,1,2]) == [1,2,3]
# def test_sort_2(): assert sorted([]) == []
# ... (20 more examples)

# Write ONE property test that covers ALL cases
@given(st.lists(st.integers()))
def test_sort_idempotent(lst):
    """Property: Sorting twice gives same result as once."""
    once_sorted = sorted(lst)
    twice_sorted = sorted(once_sorted)
    assert once_sorted == twice_sorted
```

**Hypothesis automatically generates 100+ test cases** including edge cases you'd never think of:
empty lists, single elements, duplicates, large lists, negative numbers, etc.

## Table of Contents

1. [When to Use This Skill](#when-to-use-this-skill)
2. [What This Skill Does](#what-this-skill-does)
3. [Core Concepts](#core-concepts)
   - [Strategies](#strategies)
   - [The @given Decorator](#the-given-decorator)
   - [Shrinking](#shrinking)
   - [Custom Strategies](#custom-strategies)
4. [Step-by-Step Workflow](#step-by-step-workflow)
5. [Common Property Patterns](#common-property-patterns)
6. [Integration with pytest](#integration-with-pytest)
7. [Async Property Testing](#async-property-testing)
8. [Pydantic Model Testing](#pydantic-model-testing)
9. [Configuration](#configuration)
10. [Supporting Files](#supporting-files)
11. [Expected Outcomes](#expected-outcomes)
12. [Requirements](#requirements)
13. [Red Flags to Avoid](#red-flags-to-avoid)

## When to Use This Skill

### Explicit Triggers
Use this skill when users mention:
- "property test"
- "hypothesis test"
- "generate test cases"
- "fuzz testing"
- "invariant testing"
- "roundtrip test"
- "stateful testing"
- "edge case testing"
- "test with random data"

### Implicit Triggers
Use when you observe:
- Manual writing of many similar example tests
- Testing parsing/serialization (perfect for roundtrip properties)
- Validating configuration classes (especially Pydantic models)
- Testing algorithms with mathematical properties
- Protocol message handling (IPC, API requests/responses)
- State machine behavior

### Debugging Triggers
Use when:
- Edge case bugs slip through example-based tests
- Need more comprehensive input coverage
- Test suite misses corner cases
- Validating refactored code behavior

## What This Skill Does

This skill guides you through:

1. **Installing Hypothesis** - Add to project dependencies
2. **Writing property tests** - Transform example tests into property-based tests
3. **Choosing strategies** - Select appropriate data generators
4. **Creating custom strategies** - Build domain-specific generators
5. **Async integration** - Combine with pytest-asyncio
6. **Pydantic integration** - Test Pydantic models automatically
7. **Configuration** - Set up profiles for dev/CI/thorough testing
8. **Stateful testing** - Test state machines and complex workflows

**Philosophy:** Instead of "here are 5 examples that should work", write "here's a property that should ALWAYS hold" and let Hypothesis find edge cases.

## Core Concepts

### Strategies

Strategies describe the type of data Hypothesis should generate:

```python
from hypothesis import strategies as st

# Basic types
st.integers()                           # All integers
st.integers(min_value=0, max_value=100) # Constrained range
st.floats(allow_nan=False)              # Floats without NaN
st.text()                               # Unicode strings
st.text(alphabet="abc", min_size=1)     # Limited alphabet
st.binary()                             # Bytes

# Collections
st.lists(st.integers())                 # Lists of integers
st.dictionaries(st.text(), st.integers()) # Dict[str, int]
st.sets(st.text(), min_size=1)          # Non-empty sets
st.tuples(st.text(), st.integers())     # (str, int) tuples

# Special
st.one_of(st.integers(), st.text())     # Union types
st.none()                               # None values
st.uuids()                              # UUID objects
st.datetimes()                          # datetime objects
```

**See [references/strategies-reference.md](references/strategies-reference.md) for complete strategy catalog.**

### The @given Decorator

The `@given` decorator runs your test function with generated data:

```python
from hypothesis import given, strategies as st

@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
    """Addition should be commutative."""
    assert a + b == b + a

@given(st.lists(st.integers()))
def test_sort_preserves_length(lst):
    """Sorting preserves list length."""
    assert len(sorted(lst)) == len(lst)
```

**Default behavior:** Runs 100 examples (configurable via settings).

### Shrinking

When Hypothesis finds a failing test, it **automatically minimizes** the input:

```python
@given(st.lists(st.integers()))
def test_sum_positive(lst):
    assert sum(lst) >= 0  # Fails for negative numbers

# Hypothesis reports: lst=[-1]
# NOT lst=[-9999, -42, -1, -8888] (the random case it found)
```

**This is invaluable for debugging** - you get the minimal failing case, not a complex random one.

### Custom Strategies

For complex domain objects, build custom strategies with `@composite`:

```python
from hypothesis import strategies as st
from hypothesis.strategies import composite

@composite
def valid_emails(draw):
    """Generate valid email addresses."""
    username = draw(st.text(alphabet=st.characters(
        whitelist_categories=('Ll', 'Lu', 'Nd'),
        min_codepoint=ord('a')
    ), min_size=1, max_size=20))

    domain = draw(st.text(alphabet=st.characters(
        whitelist_categories=('Ll',),
        min_codepoint=ord('a')
    ), min_size=1, max_size=15))

    tld = draw(st.sampled_from(['com', 'org', 'net', 'io']))

    return f"{username}@{domain}.{tld}"

@given(valid_emails())
def test_email_parsing(email):
    """Test parsing of valid email addresses."""
    assert '@' in email
    assert '.' in email.split('@')[1]
```

**See [references/patterns-catalog.md](references/patterns-catalog.md) for more custom strategy patterns.**

## Step-by-Step Workflow

### Step 1: Install Hypothesis

```bash
# Add to project dependencies
uv add --dev hypothesis

# Verify installation
python -c "import hypothesis; print(hypothesis.__version__)"
```

### Step 2: Identify Properties to Test

Look for:
- **Invariants** - Things that should always be true
- **Roundtrips** - Serialize → Deserialize → Should equal original
- **Idempotency** - Operation twice = operation once
- **Commutativity** - Order doesn't matter
- **Consistency** - Related operations agree

**Example:** Testing a JSON serializer:
- Property: `parse(serialize(obj)) == obj` (roundtrip)
- Property: `serialize(obj)` returns valid JSON string
- Property: All serialized objects are parseable

### Step 3: Choose Strategies

Map your data types to Hypothesis strategies:

```python
# Simple types
int → st.integers()
str → st.text()
bool → st.booleans()

# Collections
List[int] → st.lists(st.integers())
Dict[str, int] → st.dictionaries(st.text(), st.integers())
Optional[str] → st.one_of(st.text(), st.none())

# Domain models (Pydantic)
MyModel → builds(MyModel)
```

### Step 4: Write Property Test

```python
from hypothesis import given, strategies as st

@given(st.dictionaries(st.text(), st.text()))
def test_json_roundtrip(data):
    """Property: All dicts should roundtrip through JSON."""
    import json
    serialized = json.dumps(data)
    parsed = json.loads(serialized)
    assert parsed == data
```

### Step 5: Run and Observe

```bash
# Run property test
pytest tests/test_properties.py -v

# Show statistics
pytest --hypothesis-show-statistics

# Reproduce specific failure
pytest --hypothesis-seed=12345
```

### Step 6: Refine if Needed

If test generates invalid inputs:
- Add constraints to strategy
- Use `assume()` to filter (sparingly)
- Create custom strategy with `@composite`

```python
from hypothesis import given, assume, strategies as st

@given(st.lists(st.integers()))
def test_with_filtering(lst):
    # AVOID: Too much filtering (slow)
    assume(len(lst) > 0)  # Better: st.lists(st.integers(), min_size=1)
    assume(all(x >= 0 for x in lst))  # Better: st.lists(st.integers(min_value=0))
    ...
```

## Common Property Patterns

### 1. Roundtrip Testing

**Pattern:** Serialize → Deserialize → Should equal original

```python
@given(builds(MyModel))
def test_model_json_roundtrip(model):
    """Property: Models roundtrip through JSON."""
    json_str = model.model_dump_json()
    restored = MyModel.model_validate_json(json_str)
    assert restored == model
```

### 2. Invariant Testing

**Pattern:** Some property should always hold

```python
@given(st.lists(st.integers()))
def test_sort_ordered(lst):
    """Property: Sorted list should be in ascending order."""
    sorted_lst = sorted(lst)
    for i in range(len(sorted_lst) - 1):
        assert sorted_lst[i] <= sorted_lst[i + 1]
```

### 3. Idempotency Testing

**Pattern:** Operation twice = operation once

```python
@given(st.text())
def test_normalize_idempotent(text):
    """Property: Normalizing twice gives same result."""
    once = normalize(text)
    twice = normalize(once)
    assert once == twice
```

### 4. Commutativity Testing

**Pattern:** Order doesn't matter

```python
@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
    """Property: a + b == b + a."""
    assert a + b == b + a
```

### 5. Consistency Testing

**Pattern:** Different paths to same result should agree

```python
@given(st.lists(st.integers()))
def test_sum_consistency(lst):
    """Property: Manual sum equals built-in sum."""
    manual_sum = 0
    for x in lst:
        manual_sum += x
    assert manual_sum == sum(lst)
```

**See [references/patterns-catalog.md](references/patterns-catalog.md) for 15+ common patterns.**

## Integration with pytest

Hypothesis works seamlessly with pytest:

```python
import pytest
from hypothesis import given, strategies as st

# Combine with fixtures
@pytest.fixture
def temp_config(tmp_path):
    """Fixture providing temp configuration."""
    return Config(data_dir=tmp_path)

@given(st.text())
def test_with_fixture(temp_config, text):
    """Hypothesis + fixture: temp_config from fixture, text from Hypothesis."""
    result = process_with_config(temp_config, text)
    assert result is not None
```

**Important:** Fixtures are called **once per test function**, not once per Hypothesis example (100 runs).

### pytest Command-Line Options

```bash
# Show statistics about data generation
pytest --hypothesis-show-statistics

# Use a specific Hypothesis profile
pytest --hypothesis-profile=ci

# Set verbosity level
pytest --hypothesis-verbosity=debug

# Reproduce a specific failure
pytest --hypothesis-seed=12345
```

## Async Property Testing

### Basic Async Pattern

Hypothesis works with pytest-asyncio:

```python
import pytest
from hypothesis import given, strategies as st

@pytest.mark.asyncio
@given(st.text())
async def test_async_property(text):
    """Property test for async function."""
    result = await async_process(text)
    assert isinstance(result, str)
```

### Critical: Decorator Order

**MUST follow this order:**

```python
@pytest.mark.asyncio  # Innermost (closest to function)
@given(st.text())      # Outermost
async def test_async_property(text):
    pass
```

**If you get "Hypothesis doesn't know how to run async test functions", check decorator order.**

### Example: Testing Async IPC

```python
import pytest
from hypothesis import given, strategies as st

@pytest.mark.asyncio
@given(
    command=st.sampled_from(["execute", "status", "cancel"]),
    prompt=st.text(),
    correlation_id=st.uuids().map(str)
)
async def test_ipc_command_roundtrip(command, prompt, correlation_id):
    """Property: All IPC commands should roundtrip through serialization."""
    request = create_command_request(
        command=command,
        prompt=prompt,
        correlation_id=correlation_id
    )

    import json
    serialized = json.dumps(request)
    deserialized = json.loads(serialized)

    assert deserialized == request
    assert deserialized["command"] == command
```

## Pydantic Model Testing

Hypothesis **automatically supports Pydantic models**:

```python
from hypothesis import given
from hypothesis.strategies import builds
from pydantic import BaseModel, EmailStr, PositiveFloat

class PaymentModel(BaseModel):
    amount: PositiveFloat
    email: EmailStr
    description: str

# Hypothesis automatically respects Pydantic constraints!
@given(builds(PaymentModel))
def test_payment_validation(payment):
    """Hypothesis generates valid PaymentModel instances."""
    assert payment.amount > 0
    assert '@' in payment.email
    assert isinstance(payment.description, str)
```

### Overriding Specific Fields

```python
@given(builds(
    PaymentModel,
    amount=st.floats(min_value=100, max_value=1000),
    description=st.text(min_size=10, max_size=100)
))
def test_large_payments(payment):
    """Test with payments between $100-$1000."""
    assert 100 <= payment.amount <= 1000
    assert 10 <= len(payment.description) <= 100
```

### Testing Configuration Models

```python
from hypothesis import given, strategies as st
from hypothesis.strategies import builds
from my_project.config import AgentConfig

@given(builds(AgentConfig))
def test_agent_config_invariants(config):
    """Any valid AgentConfig should satisfy these invariants."""
    assert config.agent_id is not None
    assert config.system_prompt is not None
    assert len(config.agent_id) > 0
```

## Configuration

### Profile Setup (conftest.py)

Create profiles for different environments:

```python
# tests/conftest.py
from hypothesis import settings, HealthCheck

# Configure Hypothesis profiles
settings.register_profile(
    "ci",
    max_examples=200,
    deadline=1000  # milliseconds
)

settings.register_profile(
    "dev",
    max_examples=50,
    deadline=None
)

settings.register_profile(
    "thorough",
    max_examples=1000,
    deadline=None,
    suppress_health_check=[HealthCheck.too_slow]
)

# Activate based on environment
import os
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))
```

### Per-Test Settings

```python
from hypothesis import given, settings, strategies as st

@settings(max_examples=1000, deadline=None)
@given(st.integers())
def test_expensive_operation(n):
    """Run 1000 examples with no time limit."""
    result = very_slow_computation(n)
    assert result >= 0
```

### Configuration Options

| Option | Default | Description |
|--------|---------|-------------|
| `max_examples` | 100 | Number of test cases to generate |
| `deadline` | 200ms | Time limit per test case |
| `suppress_health_check` | [] | Disable specific warnings |
| `verbosity` | normal | Output verbosity (quiet, normal, verbose, debug) |
| `derandomize` | False | Use deterministic randomness |

## Supporting Files

### [references/strategies-reference.md](references/strategies-reference.md)
Complete catalog of built-in Hypothesis strategies with examples:
- Basic types (integers, floats, text, binary)
- Collections (lists, sets, dicts, tuples)
- Special types (UUIDs, datetimes, emails)
- Combinators (one_of, builds, recursive)
- Advanced patterns (composite, shared, data)

### [references/patterns-catalog.md](references/patterns-catalog.md)
Common property test patterns with examples:
- Roundtrip testing (serialization, encoding)
- Invariant testing (order, size, consistency)
- Idempotency testing (normalization, deduplication)
- Commutativity testing (operations, transformations)
- State machine testing (lifecycle, protocols)

### [templates/property-test-templates.md](templates/property-test-templates.md)
Copy-paste ready templates for:
- Basic property test
- Async property test
- Pydantic model property test
- Custom strategy
- State machine test
- conftest.py Hypothesis configuration

## Expected Outcomes

### Successful Property Test Creation

```
✓ Property Tests Added

Module: tests/unit/test_ipc_protocol_properties.py
Properties tested:
  - JSON roundtrip for command requests
  - Correlation ID preservation
  - Valid command types

Generated examples: 100 per property
Edge cases found: 0 (all tests passed)

Test results:
  ✓ All properties hold
  ✓ 300 examples generated (3 properties × 100 each)
  ✓ No shrinking needed (no failures)

Configuration:
  Profile: dev (50 examples/property)
  Deadline: None (development)

Time: 2.3 seconds
Confidence: High (comprehensive input coverage)
```

### Property Test Finding Bug

```
⚠️ Property Violation Found

Test: test_json_roundtrip
Property: All dicts should roundtrip through JSON
Falsifying example: data={'key': float('inf')}

Error: JSON cannot serialize infinity
Shrinking: Reduced from complex dict to minimal case

Root cause: Missing validation for special float values
Fix required: Add constraint to strategy or handle inf/nan

Next steps:
  1. Decide: Should code handle inf/nan or reject them?
  2. Update strategy: st.floats(allow_nan=False, allow_infinity=False)
  3. OR: Add validation in serializer
  4. Re-run property tests to verify fix
```

## Requirements

**Tools needed:**
- Bash (for running tests)
- Read (for examining test files)
- Grep (for finding test patterns)
- Glob (for file discovery)
- Edit/Write (for creating/modifying tests)

**Dependencies:**
- Python 3.8+
- pytest
- hypothesis (install with: `uv add --dev hypothesis`)
- pytest-asyncio (for async tests)

**Test Framework:**
- pytest with Hypothesis integration
- pytest-asyncio for async property tests

**Knowledge:**
- Basic understanding of property-based testing concepts
- Familiarity with pytest
- Understanding of type annotations (helpful for strategies)

## Red Flags to Avoid

### ❌ WRONG: Over-Constraining Strategies
```python
# BAD: Too specific, loses property testing benefits
@given(st.integers(min_value=42, max_value=42))
def test_specific_value(n):
    assert n == 42  # This is just an example test!
```

**✅ RIGHT: Test properties that hold for all inputs**
```python
@given(st.integers())
def test_absolute_value_non_negative(n):
    assert abs(n) >= 0
```

---

### ❌ WRONG: Filtering Too Much
```python
# BAD: Rejecting most generated examples
@given(st.integers())
def test_primes(n):
    assume(is_prime(n))  # Rejects 99% of inputs!
    # ... test code
```

**✅ RIGHT: Use a strategy that generates valid inputs**
```python
@composite
def primes(draw):
    return draw(st.sampled_from([2, 3, 5, 7, 11, 13, 17, 19, 23, 29]))

@given(primes())
def test_primes(n):
    # All inputs are primes
```

---

### ❌ WRONG: Testing Implementation, Not Properties
```python
# BAD: Duplicating implementation in test
@given(st.lists(st.integers()))
def test_sum_implementation(lst):
    result = sum(lst)
    # Bad: Reimplementing sum() in test
    expected = 0
    for item in lst:
        expected += item
    assert result == expected
```

**✅ RIGHT: Test properties**
```python
@given(st.lists(st.integers()))
def test_sum_commutative(lst):
    assert sum(lst) == sum(reversed(lst))

@given(st.lists(st.integers()))
def test_sum_with_zero(lst):
    assert sum(lst + [0]) == sum(lst)
```

---

### ❌ WRONG: Wrong Decorator Order for Async
```python
# BAD: Will fail with "Hypothesis doesn't know how to run async"
@given(st.text())
@pytest.mark.asyncio
async def test_async_property(text):
    pass
```

**✅ RIGHT: pytest.mark.asyncio innermost**
```python
@pytest.mark.asyncio
@given(st.text())
async def test_async_property(text):
    pass
```

---

### ❌ WRONG: Not Using Pydantic Integration
```python
# BAD: Manually constructing Pydantic models
@given(
    amount=st.floats(min_value=0.01),
    email=st.text(),  # Not valid emails!
)
def test_payment(amount, email):
    payment = PaymentModel(amount=amount, email=email)  # Will fail validation
```

**✅ RIGHT: Use builds() for Pydantic models**
```python
@given(builds(PaymentModel))
def test_payment(payment):
    # Hypothesis automatically generates valid instances
    assert payment.amount > 0
```

---

### ❌ WRONG: Mixing Hypothesis with pytest.mark.parametrize
```python
# BAD: Redundant - Hypothesis already does this
@pytest.mark.parametrize("n", [1, 2, 3, 4, 5])
@given(st.integers())
def test_redundant(n, generated_int):
    # Why both? Pick one approach!
    pass
```

**✅ RIGHT: Use Hypothesis for data generation**
```python
@given(st.integers(min_value=1, max_value=5))
def test_small_integers(n):
    assert 1 <= n <= 5
```

## Notes

**Start Small:**
1. Pick one simple function to test
2. Write one property test
3. Run it, observe results
4. Expand to more properties

**Think in Properties, Not Examples:**
- Instead of: "sort([3,1,2]) == [1,2,3]"
- Think: "sorted list should be ordered" (invariant)
- Or: "sorting twice == sorting once" (idempotency)
- Or: "sort preserves all elements" (conservation)

**Hypothesis Finds Edge Cases You Miss:**
- Empty collections
- Single elements
- Duplicates
- Very large/small numbers
- Unicode edge cases
- Boundary conditions

**When Property Tests Fail:**
1. Read the minimal failing example (shrinking gives you this)
2. Understand why the property doesn't hold
3. Decide: Is code wrong or property too strict?
4. Fix and re-run

**Further Reading:**
- Hypothesis documentation: https://hypothesis.readthedocs.io/
- Strategies reference: [references/strategies-reference.md](references/strategies-reference.md)
- Pattern catalog: [references/patterns-catalog.md](references/patterns-catalog.md)