---
name: testing-quality
description: pytest, data validation, Great Expectations, and quality assurance for data systems
sasmp_version: "1.3.0"
bonded_agent: 02-backend-developer
bond_type: SECONDARY_BOND
skill_version: "2.0.0"
last_updated: "2025-01"
complexity: intermediate
estimated_mastery_hours: 80
prerequisites: [python-programming]
unlocks: [cicd-pipelines, data-engineering]
---

# Testing & Data Quality

Production testing strategies with pytest, data validation, and quality frameworks.

## Quick Start

```python
import pytest
from unittest.mock import Mock, patch
import pandas as pd

# Fixtures for test data
@pytest.fixture
def sample_dataframe():
    return pd.DataFrame({
        "id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "amount": [100.0, 200.0, 300.0]
    })

@pytest.fixture
def mock_database():
    with patch("app.db.connection") as mock:
        mock.query.return_value = [{"id": 1, "value": 100}]
        yield mock

# Unit test with AAA pattern
class TestDataTransformer:

    def test_calculates_total_correctly(self, sample_dataframe):
        # Arrange
        transformer = DataTransformer()

        # Act
        result = transformer.calculate_total(sample_dataframe)

        # Assert
        assert result == 600.0

    def test_handles_empty_dataframe(self):
        # Arrange
        empty_df = pd.DataFrame()
        transformer = DataTransformer()

        # Act & Assert
        with pytest.raises(ValueError, match="Empty dataframe"):
            transformer.calculate_total(empty_df)

    @pytest.mark.parametrize("input_val,expected", [
        (100, 110),
        (0, 0),
        (-50, -55),
    ])
    def test_apply_tax(self, input_val, expected):
        result = apply_tax(input_val, rate=0.10)
        assert result == expected
```

## Core Concepts

### 1. Data Validation with Pydantic

```python
from pydantic import BaseModel, Field, field_validator
from datetime import datetime
from typing import Optional

class DataRecord(BaseModel):
    id: str = Field(..., min_length=1)
    amount: float = Field(..., ge=0)
    timestamp: datetime
    category: Optional[str] = None

    @field_validator("id")
    @classmethod
    def validate_id_format(cls, v):
        if not v.startswith("REC-"):
            raise ValueError("ID must start with 'REC-'")
        return v

    @field_validator("amount")
    @classmethod
    def round_amount(cls, v):
        return round(v, 2)

# Validation
def process_records(raw_data: list[dict]) -> list[DataRecord]:
    valid_records = []
    for item in raw_data:
        try:
            record = DataRecord(**item)
            valid_records.append(record)
        except ValidationError as e:
            logger.warning(f"Invalid record: {e}")
    return valid_records
```

### 2. Great Expectations

```python
import great_expectations as gx
from great_expectations.checkpoint import Checkpoint

# Initialize context
context = gx.get_context()

# Create expectations
validator = context.sources.pandas_default.read_csv("data/orders.csv")

# Column expectations
validator.expect_column_to_exist("order_id")
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")

# Value expectations
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=10000)
validator.expect_column_values_to_be_in_set("status", ["pending", "completed", "cancelled"])

# Pattern matching
validator.expect_column_values_to_match_regex("email", r"^[\w\.-]+@[\w\.-]+\.\w+$")

# Run validation
results = validator.validate()

if not results.success:
    failed_expectations = [r for r in results.results if not r.success]
    raise DataQualityError(f"Validation failed: {failed_expectations}")
```

### 3. Integration Testing

```python
import pytest
from testcontainers.postgres import PostgresContainer
from sqlalchemy import create_engine

@pytest.fixture(scope="module")
def postgres_container():
    """Spin up real Postgres for integration tests."""
    with PostgresContainer("postgres:16-alpine") as postgres:
        yield postgres

@pytest.fixture
def db_engine(postgres_container):
    """Create engine with test database."""
    engine = create_engine(postgres_container.get_connection_url())

    # Setup schema
    with engine.connect() as conn:
        conn.execute(text("CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT)"))
        conn.commit()

    yield engine

    # Cleanup
    engine.dispose()

class TestDatabaseOperations:

    def test_insert_and_query(self, db_engine):
        # Arrange
        repo = UserRepository(db_engine)

        # Act
        repo.insert(User(name="Test User"))
        users = repo.get_all()

        # Assert
        assert len(users) == 1
        assert users[0].name == "Test User"

    def test_transaction_rollback(self, db_engine):
        repo = UserRepository(db_engine)

        with pytest.raises(IntegrityError):
            repo.insert(User(name=None))  # Violates constraint

        # Verify rollback
        assert repo.count() == 0
```

### 4. Mocking External Services

```python
from unittest.mock import Mock, patch, MagicMock
import responses

class TestAPIClient:

    @responses.activate
    def test_fetch_data_success(self):
        # Mock HTTP response
        responses.add(
            responses.GET,
            "https://api.example.com/data",
            json={"items": [{"id": 1}]},
            status=200
        )

        client = APIClient()
        result = client.fetch_data()

        assert len(result["items"]) == 1

    @responses.activate
    def test_handles_api_error(self):
        responses.add(
            responses.GET,
            "https://api.example.com/data",
            json={"error": "Server error"},
            status=500
        )

        client = APIClient()

        with pytest.raises(APIError):
            client.fetch_data()

    @patch("app.services.external_api")
    def test_with_mock_service(self, mock_api):
        mock_api.get_user.return_value = {"id": 1, "name": "Test"}

        result = process_user_data(user_id=1)

        mock_api.get_user.assert_called_once_with(1)
        assert result["name"] == "Test"
```

## Tools & Technologies

| Tool | Purpose | Version (2025) |
|------|---------|----------------|
| **pytest** | Testing framework | 8.0+ |
| **Great Expectations** | Data validation | 0.18+ |
| **Pydantic** | Data validation | 2.5+ |
| **pytest-cov** | Code coverage | 4.1+ |
| **testcontainers** | Integration testing | 3.7+ |
| **responses** | HTTP mocking | 0.25+ |
| **hypothesis** | Property-based testing | 6.98+ |

## Troubleshooting Guide

| Issue | Symptoms | Root Cause | Fix |
|-------|----------|------------|-----|
| **Flaky Tests** | Random failures | Shared state, timing | Isolate tests, use fixtures |
| **Slow Tests** | Long test runs | No mocking, real I/O | Mock external services |
| **Low Coverage** | Uncovered code | Missing edge cases | Add parametrized tests |
| **Test Data Issues** | Inconsistent results | Hardcoded data | Use factories/fixtures |

## Best Practices

```python
# ✅ DO: Use fixtures for setup
@pytest.fixture
def client():
    return TestClient(app)

# ✅ DO: Test edge cases
@pytest.mark.parametrize("input_data", [None, [], {}, ""])
def test_handles_empty_input(input_data):
    assert process(input_data) == default_result

# ✅ DO: Name tests descriptively
def test_user_creation_fails_with_invalid_email():
    ...

# ✅ DO: Use marks for slow tests
@pytest.mark.slow
def test_full_pipeline():
    ...

# ❌ DON'T: Test implementation details
# ❌ DON'T: Share state between tests
# ❌ DON'T: Skip error path testing
```

## Resources

- [pytest Documentation](https://docs.pytest.org/)
- [Great Expectations](https://greatexpectations.io/)
- [Testing Python Applications](https://testdriven.io/)

---

**Skill Certification Checklist:**
- [ ] Can write unit tests with pytest
- [ ] Can use fixtures and parametrization
- [ ] Can implement data validation
- [ ] Can write integration tests
- [ ] Can mock external dependencies