---
name: adding-new-metric
description: Guides systematic implementation of new sustainability metrics in OSS Sustain Guard using the plugin-based metric system. Use when adding metric functions to evaluate project health aspects like issue responsiveness, test coverage, or security response time.
---

# Add New Metric

This skill provides a systematic workflow for adding new sustainability metrics to the OSS Sustain Guard project using the **plugin-based metric system**.

## When to Use

- User wants to add a new metric to evaluate project health
- Implementing metrics from NEW_METRICS_IDEA.md
- Extending analysis capabilities with additional measurements
- Creating custom external metrics via plugins

## Critical Principles

1. **No Duplication**: Always check existing metrics to avoid measuring the same thing
2. **10-Point Scale**: ALL metrics use max_score=10 for consistency and transparency
3. **Integer Weights**: Metric importance is controlled via profile weights (integers ≥1)
4. **Project Philosophy**: Use "observation" language, not "risk" or "critical"
5. **CHAOSS Alignment**: Reference CHAOSS metrics when applicable
6. **Plugin Architecture**: Metrics are discovered via entry points and MetricSpec

## Implementation Workflow

### 1. Verify No Duplication

```bash
# Search for similar metrics in the metrics directory
ls oss_sustain_guard/metrics/
grep -rn "def check_" oss_sustain_guard/metrics/

# Check entry points in pyproject.toml
grep -A 30 '\[project.entry-points."oss_sustain_guard.metrics"\]' pyproject.toml
```

**Check**: Does any existing metric measure the same aspect?

### 2. Create Metric Module

Create a new file in `oss_sustain_guard/metrics/`:

```bash
touch oss_sustain_guard/metrics/my_metric.py
```

**Template**:
```python
"""My metric description."""

from typing import Any

from oss_sustain_guard.metrics.base import Metric, MetricContext, MetricSpec


def check_my_metric(repo_data: dict[str, Any]) -> Metric:
    """
    Evaluates [metric purpose].

    [Description of what this measures and why it matters.]

    Scoring:
    - [Condition]: X/10 ([Label])
    - [Condition]: X/10 ([Label])

    CHAOSS Aligned: [CHAOSS metric name] (if applicable)
    """
    max_score = 10  # ALWAYS use 10 for all metrics

    # Extract data from repo_data
    data = repo_data.get("fieldName", {})

    if not data:
        return Metric(
            "My Metric Name",
            score_on_no_data,
            max_score,
            "Note: [Reason for default score].",
            "None",
        )

    # Calculate metric
    # ...

    # Score logic with graduated thresholds (0-10 scale)
    if condition_excellent:
        score = 10  # Excellent
        risk = "None"
        message = f"Excellent: [Details]."
    elif condition_good:
        score = 8  # Good (80%)
        risk = "Low"
        message = f"Good: [Details]."
    elif condition_moderate:
        score = 5  # Moderate (50%)
        risk = "Medium"
        message = f"Moderate: [Details]."
    elif condition_needs_attention:
        score = 2  # Needs attention (20%)
        risk = "High"
        message = f"Observe: [Details]. Consider improving."
    else:
        score = 0  # Critical issue
        risk = "Critical"
        message = f"Note: [Details]. Immediate attention recommended."

    return Metric("My Metric Name", score, max_score, message, risk)


def _check(repo_data: dict[str, Any], _context: MetricContext) -> Metric:
    """Wrapper for metric spec."""
    return check_my_metric(repo_data)


def _on_error(error: Exception) -> Metric:
    """Error handler for metric spec."""
    return Metric(
        "My Metric Name",
        0,
        10,
        f"Note: Analysis incomplete - {error}",
        "Medium",
    )


# Export MetricSpec for automatic discovery
METRIC = MetricSpec(
    name="My Metric Name",
    checker=_check,
    on_error=_on_error,
)
```

**Key Decisions**:
- `max_score`: **ALWAYS 10** for all metrics (consistency)
- Score range: **0-10** (use integers or decimals)
- Importance: Controlled by **profile weights** (integers ≥1)
- Risk levels: "None", "Low", "Medium", "High", "Critical"
- Use supportive language: "Observe", "Consider", "Monitor" not "Failed", "Error"

### 3. Register Entry Point

Add to `pyproject.toml` under `[project.entry-points."oss_sustain_guard.metrics"]`:

```toml
[project.entry-points."oss_sustain_guard.metrics"]
# ... existing entries ...
my_metric = "oss_sustain_guard.metrics.my_metric:METRIC"
```

### 4. Add to Built-in Registry

Update `oss_sustain_guard/metrics/__init__.py`:

```python
_BUILTIN_MODULES = [
    # ... existing modules ...
    "oss_sustain_guard.metrics.my_metric",
]
```

**Why both entry points and built-in registry?**
- Entry points: Enable external plugins
- Built-in registry: Fallback for direct imports and faster loading

### 5. Update ANALYSIS_VERSION

**CRITICAL**: Before integrating your new metric, increment `ANALYSIS_VERSION` in `cli.py`.

```python
# In cli.py, update the version
ANALYSIS_VERSION = "1.2"  # Increment from previous version
```

**Why this is required:**
- New metrics change the total score calculation
- Old cached data won't include your new metric
- Without version increment, users get inconsistent scores (cache vs. real-time)
- Version mismatch automatically invalidates old cache entries

**Always increment when:**
- Adding/removing metrics
- Changing metric weights in profiles
- Modifying scoring thresholds
- Changing max_score values

### 6. Add Metric to Scoring Profiles

Update `SCORING_PROFILES` in `core.py` to include your new metric:

```python
SCORING_PROFILES = {
    "balanced": {
        "name": "Balanced",
        "description": "...",
        "weights": {
            # Existing metrics...
            "Contributor Redundancy": 3,
            "Security Signals": 2,
            # Add your new metric
            "My Metric Name": 2,  # Assign appropriate weight (1+)
            # ...
        },
    },
    # Update all 4 profiles...
}
```

**Weight Guidelines**:
- **Critical metrics**: 3-5 (bus factor, security)
- **Important metrics**: 2-3 (activity, responsiveness)
- **Supporting metrics**: 1-2 (documentation, governance)

### 7. Test Implementation

```bash
# Create test file
touch tests/metrics/test_my_metric.py

# Write tests (see section below)

# Run tests
uv run pytest tests/metrics/test_my_metric.py -v

# Syntax check
python -m py_compile oss_sustain_guard/metrics/my_metric.py

# Run analysis on test project
uv run os4g check fastapi --insecure --no-cache -o detail

# Verify metric appears in output
# Check score is reasonable

# Run all tests
uv run pytest tests/ -x --tb=short

# Lint check
uv run ruff check oss_sustain_guard/metrics/my_metric.py
uv run ruff format oss_sustain_guard/metrics/my_metric.py
```

### 8. Write Comprehensive Tests

Create `tests/metrics/test_my_metric.py`:

```python
"""Tests for my_metric module."""

from oss_sustain_guard.metrics.my_metric import check_my_metric


def test_check_my_metric_excellent():
    """Test metric with excellent conditions."""
    mock_data = {"fieldName": {"value": 100}}
    result = check_my_metric(mock_data)
    assert result.score == 10
    assert result.max_score == 10
    assert result.risk == "None"
    assert "Excellent" in result.message


def test_check_my_metric_good():
    """Test metric with good conditions."""
    mock_data = {"fieldName": {"value": 80}}
    result = check_my_metric(mock_data)
    assert result.score == 8
    assert result.max_score == 10
    assert result.risk == "Low"


def test_check_my_metric_no_data():
    """Test metric with missing data."""
    mock_data = {}
    result = check_my_metric(mock_data)
    assert result.max_score == 10
    assert "Note:" in result.message
```

### 9. Update Documentation (if needed)

Consider updating:
- `docs/local/NEW_METRICS_IDEA.md` - Mark as implemented
- Metric count in README.md
- `docs/SCORING_PROFILES_GUIDE.md` - If significant new metric

## Plugin Architecture Details

### MetricSpec Structure

```python
class MetricSpec(NamedTuple):
    """Specification for a metric check."""
    name: str                                                    # Metric display name
    checker: Callable[[dict[str, Any], MetricContext], Metric | None]  # Main logic
    on_error: Callable[[Exception], Metric] | None = None       # Error handler
    error_log: str | None = None                                # Error log format
```

### MetricContext

Context provided to metric checkers:

```python
class MetricContext(NamedTuple):
    """Context provided to metric checks."""
    owner: str              # GitHub owner
    name: str               # Repository name
    repo_url: str           # Full GitHub URL
    platform: str | None    # Platform (e.g., "pypi", "npm")
    package_name: str | None  # Original package name
```

### Metric Discovery Flow

1. **Built-in loading**: `_load_builtin_metric_specs()` imports from `_BUILTIN_MODULES`
2. **Entry point loading**: `_load_entrypoint_metric_specs()` discovers via `importlib.metadata`
3. **Deduplication**: Built-in metrics take precedence over external metrics with same name
4. **Integration**: `load_metric_specs()` returns combined list to `core.py`

### External Plugin Example

For external plugins (separate packages):

**`my_custom_metric/pyproject.toml`:**
```toml
[project]
name = "my-custom-metric"
version = "0.1.0"
dependencies = ["oss-sustain-guard>=0.13.0"]

[project.entry-points."oss_sustain_guard.metrics"]
my_custom = "my_custom_metric:METRIC"
```

**`my_custom_metric/__init__.py`:**
```python
from oss_sustain_guard.metrics.base import Metric, MetricContext, MetricSpec

def check_custom(repo_data, context):
    return Metric("Custom Metric", 10, 10, "Custom logic", "None")

METRIC = MetricSpec(name="Custom Metric", checker=check_custom)
```

**Installation:**
```bash
pip install my-custom-metric
```

Metrics are automatically discovered and loaded!

```python
from datetime import datetime

created_at = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
completed_at = datetime.fromisoformat(completed_str.replace("Z", "+00:00"))
duration_days = (completed_at - created_at).total_seconds() / 86400
```

### Ratio/Percentage Metrics

```python
ratio = (count_a / total) * 100
# Use graduated scoring
if ratio < 15:
    score = max_score  # Excellent
elif ratio < 30:
    score = max_score * 0.6  # Acceptable
```

### Median Calculations

```python
values.sort()
median = (
    values[len(values) // 2]
    if len(values) % 2 == 1
    else (values[len(values) // 2 - 1] + values[len(values) // 2]) / 2
)
```

### GraphQL Data Access

```python
# Common paths in repo_data
issues = repo_data.get("issues", {}).get("edges", [])
prs = repo_data.get("pullRequests", {}).get("edges", [])
commits = repo_data.get("defaultBranchRef", {}).get("target", {}).get("history", {})
funding = repo_data.get("fundingLinks", [])
```

## Score Budget Guidelines

| Importance | Max Score | Use Case |
|-----------|-----------|----------|
| Critical | 20 | Core sustainability (Bus Factor, Activity) |
| High | 10 | Important health signals (Funding, Retention) |
| Medium | 5 | Supporting metrics (CI, Community Health) |
| Low | 3-5 | Supplementary observations |

**Total Budget**: 100 points across ~20-25 metrics

## Validation Checklist

- [ ] **ANALYSIS_VERSION incremented in cli.py**
- [ ] No duplicate measurement with existing metrics
- [ ] Total max_score budget ≤ 100
- [ ] Uses supportive "observation" language
- [ ] Has graduated scoring (not binary)
- [ ] Handles missing data gracefully
- [ ] Error handling in integration
- [ ] Syntax check passes
- [ ] Real-world test shows metric in output
- [ ] Unit tests pass
- [ ] Lint checks pass

## Example: Stale Issue Ratio

For a complete, production-ready implementation example, see [examples/stale-issue-ratio.md](examples/stale-issue-ratio.md).

**Quick overview:**
- **Measures**: Percentage of issues not updated in 90+ days
- **Max Score**: 5 points
- **Scoring**: <15% stale (5pts), 15-30% (3pts), 30-50% (2pts), >50% (1pt)
- **Key patterns**: Time-based calculation, graduated scoring, graceful error handling
- **Real results**: fastapi (8.2% stale, 5/5), requests (23.4%, 3/5)

## Score Validation with Real Projects

After implementing a new metric, validate scoring behavior with diverse real-world projects.

### Validation Script

Create `scripts/validate_scoring.py`:

```python
#!/usr/bin/env python3
"""
Score validation script for testing new metrics against diverse projects.

Usage:
    uv run python scripts/validate_scoring.py
"""

import subprocess
import json
from typing import Any

VALIDATION_PROJECTS = {
    "Famous/Mature": {
        "requests": "psf/requests",
        "react": "facebook/react",
        "kubernetes": "kubernetes/kubernetes",
        "django": "django/django",
        "fastapi": "fastapi/fastapi",
    },
    "Popular/Active": {
        "angular": "angular/angular",
        "numpy": "numpy/numpy",
        "pandas": "pandas-dev/pandas",
    },
    "Emerging/Small": {
        # Add smaller projects you want to test
    },
}

def analyze_project(owner: str, repo: str) -> dict[str, Any]:
    """Run analysis on a project and return results."""
    cmd = [
        "uv", "run", "os4g", "check",
        f"{owner}/{repo}",
        "--insecure", "--no-cache", "-o", "json"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        return {"error": result.stderr}

    # Parse JSON output
    try:
        return json.loads(result.stdout)
    except json.JSONDecodeError:
        return {"error": "Failed to parse JSON output"}

def main():
    print("=" * 80)
    print("OSS Sustain Guard - Score Validation Report")
    print("=" * 80)
    print()

    for category, projects in VALIDATION_PROJECTS.items():
        print(f"\n## {category}\n")
        print(f"{'Project':<25} {'Score':<10} {'Status':<15} {'Key Observations'}")
        print("-" * 80)

        for name, repo_path in projects.items():
            result = analyze_project(*repo_path.split("/"))

            if "error" in result:
                print(f"{name:<25} {'ERROR':<10} {result['error'][:40]}")
                continue

            score = result.get("total_score", 0)
            status = "✓ Healthy" if score >= 80 else "⚠ Monitor" if score >= 60 else "⚡ Needs attention"
            observations = result.get("key_observations", "N/A")[:40]

            print(f"{name:<25} {score:<10} {status:<15} {observations}")

    print("\n" + "=" * 80)
    print("\nValidation complete. Review scores for:")
    print("  - Famous projects should score 70-95")
    print("  - New metrics should show reasonable distribution")
    print("  - No project should score >100")

if __name__ == "__main__":
    main()
```

### Quick Validation Command

```bash
# Test specific famous projects
uv run os4g check requests react fastapi kubernetes --insecure --no-cache

# Compare before/after metric changes
uv run os4g check requests --insecure --no-cache -o detail > before.txt
# ... make changes ...
uv run os4g check requests --insecure --no-cache -o detail > after.txt
diff before.txt after.txt
```

### Expected Score Ranges

| Category | Expected Score | Examples |
|----------|----------------|----------|
| Famous/Mature | 75-95 | requests, kubernetes, react |
| Popular/Active | 65-85 | angular, numpy, pandas |
| Emerging/Small | 45-70 | New projects with activity |
| Problematic | 20-50 | Abandoned or struggling projects |

### Validation Checklist

After implementing a new metric:

- [ ] Test on 3-5 famous projects (requests, react, kubernetes, etc.)
- [ ] Verify scores remain within 0-100
- [ ] Check that famous projects score reasonably high (70+)
- [ ] Ensure new metric contributes meaningfully to total score
- [ ] Review that metric differentiates well between projects
- [ ] Confirm no single metric dominates the total score

## Troubleshooting

**Score calculation issues**: Verify all metrics have max_score=10 and check profile weights
**Metric not appearing**: Check integration in `_analyze_repository_data()`
**Tests fail**: Update expected metric names in test files
**Data not available**: Add proper null checks and default handling
**Scores too similar across projects**: Adjust scoring thresholds for better differentiation
**Famous project scores low**: Review metric logic and thresholds