# SC4: Live Vulnerability Lookups via OSV.dev **Author:** Nraghavan | **Date:** 2026-03-17 | **Status:** Implemented **Component:** `static_patterns_supply_chain.py` (SC4 rule), `osv_client.py` --- ## 1. Background The SC4 rule in skillspector's supply-chain analyzer flags dependencies with known CVEs. Previously this relied on two manually curated lists hardcoded in `static_patterns_supply_chain.py`: - `_KNOWN_VULNERABLE_PACKAGES` — 15 Python (PyPI) entries - `_KNOWN_VULNERABLE_NPM` — 9 npm entries **Problems with the static approach:** | Issue | Impact | |-------|--------| | **Staleness** | 24 entries vs. tens of thousands of published advisories. New CVEs are disclosed daily and the list was immediately out of date. | | **Manual maintenance** | Every update required a code change, review, and release. No one owned the update cadence. | | **Incomplete coverage** | High-profile packages only. A skill depending on a vulnerable transitive dependency not in the list would pass undetected. | | **Version logic was fragile** | The custom `_version_lt()` comparator did simple numeric-tuple comparison and mishandled pre-release tags, date-based versions (e.g. `certifi 2022.12.07`), and epoch-prefixed versions. | SC5 (abandoned packages) and SC6 (typosquatting / popular-package lists) are **not** affected — those sets change infrequently and remain static. --- ## 2. Solution — OSV.dev API [OSV.dev](https://osv.dev) is Google's open, free vulnerability database. It aggregates advisories from PyPI (via the [PyPA Advisory Database](https://github.com/pypa/advisory-database)), the GitHub Advisory Database, NVD, and ecosystem-specific sources. ### Why OSV.dev over alternatives | Criteria | OSV.dev | PyPI JSON API | GitHub Advisory DB | pip-audit (lib) | |----------|:-------:|:-------------:|:------------------:|:---------------:| | Covers PyPI | Yes | Yes | Yes | Yes | | Covers npm | Yes | No | Yes | No | | Auth required | **No** | No | Yes (token) | No | | Rate limits | **None** | Undocumented | Yes | N/A | | Batch queries | **Yes** | No | Limited | No | | New dependency needed | **No** (`httpx`) | No | No | Yes | | Authoritative data | Yes — PyPI + GHSA + NVD | PyPI only | GHSA + NVD | Delegates to PyPI/OSV | ### API shape (batch endpoint) ``` POST https://api.osv.dev/v1/querybatch { "queries": [ {"package": {"name": "jinja2", "ecosystem": "PyPI"}, "version": "2.4.1"}, {"package": {"name": "requests", "ecosystem": "PyPI"}, "version": "2.25.0"}, {"package": {"name": "lodash", "ecosystem": "npm"}, "version": "4.17.20"} ] } ``` Response returns, per query, a list of matching vulnerability IDs (GHSA, PYSEC, CVE aliases). A follow-up `GET /v1/vulns/{id}` call retrieves severity, summary, and fix versions for the finding message. OSV handles all version-range matching server-side using ecosystem-aware semver/PEP 440 logic, eliminating the fragile `_version_lt()` comparator. --- ## 3. Implementation ### 3.1 Architecture ```text _analyze_dependencies(content, file_path) ├── _extract_packages_from_requirements() / _extract_packages_from_package_json() │ (unchanged — returns list of (name, version, line_num)) │ ├── SC4: _sc4_from_osv(packages, ecosystem) │ osv_client.query_batch() → map vulns back to packages → emit SC4 findings │ On empty results / failure → _sc4_from_fallback() using static list │ ├── SC5: _ABANDONED_PACKAGES lookup (unchanged) └── SC6: _is_typosquat() against popular sets (unchanged) ``` ### 3.2 Key files | File | Purpose | |------|---------| | `src/skillspector/nodes/analyzers/osv_client.py` | OSV.dev batch API client — `query_batch()`, `VulnResult` dataclass, in-memory cache, severity parsing | | `src/skillspector/nodes/analyzers/static_patterns_supply_chain.py` | Refactored SC4 with `_sc4_from_osv()` and `_sc4_from_fallback()` | | `tests/unit/test_osv_client.py` | 13 tests covering severity parsing, batch queries, cache behavior, network failures | | `tests/unit/test_patterns_new.py` | Updated SC4-SC6 tests with OSV mocking; 7 new SC4 test cases | ### 3.3 Design decisions | Decision | Choice | Rationale | |----------|--------|-----------| | **Sync vs. async** | Synchronous `httpx.Client` | The analyzer pipeline is sync. A single batch call completes in <500 ms for typical dependency files. | | **Caching** | In-memory dict with 1-hour TTL, keyed on `(name, version, ecosystem)` | Prevents redundant API calls when multiple skills share dependencies. Skillspector runs are short-lived CLI invocations, so memory is not a concern. | | **Graceful degradation** | On timeout/network error, fall back to the static list (`_FALLBACK_VULNERABLE_*`) | Ensures the tool works in air-gapped or offline environments. A warning is logged when falling back. | | **Finding detail level** | Batch query returns vuln IDs; detail fetched via `GET /v1/vulns/{id}` for up to 10 vulns per package | Keeps latency low — the 10-vulnerability cap limits the number of sequential detail fetches per package. The implementation selects the first 10 vulnerability IDs as returned by the OSV batch API (`_fetch_vuln_details(vuln_ids[:10])`) with no severity-based sorting or prioritisation. When more than 10 IDs are returned, the OSV client logs a warning indicating the total count and that only the first 10 will be processed, so users are alerted to the truncation. Finding messages include OSV/GHSA/CVE IDs with summaries. | | **Confidence mapping** | Map OSV severity → confidence: CRITICAL=0.9, HIGH=0.8, MEDIUM=0.7, LOW=0.6 | Replaces the per-entry hardcoded confidence values with a systematic mapping. | | **Timeout** | 10 s connect + read | Generous for a single POST. If exceeded, fallback activates gracefully. | | **Severity aggregation** | When multiple advisories affect one package, the worst severity is used for the finding | A single SC4 finding is emitted per package with a count and summary of all advisories. | ### 3.4 What stays static - `_ABANDONED_PACKAGES` (SC5) — no API exists for "abandoned" status. - `_POPULAR_PYPI` / `_POPULAR_NPM` (SC6) — stable lists for typosquatting heuristic. - `_FALLBACK_VULNERABLE_PYPI` / `_FALLBACK_VULNERABLE_NPM` — renamed from original lists, kept as offline safety net. ### 3.5 What was removed - Direct iteration over hardcoded CVE tuples in the SC4 hot path — replaced by `_sc4_from_osv()`. - The `_version_lt()` comparator is retained only for fallback mode; OSV handles version comparison server-side in the primary path. --- ## 4. Risks & Mitigations | Risk | Likelihood | Mitigation | |------|-----------|------------| | OSV.dev API downtime | Low (Google-hosted, high uptime) | Fallback to static list + warning log | | Latency increase (~200-500 ms per scan) | Certain but minor | Batch queries minimize round-trips; caching eliminates repeat calls | | False positives from OSV (disputed/withdrawn advisories) | Low | OSV filters withdrawn entries; a suppression list can be added if needed | | Breaking API changes | Very low (versioned API, stable since 2021) | Pinned to `/v1/` endpoints | | Air-gapped / firewalled environments | Medium | Static fallback ensures functionality; documented that live mode needs outbound HTTPS to `api.osv.dev` | --- ## 5. Validation - **245 tests pass** across unit and analyzer test suites (0 regressions). - **13 new OSV client tests** cover severity parsing, batch queries, cache hits, network failures, and npm ecosystem support. - **7 new SC4 integration tests** verify OSV-driven findings, fallback behavior, multi-advisory aggregation, and severity mapping. - **No new dependencies** — uses the existing `httpx>=0.28.0` dependency.