---
name: "uspto-database"
description: "Access USPTO patent data via PatentsView REST API and Google Patents Public Data (BigQuery). Search by inventor, assignee, CPC, or keywords; download metadata and claims; analyze portfolios; track tech trends. For IP landscape analysis, competitor monitoring, prior art search, and tech forecasting in life sciences and biotech."
license: "CC0-1.0"
---

# uspto-database

## Overview

The USPTO provides two primary programmatic access points for patent data: the **PatentsView API** (REST, free, no key required for basic use) for structured queries by inventor, assignee, CPC classification, and keywords; and **Google Patents Public Data** (BigQuery public dataset) for large-scale analytics across the full patent corpus. Both expose data under the CC0 Public Domain Dedication. This skill covers Python-based access patterns for both, plus basic patent portfolio analytics.

## When to Use

- **Prior art search**: Finding existing patents relevant to a technology before filing or to assess freedom-to-operate.
- **Competitor IP landscape analysis**: Querying all patents from a specific assignee (company or institution) to map their technology portfolio.
- **CPC classification search**: Finding patents in a specific technology area using Cooperative Patent Classification codes (e.g., C12N for nucleotides/genetic engineering).
- **Inventor network analysis**: Identifying prolific inventors in a field and their institutional affiliations.
- **Technology trend tracking**: Counting patent filings by year and technology category to identify emerging areas.
- **Life sciences IP analysis**: Searching biotech-specific classifications (A61K for pharmaceuticals, C12N for genetics, G16B for bioinformatics).
- For full-text patent PDF downloads, use the USPTO Bulk Data Storage System (BDSS) or Google Patents direct links.
- **Rate limits**: PatentsView API allows 45 requests/minute without an API key; request a free key for 45 req/min with higher daily limits.

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Optional**: `google-cloud-bigquery` for Google Patents Public Data queries
- **Data requirements**: No account needed for PatentsView basic queries; Google Cloud account required for BigQuery
- **Rate limits**: PatentsView — 45 requests/minute (unauthenticated), higher with free API key

```bash
pip install requests pandas matplotlib
pip install google-cloud-bigquery  # optional: for BigQuery access
```

## Quick Start

```python
import requests
import pandas as pd

# Search PatentsView API: patents assigned to "Genentech" in CPC class C12N
url = "https://api.patentsview.org/patents/query"
payload = {
    "q": {"_and": [
        {"_contains": {"assignee_organization": "Genentech"}},
        {"_contains": {"cpc_subgroup_id": "C12N"}},
    ]},
    "f": ["patent_number", "patent_title", "patent_date", "assignee_organization"],
    "o": {"per_page": 25},
}
resp = requests.post(url, json=payload)
data = resp.json()
df   = pd.DataFrame(data["patents"])
print(f"Found: {data['total_patent_count']} patents")
print(df[["patent_number", "patent_title", "patent_date"]].head())
```

## Core API

### Query Type 1: Search by Assignee (Company / Institution)

Find all patents granted to a specific organization.

```python
import requests
import pandas as pd

def search_by_assignee(assignee_name: str, per_page: int = 100) -> pd.DataFrame:
    url     = "https://api.patentsview.org/patents/query"
    payload = {
        "q": {"_contains": {"assignee_organization": assignee_name}},
        "f": [
            "patent_number", "patent_title", "patent_date",
            "patent_abstract", "assignee_organization", "assignee_country",
        ],
        "o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]},
    }
    resp = requests.post(url, json=payload)
    resp.raise_for_status()
    data = resp.json()
    df   = pd.DataFrame(data.get("patents", []))
    print(f"Assignee '{assignee_name}': {data.get('total_patent_count', 0)} total patents")
    return df

# Example: patents from Broad Institute
df_broad = search_by_assignee("Broad Institute")
print(df_broad[["patent_number", "patent_title", "patent_date"]].head(10))
```

```python
# Paginate through all results for large portfolios
def search_assignee_all_pages(assignee_name: str, page_size: int = 100) -> pd.DataFrame:
    url  = "https://api.patentsview.org/patents/query"
    all_patents = []
    page = 1
    while True:
        payload = {
            "q": {"_contains": {"assignee_organization": assignee_name}},
            "f": ["patent_number", "patent_title", "patent_date", "cpc_subgroup_id"],
            "o": {"per_page": page_size, "page": page},
        }
        resp = requests.post(url, json=payload)
        data = resp.json()
        patents = data.get("patents", [])
        if not patents:
            break
        all_patents.extend(patents)
        total = data.get("total_patent_count", 0)
        if len(all_patents) >= total:
            break
        page += 1

    df = pd.DataFrame(all_patents)
    print(f"Retrieved {len(df)} patents for '{assignee_name}'")
    return df
```

### Query Type 2: Search by CPC Classification

CPC (Cooperative Patent Classification) codes organize patents by technology. Life sciences codes include C12N (nucleotides/genetics), A61K (pharmaceuticals), and G16B (bioinformatics).

```python
import requests
import pandas as pd

# Search by CPC subgroup: C12N15 (mutation/genetic engineering)
url = "https://api.patentsview.org/patents/query"
payload = {
    "q": {"_begins": {"cpc_subgroup_id": "C12N15"}},
    "f": [
        "patent_number", "patent_title", "patent_date",
        "assignee_organization", "cpc_subgroup_id",
    ],
    "o": {"per_page": 50, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
data = resp.json()
df   = pd.DataFrame(data["patents"])
print(f"C12N15 patents: {data['total_patent_count']}")
print(df[["patent_number", "patent_title", "assignee_organization"]].head(10))
```

```python
# Common life sciences CPC codes
CPC_LIFE_SCIENCES = {
    "C12N":    "Microorganisms / enzymes / compositions",
    "C12N15":  "Mutation / genetic engineering",
    "C12Q":    "Measuring / testing involving enzymes or microorganisms",
    "A61K":    "Preparations for medical use",
    "A61P":    "Therapeutic activity of chemical compounds",
    "G16B":    "Bioinformatics",
    "G16H":    "Healthcare informatics",
    "C07K":    "Peptides / proteins",
}
for code, desc in CPC_LIFE_SCIENCES.items():
    print(f"  {code:10s}: {desc}")
```

### Query Type 3: Full-Text Keyword Search

Search patent titles and abstracts for specific terms.

```python
import requests
import pandas as pd

def keyword_search(keyword: str, per_page: int = 50) -> pd.DataFrame:
    url = "https://api.patentsview.org/patents/query"
    payload = {
        "q": {"_or": [
            {"_text_any": {"patent_title":    keyword}},
            {"_text_any": {"patent_abstract": keyword}},
        ]},
        "f": [
            "patent_number", "patent_title", "patent_date",
            "patent_abstract", "assignee_organization",
        ],
        "o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]},
    }
    resp = requests.post(url, json=payload)
    resp.raise_for_status()
    data = resp.json()
    df   = pd.DataFrame(data.get("patents", []))
    print(f"Keyword '{keyword}': {data.get('total_patent_count', 0)} patents found")
    return df

# Search for CRISPR-related patents
df_crispr = keyword_search("CRISPR")
print(df_crispr[["patent_number", "patent_title", "patent_date"]].head(10))
```

### Query Type 4: Inventor Search

Find patents by inventor name or retrieve an inventor's full publication history.

```python
import requests
import pandas as pd

# Search by inventor name
url = "https://api.patentsview.org/inventors/query"
payload = {
    "q": {"_and": [
        {"inventor_last_name":  "Doudna"},
        {"inventor_first_name": "Jennifer"},
    ]},
    "f": ["inventor_id", "inventor_first_name", "inventor_last_name",
          "inventor_city", "inventor_state", "inventor_country"],
    "o": {"per_page": 10},
}
resp = requests.post(url, json=payload)
data = resp.json()
print(f"Found {data.get('total_inventor_count', 0)} inventors matching 'Jennifer Doudna'")
for inv in data.get("inventors", []):
    print(f"  ID: {inv['inventor_id']}, Location: {inv.get('inventor_city')}, {inv.get('inventor_country')}")
```

```python
# Get all patents for a specific inventor by inventor_id
inventor_id = "fl:j_ln:doudna-1"   # PatentsView inventor ID format

url = "https://api.patentsview.org/patents/query"
payload = {
    "q": {"inventor_id": inventor_id},
    "f": ["patent_number", "patent_title", "patent_date", "assignee_organization"],
    "o": {"per_page": 100, "sort": [{"patent_date": "desc"}]},
}
resp  = requests.post(url, json=payload)
data  = resp.json()
df    = pd.DataFrame(data.get("patents", []))
print(f"Patents for inventor {inventor_id}: {data.get('total_patent_count', 0)}")
print(df.head(5))
```

### Query Type 5: Date Range and Combined Filters

Combine multiple filters for targeted searches.

```python
import requests
import pandas as pd

# Patents in gene therapy (CPC A61K48) filed 2020-2024 by a US assignee
url = "https://api.patentsview.org/patents/query"
payload = {
    "q": {"_and": [
        {"_begins":    {"cpc_subgroup_id": "A61K48"}},
        {"_gte":       {"patent_date": "2020-01-01"}},
        {"_lte":       {"patent_date": "2024-12-31"}},
        {"_eq":        {"assignee_country": "US"}},
    ]},
    "f": [
        "patent_number", "patent_title", "patent_date",
        "assignee_organization", "patent_num_claims",
    ],
    "o": {"per_page": 100, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
data = resp.json()
df   = pd.DataFrame(data.get("patents", []))
print(f"Gene therapy patents 2020-2024 (US assignee): {data.get('total_patent_count', 0)}")
print(df[["patent_number", "patent_title", "patent_date", "assignee_organization"]].head(10))
```

### Query Type 6: Google Patents BigQuery

For large-scale corpus analytics, use the public Google Patents dataset in BigQuery.

```python
from google.cloud import bigquery

client = bigquery.Client(project="YOUR_GCP_PROJECT")

# Count CRISPR patents by year (Google Patents public data)
query = """
SELECT
    EXTRACT(YEAR FROM filing_date) AS filing_year,
    COUNT(*)                        AS patent_count,
    COUNT(DISTINCT assignee)        AS unique_assignees
FROM `patents-public-data.patents.publications`
WHERE
    (LOWER(title_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%'
     OR LOWER(abstract_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%')
    AND filing_date >= '2010-01-01'
    AND country_code = 'US'
GROUP BY filing_year
ORDER BY filing_year
"""
df_bq = client.query(query).to_dataframe()
print(df_bq)
print(f"Peak year: {df_bq.loc[df_bq.patent_count.idxmax(), 'filing_year']} "
      f"({df_bq.patent_count.max()} patents)")
```

## Key Parameters

| Parameter | Module | Default | Range / Options | Effect |
|-----------|--------|---------|-----------------|--------|
| `per_page` | PatentsView `"o"` | `25` | `1`–`10000` | Results per API call |
| `page` | PatentsView `"o"` | `1` | `1`–max pages | Page number for pagination |
| `sort` | PatentsView `"o"` | API default | any field + `"asc"`/`"desc"` | Sort order of results |
| `"f"` fields | PatentsView | minimal | any valid field list | Fields returned in response (controls payload size) |
| `"_begins"` | query operator | — | field + prefix string | Prefix match (e.g., CPC code prefix) |
| `"_contains"` | query operator | — | field + substring | Substring search (case-insensitive) |
| `"_text_any"` | query operator | — | field + keywords | Full-text search on title/abstract fields |

## Best Practices

1. **Request only the fields you need**: The `"f"` (fields) parameter controls what is returned. Requesting `patent_abstract` for thousands of patents significantly increases payload size and latency.

2. **Always handle pagination for large result sets**: PatentsView caps responses at 10,000 per page maximum. For queries returning >10,000 results, use date-range slicing or narrower CPC codes to split the query.

3. **Cache API responses to disk**: PatentsView is rate-limited; if building a dataset iteratively, save responses to JSON/CSV after each API call.
   ```python
   import json, pathlib
   cache = pathlib.Path("cache")
   cache.mkdir(exist_ok=True)
   cache_file = cache / "genentech_patents.json"
   if not cache_file.exists():
       resp = requests.post(url, json=payload)
       cache_file.write_text(resp.text)
   data = json.loads(cache_file.read_text())
   ```

4. **Use CPC codes for technology-specific searches, not just keywords**: Keywords miss synonyms and foreign-language patents; CPC codes are assigned by patent examiners and are more systematic.

5. **Validate assignee names**: Company names in patent records vary (e.g., "Genentech Inc.", "Genentech, Inc.", "GENENTECH INC"). Use `_contains` for fuzzy matching, then deduplicate in pandas.

## Common Workflows

### Workflow 1: Technology Landscape Analysis — Filing Trends by Year

**Goal**: Count patents filed in a CPC class by year and plot the trend.

```python
import requests
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict

def count_patents_by_year(cpc_prefix: str, start_year: int = 2010) -> pd.DataFrame:
    url  = "https://api.patentsview.org/patents/query"
    counts = defaultdict(int)
    page   = 1
    while True:
        payload = {
            "q": {"_and": [
                {"_begins": {"cpc_subgroup_id": cpc_prefix}},
                {"_gte":    {"patent_date": f"{start_year}-01-01"}},
            ]},
            "f": ["patent_number", "patent_date"],
            "o": {"per_page": 10000, "page": page},
        }
        resp    = requests.post(url, json=payload)
        patents = resp.json().get("patents", [])
        if not patents:
            break
        for p in patents:
            year = p["patent_date"][:4]
            counts[year] += 1
        total = resp.json().get("total_patent_count", 0)
        if sum(counts.values()) >= total:
            break
        page += 1

    df = pd.DataFrame(sorted(counts.items()), columns=["year", "count"])
    return df

df_trend = count_patents_by_year("C12N15", start_year=2010)

fig, ax = plt.subplots(figsize=(8, 4))
ax.bar(df_trend["year"], df_trend["count"], color="steelblue", edgecolor="white")
ax.set_xlabel("Year")
ax.set_ylabel("Patents granted")
ax.set_title("US Patents: C12N15 (Genetic Engineering) by Year")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("cpc_trend.png", dpi=150)
print(f"Trend plotted: {df_trend['count'].sum()} total patents -> cpc_trend.png")
```

### Workflow 2: Assignee Portfolio Comparison

**Goal**: Compare patent counts across multiple biotech companies in a target CPC class.

```python
import requests
import pandas as pd
import matplotlib.pyplot as plt

def count_patents_by_assignee(assignees: list, cpc_prefix: str) -> pd.DataFrame:
    url     = "https://api.patentsview.org/patents/query"
    records = []
    for assignee in assignees:
        payload = {
            "q": {"_and": [
                {"_contains": {"assignee_organization": assignee}},
                {"_begins":   {"cpc_subgroup_id": cpc_prefix}},
            ]},
            "f": ["patent_number"],
            "o": {"per_page": 1},  # only need total count
        }
        resp  = requests.post(url, json=payload)
        total = resp.json().get("total_patent_count", 0)
        records.append({"assignee": assignee, "patent_count": total})
        print(f"  {assignee}: {total} patents")

    df = pd.DataFrame(records).sort_values("patent_count", ascending=True)
    return df

companies = ["Genentech", "Amgen", "Regeneron", "AstraZeneca", "Novartis"]
df_comp   = count_patents_by_assignee(companies, cpc_prefix="A61K")

fig, ax = plt.subplots(figsize=(7, 4))
ax.barh(df_comp["assignee"], df_comp["patent_count"], color="salmon")
ax.set_xlabel("Patent count (A61K)")
ax.set_title("Pharmaceutical Patents by Assignee (CPC A61K)")
plt.tight_layout()
plt.savefig("assignee_comparison.png", dpi=150)
print("Comparison chart saved -> assignee_comparison.png")
```

## Expected Outputs

- `pd.DataFrame` with patent records (columns depend on requested `"f"` fields)
- `cpc_trend.png` — bar chart of patent counts by year
- `assignee_comparison.png` — horizontal bar chart comparing companies
- `total_patent_count` in API response gives the full corpus size for a query

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `HTTPError 429 Too Many Requests` | Exceeded 45 req/min rate limit | Add `time.sleep(1.5)` between requests; request a free API key |
| Empty `patents` list in response | Query too narrow or field name incorrect | Check field names in [PatentsView API docs](https://patentsview.org/apis/api-endpoints/patents); test query in the web UI first |
| Results miss known patents | Exact string matching on assignee name | Use `_contains` instead of `_eq`; check for name variants |
| `KeyError: patent_date` | Field not requested in `"f"` list | Add `"patent_date"` to the `"f"` array |
| BigQuery auth error | GCP credentials not configured | Run `gcloud auth application-default login` or set `GOOGLE_APPLICATION_CREDENTIALS` |
| CPC prefix returns no results | Invalid CPC code or typo | Verify code at [CPC classification browser](https://www.cooperativepatentclassification.org/cpcSchemeAndDefinitions/table.html) |

## References

- [PatentsView API Documentation](https://patentsview.org/apis/api-endpoints/patents) — full field list and query operator reference
- [PatentsView API Explorer](https://patentsview.org/query) — browser-based query builder for testing
- [Google Patents Public Data (BigQuery)](https://cloud.google.com/blog/topics/public-datasets/google-patents-public-data-connecting-public-paid-and-private-patent-data) — schema documentation
- [CPC Classification Browser](https://www.cooperativepatentclassification.org/cpcSchemeAndDefinitions/table.html) — browse life sciences CPC codes
- [USPTO Bulk Data Storage System](https://bulkdata.uspto.gov/) — full-text patent XML downloads