---
name: "jaspar-database"
description: "JASPAR 2024 TF binding profiles via REST API and pyJASPAR. Retrieve PFMs/PWMs by TF name, JASPAR ID, species, or structural class. Scan DNA for TFBS; browse by taxon (human, mouse) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database."
license: "CC-BY-4.0"
---

# JASPAR Database

## Overview

JASPAR is a curated, open-access database of transcription factor (TF) binding profiles represented as position frequency matrices (PFMs). The 2024 release contains 1,209 profiles in the CORE vertebrate collection, covering 783 TFs with experimentally validated binding data from SELEX, ChIP-seq, and PBM experiments. Access is free via the JASPAR REST API at `https://jaspar.elixir.no/api/v1/` — no authentication required — and through the `pyJASPAR` Python library for matrix retrieval and manipulation.

## When to Use

- Looking up the PWM or PFM for a specific TF by name (e.g., CTCF, SP1, GATA1) to use as motif input for a scanning tool
- Retrieving all JASPAR profiles for a species (e.g., Homo sapiens, Mus musculus) to build a motif library for enrichment analysis
- Scanning a DNA promoter sequence for predicted TF binding sites using a known PWM
- Finding all TFs of a given structural class (bHLH, zinc finger, homeodomain) to build a TF family binding profile set
- Getting metadata for a JASPAR matrix: number of binding sites, information content, GC content, experiment type
- Downloading complete JASPAR collection sets (CORE, UNVALIDATED, CNE) in JASPAR or MEME format for batch analysis
- Use `homer-motif-analysis` instead when you need de novo motif discovery from ChIP-seq peaks; JASPAR is for retrieving known matrices
- For regulatory element annotations tied to a genomic region use `encode-database` or `regulomedb-database`

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`, `numpy`
- **Optional**: `pyJASPAR` (Python library wrapping JASPAR REST API with BIOPYTHON motif objects)
- **Data requirements**: TF gene symbols, JASPAR matrix IDs (e.g., `MA0139.1`), or DNA sequences (string or FASTA)
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between batch requests

```bash
pip install requests pandas matplotlib numpy
pip install pyJASPAR   # optional; pulls in biopython
```

## Quick Start

```python
import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

# Search for CTCF profile in the CORE vertebrate collection
r = requests.get(f"{JASPAR_API}/matrix/", params={
    "search": "CTCF",
    "collection": "CORE",
    "tax_group": "vertebrates",
    "format": "json"
}, timeout=15)
r.raise_for_status()
results = r.json()
print(f"Profiles found: {results['count']}")
for m in results["results"][:3]:
    print(f"  {m['matrix_id']}  {m['name']}  sites={m['sites']}  type={m['type']}")
# Profiles found: 2
#   MA0139.1  CTCF  sites=190  type=ChIP-seq
#   MA1929.1  CTCF  sites=2135  type=ChIP-seq
```

## Core API

### Query 1: Matrix Search

Search for TF profiles by TF name, species, collection, or taxonomic group. Returns a paginated list of matching profile records.

```python
import requests, time

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def jaspar_search(search=None, collection="CORE", tax_id=None, tax_group=None,
                  tf_class=None, tf_family=None, page_size=50):
    """Search JASPAR matrices. Returns list of result dicts."""
    params = {"format": "json", "page_size": page_size}
    if search:      params["search"]     = search
    if collection:  params["collection"] = collection
    if tax_id:      params["tax_id"]     = tax_id
    if tax_group:   params["tax_group"]  = tax_group
    if tf_class:    params["tf_class"]   = tf_class
    if tf_family:   params["tf_family"]  = tf_family

    all_results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        all_results.extend(data["results"])
        url = data.get("next")   # follow pagination
        time.sleep(0.3)
    return all_results

# Example: all CORE vertebrate profiles for GATA family
gata_profiles = jaspar_search(search="GATA", collection="CORE", tax_group="vertebrates")
print(f"GATA profiles: {len(gata_profiles)}")
for m in gata_profiles[:4]:
    print(f"  {m['matrix_id']}  {m['name']:12s}  {m.get('tf_class','')}  sites={m['sites']}")
```

### Query 2: Matrix Retrieval

Fetch the full profile record for a specific matrix ID, including the raw PFM counts, metadata, and TF annotations.

```python
import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_matrix(matrix_id):
    """Return full matrix record for a JASPAR ID (e.g. 'MA0139.1')."""
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "json"}, timeout=15)
    r.raise_for_status()
    return r.json()

m = get_matrix("MA0139.1")   # CTCF
print(f"ID: {m['matrix_id']}  Name: {m['name']}")
print(f"Collection: {m['collection']}  Type: {m['type']}")
print(f"Species: {[s['name'] for s in m.get('species', [])]}")
print(f"UniProt: {m.get('uniprot_ids', [])}")
print(f"Sites: {m['sites']}  Binding sites used to build matrix")
print(f"TF class: {m.get('class_name', 'n/a')}  Family: {m.get('family_name', 'n/a')}")

# PFM structure: dict mapping position (as str) -> {A, C, G, T: count}
pfm = m["pfm"]
n_positions = len(pfm)
print(f"\nPFM length: {n_positions} positions")
print(f"Position 0: {pfm['0']}")   # {A: x, C: y, G: z, T: w}
# Position 0: {'A': 87, 'C': 12, 'G': 22, 'T': 69}
```

### Query 3: PWM Computation from PFM

Convert a raw PFM (count matrix) to a position weight matrix (PWM) using log-odds scoring. The PWM is used for binding site scanning.

```python
import requests, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def pfm_to_pwm(pfm_dict, pseudocount=0.8, background=None):
    """
    Convert JASPAR PFM dict to PWM (log2 odds).
    pfm_dict: dict of str(position) -> {A, C, G, T: float}
    Returns: numpy array shape (4, L), rows = [A, C, G, T]
    """
    if background is None:
        background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    bg = np.array([background[b] for b in bases])[:, None]
    pwm = np.log2(freqs / bg)
    return pwm   # shape (4, L)

r = requests.get(f"{JASPAR_API}/matrix/MA0139.1/", params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
pwm = pfm_to_pwm(pfm)

print(f"PWM shape: {pwm.shape}  (4 bases × {pwm.shape[1]} positions)")
print(f"Min score per position: {pwm.min(axis=0)[:5]}")
print(f"Max score per position: {pwm.max(axis=0)[:5]}")
# PWM shape: (19, ) -> transposed view: (4, 19)
# Min score per position: [-2.32 -3.64 -3.64 -1.32 -3.64]
# Max score per position: [ 1.87  1.82  1.87  1.71  1.90]
```

### Query 4: Sequence Scanning

Scan a DNA sequence for TFBS matches by sliding the PWM across the sequence and computing log-odds scores at each position.

```python
import requests, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

BASE_IDX = {"A": 0, "C": 1, "G": 2, "T": 3}

def pfm_to_pwm(pfm_dict, pseudocount=0.8):
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    return np.log2(freqs / 0.25)

def scan_sequence(seq, pwm, threshold_pct=0.80):
    """
    Slide pwm over seq, return hits above threshold_pct of max possible score.
    Returns list of (position, score, strand).
    """
    seq = seq.upper()
    L = pwm.shape[1]
    max_score = pwm.clip(min=0).sum(axis=0).sum()
    min_score = pwm.clip(max=0).sum(axis=0).sum()
    threshold = min_score + threshold_pct * (max_score - min_score)
    hits = []
    for i in range(len(seq) - L + 1):
        window = seq[i:i+L]
        if "N" in window:
            continue
        score = sum(pwm[BASE_IDX[window[j]], j] for j in range(L))
        if score >= threshold:
            hits.append((i, round(score, 3), "+"))
    return hits, max_score, threshold

# Fetch CTCF matrix and scan a synthetic CTCF-like sequence
r = requests.get(f"{JASPAR_API}/matrix/MA0139.1/", params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
pwm = pfm_to_pwm(pfm)

# 100 bp sequence with known CTCF consensus embedded
seq = ("GCAGGTTTAAGCTTCCTGGCATTTAAGCTTCCTGGCATTTCCCCAGGGGGCGGAGGCAGAG"
       "CCGCGAGCCGCGAGCCGCGAGCCGCGAGCCGCGAGTTTAAG")
hits, max_score, thresh = scan_sequence(seq, pwm, threshold_pct=0.80)
print(f"Max PWM score: {max_score:.2f}  |  Threshold (80%): {thresh:.2f}")
print(f"Hits: {len(hits)}")
for pos, score, strand in hits:
    print(f"  pos={pos}  score={score:.2f}  {strand}  seq={seq[pos:pos+pwm.shape[1]]}")
```

### Query 5: Taxon Browser

List all JASPAR CORE profiles for a specific organism, identified by NCBI taxonomy ID.

```python
import requests, time, pandas as pd

JASPAR_API = "https://jaspar.elixir.no/api/v1"

# Common taxonomy IDs
TAX_IDS = {
    "Homo sapiens":    9606,
    "Mus musculus":    10090,
    "Rattus norvegicus": 10116,
    "Drosophila melanogaster": 7227,
    "Saccharomyces cerevisiae": 4932,
}

def get_species_profiles(tax_id, collection="CORE"):
    """Return all matrices for a species tax ID."""
    params = {"tax_id": tax_id, "collection": collection, "format": "json", "page_size": 100}
    results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

human_profiles = get_species_profiles(TAX_IDS["Homo sapiens"])
print(f"Human CORE profiles: {len(human_profiles)}")
df = pd.DataFrame([{
    "matrix_id": m["matrix_id"],
    "name": m["name"],
    "tf_class": m.get("class_name", ""),
    "tf_family": m.get("family_name", ""),
    "sites": m["sites"],
    "type": m["type"],
} for m in human_profiles])
print(df.head(5).to_string(index=False))
print(f"\nExperiment types:\n{df['type'].value_counts().to_string()}")
```

### Query 6: TF Class and Family Browser

Find all profiles belonging to a specific TF structural class or family, useful for building class-specific motif libraries.

```python
import requests, time

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_class_profiles(tf_class=None, tf_family=None, collection="CORE", tax_group="vertebrates"):
    """Return all profiles for a TF structural class or family."""
    params = {"collection": collection, "tax_group": tax_group, "format": "json", "page_size": 100}
    if tf_class:  params["tf_class"]  = tf_class
    if tf_family: params["tf_family"] = tf_family
    results = []
    url = f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

# Common TF classes: "Zinc-coordinating", "Basic leucine zipper", "Helix-turn-helix"
# Common TF families: "C2H2 ZF", "bHLH", "bZIP", "Homeodomain"
bhlh = get_class_profiles(tf_family="bHLH", collection="CORE", tax_group="vertebrates")
print(f"bHLH vertebrate profiles: {len(bhlh)}")
for m in bhlh[:5]:
    print(f"  {m['matrix_id']:12s} {m['name']:15s} sites={m['sites']:5d}  {m['type']}")
# bHLH vertebrate profiles: 47
#   MA0006.1     Ahr::Arnt      sites=6    ChIP-seq
#   MA0010.1     Tal1::Gata1    sites=5    SELEX
```

## Key Concepts

### JASPAR Collections

JASPAR organizes profiles into curated collections:

| Collection | Description | Profile count |
|------------|-------------|---------------|
| `CORE` | Manually curated, non-redundant, high-quality profiles | ~1,200 (2024) |
| `CNE` | Profiles derived from conserved noncoding elements | ~100 |
| `UNVALIDATED` | Profiles not yet manually curated | ~1,000 |
| `PHYLOFACTS` | Profiles from phylogenetically constrained sites | ~100 |
| `POLII` | RNA polymerase II binding profiles | ~10 |

For most analyses use `CORE`. The JASPAR CORE 2024 vertebrate collection is the default reference for motif enrichment tools.

### Matrix ID Versioning

JASPAR matrix IDs have the format `MA{number}.{version}` (e.g., `MA0139.1`). A new version is released when the binding data is updated. When scripting, use the versioned ID for reproducibility. Searching by name (e.g., `CTCF`) returns all versions; select the highest-numbered version for the most up-to-date matrix.

### Information Content

The information content (IC) at each position (in bits) measures binding site specificity:

- IC = 2 - H(position), where H is the Shannon entropy
- High IC (close to 2 bits) = near-invariant base (e.g., always A)
- Low IC (close to 0) = little positional preference
- Total IC = sum over all positions; high total IC = more specific binding

```python
import numpy as np

def information_content(pfm_dict, pseudocount=0.8):
    """Compute per-position IC and total IC for a JASPAR PFM."""
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    entropy = -np.sum(freqs * np.log2(freqs + 1e-12), axis=0)
    ic_per_pos = 2 - entropy
    return ic_per_pos, ic_per_pos.sum()

import requests
r = requests.get("https://jaspar.elixir.no/api/v1/matrix/MA0139.1/",
                 params={"format": "json"}, timeout=15)
pfm = r.json()["pfm"]
ic_pos, total_ic = information_content(pfm)
print(f"Total IC: {total_ic:.2f} bits")
print(f"Max IC position: {ic_pos.argmax()}  ({ic_pos.max():.2f} bits)")
# Total IC: 19.47 bits
# Max IC position: 9  (1.99 bits)
```

## Common Workflows

### Workflow 1: Build a Human TF Motif Library and Export to MEME Format

**Goal**: Download all human CORE profiles and write them in MEME minimal format for use with FIMO, AME, or TOMTOM.

```python
import requests, time, numpy as np

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_all_profiles(tax_id=9606, collection="CORE"):
    params = {"tax_id": tax_id, "collection": collection, "format": "json", "page_size": 100}
    results, url = [], f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return results

def pfm_to_freq(pfm_dict, pseudocount=0.1):
    """Return (4, L) frequency matrix [A, C, G, T]."""
    bases = ["A", "C", "G", "T"]
    L = len(pfm_dict)
    counts = np.array([[pfm_dict[str(i)][b] for i in range(L)] for b in bases], dtype=float)
    counts += pseudocount
    return counts / counts.sum(axis=0, keepdims=True)

profiles = get_all_profiles(tax_id=9606, collection="CORE")
print(f"Downloaded {len(profiles)} human CORE profiles")

meme_lines = [
    "MEME version 4",
    "ALPHABET= ACGT",
    "strands: + -",
    "Background letter frequencies",
    "A 0.25 C 0.25 G 0.25 T 0.25",
    "",
]
for m in profiles:
    pfm = m["pfm"]
    freq = pfm_to_freq(pfm)
    L = freq.shape[1]
    meme_lines.append(f"MOTIF {m['matrix_id']} {m['name']}")
    meme_lines.append(f"letter-probability matrix: alength= 4 w= {L} nsites= {m['sites']}")
    for j in range(L):
        row = " ".join(f"{freq[b, j]:.4f}" for b in range(4))
        meme_lines.append(f"  {row}")
    meme_lines.append("")

output_path = "jaspar_human_core_2024.meme"
with open(output_path, "w") as f:
    f.write("\n".join(meme_lines))
print(f"Written: {output_path}  ({len(profiles)} motifs)")
```

### Workflow 2: Promoter Scan for Multiple TFs and Visualize PWM Logos

**Goal**: Download PWMs for a set of TFs and scan a promoter sequence, then visualize the best-scoring hit as a bar logo.

```python
import requests, numpy as np, time
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

JASPAR_API = "https://jaspar.elixir.no/api/v1"
BASE_IDX = {"A": 0, "C": 1, "G": 2, "T": 3}
BASE_COLORS = {"A": "#2ca02c", "C": "#1f77b4", "G": "#ff7f0e", "T": "#d62728"}

def fetch_pwm(matrix_id, pseudocount=0.8):
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "json"}, timeout=15)
    r.raise_for_status()
    pfm = r.json()["pfm"]
    L = len(pfm)
    counts = np.array([[pfm[str(i)][b] for i in range(L)] for b in ["A","C","G","T"]], dtype=float)
    counts += pseudocount
    freqs = counts / counts.sum(axis=0, keepdims=True)
    return np.log2(freqs / 0.25), freqs

def scan(seq, pwm, pct=0.80):
    seq = seq.upper()
    L = pwm.shape[1]
    max_s = pwm.clip(min=0).sum()
    min_s = pwm.clip(max=0).sum()
    thresh = min_s + pct * (max_s - min_s)
    return [(i, sum(pwm[BASE_IDX[seq[i+j]], j] for j in range(L)))
            for i in range(len(seq)-L+1) if "N" not in seq[i:i+L]
            and sum(pwm[BASE_IDX[seq[i+j]], j] for j in range(L)) >= thresh]

def plot_logo(freqs, title, outfile):
    """Bar-chart sequence logo from frequency matrix (4, L)."""
    L = freqs.shape[1]
    entropy = -np.sum(freqs * np.log2(freqs + 1e-12), axis=0)
    ic = 2 - entropy
    bases = ["A", "C", "G", "T"]
    fig, ax = plt.subplots(figsize=(max(6, L * 0.5), 2.5))
    bottom = np.zeros(L)
    for idx, base in enumerate(bases):
        heights = freqs[idx] * ic
        ax.bar(range(L), heights, bottom=bottom, color=BASE_COLORS[base], label=base, width=0.8)
        bottom += heights
    ax.set_xticks(range(L))
    ax.set_xticklabels([str(i+1) for i in range(L)], fontsize=7)
    ax.set_ylabel("Information Content (bits)")
    ax.set_title(title, fontsize=10)
    ax.legend(handles=[mpatches.Patch(color=BASE_COLORS[b], label=b) for b in bases],
              loc="upper right", fontsize=7, ncol=4)
    plt.tight_layout()
    plt.savefig(outfile, dpi=150, bbox_inches="tight")
    plt.close()
    print(f"Saved {outfile}")

# TP53 promoter-region fragment (GRCh38 chr17:7,687,000-7,687,100)
promoter = ("GCAGAGGCGGAGGATTTGCCTTTTTTCGAGTTGGTGAGAGATCTGGGGCGGGGCAGGGCC"
            "CTGGAACGGCAGGACGGAGAGCAAGGCCGGGGAAGGGCGGGAGCGGGCGGG")

# Scan for SP1 (MA0080.4) and TP53 (MA0106.3) hits
for matrix_id, tf_name in [("MA0080.4", "SP1"), ("MA0106.3", "TP53")]:
    pwm, freqs = fetch_pwm(matrix_id)
    hits = scan(promoter, pwm, pct=0.75)
    print(f"{tf_name} ({matrix_id}): {len(hits)} hits at >=75% threshold")
    for pos, score in hits[:3]:
        print(f"   pos={pos}  score={score:.2f}  {promoter[pos:pos+pwm.shape[1]]}")
    plot_logo(freqs, f"{tf_name} ({matrix_id}) PWM logo", f"{tf_name}_logo.png")
    time.sleep(0.5)
```

### Workflow 3: TF Co-binding Partner Discovery via Shared Matrix Families

**Goal**: Find all TF families that have profiles in JASPAR, then retrieve family members to identify potential co-binding partners of a TF of interest.

```python
import requests, time, pandas as pd
from collections import Counter

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_profiles_df(tax_group="vertebrates", collection="CORE"):
    params = {"tax_group": tax_group, "collection": collection, "format": "json", "page_size": 100}
    results, url = [], f"{JASPAR_API}/matrix/"
    while url:
        r = requests.get(url, params=params if url == f"{JASPAR_API}/matrix/" else None, timeout=15)
        r.raise_for_status()
        data = r.json()
        results.extend(data["results"])
        url = data.get("next")
        time.sleep(0.3)
    return pd.DataFrame([{
        "matrix_id": m["matrix_id"],
        "name": m["name"],
        "tf_class": m.get("class_name", "Unknown"),
        "tf_family": m.get("family_name", "Unknown"),
        "sites": m["sites"],
        "type": m["type"],
        "uniprot": ";".join(m.get("uniprot_ids", [])),
    } for m in results])

df = get_profiles_df(tax_group="vertebrates", collection="CORE")
print(f"Total CORE vertebrate profiles: {len(df)}")

# Top TF families
family_counts = df["tf_family"].value_counts()
print(f"\nTop 10 TF families:")
print(family_counts.head(10).to_string())

# Co-binding: find all TFs in the same family as CTCF
ctcf_row = df[df["name"] == "CTCF"].iloc[0]
ctcf_family = ctcf_row["tf_family"]
co_family = df[df["tf_family"] == ctcf_family][["matrix_id", "name", "sites", "type"]]
print(f"\nTFs in {ctcf_family} family (potential CTCF co-binders by class):")
print(co_family.to_string(index=False))
df.to_csv("jaspar_core_vertebrates.csv", index=False)
print(f"\nSaved jaspar_core_vertebrates.csv")
```

## Key Parameters

| Parameter | Endpoint | Default | Range / Options | Effect |
|-----------|----------|---------|-----------------|--------|
| `search` | `/matrix/` | — | Any string (TF name, gene symbol) | Full-text search across name and aliases |
| `collection` | `/matrix/` | — | `CORE`, `UNVALIDATED`, `CNE`, `POLII`, `PHYLOFACTS` | Restricts to a JASPAR sub-collection |
| `tax_group` | `/matrix/` | — | `vertebrates`, `insects`, `plants`, `fungi`, `nematodes`, `urochordates` | Filter by broad taxonomic group |
| `tax_id` | `/matrix/` | — | NCBI taxonomy ID integer (e.g., `9606`) | Restrict to a single species |
| `tf_class` | `/matrix/` | — | `"Zinc-coordinating"`, `"Basic leucine zipper"`, `"Helix-turn-helix"`, etc. | Filter by TF structural class |
| `tf_family` | `/matrix/` | — | `"C2H2 ZF"`, `"bHLH"`, `"bZIP"`, `"Homeodomain"`, etc. | Filter by TF structural family |
| `page_size` | `/matrix/` | 10 | 1–100 | Results per page; use 100 for batch downloads |
| `pseudocount` | PFM→PWM (local) | 0.8 | 0.01–1.0 | Smooths zero-count positions; higher = less extreme PWM values |
| `threshold_pct` | scan (local) | 0.80 | 0.50–0.99 | Fraction of max score required to call a hit; lower = more permissive |

## Best Practices

1. **Pin matrix ID versions for reproducibility**: Use `MA0139.1` (not just `CTCF`) in scripts and manuscripts so results do not silently change across JASPAR releases.

2. **Always add pseudocounts when computing PWMs**: Raw PFMs contain zero counts for rare positions. A zero count produces -inf in log space, eliminating any sequence with that nucleotide. Use `pseudocount = 0.8` (JASPAR recommendation) or `pseudocount = sqrt(sites) / 4`.

3. **Use the CORE collection for standard analyses**: `UNVALIDATED` profiles have not been manually curated and may contain lower-confidence motifs. Reserve `UNVALIDATED` for exploratory analyses.

4. **Respect pagination**: JASPAR returns at most 100 results per page. Always follow the `next` URL in responses when building complete profile sets.

5. **For batch scanning, use FIMO (MEME suite) rather than manual sliding window**: The manual scanning above is educational. For production use, export matrices to MEME format (Workflow 1) and use `fimo --thresh 1e-4 motifs.meme sequence.fa`.

## Common Recipes

### Recipe: Fetch All Versions of a TF's Matrix

When to use: Compare old and new profile versions of the same TF to check for changes.

```python
import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_all_versions(tf_name, collection="CORE"):
    r = requests.get(f"{JASPAR_API}/matrix/", params={
        "search": tf_name, "collection": collection, "format": "json", "page_size": 50
    }, timeout=15)
    r.raise_for_status()
    return r.json()["results"]

versions = get_all_versions("SP1")
print(f"SP1 matrix versions in CORE: {len(versions)}")
for m in sorted(versions, key=lambda x: x["matrix_id"]):
    print(f"  {m['matrix_id']:12s}  sites={m['sites']:5d}  type={m['type']}")
# SP1 matrix versions in CORE: 4
#   MA0080.1      sites=   18  type=SELEX
#   MA0080.2      sites=   20  type=SELEX
#   MA0080.3      sites=  146  type=ChIP-seq
#   MA0080.4      sites=  481  type=ChIP-seq
```

### Recipe: Retrieve Matrix in JASPAR Flat-File Format

When to use: Download a matrix in the legacy JASPAR text format for tools that accept it directly.

```python
import requests

JASPAR_API = "https://jaspar.elixir.no/api/v1"

def get_jaspar_format(matrix_id):
    """Return matrix as JASPAR flat-file string."""
    r = requests.get(f"{JASPAR_API}/matrix/{matrix_id}/", params={"format": "jaspar"}, timeout=15)
    r.raise_for_status()
    return r.text

jaspar_str = get_jaspar_format("MA0139.1")
print(jaspar_str[:200])
# >MA0139.1  CTCF
# A [ 87  167  281  56  8  32  15  4  55 ...]
# C [ 12   30   98  42  111  30  ...]
# G [ 22   36  160  40  66  21  ...]
# T [ 69   67  ...  ]

with open("CTCF_MA0139.1.jaspar", "w") as f:
    f.write(jaspar_str)
print("Saved CTCF_MA0139.1.jaspar")
```

### Recipe: pyJASPAR Quick Motif Retrieval

When to use: Retrieve a motif as a BioPython `motifs.Motif` object when downstream tools expect that interface.

```python
# Requires: pip install pyJASPAR
import pyJASPAR

db = pyJASPAR.JASPAR2024(auto_reverse_complement=True)
motif = db.fetch_motif_by_id("MA0139.1")   # CTCF
print(f"Name: {motif.name}")
print(f"Matrix ID: {motif.matrix_id}")
print(f"Length: {len(motif)}")
print(f"Consensus: {motif.consensus}")
# Name: CTCF
# Matrix ID: MA0139.1
# Length: 19
# Consensus: CCGCGNGGNGGCAG

# Fetch multiple motifs for a TF family
motifs = db.fetch_motifs(collection="CORE", tax_id=9606, tf_family="bHLH")
print(f"Human bHLH motifs: {len(motifs)}")
for m in motifs[:3]:
    print(f"  {m.matrix_id}  {m.name}  len={len(m)}")
```

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| Empty `results` list from search | TF not in JASPAR, wrong collection, or wrong `tax_group` | Try `collection=None` to search all collections; check TF alias (e.g., NF-kB → RELA) |
| `404 Not Found` for matrix ID | Invalid or misspelled matrix ID | Verify ID format: `MA` + 4 digits + `.` + version (e.g., `MA0139.1`); search by name first |
| `pfm['0']` missing key | Some JASPAR profiles have 0-indexed positions as integers, not strings | Cast position keys: `pfm_dict = {str(k): v for k, v in pfm.items()}` |
| PWM scan produces no hits | Threshold too strict or sequence too short | Lower `threshold_pct` to 0.70; check sequence length vs motif length |
| `pyJASPAR` install fails | Requires Python ≥3.8 and C extensions for BIOPYTHON | Use `requests`-based API directly; pyJASPAR is optional |
| Pagination stops early | `next` field is `null` before expected total | Check `count` field in first response vs `len(results)` after loop |
| High IC positions show wrong base | PFM row order assumed incorrectly | JASPAR always returns `{A, C, G, T}` keys; never assume positional ordering |

## Related Skills

- `homer-motif-analysis` — de novo motif discovery from ChIP-seq or ATAC-seq peak sets; complements JASPAR known-motif library
- `regulomedb-database` — regulatory variant scoring using TF binding evidence overlapping JASPAR motifs
- `encode-database` — download TF ChIP-seq peak files that can be cross-referenced with JASPAR profiles
- `remap-database` — TF binding peak sets from ChIP-seq experiments for binding site validation
- `macs3-peak-calling` — produce ChIP-seq peak BED files for downstream JASPAR motif enrichment

## References

- [JASPAR REST API v1 documentation](https://jaspar.elixir.no/api/v1/) — interactive browser and endpoint reference
- [Castro-Mondragon et al., Nucleic Acids Research 2022](https://doi.org/10.1093/nar/gkab1113) — JASPAR 2022 flagship paper describing collection structure and validation
- [pyJASPAR GitHub repository](https://github.com/asntech/pyjaspar) — Python library wrapping the JASPAR API with BioPython motif objects
- [Fornes et al., Nucleic Acids Research 2020](https://doi.org/10.1093/nar/gkz1001) — JASPAR 2020 paper introducing the CORE 2020 collection