---
name: "remap-database"
description: "Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks."
license: "CC-BY-4.0"
---

# ReMap Database

## Overview

ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at `https://remap2022.univ-amu.fr/api/` and bulk BED file downloads; no authentication required.

## When to Use

- Finding all TFs with ChIP-seq peaks overlapping a genomic region of interest (e.g., a GWAS SNP locus or candidate enhancer)
- Retrieving TF peaks near a gene's transcription start site to map its proximal regulatory landscape
- Listing all TFs available in ReMap for human or mouse with their peak and dataset counts
- Filtering ChIP-seq peaks by regulatory biotype annotation (promoter, enhancer, exon, intron, intergenic) for a TF in a specific cell line
- Downloading a BED file of all binding peaks for a TF across all cell types for offline analysis
- Identifying co-binding TFs at a locus by querying all overlapping peaks and grouping by TF name
- Use `jaspar-database` instead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locations
- For ENCODE-specific regulatory tracks and accessibility data use `encode-database`; ReMap aggregates TF binding peaks from many sources including ENCODE

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: genomic coordinates (GRCh38/hg38 or hg19), gene names, or TF names
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between batch requests to avoid server overload
- **Note**: The ReMap API is a research API; endpoint availability may vary. All examples include a BED download fallback.

```bash
pip install requests pandas matplotlib
```

## Quick Start

```python
import requests

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

# Query TF peaks overlapping a genomic region
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
    "chr": "chr17",
    "start": 7_670_000,
    "end": 7_690_000,
    "assembly": "hg38"
}, timeout=30)
r.raise_for_status()
peaks = r.json()
print(f"Peaks overlapping TP53 locus: {len(peaks)}")
tfs = set(p.get("name", "").split(":")[0] for p in peaks)
print(f"Unique TFs: {len(tfs)}")
print(f"TF names (first 10): {sorted(tfs)[:10]}")
```

## Core API

### Query 1: Region Overlap

Find all TF ChIP-seq peaks overlapping a specified genomic window. Returns peak records including TF name, cell type, coordinates, and score.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_region(chrom, start, end, assembly="hg38", timeout=30):
    """Return all ReMap peaks overlapping [chrom:start-end]."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

# Query 100 kb window on chr17 around TP53
peaks = query_region("chr17", 7_670_000, 7_690_000, assembly="hg38")
print(f"Total peaks: {len(peaks)}")

# Parse name field: format is "TF:experiment_id:cell_type"
rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    tf   = parts[0] if len(parts) > 0 else ""
    exp  = parts[1] if len(parts) > 1 else ""
    cell = parts[2] if len(parts) > 2 else ""
    rows.append({
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "tf_name": tf,
        "experiment_id": exp,
        "cell_type": cell,
        "score": p.get("score", 0),
    })

df = pd.DataFrame(rows)
print(f"\nUnique TFs: {df['tf_name'].nunique()}")
print(f"Top TFs by peak count:\n{df['tf_name'].value_counts().head(10).to_string()}")
```

```python
# Fallback: if API is unavailable, use a locally downloaded BED file
# Download from: https://remap2022.univ-amu.fr/download_page
# e.g., remap2022_all_macs2_hg38_v1_0.bed.gz

import pandas as pd

def query_region_from_bed(bed_file, chrom, start, end):
    """Filter a ReMap BED file for overlapping peaks."""
    cols = ["chr", "start", "end", "name", "score", "strand",
            "thick_start", "thick_end", "color"]
    df = pd.read_csv(bed_file, sep="\t", header=None, names=cols,
                     compression="infer")
    mask = (df["chr"] == chrom) & (df["end"] > start) & (df["start"] < end)
    return df[mask].reset_index(drop=True)

# Usage (requires downloaded BED):
# df = query_region_from_bed("remap2022_all_macs2_hg38_v1_0.bed.gz",
#                             "chr17", 7_670_000, 7_690_000)
```

### Query 2: Gene-Centric Query

Retrieve all TF ChIP-seq peaks near a gene's TSS, providing a promoter-proximal regulatory landscape for the gene.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
    """Return all ReMap peaks near a gene TSS."""
    r = requests.get(f"{REMAP_API}/peaks/gene/", params={
        "gene": gene_name, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_gene_peaks("MYC", assembly="hg38")
print(f"Peaks near MYC TSS: {len(peaks)}")

rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    rows.append({
        "tf_name": parts[0] if parts else "",
        "cell_type": parts[2] if len(parts) > 2 else "",
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "score": p.get("score", 0),
        "biotype": p.get("biotype", ""),
    })

df = pd.DataFrame(rows)
print(f"\nTFs near MYC TSS ({df['tf_name'].nunique()} unique):")
print(df["tf_name"].value_counts().head(10).to_string())
print(f"\nCell types represented: {df['cell_type'].nunique()}")
```

### Query 3: TF Browser

List all TFs available in ReMap for a given genome assembly, with peak and experiment counts.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def list_tfs(assembly="hg38", timeout=30):
    """Return all TFs in ReMap for the given assembly with statistics."""
    r = requests.get(f"{REMAP_API}/tfbs/list/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

def get_database_stats(assembly="hg38", timeout=30):
    """Return overall database statistics for the assembly."""
    r = requests.get(f"{REMAP_API}/stats/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

# Database overview
try:
    stats = get_database_stats("hg38")
    print(f"ReMap 2022 hg38 statistics:")
    for k, v in stats.items():
        print(f"  {k}: {v}")
except Exception as e:
    print(f"Stats endpoint unavailable: {e}")
    print("ReMap 2022 hg38: 165M peaks, 1,210 TFs, 8,113 datasets (from publication)")

# TF list
try:
    tfs = list_tfs("hg38")
    df_tfs = pd.DataFrame(tfs)
    print(f"\nTFs available (hg38): {len(df_tfs)}")
    if "peak_count" in df_tfs.columns:
        top = df_tfs.nlargest(10, "peak_count")[["name", "peak_count", "dataset_count"]]
        print("Top 10 TFs by peak count:")
        print(top.to_string(index=False))
except Exception as e:
    print(f"TF list endpoint unavailable: {e}")
    print("Use TF name queries directly (Query 4) or download TF-specific BED files.")
```

### Query 4: TF-Specific Peak Query

Retrieve all peaks for a named TF in a given assembly, optionally filtered by cell type.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
    """Return all ChIP-seq peaks for a TF across all cell types."""
    r = requests.get(f"{REMAP_API}/tfbs/name/", params={
        "name": tf_name, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_tf_peaks("CTCF", assembly="hg38")
print(f"CTCF peaks (all cell types): {len(peaks)}")

# Parse and summarize
rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    rows.append({
        "tf_name": parts[0] if parts else "",
        "cell_type": parts[2] if len(parts) > 2 else "",
        "chr":   p.get("chr",   p.get("chrom", "")),
        "start": p.get("start", 0),
        "end":   p.get("end",   0),
        "score": p.get("score", 0),
        "biotype": p.get("biotype", ""),
    })

df = pd.DataFrame(rows)
print(f"Cell types: {df['cell_type'].nunique()}")
print(f"Chromosomes: {df['chr'].nunique()}")
print(f"Peak width stats (bp):")
df["width"] = df["end"] - df["start"]
print(f"  Median: {df['width'].median():.0f}  Mean: {df['width'].mean():.0f}  "
      f"Min: {df['width'].min()}  Max: {df['width'].max()}")
```

### Query 5: Biotype Filter and Regulatory Annotation

Filter peaks by regulatory biotype annotation to identify binding at promoters, enhancers, or intergenic regions.

```python
import requests, pandas as pd, matplotlib.pyplot as plt

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def get_biotypes(assembly="hg38", timeout=30):
    """List all regulatory biotype categories available."""
    r = requests.get(f"{REMAP_API}/biotypes/", params={"assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

def query_tf_by_biotype(tf_name, biotype, assembly="hg38", timeout=30):
    """Retrieve TF peaks filtered by regulatory biotype."""
    r = requests.get(f"{REMAP_API}/peaks/biotype/", params={
        "name": tf_name, "biotype": biotype, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

# List available biotypes
try:
    biotypes = get_biotypes("hg38")
    print(f"Available biotypes: {biotypes}")
except Exception:
    biotypes = ["promoter", "enhancer", "exon", "intron", "intergenic", "UTR"]
    print(f"Using known biotypes: {biotypes}")

# Query CTCF peaks and plot biotype distribution
peaks = query_tf_peaks("CTCF", assembly="hg38")  # from Query 4 function above

def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
    r = requests.get(f"https://remap2022.univ-amu.fr/api/v1/tfbs/name/",
                     params={"name": tf_name, "assembly": assembly}, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_tf_peaks("CTCF")
rows = [{"biotype": p.get("biotype", "unknown"),
         "cell_type": p.get("name", "::").split(":")[2] if len(p.get("name","").split(":")) > 2 else ""}
        for p in peaks]
df = pd.DataFrame(rows)

biotype_counts = df["biotype"].value_counts()
biotype_counts = biotype_counts[biotype_counts > 0]
print(f"\nCTCF peak biotype distribution:")
print(biotype_counts.to_string())

# Stacked bar chart across top 5 cell types
top_cells = df["cell_type"].value_counts().head(5).index.tolist()
pivot = (df[df["cell_type"].isin(top_cells)]
         .groupby(["cell_type", "biotype"])
         .size()
         .unstack(fill_value=0))

fig, ax = plt.subplots(figsize=(9, 5))
pivot.plot(kind="bar", stacked=True, ax=ax, colormap="tab10", edgecolor="white")
ax.set_xlabel("Cell Type")
ax.set_ylabel("Peak Count")
ax.set_title("CTCF ChIP-seq Peak Biotype Distribution by Cell Type (ReMap 2022, hg38)")
ax.legend(title="Biotype", bbox_to_anchor=(1.01, 1), loc="upper left", fontsize=8)
plt.tight_layout()
plt.savefig("CTCF_biotype_distribution.png", dpi=150, bbox_inches="tight")
print("Saved CTCF_biotype_distribution.png")
```

## Key Concepts

### Peak Name Field Format

The `name` field in every ReMap peak record encodes three pieces of information as a colon-separated string:

```
TF_NAME:EXPERIMENT_ID:CELL_TYPE
```

For example: `CTCF:GSE30263.SRX028592:GM12878`

Always parse with `.split(":")` and guard against missing parts. Some records may have fewer than three components if metadata is incomplete.

### Assemblies

| Assembly code | Organism | Notes |
|---------------|----------|-------|
| `hg38` | Homo sapiens (GRCh38) | Primary human assembly in ReMap 2022 |
| `hg19` | Homo sapiens (GRCh37) | Legacy human assembly; fewer datasets |
| `mm10` | Mus musculus | Primary mouse assembly |
| `dm6` | Drosophila melanogaster | Smaller dataset collection |
| `tair10` | Arabidopsis thaliana | Plant TF dataset |

### BED File Download (API Fallback)

When the REST API is unavailable or for offline bulk analysis, ReMap provides pre-built BED files at `https://remap2022.univ-amu.fr/download_page`. Key files:

- `remap2022_all_macs2_hg38_v1_0.bed.gz` — all peaks, hg38 (large, ~5 GB)
- `remap2022_{TF}_macs2_hg38_v1_0.bed.gz` — per-TF peak files
- `remap2022_crm_macs2_hg38_v1_0.bed.gz` — cis-regulatory modules (merged peaks)

```python
import pandas as pd

def load_remap_bed(bed_path, chrom=None, start=None, end=None):
    """
    Load a ReMap BED file with optional region filter.
    Columns: chr, start, end, name (TF:exp:cell), score, strand,
             thick_start, thick_end, itemRgb
    """
    cols = ["chr", "start", "end", "name", "score", "strand",
            "thick_start", "thick_end", "itemRgb"]
    df = pd.read_csv(bed_path, sep="\t", header=None, names=cols,
                     compression="infer", low_memory=False)
    if chrom:
        df = df[df["chr"] == chrom]
    if start is not None and end is not None:
        df = df[(df["end"] > start) & (df["start"] < end)]
    # Parse name field
    parts = df["name"].str.split(":", expand=True)
    df["tf_name"]       = parts[0]
    df["experiment_id"] = parts[1] if 1 in parts.columns else ""
    df["cell_type"]     = parts[2] if 2 in parts.columns else ""
    return df.reset_index(drop=True)

# Usage example (offline):
# df = load_remap_bed("remap2022_CTCF_macs2_hg38_v1_0.bed.gz",
#                     chrom="chr17", start=7_670_000, end=7_690_000)
# print(df.head())
```

## Common Workflows

### Workflow 1: TF Co-occupancy Analysis at a Locus

**Goal**: Identify all TFs with ChIP-seq evidence at a genomic locus and rank by peak count, then export a co-occupancy matrix.

```python
import requests, time, pandas as pd, matplotlib.pyplot as plt

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_region(chrom, start, end, assembly="hg38", timeout=30):
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

def parse_peaks(peaks):
    rows = []
    for p in peaks:
        parts = p.get("name", "::").split(":")
        rows.append({
            "tf_name":  parts[0] if len(parts) > 0 else "unknown",
            "cell_type": parts[2] if len(parts) > 2 else "unknown",
            "chr":   p.get("chr",   p.get("chrom", "")),
            "start": p.get("start", 0),
            "end":   p.get("end",   0),
            "score": p.get("score", 0),
        })
    return pd.DataFrame(rows)

# BRCA1 promoter region (GRCh38)
peaks = query_region("chr17", 43_044_000, 43_050_000, assembly="hg38")
df = parse_peaks(peaks)
print(f"Peaks at BRCA1 promoter: {len(df)}")

# TF occupancy summary
tf_summary = (df.groupby("tf_name")
                .agg(peak_count=("tf_name", "count"),
                     cell_types=("cell_type", "nunique"),
                     mean_score=("score", "mean"))
                .sort_values("peak_count", ascending=False))
print(f"\nTop TFs at BRCA1 promoter:")
print(tf_summary.head(15).to_string())
tf_summary.to_csv("BRCA1_promoter_TF_occupancy.csv")

# Horizontal bar chart
top = tf_summary.head(20)
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(top.index[::-1], top["peak_count"][::-1], color="#1f77b4", edgecolor="white")
ax.set_xlabel("Number of ChIP-seq Peaks")
ax.set_title("TF Co-occupancy at BRCA1 Promoter (ReMap 2022, hg38)")
plt.tight_layout()
plt.savefig("BRCA1_promoter_TF_cooccupancy.png", dpi=150, bbox_inches="tight")
print("Saved BRCA1_promoter_TF_cooccupancy.png")
```

### Workflow 2: Gene Regulatory Profile — TSS-Proximal TF Binding Atlas

**Goal**: For a list of genes, retrieve their promoter-proximal TF binding profiles and compare the TF repertoires across genes.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
    try:
        r = requests.get(f"{REMAP_API}/peaks/gene/", params={
            "gene": gene_name, "assembly": assembly
        }, timeout=timeout)
        r.raise_for_status()
        return r.json()
    except Exception as e:
        print(f"  Warning: {gene_name} failed — {e}")
        return []

genes_of_interest = ["MYC", "TP53", "BRCA1", "EGFR", "CDK4"]
gene_tf_profiles = {}

for gene in genes_of_interest:
    peaks = query_gene_peaks(gene, assembly="hg38")
    if peaks:
        tfs = set()
        for p in peaks:
            parts = p.get("name", "").split(":")
            if parts:
                tfs.add(parts[0])
        gene_tf_profiles[gene] = tfs
        print(f"{gene}: {len(peaks)} peaks, {len(tfs)} unique TFs")
    time.sleep(0.5)

# Build binary TF presence matrix
all_tfs = sorted(set().union(*gene_tf_profiles.values()))
matrix = pd.DataFrame(
    {gene: [1 if tf in gene_tf_profiles.get(gene, set()) else 0 for tf in all_tfs]
     for gene in genes_of_interest},
    index=all_tfs
)
print(f"\nTF × Gene matrix: {matrix.shape}")
print(f"TFs shared by all genes: {(matrix.sum(axis=1) == len(genes_of_interest)).sum()}")
matrix.to_csv("gene_TF_binding_atlas.csv")
print("Saved gene_TF_binding_atlas.csv")
```

### Workflow 3: Download and Analyze TF Peak BED File

**Goal**: Download a TF-specific ReMap BED file and analyze its genomic distribution with pandas.

```python
import requests, gzip, io, pandas as pd, time

# ReMap provides per-TF BED files. For large-scale offline analysis:
REMAP_DOWNLOAD_BASE = "https://remap2022.univ-amu.fr/storage/remap2022/hg38/MACS2"

def download_tf_bed(tf_name, assembly="hg38", save_path=None):
    """
    Attempt to download TF-specific BED file from ReMap.
    Falls back to API region query if download unavailable.
    """
    filename = f"remap2022_{tf_name}_macs2_{assembly}_v1_0.bed.gz"
    url = f"{REMAP_DOWNLOAD_BASE}/{filename}"
    print(f"Attempting download: {url}")
    r = requests.get(url, stream=True, timeout=60)
    if r.status_code == 200:
        if save_path:
            with open(save_path, "wb") as f:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
            print(f"Saved: {save_path}")
            return save_path
        else:
            # Read directly into DataFrame
            content = b"".join(r.iter_content(chunk_size=8192))
            cols = ["chr", "start", "end", "name", "score", "strand",
                    "thick_start", "thick_end", "itemRgb"]
            with gzip.open(io.BytesIO(content), "rt") as gz:
                df = pd.read_csv(gz, sep="\t", header=None, names=cols)
            return df
    else:
        print(f"Download returned {r.status_code}; use API query as fallback")
        return None

# Analyze a downloaded BED file
def analyze_remap_bed(df):
    """Compute summary statistics for a ReMap peak DataFrame."""
    parts = df["name"].str.split(":", expand=True)
    df = df.copy()
    df["tf_name"]   = parts[0]
    df["cell_type"] = parts[2] if 2 in parts.columns else "unknown"
    df["width"] = df["end"] - df["start"]

    print(f"Total peaks: {len(df):,}")
    print(f"Unique TFs: {df['tf_name'].nunique()}")
    print(f"Unique cell types: {df['cell_type'].nunique()}")
    print(f"\nPeak width (bp): median={df['width'].median():.0f}  "
          f"mean={df['width'].mean():.0f}  range=[{df['width'].min()}, {df['width'].max()}]")
    print(f"\nChromosome distribution:")
    chr_counts = df["chr"].value_counts().head(5)
    print(chr_counts.to_string())
    return df

# Example usage (requires BED download or substitute with API results):
# df_raw = download_tf_bed("CTCF", save_path="CTCF_hg38.bed.gz")
# if df_raw is not None:
#     df_analyzed = analyze_remap_bed(df_raw)
```

## Key Parameters

| Parameter | Endpoint | Default | Range / Options | Effect |
|-----------|----------|---------|-----------------|--------|
| `chr` | `/peaks/overlap/` | — | `chr1`–`chrX`, `chrY`, `chrM` | Chromosome for region query (include `chr` prefix) |
| `start` | `/peaks/overlap/` | — | Integer genomic coordinate | Region start (0-based) |
| `end` | `/peaks/overlap/` | — | Integer genomic coordinate | Region end (exclusive) |
| `assembly` | All endpoints | — | `hg38`, `hg19`, `mm10`, `dm6`, `tair10` | Genome assembly for coordinates and peak lookup |
| `gene` | `/peaks/gene/` | — | HGNC gene symbol (e.g., `TP53`, `MYC`) | Queries peaks near the gene's annotated TSS |
| `name` | `/tfbs/name/` | — | TF name as in ReMap (e.g., `CTCF`, `SP1`) | TF name is case-sensitive; match ReMap TF naming |
| `biotype` | `/peaks/biotype/` | — | `promoter`, `enhancer`, `exon`, `intron`, `intergenic`, `UTR` | Filters peaks by Ensembl regulatory biotype |
| `timeout` | All requests | 30 | Integer seconds | Increase to 60–120 for large gene/TF queries |

## Best Practices

1. **Parse the `name` field defensively**: The `TF:experiment:cell_type` format may have fewer than three components for some records. Always guard with `parts[n] if len(parts) > n else ""`.

2. **Use BED downloads for genome-wide analyses**: Querying large genomic regions or all peaks for a TF via the REST API can time out. For whole-genome or per-chromosome scans, download the per-TF or per-assembly BED files from the ReMap download page and filter locally with pandas or bedtools.

3. **Cross-reference with JASPAR for sequence evidence**: ReMap peaks show where TF binding was detected by ChIP-seq (positional evidence); JASPAR PWMs show what sequence the TF prefers (motif evidence). For robust regulatory annotation, require both: a ReMap peak in the region AND a JASPAR motif hit within the peak.

4. **Use `time.sleep(0.5)` in batch loops**: The ReMap API serves a research community; polite request pacing prevents throttling.

5. **Validate assembly coordinates**: ReMap 2022 hg38 peaks use 0-based half-open BED coordinates (`[start, end)`). When comparing with VCF or 1-based GFF coordinates, add 1 to `start`.

## Common Recipes

### Recipe: Find TFs Binding at a GWAS SNP

When to use: Prioritize functional candidates from a GWAS hit by identifying which TFs bind at the SNP location.

```python
import requests

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def tfs_at_snp(chrom, pos, window=500, assembly="hg38"):
    """Find TFs with ChIP-seq peaks overlapping a SNP position ± window bp."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": pos - window, "end": pos + window,
        "assembly": assembly
    }, timeout=30)
    r.raise_for_status()
    peaks = r.json()
    tfs = {}
    for p in peaks:
        parts = p.get("name", "::").split(":")
        tf = parts[0] if parts else "unknown"
        tfs[tf] = tfs.get(tf, 0) + 1
    return dict(sorted(tfs.items(), key=lambda x: -x[1]))

# Example: rs2736100 (TERT locus, chr5:1,286,401)
snp_tfs = tfs_at_snp("chr5", 1_286_401, window=500, assembly="hg38")
print(f"TFs at TERT GWAS SNP (±500 bp): {len(snp_tfs)}")
for tf, count in list(snp_tfs.items())[:10]:
    print(f"  {tf:<20s} {count:3d} peaks")
```

### Recipe: Compare TF Binding Profiles of Two Genes

When to use: Check whether two co-regulated genes share the same upstream TF binding landscape.

```python
import requests, time

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def get_gene_tfs(gene, assembly="hg38"):
    try:
        r = requests.get(f"{REMAP_API}/peaks/gene/", params={"gene": gene, "assembly": assembly}, timeout=30)
        r.raise_for_status()
        peaks = r.json()
        return set(p.get("name", "").split(":")[0] for p in peaks if p.get("name", ""))
    except Exception as e:
        print(f"Warning: {gene} → {e}")
        return set()

gene_a, gene_b = "MYC", "MYCN"
tfs_a = get_gene_tfs(gene_a)
time.sleep(0.5)
tfs_b = get_gene_tfs(gene_b)

shared = tfs_a & tfs_b
only_a = tfs_a - tfs_b
only_b = tfs_b - tfs_a

print(f"{gene_a} TFs: {len(tfs_a)}  |  {gene_b} TFs: {len(tfs_b)}")
print(f"Shared: {len(shared)}  |  {gene_a}-only: {len(only_a)}  |  {gene_b}-only: {len(only_b)}")
print(f"\nShared TFs (first 15): {sorted(shared)[:15]}")
print(f"\n{gene_a}-only (first 10): {sorted(only_a)[:10]}")
```

### Recipe: Export Region Peaks as BED

When to use: Export ReMap query results to BED format for downstream bedtools intersection or IGV visualization.

```python
import requests, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def export_region_as_bed(chrom, start, end, outfile, assembly="hg38"):
    """Query ReMap region and save as 6-column BED file."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=30)
    r.raise_for_status()
    peaks = r.json()
    rows = [{
        "chr":   p.get("chr",   p.get("chrom", "")),
        "start": p.get("start", 0),
        "end":   p.get("end",   0),
        "name":  p.get("name",  "."),
        "score": p.get("score", 0),
        "strand": p.get("strand", "."),
    } for p in peaks]
    df = pd.DataFrame(rows)
    df = df.sort_values(["chr", "start"])
    df.to_csv(outfile, sep="\t", header=False, index=False)
    print(f"Saved {len(df)} peaks to {outfile}")
    return df

export_region_as_bed("chr17", 7_670_000, 7_690_000, "TP53_locus_remap.bed")
```

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `404 Not Found` from API | Endpoint path changed or unavailable | Check `https://remap2022.univ-amu.fr/api/` for current endpoint list; fall back to BED download |
| Empty JSON list `[]` from region query | No peaks in region, or assembly mismatch | Verify coordinates are on the correct assembly; try a wider window (±10 kb) |
| Gene query returns empty | Gene symbol not recognized by ReMap | Try Ensembl gene symbol; some aliases are not mapped — verify with HGNC |
| `requests.exceptions.Timeout` | Large region or slow server | Increase `timeout=60`; for regions >1 Mb use BED file download instead |
| `name` field has only one component | Incomplete metadata in ReMap for that experiment | Guard with `parts[n] if len(parts) > n else "unknown"` |
| BED download 404 | Per-TF files use exact ReMap TF naming | Check TF name case and spelling at `https://remap2022.univ-amu.fr/download_page` |
| Duplicate peaks for same TF | Multiple experiments per TF in a cell type | Group by `tf_name` and count unique experiments; deduplicate peaks with bedtools merge |

## Related Skills

- `jaspar-database` — TF binding motif matrices (PWMs/PFMs); use alongside ReMap peak evidence for sequence-level validation
- `encode-database` — ENCODE regulatory tracks including TF ChIP-seq, DNase-seq, and ATAC-seq; partially overlaps with ReMap
- `homer-motif-analysis` — de novo motif discovery in ChIP-seq peak sets from ReMap or MACS3
- `macs3-peak-calling` — call peaks from raw ChIP-seq BAM files; ReMap provides pre-called peaks from the same approach
- `regulomedb-database` — regulatory variant scoring that integrates TF binding evidence similar to ReMap

## References

- [ReMap 2022 API documentation](https://remap2022.univ-amu.fr/api/) — REST API endpoint reference and interactive explorer
- [Hammal et al., Nucleic Acids Research 2022](https://doi.org/10.1093/nar/gkab996) — ReMap 2022 paper describing the 2022 release (165M peaks, 1,210 TFs)
- [ReMap portal and download page](https://remap2022.univ-amu.fr/) — web browser, download page for BED files and cis-regulatory modules
- [Chèneby et al., Nucleic Acids Research 2020](https://doi.org/10.1093/nar/gkz945) — ReMap 2020 paper describing the reprocessing pipeline and quality control methodology