{
"cells": [
{
"cell_type": "markdown",
"id": "039928a3",
"metadata": {},
"source": [
"# Scop3P\n",
"\n",
"A comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. \n",
"\n",
"Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure–function relationships.\n",
"\n",
"Please cite: https://doi.org/10.1021/acs.jproteome.0c00306\n"
]
},
{
"cell_type": "markdown",
"id": "0d2dbe26",
"metadata": {},
"source": [
"### Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6fd309ae",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"!jupyter labextension install jupyterlab_3dmol\n",
"!jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"!pip install pandas matplotlib py3Dmol nglview"
]
},
{
"cell_type": "markdown",
"id": "c5aadc48",
"metadata": {},
"source": [
"### Import required packages"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f63a76a8",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"import requests, tempfile,json,sys\n",
"import pandas as pd \n",
"from b2bTools import SingleSeq, constants\n",
"import py3Dmol\n",
"import ipywidgets as widgets"
]
},
{
"cell_type": "markdown",
"id": "ffd05325",
"metadata": {},
"source": [
"### Fetch phospho peptides from Scop3P and map onto protein structures\n",
"> 1. Enter the protein ID (ex: P07949) and click 'Load'\n",
"> 2. The app will let you choose between all peptides ('All rows') and unique spans (the peptide ranges are merged)\n",
"> 3. Map all petides on the AF structure using 'Map all' (shows the mass spec coverage of your protein)\n",
"> 4. Alternatively click one or multiple peptides on peptide panel to see their structural mapping\n",
"> 5. Hint:\n",
"> > Explore what the search funtion does!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "441b530a",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d7d8a4d8a4dd447792f089b8da131d47",
"version_major": 2,
"version_minor": 0
},
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7e80691c8a4143ec846eaf404204898c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HTML(value='Scop3P → AlphaFold → NGLView peptide mapper
Enter accession → Load → (optional) Search →…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1ef83ddc8e6e425fa92f7bbb947c4275",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(Text(value='', description='ACC_ID:', layout=Layout(width='260px')), Button(button_style='prima…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b98678d730f44e0e97920dc311a99035",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Text(value='', description='Search:', layout=Layout(width='750px'), placeholder='Filter: substring (SSFG), ran…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "852c323cd3ac43e3b79382458267624c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"SelectMultiple(description='Peptides:', layout=Layout(height='240px', width='980px'), options=(), value=())"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6120007ad19b49998b8b0a04b1910ccc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(Button(button_style='warning', description='Map all (filtered)', style=ButtonStyle()), Checkbox…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a00f5213c5ff4ac0850dae7dadd2e643",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import requests\n",
"import urllib.request\n",
"from urllib.error import HTTPError, URLError\n",
"import re\n",
"\n",
"import ipywidgets as widgets\n",
"from IPython.display import display, clear_output\n",
"import nglview as nv\n",
"\n",
"# --- added (export only) ---\n",
"from pathlib import Path\n",
"import json\n",
"\n",
"\n",
"def scop3p_ngl_mapper_app(default_accession=\"\"):\n",
" # -------------------------\n",
" # 1) Scop3P API fetch\n",
" # -------------------------\n",
" def fetch_scop3p_peptides(accession: str) -> pd.DataFrame:\n",
" url = f\"https://iomics.ugent.be/scop3p/api/get-peptides-modifications?accession={accession}\"\n",
" r = requests.get(url, timeout=30)\n",
" r.raise_for_status()\n",
" data = r.json()\n",
"\n",
" df = pd.DataFrame(data.get(\"peptides\", []))\n",
" if df.empty:\n",
" return df\n",
"\n",
" for c in [\"peptideStart\", \"peptideEnd\", \"peptideModificationPosition\", \"uniprotPosition\"]:\n",
" if c in df.columns:\n",
" df[c] = pd.to_numeric(df[c], errors=\"coerce\").astype(\"Int64\")\n",
"\n",
" df[\"label\"] = df.apply(\n",
" lambda x: (\n",
" f'{x[\"peptideSequence\"]} ({int(x[\"peptideStart\"])}-{int(x[\"peptideEnd\"])}) '\n",
" f'@{x.get(\"modifiedResidue\",\"\")}{int(x[\"uniprotPosition\"])} score={x.get(\"score\",\"\")}'\n",
" ),\n",
" axis=1\n",
" )\n",
" return df\n",
"\n",
" # -------------------------\n",
" # 2) AlphaFold download (fallback v6 -> v4)\n",
" # -------------------------\n",
" def download_alphafold_pdb(accession: str, versions=(\"v6\", \"v4\")) -> str:\n",
" base = \"https://alphafold.ebi.ac.uk/files\"\n",
" last_err = None\n",
"\n",
" for v in versions:\n",
" pdb_name = f\"AF-{accession}-F1-model_{v}.pdb\"\n",
" url = f\"{base}/{pdb_name}\"\n",
" out = f\"{accession}.pdb\"\n",
" try:\n",
" urllib.request.urlretrieve(url, out)\n",
"\n",
" import os\n",
" if os.path.getsize(out) < 1000:\n",
" raise RuntimeError(f\"Downloaded file too small from {url}\")\n",
"\n",
" return out\n",
" except (HTTPError, URLError, RuntimeError) as e:\n",
" last_err = e\n",
"\n",
" raise RuntimeError(f\"Could not download AlphaFold PDB for {accession}. Last error: {last_err}\")\n",
"\n",
" # -------------------------\n",
" # 3) NGL helpers\n",
" # -------------------------\n",
" def positions_to_ranges(pos_list):\n",
" if not pos_list:\n",
" return []\n",
" pos_list = sorted(set(int(p) for p in pos_list))\n",
" ranges = []\n",
" s = pos_list[0]\n",
" prev = pos_list[0]\n",
" for x in pos_list[1:]:\n",
" if x == prev + 1:\n",
" prev = x\n",
" else:\n",
" ranges.append((s, prev))\n",
" s = x\n",
" prev = x\n",
" ranges.append((s, prev))\n",
" return ranges\n",
"\n",
" def add_cartoon_selection(view, ranges, color, name):\n",
" if not ranges:\n",
" return\n",
" selection = \" or \".join([f\"resi {a}-{b}\" for a, b in ranges])\n",
" view.add_representation(\"cartoon\", selection=selection, color=color, name=name)\n",
"\n",
" def add_positions(view, positions, color, name, repr_type=\"ball+stick\"):\n",
" if not positions:\n",
" return\n",
" selection = \" or \".join([f\"resi {int(p)}\" for p in sorted(set(int(p) for p in positions))])\n",
" view.add_representation(repr_type, selection=selection, color=color, name=name)\n",
"\n",
" # -------------------------\n",
" # 4) Filter logic\n",
" # -------------------------\n",
" def filter_peptides(df: pd.DataFrame, query: str) -> pd.DataFrame:\n",
" if df is None or df.empty:\n",
" return df\n",
" if not query:\n",
" return df\n",
"\n",
" q = query.strip()\n",
"\n",
" m = re.match(r\"^(\\d+)\\s*-\\s*(\\d+)$\", q)\n",
" if m:\n",
" a, b = int(m.group(1)), int(m.group(2))\n",
" return df[(df[\"peptideStart\"] <= b) & (df[\"peptideEnd\"] >= a)]\n",
"\n",
" m = re.match(r\"^>=\\s*(\\d+)$\", q)\n",
" if m:\n",
" p = int(m.group(1))\n",
" return df[df[\"peptideEnd\"] >= p]\n",
"\n",
" m = re.match(r\"^<=\\s*(\\d+)$\", q)\n",
" if m:\n",
" p = int(m.group(1))\n",
" return df[df[\"peptideStart\"] <= p]\n",
"\n",
" if q.isdigit():\n",
" p = int(q)\n",
" return df[(df[\"peptideStart\"] <= p) & (df[\"peptideEnd\"] >= p)]\n",
"\n",
" return df[df[\"peptideSequence\"].astype(str).str.contains(q, case=False, na=False)]\n",
"\n",
" # -------------------------\n",
" # 5) UI\n",
" # -------------------------\n",
" acc_input = widgets.Text(value=default_accession, description=\"ACC_ID:\", layout=widgets.Layout(width=\"260px\"))\n",
" load_btn = widgets.Button(description=\"Load\", button_style=\"primary\")\n",
"\n",
" mode = widgets.ToggleButtons(\n",
" options=[\"Unique peptide spans\", \"All rows\"],\n",
" value=\"Unique peptide spans\",\n",
" description=\"List:\"\n",
" )\n",
"\n",
" search_box = widgets.Text(\n",
" value=\"\",\n",
" placeholder=\"Filter: substring (SSFG), range (70-90), >=150, <=300, or single pos (154)\",\n",
" description=\"Search:\",\n",
" layout=widgets.Layout(width=\"750px\")\n",
" )\n",
"\n",
" peptide_multi = widgets.SelectMultiple(\n",
" description=\"Peptides:\",\n",
" options=[],\n",
" layout=widgets.Layout(width=\"980px\", height=\"240px\")\n",
" )\n",
"\n",
" show_mods_chk = widgets.Checkbox(value=True, description=\"Show modified sites (magenta)\")\n",
" show_mods_mode = widgets.ToggleButtons(\n",
" options=[\"Selected peptides only\", \"All protein mods\"],\n",
" value=\"Selected peptides only\",\n",
" description=\"Mods:\"\n",
" )\n",
"\n",
" map_all_btn = widgets.Button(description=\"Map all (filtered)\", button_style=\"warning\")\n",
"\n",
" # --- added (export only) ---\n",
" export_html_btn = widgets.Button(description=\"Export styled HTML\", button_style=\"info\")\n",
" # export_png_btn = widgets.Button(description=\"Export PNG (via HTML)\", button_style=\"info\")\n",
"\n",
" out = widgets.Output()\n",
"\n",
" display(widgets.HTML(\n",
" \"Scop3P → AlphaFold → NGLView peptide mapper
\"\n",
" \"Enter accession → Load → (optional) Search → select peptides (auto-renders).\"\n",
" ))\n",
" display(widgets.HBox([acc_input, load_btn, mode]))\n",
" display(search_box)\n",
" display(peptide_multi)\n",
" display(widgets.HBox([map_all_btn, show_mods_chk, show_mods_mode, export_html_btn])) #export_png_btn\n",
" display(out)\n",
"\n",
" # -------------------------\n",
" # 6) State\n",
" # -------------------------\n",
" STATE = {\n",
" \"df\": pd.DataFrame(),\n",
" \"df_filtered\": pd.DataFrame(),\n",
" \"pdb_path\": None,\n",
" \"acc_loaded\": None,\n",
" \"suspend_autorender\": False,\n",
" \"last_action\": None, # \"map_all\" or \"select\"\n",
"\n",
" # --- added (export only) ---\n",
" \"last_union_ranges\": [],\n",
" \"last_inter_pos\": [],\n",
" \"last_mod_pos\": [],\n",
" \"last_pdb_path\": None,\n",
" }\n",
"\n",
" def build_peptide_options(df: pd.DataFrame, mode_value: str):\n",
" if df is None or df.empty:\n",
" return []\n",
"\n",
" if mode_value == \"Unique peptide spans\":\n",
" g = (\n",
" df.groupby([\"peptideSequence\", \"peptideStart\", \"peptideEnd\"], as_index=False)\n",
" .agg(n_mod_sites=(\"uniprotPosition\", \"nunique\"),\n",
" max_score=(\"score\", \"max\"))\n",
" )\n",
" opts = []\n",
" for _, row in g.iterrows():\n",
" key = (row[\"peptideSequence\"], int(row[\"peptideStart\"]), int(row[\"peptideEnd\"]))\n",
" label = f'{key[0]} ({key[1]}-{key[2]}) | modSites={int(row[\"n_mod_sites\"])} maxScore={row[\"max_score\"]}'\n",
" opts.append((label, key))\n",
" return opts\n",
"\n",
" return [(r[\"label\"], int(idx)) for idx, r in df.iterrows()]\n",
"\n",
" def ensure_loaded_assets(acc: str):\n",
" if STATE[\"acc_loaded\"] != acc:\n",
" STATE[\"pdb_path\"] = None\n",
" STATE[\"acc_loaded\"] = acc\n",
"\n",
" if STATE[\"pdb_path\"] is None:\n",
" STATE[\"pdb_path\"] = download_alphafold_pdb(acc)\n",
"\n",
" return STATE[\"pdb_path\"]\n",
"\n",
" def refresh_filtered_and_options(keep_selection=True):\n",
" df = STATE[\"df\"]\n",
" df_filt = filter_peptides(df, search_box.value)\n",
" STATE[\"df_filtered\"] = df_filt\n",
"\n",
" old_sel = set(peptide_multi.value) if keep_selection else set()\n",
" peptide_multi.options = build_peptide_options(df_filt, mode.value)\n",
"\n",
" if keep_selection and old_sel:\n",
" valid_vals = [v for _, v in peptide_multi.options]\n",
" restored = [v for v in valid_vals if v in old_sel]\n",
" STATE[\"suspend_autorender\"] = True\n",
" try:\n",
" peptide_multi.value = tuple(restored)\n",
" finally:\n",
" STATE[\"suspend_autorender\"] = False\n",
"\n",
" def render_current_selection():\n",
" with out:\n",
" clear_output()\n",
"\n",
" acc = acc_input.value.strip()\n",
" df_all = STATE[\"df\"]\n",
" df_filt = STATE[\"df_filtered\"]\n",
"\n",
" if df_all is None or df_all.empty:\n",
" print(\"No data loaded. Click Load.\")\n",
" return\n",
"\n",
" selected = list(peptide_multi.value)\n",
" if not selected:\n",
" print(\"Select at least one peptide (or click 'Map all (filtered)').\")\n",
" return\n",
"\n",
" try:\n",
" pdb_path = ensure_loaded_assets(acc)\n",
" except Exception as e:\n",
" print(\"AlphaFold download error:\", e)\n",
" return\n",
"\n",
" spans = []\n",
" mod_positions = []\n",
"\n",
" if mode.value == \"Unique peptide spans\":\n",
" spans = [(int(s), int(e)) for _, s, e in selected]\n",
"\n",
" if show_mods_mode.value == \"Selected peptides only\":\n",
" for pepSeq, s, e in selected:\n",
" sub = df_all[\n",
" (df_all[\"peptideSequence\"] == pepSeq)\n",
" & (df_all[\"peptideStart\"] == int(s))\n",
" & (df_all[\"peptideEnd\"] == int(e))\n",
" ]\n",
" mod_positions.extend(sub[\"uniprotPosition\"].dropna().astype(int).tolist())\n",
" else:\n",
" mod_positions = df_all[\"uniprotPosition\"].dropna().astype(int).tolist()\n",
"\n",
" else:\n",
" sub = df_filt.loc[selected].copy()\n",
" spans = [(int(r[\"peptideStart\"]), int(r[\"peptideEnd\"])) for _, r in sub.iterrows()]\n",
"\n",
" if show_mods_mode.value == \"Selected peptides only\":\n",
" mod_positions = sub[\"uniprotPosition\"].dropna().astype(int).tolist()\n",
" else:\n",
" mod_positions = df_all[\"uniprotPosition\"].dropna().astype(int).tolist()\n",
"\n",
" # union/intersection\n",
" pos_lists = [list(range(a, b + 1)) for a, b in spans]\n",
" union_pos = sorted(set(p for L in pos_lists for p in L))\n",
" inter_pos = sorted(set(pos_lists[0]).intersection(*map(set, pos_lists[1:]))) if len(pos_lists) > 1 else []\n",
" union_ranges = positions_to_ranges(union_pos)\n",
"\n",
" # --- Big NGL panel ---\n",
" view = nv.NGLWidget()\n",
" view.add_component(pdb_path)\n",
" view.clear_representations()\n",
" view.add_cartoon(color=\"silver\")\n",
"\n",
" add_cartoon_selection(view, union_ranges, color=\"blue\", name=\"peptide_union\")\n",
"\n",
" if inter_pos:\n",
" add_positions(view, inter_pos, color=\"red\", name=\"peptide_intersection\", repr_type=\"ball+stick\")\n",
"\n",
" if show_mods_chk.value and mod_positions:\n",
" add_positions(view, mod_positions, color=\"magenta\", name=\"mods\", repr_type=\"ball+stick\")\n",
"\n",
" view.center()\n",
"\n",
" # Make structure panel bigger (tweak as you like)\n",
" view.layout = widgets.Layout(width=\"1100px\", height=\"700px\")\n",
"\n",
" display(view)\n",
"\n",
" # --- added (export only): store last render state ---\n",
" STATE[\"last_union_ranges\"] = union_ranges\n",
" STATE[\"last_inter_pos\"] = sorted(set(int(x) for x in inter_pos))\n",
" STATE[\"last_mod_pos\"] = sorted(set(int(x) for x in mod_positions))\n",
" STATE[\"last_pdb_path\"] = pdb_path\n",
"\n",
" # --- Summary printing tweaks ---\n",
" spans_sorted = sorted(spans, key=lambda x: (x[0], x[1]))\n",
" first_start = spans_sorted[0][0]\n",
" last_end = spans_sorted[-1][1]\n",
"\n",
" print(f\"\\nACC_ID: {acc}\")\n",
" print(f\"AlphaFold model: {pdb_path}\")\n",
" print(f\"Selected peptide spans: {len(spans_sorted)}\")\n",
"\n",
" # When map-all clicked, show compact coverage (also useful generally)\n",
" if STATE[\"last_action\"] == \"map_all\":\n",
" print(f\"Coverage (first peptide start → last peptide end): {first_start} → {last_end}\")\n",
" else:\n",
" # For manual selection, still show compact coverage (less spammy)\n",
" print(f\"Coverage: {first_start} → {last_end}\")\n",
"\n",
" if inter_pos:\n",
" print(f\"Intersection (red): {len(inter_pos)} residues\")\n",
" else:\n",
" print(\"Intersection: none (only one peptide)\")\n",
"\n",
" if show_mods_chk.value:\n",
" print(f\"Modified sites (magenta): {len(set(mod_positions))} unique positions\")\n",
"\n",
" # -------------------------\n",
" # --- added (export only): standalone styled HTML writer ---\n",
" # -------------------------\n",
" def _write_styled_ngl_html(acc, pdb_path, union_ranges, inter_pos, mod_pos, out_html_path, auto_download_png=False):\n",
" pdb_text = Path(pdb_path).read_text(errors=\"ignore\")\n",
"\n",
" payload = {\n",
" \"acc\": acc,\n",
" \"union_ranges\": union_ranges,\n",
" \"intersection\": inter_pos,\n",
" \"mods\": mod_pos\n",
" }\n",
"\n",
" # If auto_download_png=True, the HTML will immediately trigger a PNG download via stage.makeImage()\n",
" auto_png_js = \"\"\"\n",
" // Auto-download PNG snapshot\n",
" stage.makeImage({ factor: 2, antialias: true, trim: false }).then(function (blob) {\n",
" var a = document.createElement(\"a\");\n",
" a.href = URL.createObjectURL(blob);\n",
" a.download = payload.acc + \"_snapshot.png\";\n",
" document.body.appendChild(a);\n",
" a.click();\n",
" a.remove();\n",
" });\n",
" \"\"\" if auto_download_png else \"\"\n",
"\n",
" html = f\"\"\"\n",
"\n",
"
union ranges: {len(union_ranges)} | mods: {len(mod_pos)} | intersection: {len(inter_pos)}union ranges: {len(union_ranges)} | mods: {len(mods)} | intersection: {len(inter)}