{ "cells": [ { "cell_type": "markdown", "id": "779a5404", "metadata": {}, "source": [ "# Scop3P\n", "\n", "A comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. \n", "\n", "Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure–function relationships." ] }, { "cell_type": "markdown", "id": "c15bdc6c", "metadata": {}, "source": [ "### Install Dependencies" ] }, { "cell_type": "code", "execution_count": 1, "id": "2d8057d7", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "!jupyter labextension install jupyterlab_3dmol\n", "!jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", "!pip install pandas matplotlib py3Dmol b2btools==3.0.7b2" ] }, { "cell_type": "markdown", "id": "37c432c6", "metadata": {}, "source": [ "### Import required packages" ] }, { "cell_type": "code", "execution_count": 1, "id": "8a43806b", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "import requests, tempfile,json,sys\n", "sys.path.append(\"./scripts\") ## get python scripts from this directory\n", "import pandas as pd \n", "from b2bTools import SingleSeq, constants\n", "import py3Dmol\n", "import ipywidgets as widgets" ] }, { "cell_type": "markdown", "id": "625fc3c6", "metadata": {}, "source": [ "### Upload your peptide (tab delimited .txt) file in the next cell" ] }, { "cell_type": "code", "execution_count": 7, "id": "ffd610c1", "metadata": {}, "outputs": [], "source": [ "wid=widgets.FileUpload(\n", " accept='', # Accepted file extension e.g. '.txt', '.pdf', 'image/*', 'image/*,.pdf'\n", " multiple=False # True to accept multiple files upload else False\n", ")\n", "\n", "display(wid)" ] }, { "cell_type": "code", "execution_count": 50, "id": "0734d529", "metadata": {}, "outputs": [], "source": [ "# wid.values\n", "# print (infile)\n", "# content=infile['content']\n", "# content=io.String(content.decode('utf-8'))\n", "# df=pd.read_csv(content)" ] }, { "cell_type": "code", "execution_count": 2, "id": "8fe86496", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter Protein ID column number: 5\n", "Protein column name in provided file is: ACC_ID\n" ] } ], "source": [ "pepfile=pd.read_csv('peptidefile.txt',sep='\\t',usecols=[1,2,4,5,6,7])\n", "proteincol=input(\"Enter Protein ID column number: \")\n", "proteincolname=pepfile.columns[int(proteincol)-1]\n", "print (\"Protein column name in provided file is: \", proteincolname)" ] }, { "cell_type": "code", "execution_count": 3, "id": "72a02585", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of proteins in your file: 4\n" ] } ], "source": [ "pepgroup=pepfile.groupby(proteincolname)\n", "proteinlist=list(set(pepfile[proteincolname].tolist()))\n", "print (\"Total number of proteins in your file: \", len(proteinlist))\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "af423b6a", "metadata": {}, "outputs": [], "source": [ "def fetch_prot_sequence(accession):\n", " BASE_URL = f\"http://uniprot.org/uniprotkb/{accession}.fasta\"\n", " url = f'{BASE_URL}?accession={accession}'\n", " response = requests.get(url)\n", " if response.status_code == 200:\n", " raw_fasta_sequence = response.content.decode(\"utf-8\")\n", " else:\n", " raw_fasta_sequence = \"\"\n", " \n", " lines = raw_fasta_sequence.split('\\n')\n", " protein_id = str(lines[0])\n", " amino_acids = \"\".join([str(l) for l in lines[1:]])\n", " \n", " return protein_id, amino_acids" ] }, { "cell_type": "code", "execution_count": 5, "id": "455809c2", "metadata": {}, "outputs": [], "source": [ "# def UPseqfetch(protseq,accsn,pep):\n", "import re\n", "\n", "def add_element(tmpdict, key, value):\n", " if key not in tmpdict:\n", " tmpdict[key] = []\n", " tmpdict[key].append(value)\n", "\n", "def pepMapper(protDF):\n", " dataDF={}\n", " for idx, grp in pepgroup:\n", " protid=grp[proteincolname].tolist()[0]\n", " peplis=list(set(grp['Pep_seq'].tolist()))\n", " id_,protseq=fetch_prot_sequence(protid)\n", " for pep in peplis:\n", " for match in re.finditer(pep, protseq):\n", " newpos=[match.start()+1, match.end()+1]\n", " add_element(dataDF,protid,[pep,newpos[0],newpos[1]])\n", " return dataDF" ] }, { "cell_type": "code", "execution_count": 6, "id": "78d96e2a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Protein\t#Peptides\n", "P55884\t1\n", "Q9BXS5\t2\n", "Q9H0H5\t1\n", "Q9HC35\t1\n" ] } ], "source": [ "mapped_peptides=pepMapper(pepgroup)\n", "print (\"Protein\\t#Peptides\")\n", "print(\"\\n\".join(\"{}\\t{}\".format(key,len(value)) for key,value in mapped_peptides.items()))" ] }, { "cell_type": "code", "execution_count": 7, "id": "a86001fa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Please enter the protein Id for 3D visualization: Q9BXS5\n" ] } ], "source": [ "structureID=input(\"Please enter the protein Id for 3D visualization: \")\n", "peptides=mapped_peptides[structureID]" ] }, { "cell_type": "code", "execution_count": 8, "id": "d5da5396", "metadata": {}, "outputs": [], "source": [ "\n", "## Get alphaFold model for the protein\n", "import urllib.request\n", "AFurl=\"https://alphafold.ebi.ac.uk/files/AF-\"\n", "modelurl = f'{AFurl}{structureID}{\"-F1-model_v4.pdb\"}'\n", "AFmodel = urllib.request.urlretrieve(modelurl,f'{structureID}{\".pdb\"}')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "77332a9e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 11, "id": "b0cb9d79", "metadata": {}, "outputs": [], "source": [ "import itertools\n", "def display_3D(peptides):\n", " view = py3Dmol.view()\n", " view.addModel(open((structureID+'.pdb'), 'r').read(),'pdb')\n", " allpos=[]\n", " colordict,commoncol={},{}\n", " view.setStyle({'cartoon': { 'color': 'silver' }})\n", " view.addSurface(py3Dmol.VDW, {'opacity': 0.60, 'color': 'white' })\n", " for pep in peptides:\n", " for aapos in range(pep[1],pep[2]+1):\n", " allpos.append(list(range(pep[1],pep[2]+1)))\n", " \n", " mergedPos = list(itertools.chain.from_iterable(allpos))\n", " commonPos=list(set.intersection(*map(set, allpos))) \n", " for aapos in mergedPos:\n", " colordict[aapos]='blue'\n", "# view.addSurface(py3Dmol.VDW, {'opacity': 0.6,'color':'red'},{'resi': [aapos]})\n", " for aapos1 in commonPos:\n", " commoncol[aapos1]='purple'\n", "# view.addSurface(py3Dmol.VDW, {'opacity': 0.6,'color':'purple'},{'resi': [aapos1]}) \n", " print (\"Common positions colored in Purple\") \n", "\n", " view.setStyle({'cartoon': {'colorscheme':{'prop':'resi','map':colordict}}}) \n", " view.setStyle({'cartoon': {'colorscheme':{'prop':'resi','map':commoncol}}})\n", " \n", " view.zoomTo()\n", " return view" ] }, { "cell_type": "code", "execution_count": 12, "id": "9580d505", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Common positions colored in Purple\n" ] }, { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_3D(peptides)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }