{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(fcb-bridgedbnotebook)=\n", "# Using BridgeDb web services\n", "\n", "\n", "````{panels_fairplus}\n", ":identifier_text: FCB018\n", ":identifier_link: 'http://w3id.org/faircookbook/FCB018'\n", ":difficulty_level: 3\n", ":recipe_type: hands_on\n", ":reading_time_minutes: 30\n", ":intended_audience: data_manager, data_scientist \n", ":has_executable_code: yeah\n", ":recipe_name: Using BridgeDb web services\n", "```` \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook I will present two use cases for BridgeDb with the purpose of identifier mapping: \n", "* Mapping data from a recognized data source by BridgeDb to another recognized data source ([see here](https://github.com/bridgedb/BridgeDb/blob/2dba5780260421de311cb3064df79e16a396b887/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.tsv)). For example mapping data identifiers from HGNC to Ensembl.\n", "* Given a local identifier and a TSV mapping it to one of the BridgeDb data sources, how to map the local identifier to a different data source.\n", "\n", "[![](https://mermaid.ink/img/eyJjb2RlIjoiICAgIGdyYXBoIFRCIFxuICAgICAgc3ViZ3JhcGggU2NyaXB0XG5cbiAgICAgICAgICBzdWJncmFwaCBCcmlkZ2VEYlxuICAgICAgICAgICAgICBCW0FmZnldLS0-RVtHZW5lIE9udG9sb2d5XVxuICAgICAgICAgICAgICBCW0FmZnldLS0-RltFbnNlbWJsXVxuICAgICAgICAgICAgICBCW0FmZnldIC0tPiBIW1VuaVByb3RdXG4gICAgICAgICAgc3ViZ3JhcGggUHJlLW1hZGUgbWFwcGluZ1xuICAgICAgICAgICAgQVtMb2NhbCBpZGVudGlmaWVyIHNvdXJjZV0tLT5CW0hHTkNdXG4gICAgICAgICAgZW5kXG4gICAgICAgICAgZW5kXG4gICAgICAgIGVuZCIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ)](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiICAgIGdyYXBoIFRCIFxuICAgICAgc3ViZ3JhcGggU2NyaXB0XG5cbiAgICAgICAgICBzdWJncmFwaCBCcmlkZ2VEYlxuICAgICAgICAgICAgICBCW0FmZnldLS0-RVtHZW5lIE9udG9sb2d5XVxuICAgICAgICAgICAgICBCW0FmZnldLS0-RltFbnNlbWJsXVxuICAgICAgICAgICAgICBCW0FmZnldIC0tPiBIW1VuaVByb3RdXG4gICAgICAgICAgc3ViZ3JhcGggUHJlLW1hZGUgbWFwcGluZ1xuICAgICAgICAgICAgQVtMb2NhbCBpZGVudGlmaWVyIHNvdXJjZV0tLT5CW0hHTkNdXG4gICAgICAgICAgZW5kXG4gICAgICAgICAgZW5kXG4gICAgICAgIGVuZCIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying the WS\n", "\n", "To query the Webservice we define below the url and the patterns for a single request and a batch request. You can find the docs [here](https://bridgedb.github.io/swagger/). We will use Python's requests library." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "url = \"https://webservice.bridgedb.org/\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "single_request = url+\"{org}/xrefs/{source}/{identifier}\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "batch_request = url+\"{org}/xrefsBatch/{source}{}\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we define a method that will turn the web service response into a dataframe with columns corresponding to:\n", "* The original identifier\n", "* The data source that the identifier is part of\n", "* The mapped identifier\n", "* The data source for the mapped identifier" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def to_df(response, batch=False):\n", " if batch:\n", " records = []\n", " for tup in to_df(response).itertuples():\n", " if tup[3] != None:\n", " for mappings in tup[3].split(','):\n", " target = mappings.split(':', 1)\n", " if len(target) > 1:\n", " records.append((tup[1], tup[2], target[1], target[0]))\n", " else:\n", " records.append((tup[1], tup[2], target[0], target[0]))\n", " return pd.DataFrame(records, columns = ['original', 'source', 'mapping', 'target'])\n", " \n", " return pd.DataFrame([line.split('\\t') for line in response.text.split('\\n')])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we define the organism and the data source from which we want to map" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "source = \"H\"\n", "org = 'Homo sapiens'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Case 1\n", "\n", "Here we first load the case 1 example data." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0A1BG
1A1CF
2A2MP1
\n", "
" ], "text/plain": [ " 0\n", "0 A1BG\n", "1 A1CF\n", "2 A2MP1" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "case1 = pd.read_csv(\"data/case1-example.tsv\", header=None)\n", "case1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we batch request the mappings" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "response1 = requests.post(batch_request.format('', org=org, source=source), data = case1.to_csv(index=False, header=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And use our `to_df` method to turn it into a DataFrame" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
originalsourcemappingtarget
0A1BGHGNCuc002qsd.5Uc
1A1BGHGNC8039748X
2A1BGHGNCGO:0072562T
3A1BGHGNCuc061drj.1Uc
4A1BGHGNCILMN_2055271Il
...............
109A2MP1HGNC16761106X
110A2MP1HGNC16761118X
111A2MP1HGNCENSG00000256069En
112A2MP1HGNCA2MP1H
113A2MP1HGNCNR_040112Q
\n", "

114 rows × 4 columns

\n", "
" ], "text/plain": [ " original source mapping target\n", "0 A1BG HGNC uc002qsd.5 Uc\n", "1 A1BG HGNC 8039748 X\n", "2 A1BG HGNC GO:0072562 T\n", "3 A1BG HGNC uc061drj.1 Uc\n", "4 A1BG HGNC ILMN_2055271 Il\n", ".. ... ... ... ...\n", "109 A2MP1 HGNC 16761106 X\n", "110 A2MP1 HGNC 16761118 X\n", "111 A2MP1 HGNC ENSG00000256069 En\n", "112 A2MP1 HGNC A2MP1 H\n", "113 A2MP1 HGNC NR_040112 Q\n", "\n", "[114 rows x 4 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "case1_df = to_df(response1, batch=True)\n", "case1_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Case 2\n", "Here we first load the case 2 example data and perform the same steps as before" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "case2 = pd.read_csv('data/case2-example.tsv', sep='\\t', names=['local', 'source'])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "source_data = case2.source.to_csv(index=False, header=False)\n", "query = batch_request.format('', org=org, source=source)\n", "response2 = requests.post(query, data = source_data)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
originalsourcemappingtarget
0A1BGHGNCuc002qsd.5Uc
1A1BGHGNC8039748X
2A1BGHGNCGO:0072562T
3A1BGHGNCuc061drj.1Uc
4A1BGHGNCILMN_2055271Il
...............
109A2MP1HGNC16761106X
110A2MP1HGNC16761118X
111A2MP1HGNCENSG00000256069En
112A2MP1HGNCA2MP1H
113A2MP1HGNCNR_040112Q
\n", "

114 rows × 4 columns

\n", "
" ], "text/plain": [ " original source mapping target\n", "0 A1BG HGNC uc002qsd.5 Uc\n", "1 A1BG HGNC 8039748 X\n", "2 A1BG HGNC GO:0072562 T\n", "3 A1BG HGNC uc061drj.1 Uc\n", "4 A1BG HGNC ILMN_2055271 Il\n", ".. ... ... ... ...\n", "109 A2MP1 HGNC 16761106 X\n", "110 A2MP1 HGNC 16761118 X\n", "111 A2MP1 HGNC ENSG00000256069 En\n", "112 A2MP1 HGNC A2MP1 H\n", "113 A2MP1 HGNC NR_040112 Q\n", "\n", "[114 rows x 4 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mappings = to_df(response2, batch=True)\n", "mappings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After obtaining the mappings we join with the TSV file on the Affy identifier, obtaining the desired mapping by selecting the columns `mapping` and local" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "local_mapping = mappings.join(case2.set_index('source'), on='original')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mappinglocal
0ENSG00000121410aa11
1ENSG00000148584bb34
2ENSG00000256069eg93
\n", "
" ], "text/plain": [ " mapping local\n", "0 ENSG00000121410 aa11\n", "1 ENSG00000148584 bb34\n", "2 ENSG00000256069 eg93" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "local_mapping[['mapping', 'local']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Script" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from bridgedb_script import get_mappings" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
originalsourcemappingtargetlocal
0A1BGHGNCENSG00000121410Enaa11
1A1CFHGNCENSG00000148584Enbb34
2A2MP1HGNCENSG00000256069Eneg93
\n", "
" ], "text/plain": [ " original source mapping target local\n", "0 A1BG HGNC ENSG00000121410 En aa11\n", "1 A1CF HGNC ENSG00000148584 En bb34\n", "2 A2MP1 HGNC ENSG00000256069 En eg93" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_mappings(\"data/case2-example.tsv\", \"Homo sapiens\", \"H\", case=2, target='En')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
originalsourcemappingtarget
0A1BGHGNCuc002qsd.5Uc
1A1BGHGNC8039748X
2A1BGHGNCGO:0072562T
3A1BGHGNCuc061drj.1Uc
4A1BGHGNCILMN_2055271Il
...............
109A2MP1HGNC16761106X
110A2MP1HGNC16761118X
111A2MP1HGNCENSG00000256069En
112A2MP1HGNCA2MP1H
113A2MP1HGNCNR_040112Q
\n", "

114 rows × 4 columns

\n", "
" ], "text/plain": [ " original source mapping target\n", "0 A1BG HGNC uc002qsd.5 Uc\n", "1 A1BG HGNC 8039748 X\n", "2 A1BG HGNC GO:0072562 T\n", "3 A1BG HGNC uc061drj.1 Uc\n", "4 A1BG HGNC ILMN_2055271 Il\n", ".. ... ... ... ...\n", "109 A2MP1 HGNC 16761106 X\n", "110 A2MP1 HGNC 16761118 X\n", "111 A2MP1 HGNC ENSG00000256069 En\n", "112 A2MP1 HGNC A2MP1 H\n", "113 A2MP1 HGNC NR_040112 Q\n", "\n", "[114 rows x 4 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_mappings(\"data/case1-example.tsv\", \"Homo sapiens\", \"H\", case=1)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }