{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "このノートブックでは、\n", "染色体ごとに保存されているゲノムデータ (VCF形式)をマージし、matrix table 形式で保存します。\n", "\n", "その後、imputation quality の低いバリアント (DR2 < 0.3) を除外します。\n", "\n", "またその後のゲノムデータを用いて、PRS の計算手順を説明します。\n", "\n", "2種類の脳梗塞 PRS モデルを事例として用います。\n", "\n", "- PGS002724 モデルに含まれるバリアント数 1213574個\n", "- PGS002725 モデルに含まれるバリアント数 6010730個\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## このノートブックを実行する前に下記をご注意ください\n", "\n", "- このノートブック用の jupyter サービスを立ち上げる前に `export PYSPARK_SUBMIT_ARGS='--driver-memory 48g --executor-memory 48g pyspark-shell'` のようにこの計算用のメモリを大きく確保してください。(この場合48ギガのメモリを割り当てています。) これを行っていただかないと、いずれかのセルで OutOfMemory エラーが発生し、それ以降のセル実行は機能しなくなります。\n", "- hail の背後では spark が働いており、普段の jupyter notebook のようにセル実行を停止しても、その計算は動き続けます。その状態で新たなセルを実行すると予期せぬエラーが発生する場合があります。そのため「セルの実行がなかなか終わらないな」とお思いになられても、実行状態が完了するまで停止操作は行わないことをおすすめします。\n", "- 本ノートブックは https://github.com/hacchy1983/prs-on-hail-public に変更を加えたものになります。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 hail など、必要なモジュールを読み込みます\n", "下記のコードを実行してください。 ページ上側のメニューバーにある 実行 ボタンを押下することで、実行することができます。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022-11-23 12:14:48.075 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Setting default log level to \"WARN\".\n", "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n", "Running on Apache Spark version 3.1.3\n", "SparkUI available at http://guaca021:4040\n", "Welcome to\n", " __ __ <>__\n", " / /_/ /__ __/ /\n", " / __ / _ `/ / /\n", " /_/ /_/\\_,_/_/_/ version 0.2.105-acd89e80c345\n", "LOGGING: writing to /lustre8/home/kozonishida-pg/prs-on-hail/hail-20221123-1214-0.2.105-acd89e80c345.log\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n var JS_MIME_TYPE = 'application/javascript';\n var HTML_MIME_TYPE = 'text/html';\n var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n var CLASS_NAME = 'output_bokeh rendered_html';\n\n /**\n * Render data to the DOM node\n */\n function render(props, node) {\n var script = document.createElement(\"script\");\n node.appendChild(script);\n }\n\n /**\n * Handle when an output is cleared or removed\n */\n function handleClearOutput(event, handle) {\n var cell = handle.cell;\n\n var id = cell.output_area._bokeh_element_id;\n var server_id = cell.output_area._bokeh_server_id;\n // Clean up Bokeh references\n if (id != null && id in Bokeh.index) {\n Bokeh.index[id].model.document.clear();\n delete Bokeh.index[id];\n }\n\n if (server_id !== undefined) {\n // Clean up Bokeh references\n var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n cell.notebook.kernel.execute(cmd, {\n iopub: {\n output: function(msg) {\n var id = msg.content.text.trim();\n if (id in Bokeh.index) {\n Bokeh.index[id].model.document.clear();\n delete Bokeh.index[id];\n }\n }\n }\n });\n // Destroy server and session\n var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n cell.notebook.kernel.execute(cmd);\n }\n }\n\n /**\n * Handle when a new output is added\n */\n function handleAddOutput(event, handle) {\n var output_area = handle.output_area;\n var output = handle.output;\n\n // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n return\n }\n\n var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n\n if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n // store reference to embed id on output_area\n output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n }\n if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n var bk_div = document.createElement(\"div\");\n bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n var script_attrs = bk_div.children[0].attributes;\n for (var i = 0; i < script_attrs.length; i++) {\n toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n }\n // store reference to server id on output_area\n output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n }\n }\n\n function register_renderer(events, OutputArea) {\n\n function append_mime(data, metadata, element) {\n // create a DOM node to render to\n var toinsert = this.create_output_subarea(\n metadata,\n CLASS_NAME,\n EXEC_MIME_TYPE\n );\n this.keyboard_manager.register_events(toinsert);\n // Render to node\n var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n render(props, toinsert[toinsert.length - 1]);\n element.append(toinsert);\n return toinsert\n }\n\n /* Handle when an output is cleared or removed */\n events.on('clear_output.CodeCell', handleClearOutput);\n events.on('delete.Cell', handleClearOutput);\n\n /* Handle when a new output is added */\n events.on('output_added.OutputArea', handleAddOutput);\n\n /**\n * Register the mime type and append_mime function with output_area\n */\n OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n /* Is output safe? */\n safe: true,\n /* Index of renderer in `output_area.display_order` */\n index: 0\n });\n }\n\n // register the mime type if in Jupyter Notebook environment and previously unregistered\n if (root.Jupyter !== undefined) {\n var events = require('base/js/events');\n var OutputArea = require('notebook/js/outputarea').OutputArea;\n\n if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n register_renderer(events, OutputArea);\n }\n }\n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1001\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };var element = document.getElementById(\"1001\");\n if (element == null) {\n console.error(\"Bokeh: ERROR: autoload.js configured with elementid '1001' but no matching script tag was found. \")\n return false;\n }\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.4.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1001\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));", "application/vnd.bokehjs_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import hail as hl\n", "hl.init()\n", "from hail.plot import show\n", "from pprint import pprint\n", "hl.plot.output_notebook()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chr1.beagle.log\t\t chr20.conform-gt.vcf.gz\r\n", "chr1.beagle.vcf.gz\t chr20.mt\r\n", "chr1.beagle.vcf.gz.tbi\t chr21.beagle.log\r\n", "chr1.conform-gt.log\t chr21.beagle.vcf.gz\r\n", "chr1.conform-gt.vcf.gz\t chr21.beagle.vcf.gz.tbi\r\n", "chr1.mt\t\t\t chr21.conform-gt.log\r\n", "chr10.beagle.log\t chr21.conform-gt.vcf.gz\r\n", "chr10.beagle.vcf.gz\t chr21.mt\r\n", "chr10.beagle.vcf.gz.tbi chr22.beagle.log\r\n", "chr10.conform-gt.log\t chr22.beagle.vcf.gz\r\n", "chr10.conform-gt.vcf.gz chr22.beagle.vcf.gz.tbi\r\n", "chr10.mt\t\t chr22.conform-gt.log\r\n", "chr11.beagle.log\t chr22.conform-gt.vcf.gz\r\n", "chr11.beagle.vcf.gz\t chr22.mt\r\n", "chr11.beagle.vcf.gz.tbi chr3.beagle.log\r\n", "chr11.conform-gt.log\t chr3.beagle.vcf.gz\r\n", "chr11.conform-gt.vcf.gz chr3.beagle.vcf.gz.tbi\r\n", "chr11.mt\t\t chr3.conform-gt.log\r\n", "chr12.beagle.log\t chr3.conform-gt.vcf.gz\r\n", "chr12.beagle.vcf.gz\t chr3.mt\r\n", "chr12.beagle.vcf.gz.tbi chr4.beagle.log\r\n", "chr12.conform-gt.log\t chr4.beagle.vcf.gz\r\n", "chr12.conform-gt.vcf.gz chr4.beagle.vcf.gz.tbi\r\n", "chr12.mt\t\t chr4.conform-gt.log\r\n", "chr13.beagle.log\t chr4.conform-gt.vcf.gz\r\n", "chr13.beagle.vcf.gz\t chr4.mt\r\n", "chr13.beagle.vcf.gz.tbi chr5.beagle.log\r\n", "chr13.conform-gt.log\t chr5.beagle.vcf.gz\r\n", "chr13.conform-gt.vcf.gz chr5.beagle.vcf.gz.tbi\r\n", "chr13.mt\t\t chr5.conform-gt.log\r\n", "chr14.beagle.log\t chr5.conform-gt.vcf.gz\r\n", "chr14.beagle.vcf.gz\t chr5.mt\r\n", "chr14.beagle.vcf.gz.tbi chr6.beagle.log\r\n", "chr14.conform-gt.log\t chr6.beagle.vcf.gz\r\n", "chr14.conform-gt.vcf.gz chr6.beagle.vcf.gz.tbi\r\n", "chr14.mt\t\t chr6.conform-gt.log\r\n", "chr15.beagle.log\t chr6.conform-gt.vcf.gz\r\n", "chr15.beagle.vcf.gz\t chr6.mt\r\n", "chr15.beagle.vcf.gz.tbi chr7.beagle.log\r\n", "chr15.conform-gt.log\t chr7.beagle.vcf.gz\r\n", "chr15.conform-gt.vcf.gz chr7.beagle.vcf.gz.tbi\r\n", "chr15.mt\t\t chr7.conform-gt.log\r\n", "chr16.beagle.log\t chr7.conform-gt.vcf.gz\r\n", "chr16.beagle.vcf.gz\t chr7.mt\r\n", "chr16.beagle.vcf.gz.tbi chr8.beagle.log\r\n", "chr16.conform-gt.log\t chr8.beagle.vcf.gz\r\n", "chr16.conform-gt.vcf.gz chr8.beagle.vcf.gz.tbi\r\n", "chr16.mt\t\t chr8.conform-gt.log\r\n", "chr17.beagle.log\t chr8.conform-gt.vcf.gz\r\n", "chr17.beagle.vcf.gz\t chr8.mt\r\n", "chr17.beagle.vcf.gz.tbi chr9.beagle.log\r\n", "chr17.conform-gt.log\t chr9.beagle.vcf.gz\r\n", "chr17.conform-gt.vcf.gz chr9.beagle.vcf.gz.tbi\r\n", "chr17.mt\t\t chr9.conform-gt.log\r\n", "chr18.beagle.log\t chr9.conform-gt.vcf.gz\r\n", "chr18.beagle.vcf.gz\t chr9.mt\r\n", "chr18.beagle.vcf.gz.tbi chrAll.filtered.matched_PGS002724.mt\r\n", "chr18.conform-gt.log\t chrAll.filtered.matched_PGS002725.mt\r\n", "chr18.conform-gt.vcf.gz chrAll.filtered.mt\r\n", "chr18.mt\t\t chrAll.mt\r\n", "chr19.beagle.log\t chrX_PAR1.beagle.log\r\n", "chr19.beagle.vcf.gz\t chrX_PAR1.beagle.vcf.gz\r\n", "chr19.beagle.vcf.gz.tbi chrX_PAR1.beagle.vcf.gz.tbi\r\n", "chr19.conform-gt.log\t chrX_PAR1.conform-gt.log\r\n", "chr19.conform-gt.vcf.gz chrX_PAR1.conform-gt.vcf.gz\r\n", "chr19.mt\t\t chrX_PAR2.beagle.log\r\n", "chr2.beagle.log\t\t chrX_PAR2.beagle.vcf.gz\r\n", "chr2.beagle.vcf.gz\t chrX_PAR2.beagle.vcf.gz.tbi\r\n", "chr2.beagle.vcf.gz.tbi\t chrX_PAR2.conform-gt.log\r\n", "chr2.conform-gt.log\t chrX_PAR2.conform-gt.vcf.gz\r\n", "chr2.conform-gt.vcf.gz\t chrX_nonPAR.beagle.log\r\n", "chr2.mt\t\t\t chrX_nonPAR.beagle.vcf.gz\r\n", "chr20.beagle.log\t chrX_nonPAR.beagle.vcf.gz.tbi\r\n", "chr20.beagle.vcf.gz\t chrX_nonPAR.conform-gt.log\r\n", "chr20.beagle.vcf.gz.tbi chrX_nonPAR.conform-gt.vcf.gz\r\n", "chr20.conform-gt.log\r\n" ] } ], "source": [ "!ls outputs/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 ゲノムデータのファイル形式を変換します\n", "\n", "ゲノムデータは、1番染色体から22番染色体まで、染色体ごとに異なるファイルに保存されています。\n", "例えば、1番染色体のゲノムデータは、 `outputs/chr1.beagle.vcf.gz` に保存されています。\n", "また、2番染色体のゲノムデータは、 `outputs/chr2.beagle.vcf.gz` に 22番染色体のゲノムデータは、 `outputs/chr22.beagle.vcf.gz` に保存されています。\n", "\n", "ゲノムデータは、よく利用されるファイル形式である VCF フォーマットで保存されています。\n", "これを、matrix table 形式に変換して、保存します。\n", "\n", "下記コマンドを実行してください。\n", "このセルの実行にはかなりの待ち時間が生じますが「このノートブックを実行する前に下記をご注意ください」のように途中でセル実行を停止することはおすすめしません。" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 outputs/chr1.beagle.vcf.gz outputs/chr1.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 00:12:03.759 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 00:12:36.902 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 00:26:13.044 Hail: INFO: wrote matrix table with 2428653 rows and 2318 columns in 7 partitions to outputs/chr1.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2 outputs/chr2.beagle.vcf.gz outputs/chr2.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 00:26:13.449 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 00:26:45.883 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 00:41:24.089 Hail: INFO: wrote matrix table with 2627240 rows and 2318 columns in 7 partitions to outputs/chr2.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "3 outputs/chr3.beagle.vcf.gz outputs/chr3.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 00:41:24.562 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 00:41:50.082 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 00:55:55.758 Hail: INFO: wrote matrix table with 2186425 rows and 2318 columns in 6 partitions to outputs/chr3.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "4 outputs/chr4.beagle.vcf.gz outputs/chr4.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 00:55:56.162 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 00:56:25.917 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 01:08:23.150 Hail: INFO: wrote matrix table with 2212857 rows and 2318 columns in 7 partitions to outputs/chr4.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5 outputs/chr5.beagle.vcf.gz outputs/chr5.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 01:08:23.583 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 01:08:50.201 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 01:21:52.499 Hail: INFO: wrote matrix table with 1986979 rows and 2318 columns in 6 partitions to outputs/chr5.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "6 outputs/chr6.beagle.vcf.gz outputs/chr6.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 01:21:52.956 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 01:22:18.976 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 01:35:31.727 Hail: INFO: wrote matrix table with 1964598 rows and 2318 columns in 6 partitions to outputs/chr6.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "7 outputs/chr7.beagle.vcf.gz outputs/chr7.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 01:35:32.141 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 01:35:58.592 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 01:50:01.298 Hail: INFO: wrote matrix table with 1801231 rows and 2318 columns in 5 partitions to outputs/chr7.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "8 outputs/chr8.beagle.vcf.gz outputs/chr8.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 01:50:01.775 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 01:50:31.947 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 02:04:55.201 Hail: INFO: wrote matrix table with 1722793 rows and 2318 columns in 5 partitions to outputs/chr8.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "9 outputs/chr9.beagle.vcf.gz outputs/chr9.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 02:04:55.609 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 02:05:19.902 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 02:18:18.739 Hail: INFO: wrote matrix table with 1342561 rows and 2318 columns in 4 partitions to outputs/chr9.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "10 outputs/chr10.beagle.vcf.gz outputs/chr10.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 02:18:19.246 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 02:18:45.990 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 02:34:04.301 Hail: INFO: wrote matrix table with 1532460 rows and 2318 columns in 4 partitions to outputs/chr10.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "11 outputs/chr11.beagle.vcf.gz outputs/chr11.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 02:34:04.802 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 02:34:30.324 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 02:48:51.586 Hail: INFO: wrote matrix table with 1520309 rows and 2318 columns in 4 partitions to outputs/chr11.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "12 outputs/chr12.beagle.vcf.gz outputs/chr12.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 02:48:51.976 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 02:49:26.205 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 03:03:38.788 Hail: INFO: wrote matrix table with 1467858 rows and 2318 columns in 4 partitions to outputs/chr12.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "13 outputs/chr13.beagle.vcf.gz outputs/chr13.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 03:03:39.549 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 03:04:03.862 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 03:18:20.910 Hail: INFO: wrote matrix table with 1099285 rows and 2318 columns in 3 partitions to outputs/chr13.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "14 outputs/chr14.beagle.vcf.gz outputs/chr14.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 03:18:21.432 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 03:18:56.995 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 03:32:45.903 Hail: INFO: wrote matrix table with 1002655 rows and 2318 columns in 3 partitions to outputs/chr14.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "15 outputs/chr15.beagle.vcf.gz outputs/chr15.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 03:32:46.434 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 03:33:15.573 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 03:46:02.317 Hail: INFO: wrote matrix table with 911380 rows and 2318 columns in 3 partitions to outputs/chr15.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "16 outputs/chr16.beagle.vcf.gz outputs/chr16.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 03:46:02.789 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 03:46:30.289 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 03:59:20.950 Hail: INFO: wrote matrix table with 988654 rows and 2318 columns in 3 partitions to outputs/chr16.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "17 outputs/chr17.beagle.vcf.gz outputs/chr17.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 03:59:21.566 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 03:59:40.999 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 04:11:07.839 Hail: INFO: wrote matrix table with 864311 rows and 2318 columns in 3 partitions to outputs/chr17.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "18 outputs/chr18.beagle.vcf.gz outputs/chr18.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 04:11:08.461 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 04:11:29.928 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 04:22:58.193 Hail: INFO: wrote matrix table with 864327 rows and 2318 columns in 3 partitions to outputs/chr18.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "19 outputs/chr19.beagle.vcf.gz outputs/chr19.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 04:22:58.737 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 04:23:31.187 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 04:37:33.986 Hail: INFO: wrote matrix table with 706126 rows and 2318 columns in 2 partitions to outputs/chr19.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "20 outputs/chr20.beagle.vcf.gz outputs/chr20.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 04:37:34.601 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 04:37:55.931 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 04:51:29.869 Hail: INFO: wrote matrix table with 679241 rows and 2318 columns in 2 partitions to outputs/chr20.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "21 outputs/chr21.beagle.vcf.gz outputs/chr21.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 04:51:30.363 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 04:52:11.583 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 05:09:18.461 Hail: INFO: wrote matrix table with 427409 rows and 2318 columns in 1 partition to outputs/chr21.mt\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "22 outputs/chr22.beagle.vcf.gz outputs/chr22.mt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 05:09:18.993 Hail: INFO: scanning VCF for sortedness...\n", "2022-11-24 05:09:54.515 Hail: INFO: Coerced sorted VCF - no additional import work to do\n", "2022-11-24 05:26:20.144 Hail: INFO: wrote matrix table with 424147 rows and 2318 columns in 1 partition to outputs/chr22.mt\n" ] } ], "source": [ "for chr in range(1,23):\n", " infile = 'outputs/chr' + str(chr) + '.beagle.vcf.gz'\n", " outfile = 'outputs/chr' + str(chr) + '.mt'\n", " print(chr, infile, outfile)\n", " hl.import_vcf(infile, force_bgz=True).write(outfile, overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 ゲノムデータをマージします\n", "続いて、1番染色体から22番染色体の matrix table 形式のファイルを、ひとつの matrix table 形式のファイルにマージします。\n", "\n", "下記コマンドを実行してください。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 outputs/chr1.mt (2428653, 2318) (2428653, 2318)\n", "2 outputs/chr2.mt (2627240, 2318) (5055893, 2318)\n", "3 outputs/chr3.mt (2186425, 2318) (7242318, 2318)\n", "4 outputs/chr4.mt (2212857, 2318) (9455175, 2318)\n", "5 outputs/chr5.mt (1986979, 2318) (11442154, 2318)\n", "6 outputs/chr6.mt (1964598, 2318) (13406752, 2318)\n", "7 outputs/chr7.mt (1801231, 2318) (15207983, 2318)\n", "8 outputs/chr8.mt (1722793, 2318) (16930776, 2318)\n", "9 outputs/chr9.mt (1342561, 2318) (18273337, 2318)\n", "10 outputs/chr10.mt (1532460, 2318) (19805797, 2318)\n", "11 outputs/chr11.mt (1520309, 2318) (21326106, 2318)\n", "12 outputs/chr12.mt (1467858, 2318) (22793964, 2318)\n", "13 outputs/chr13.mt (1099285, 2318) (23893249, 2318)\n", "14 outputs/chr14.mt (1002655, 2318) (24895904, 2318)\n", "15 outputs/chr15.mt (911380, 2318) (25807284, 2318)\n", "16 outputs/chr16.mt (988654, 2318) (26795938, 2318)\n", "17 outputs/chr17.mt (864311, 2318) (27660249, 2318)\n", "18 outputs/chr18.mt (864327, 2318) (28524576, 2318)\n", "19 outputs/chr19.mt (706126, 2318) (29230702, 2318)\n", "20 outputs/chr20.mt (679241, 2318) (29909943, 2318)\n", "21 outputs/chr21.mt (427409, 2318) (30337352, 2318)\n", "22 outputs/chr22.mt (424147, 2318) (30761499, 2318)\n" ] } ], "source": [ "chr = 1\n", "file = 'outputs/chr' + str(chr) + '.mt'\n", "mt = hl.read_matrix_table(file)\n", "print(chr, file, mt.count(), mt.count())\n", "for chr in range(2,23):\n", " file = 'outputs/chr' + str(chr) + '.mt'\n", " tmpmt = hl.read_matrix_table(file)\n", " mt = mt.union_rows(tmpmt)\n", " print(chr, file, tmpmt.count(), mt.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ゲノムデータのマージが完了しました。\n", "マージ後のゲノムデータのバリアント数を表示します。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(30761499, 2318)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mt.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "これは、次のことを意味します。\n", "\n", "- 研究対象者の人数が 2318 名\n", "- バリアントの個数が 30761499 個\n", "\n", "## Step 4 マージしたゲノムデータを保存します\n", "今後のステップの実行時間を短縮するため、マージしたゲノムデータを保存し、再度読み込みます。\n", "\n", "下記のコードは、マージしたゲノムデータを outputs/chrAll.mt に保存します。\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-15 20:03:09.905 Hail: INFO: wrote matrix table with 30761499 rows and 2318 columns in 89 partitions to outputs/chrAll.mt\n", " Total size: 18.99 GiB\n", " * Rows/entries: 18.99 GiB\n", " * Columns: 9.09 KiB\n", " * Globals: 11.00 B\n", " * Smallest partition: 282779 rows (181.07 MiB)\n", " * Largest partition: 424147 rows (311.72 MiB)\n" ] } ], "source": [ "mt.write('outputs/chrAll.mt', overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、 `outputs/chrAll.mt` を読み込みます。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "mt = hl.read_matrix_table('outputs/chrAll.mt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 ゲノムデータに variantID を追加します\n", "下記のコードは、ゲノムデータのバリアント情報に `variantID` を追加します。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "mt = mt.annotate_rows(variantID = (hl.str(mt.locus.contig) + \":\" + hl.str(mt.locus.position)) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、 `variantID` を追加した後のバリアント情報を表示します。\n", "`show(5)` は、先頭の 5 個のバリアントのみを表示する、ことを意味します。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 0:> (0 + 4) / 4]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
info
locus
alleles
rsid
qual
filters
AF
DR2
IMP
variantID
locus<GRCh37>array<str>strfloat64set<str>array<float64>array<float64>boolstr
1:10177["A","AC"]"rs367896724"-1.00e+01{}[4.46e-01][0.00e+00]True"1:10177"
1:10235["T","TA"]"rs540431307"-1.00e+01{}[4.00e-04][0.00e+00]True"1:10235"
1:10352["T","TA"]"rs555500075"-1.00e+01{}[4.73e-01][0.00e+00]True"1:10352"
1:10616["CCGCCGTTGCAAAGGCGCGCCG","C"]"rs376342519"-1.00e+01{}[9.93e-01][0.00e+00]True"1:10616"
1:10642["G","A"]"rs558604819"-1.00e+01{}[3.70e-03][0.00e+00]True"1:10642"

showing top 5 rows

\n" ], "text/plain": [ "+---------------+--------------------------------+---------------+-----------+\n", "| locus | alleles | rsid | qual |\n", "+---------------+--------------------------------+---------------+-----------+\n", "| locus | array | str | float64 |\n", "+---------------+--------------------------------+---------------+-----------+\n", "| 1:10177 | [\"A\",\"AC\"] | \"rs367896724\" | -1.00e+01 |\n", "| 1:10235 | [\"T\",\"TA\"] | \"rs540431307\" | -1.00e+01 |\n", "| 1:10352 | [\"T\",\"TA\"] | \"rs555500075\" | -1.00e+01 |\n", "| 1:10616 | [\"CCGCCGTTGCAAAGGCGCGCCG\",\"C\"] | \"rs376342519\" | -1.00e+01 |\n", "| 1:10642 | [\"G\",\"A\"] | \"rs558604819\" | -1.00e+01 |\n", "+---------------+--------------------------------+---------------+-----------+\n", "\n", "+----------+----------------+----------------+----------+-----------+\n", "| filters | info.AF | info.DR2 | info.IMP | variantID |\n", "+----------+----------------+----------------+----------+-----------+\n", "| set | array | array | bool | str |\n", "+----------+----------------+----------------+----------+-----------+\n", "| {} | [4.46e-01] | [0.00e+00] | True | \"1:10177\" |\n", "| {} | [4.00e-04] | [0.00e+00] | True | \"1:10235\" |\n", "| {} | [4.73e-01] | [0.00e+00] | True | \"1:10352\" |\n", "| {} | [9.93e-01] | [0.00e+00] | True | \"1:10616\" |\n", "| {} | [3.70e-03] | [0.00e+00] | True | \"1:10642\" |\n", "+----------+----------------+----------------+----------+-----------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt.rows().show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`variantID` のカラムが追加されていることが分かります。\n", "\n", "## Step 6 imputation quality に基づいてゲノムデータをフィルタリングします\n", "\n", "mt には multi-allelic site が含まれます。次にその割合を確認してみます。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 2:========================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/plain": [ "(30361461, 2318)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mt1 = mt.filter_rows(hl.len(mt.info.DR2) == 1)\n", "mt1.count()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 3:========================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/plain": [ "(400038, 2318)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mtnot1 = mt.filter_rows(hl.len(mt.info.DR2) > 1)\n", "mtnot1.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1% ほどが multi-allelic site であることがわかります。 1% は無視するにはやや多いですが、\n", "このチュートリアルでは内容をわかりやすくするために除外したもの(すなわち`mt1`)を今後用います。\n", "下記のコードでは、各バリアントの `imputation quality` の分布を表示します。 " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 5:=======================================================> (86 + 3) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"f46ca76d-ded3-445a-9323-97dcb0efeed2\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"#EEEEEE\"},\"below\":[{\"id\":\"1013\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1017\",\"type\":\"Grid\"},{\"id\":\"1022\",\"type\":\"Grid\"},{\"id\":\"1047\",\"type\":\"Legend\"}],\"left\":[{\"id\":\"1018\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1039\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1003\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1029\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1005\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"1009\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1007\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1011\",\"type\":\"LinearScale\"}},\"id\":\"1002\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"text\":\"Imputation Quality Histogram\"},\"id\":\"1003\",\"type\":\"Title\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1046\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"1055\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"ticker\":{\"id\":\"1014\",\"type\":\"BasicTicker\"}},\"id\":\"1017\",\"type\":\"Grid\"},{\"attributes\":{\"axis_label\":\"Frequency\",\"formatter\":{\"id\":\"1043\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1019\",\"type\":\"BasicTicker\"}},\"id\":\"1018\",\"type\":\"LinearAxis\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1023\",\"type\":\"PanTool\"},{\"id\":\"1024\",\"type\":\"WheelZoomTool\"},{\"id\":\"1025\",\"type\":\"BoxZoomTool\"},{\"id\":\"1026\",\"type\":\"SaveTool\"},{\"id\":\"1027\",\"type\":\"ResetTool\"},{\"id\":\"1028\",\"type\":\"HelpTool\"}]},\"id\":\"1029\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1045\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1038\",\"type\":\"Quad\"},{\"attributes\":{\"callback\":null,\"data\":{\"left\":[0.0,0.02,0.04,0.06,0.08,0.1,0.12,0.14,0.16,0.18,0.2,0.22,0.24,0.26,0.28,0.3,0.32,0.34,0.36,0.38,0.4,0.42,0.44,0.46,0.48,0.5,0.52,0.54,0.56,0.58,0.6,0.62,0.64,0.66,0.68,0.7000000000000001,0.72,0.74,0.76,0.78,0.8,0.8200000000000001,0.84,0.86,0.88,0.9,0.92,0.9400000000000001,0.96,0.98],\"right\":[0.02,0.04,0.06,0.08,0.1,0.12,0.14,0.16,0.18,0.2,0.22,0.24,0.26,0.28,0.3,0.32,0.34,0.36,0.38,0.4,0.42,0.44,0.46,0.48,0.5,0.52,0.54,0.56,0.58,0.6,0.62,0.64,0.66,0.68,0.7000000000000001,0.72,0.74,0.76,0.78,0.8,0.8200000000000001,0.84,0.86,0.88,0.9,0.92,0.9400000000000001,0.96,0.98,1.0],\"top\":[302645,87138,39984,24774,17669,14078,11715,10189,8994,8473,7857,7520,7210,7129,7052,7193,7123,7341,7643,8115,8287,9142,9797,10472,11199,12510,13470,14678,25373,9480,20803,23810,26766,31082,35999,42087,51078,59763,71678,88037,110636,147021,183848,248242,351023,528217,1437629,808281,2622925,22748286]},\"selected\":{\"id\":\"1054\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1055\",\"type\":\"UnionRenderers\"}},\"id\":\"1036\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"overlay\":{\"id\":\"1046\",\"type\":\"BoxAnnotation\"}},\"id\":\"1025\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"1024\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1028\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"1019\",\"type\":\"BasicTicker\"},{\"attributes\":{\"source\":{\"id\":\"1036\",\"type\":\"ColumnDataSource\"}},\"id\":\"1040\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1023\",\"type\":\"PanTool\"},{\"attributes\":{\"data_source\":{\"id\":\"1036\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1037\",\"type\":\"Quad\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1038\",\"type\":\"Quad\"},\"selection_glyph\":null,\"view\":{\"id\":\"1040\",\"type\":\"CDSView\"}},\"id\":\"1039\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"label\":{\"value\":\"Imputation Quality (DR2)\"},\"renderers\":[{\"id\":\"1039\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1048\",\"type\":\"LegendItem\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1037\",\"type\":\"Quad\"},{\"attributes\":{},\"id\":\"1014\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1054\",\"type\":\"Selection\"},{\"attributes\":{\"callback\":null,\"end\":1.05,\"start\":-0.05},\"id\":\"1005\",\"type\":\"Range1d\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1019\",\"type\":\"BasicTicker\"}},\"id\":\"1022\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1011\",\"type\":\"LinearScale\"},{\"attributes\":{\"items\":[{\"id\":\"1048\",\"type\":\"LegendItem\"}]},\"id\":\"1047\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"1027\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"1026\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"1009\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null},\"id\":\"1007\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1043\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"axis_label\":\"Imputation Quality (DR2)\",\"formatter\":{\"id\":\"1045\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1014\",\"type\":\"BasicTicker\"}},\"id\":\"1013\",\"type\":\"LinearAxis\"}],\"root_ids\":[\"1002\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"f46ca76d-ded3-445a-9323-97dcb0efeed2\",\"roots\":{\"1002\":\"f5f34881-713a-4e1c-8112-f56b61ceacf2\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1002" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.histogram(mt1.info.DR2.first(), title='Imputation Quality Histogram', legend='Imputation Quality (DR2)')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`imputation quality` が低いバリアントが少しあることが分かります。\n", "下記のコードは、`imputation quality` が低い(DR2 < 0.3)バリアントを除外します。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "mt1_filt = mt1.filter_rows(mt1.info.DR2.first()>=0.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、`imputation quality` が低いバリアントを除外した後の分布を表示します。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 15:======================================================> (87 + 2) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"e18fa5cf-fb1d-4729-bb34-544da530e316\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"#EEEEEE\"},\"below\":[{\"id\":\"1346\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1350\",\"type\":\"Grid\"},{\"id\":\"1355\",\"type\":\"Grid\"},{\"id\":\"1380\",\"type\":\"Legend\"}],\"left\":[{\"id\":\"1351\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1372\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1336\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1362\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1338\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"1342\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1340\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1344\",\"type\":\"LinearScale\"}},\"id\":\"1335\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"1357\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1352\",\"type\":\"BasicTicker\"}},\"id\":\"1355\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1352\",\"type\":\"BasicTicker\"},{\"attributes\":{\"callback\":null,\"end\":1.035,\"start\":0.265},\"id\":\"1338\",\"type\":\"Range1d\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1379\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"1412\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"source\":{\"id\":\"1369\",\"type\":\"ColumnDataSource\"}},\"id\":\"1373\",\"type\":\"CDSView\"},{\"attributes\":{\"callback\":null,\"data\":{\"left\":[0.3,0.314,0.32799999999999996,0.34199999999999997,0.356,0.37,0.384,0.39799999999999996,0.412,0.426,0.43999999999999995,0.45399999999999996,0.46799999999999997,0.482,0.496,0.51,0.524,0.538,0.552,0.566,0.58,0.594,0.6079999999999999,0.6219999999999999,0.6359999999999999,0.6499999999999999,0.6639999999999999,0.6779999999999999,0.692,0.706,0.72,0.734,0.748,0.762,0.776,0.7899999999999999,0.804,0.8179999999999998,0.8319999999999999,0.8459999999999999,0.8599999999999999,0.8739999999999999,0.8879999999999999,0.9019999999999999,0.9159999999999999,0.9299999999999999,0.944,0.958,0.972,0.986],\"right\":[0.314,0.32799999999999996,0.34199999999999997,0.356,0.37,0.384,0.39799999999999996,0.412,0.426,0.43999999999999995,0.45399999999999996,0.46799999999999997,0.482,0.496,0.51,0.524,0.538,0.552,0.566,0.58,0.594,0.6079999999999999,0.6219999999999999,0.6359999999999999,0.6499999999999999,0.6639999999999999,0.6779999999999999,0.692,0.706,0.72,0.734,0.748,0.762,0.776,0.7899999999999999,0.804,0.8179999999999998,0.8319999999999999,0.8459999999999999,0.8599999999999999,0.8739999999999999,0.8879999999999999,0.9019999999999999,0.9159999999999999,0.9299999999999999,0.944,0.958,0.972,0.986,1.0],\"top\":[7193,3450,7295,3719,3814,7774,4170,8287,4485,4657,9797,5149,10795,5727,6086,12973,6921,14678,7845,8379,18629,10138,22315,12160,12745,29179,15924,35999,20108,21979,51078,28798,64806,37837,41724,98332,58617,147021,86192,97656,248242,159438,431301,288501,363224,1074405,808281,2622925,2257643,20490643]},\"selected\":{\"id\":\"1411\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1412\",\"type\":\"UnionRenderers\"}},\"id\":\"1369\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1344\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1359\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"1347\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1342\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1361\",\"type\":\"HelpTool\"},{\"attributes\":{\"items\":[{\"id\":\"1381\",\"type\":\"LegendItem\"}]},\"id\":\"1380\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"1356\",\"type\":\"PanTool\"},{\"attributes\":{\"axis_label\":\"Frequency\",\"formatter\":{\"id\":\"1376\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1352\",\"type\":\"BasicTicker\"}},\"id\":\"1351\",\"type\":\"LinearAxis\"},{\"attributes\":{\"callback\":null},\"id\":\"1340\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1360\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"1378\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1356\",\"type\":\"PanTool\"},{\"id\":\"1357\",\"type\":\"WheelZoomTool\"},{\"id\":\"1358\",\"type\":\"BoxZoomTool\"},{\"id\":\"1359\",\"type\":\"SaveTool\"},{\"id\":\"1360\",\"type\":\"ResetTool\"},{\"id\":\"1361\",\"type\":\"HelpTool\"}]},\"id\":\"1362\",\"type\":\"Toolbar\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1371\",\"type\":\"Quad\"},{\"attributes\":{},\"id\":\"1376\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"text\":\"Imputation Quality Histogram\"},\"id\":\"1336\",\"type\":\"Title\"},{\"attributes\":{\"ticker\":{\"id\":\"1347\",\"type\":\"BasicTicker\"}},\"id\":\"1350\",\"type\":\"Grid\"},{\"attributes\":{\"data_source\":{\"id\":\"1369\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1370\",\"type\":\"Quad\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1371\",\"type\":\"Quad\"},\"selection_glyph\":null,\"view\":{\"id\":\"1373\",\"type\":\"CDSView\"}},\"id\":\"1372\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1370\",\"type\":\"Quad\"},{\"attributes\":{\"axis_label\":\"Imputation Quality (DR2)\",\"formatter\":{\"id\":\"1378\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1347\",\"type\":\"BasicTicker\"}},\"id\":\"1346\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1411\",\"type\":\"Selection\"},{\"attributes\":{\"overlay\":{\"id\":\"1379\",\"type\":\"BoxAnnotation\"}},\"id\":\"1358\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"label\":{\"value\":\"Imputation Quality (DR2)\"},\"renderers\":[{\"id\":\"1372\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1381\",\"type\":\"LegendItem\"}],\"root_ids\":[\"1335\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"e18fa5cf-fb1d-4729-bb34-544da530e316\",\"roots\":{\"1335\":\"a3cf6ace-9ca3-4a0b-886b-3978bbb3968f\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1335" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.histogram(mt1_filt.info.DR2.first(), title='Imputation Quality Histogram', legend='Imputation Quality (DR2)')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`imputation quality` が低いバリアントが除外されたことが分かります。\n", "\n", "下記のコードは、`imputation quality` が低いバリアントを除外した後のバリアントの個数を表示します。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 10:=======================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/plain": [ "(29799034, 2318)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mt1_filt.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(29799034, 2318) と表示されました。\n", "\n", "これは、次のことを意味します。\n", "\n", "研究対象者の人数が 2318 名\n", "バリアントの個数が 29799034 個\n", "下記のコードは、`imputation quality` が低いバリアントを除外した後の allele frequency の分布を表示します。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 12:======================================================> (87 + 2) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"bc394151-a0f4-42d2-819f-cc1cbef8b28a\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"#EEEEEE\"},\"below\":[{\"id\":\"1227\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1231\",\"type\":\"Grid\"},{\"id\":\"1236\",\"type\":\"Grid\"},{\"id\":\"1261\",\"type\":\"Legend\"}],\"left\":[{\"id\":\"1232\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1253\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1217\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1243\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1219\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"1223\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1221\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1225\",\"type\":\"LinearScale\"}},\"id\":\"1216\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"callback\":null},\"id\":\"1221\",\"type\":\"DataRange1d\"},{\"attributes\":{\"text\":\"AF Histogram\"},\"id\":\"1217\",\"type\":\"Title\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1237\",\"type\":\"PanTool\"},{\"id\":\"1238\",\"type\":\"WheelZoomTool\"},{\"id\":\"1239\",\"type\":\"BoxZoomTool\"},{\"id\":\"1240\",\"type\":\"SaveTool\"},{\"id\":\"1241\",\"type\":\"ResetTool\"},{\"id\":\"1242\",\"type\":\"HelpTool\"}]},\"id\":\"1243\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1242\",\"type\":\"HelpTool\"},{\"attributes\":{\"ticker\":{\"id\":\"1228\",\"type\":\"BasicTicker\"}},\"id\":\"1231\",\"type\":\"Grid\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1252\",\"type\":\"Quad\"},{\"attributes\":{},\"id\":\"1238\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1257\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1251\",\"type\":\"Quad\"},{\"attributes\":{},\"id\":\"1228\",\"type\":\"BasicTicker\"},{\"attributes\":{\"source\":{\"id\":\"1250\",\"type\":\"ColumnDataSource\"}},\"id\":\"1254\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1241\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"1240\",\"type\":\"SaveTool\"},{\"attributes\":{\"overlay\":{\"id\":\"1260\",\"type\":\"BoxAnnotation\"}},\"id\":\"1239\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"1233\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1237\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1284\",\"type\":\"Selection\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1233\",\"type\":\"BasicTicker\"}},\"id\":\"1236\",\"type\":\"Grid\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1260\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"data_source\":{\"id\":\"1250\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1251\",\"type\":\"Quad\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1252\",\"type\":\"Quad\"},\"selection_glyph\":null,\"view\":{\"id\":\"1254\",\"type\":\"CDSView\"}},\"id\":\"1253\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"callback\":null,\"data\":{\"left\":[0.0001,0.020096,0.040092,0.060088,0.080084,0.10008,0.120076,0.14007199999999997,0.160068,0.180064,0.20006,0.22005599999999997,0.240052,0.260048,0.28004399999999996,0.30004,0.320036,0.340032,0.360028,0.380024,0.40002,0.420016,0.44001199999999996,0.460008,0.480004,0.5,0.519996,0.539992,0.5599879999999999,0.5799839999999999,0.59998,0.619976,0.639972,0.659968,0.679964,0.69996,0.719956,0.7399519999999999,0.759948,0.779944,0.79994,0.819936,0.839932,0.859928,0.8799239999999999,0.8999199999999999,0.919916,0.939912,0.959908,0.979904],\"right\":[0.020096,0.040092,0.060088,0.080084,0.10008,0.120076,0.14007199999999997,0.160068,0.180064,0.20006,0.22005599999999997,0.240052,0.260048,0.28004399999999996,0.30004,0.320036,0.340032,0.360028,0.380024,0.40002,0.420016,0.44001199999999996,0.460008,0.480004,0.5,0.519996,0.539992,0.5599879999999999,0.5799839999999999,0.59998,0.619976,0.639972,0.659968,0.679964,0.69996,0.719956,0.7399519999999999,0.759948,0.779944,0.79994,0.819936,0.839932,0.859928,0.8799239999999999,0.8999199999999999,0.919916,0.939912,0.959908,0.979904,0.9999],\"top\":[19451551,1998917,931919,624032,495749,423101,367289,326247,295403,272867,250246,231280,218225,201908,191004,180445,173293,165175,155806,148756,141476,133405,130365,124383,118158,114720,109084,105588,102635,100241,95274,89428,88768,84366,83331,80505,76582,75163,71371,70826,68438,64924,64122,59869,58445,55240,56630,60339,81626,130519]},\"selected\":{\"id\":\"1284\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1285\",\"type\":\"UnionRenderers\"}},\"id\":\"1250\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1285\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"axis_label\":\"Frequency\",\"formatter\":{\"id\":\"1257\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1233\",\"type\":\"BasicTicker\"}},\"id\":\"1232\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1259\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"axis_label\":\"AF\",\"formatter\":{\"id\":\"1259\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1228\",\"type\":\"BasicTicker\"}},\"id\":\"1227\",\"type\":\"LinearAxis\"},{\"attributes\":{\"label\":{\"value\":\"AF\"},\"renderers\":[{\"id\":\"1253\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1262\",\"type\":\"LegendItem\"},{\"attributes\":{\"callback\":null,\"end\":1.04989,\"start\":-0.049890000000000004},\"id\":\"1219\",\"type\":\"Range1d\"},{\"attributes\":{\"items\":[{\"id\":\"1262\",\"type\":\"LegendItem\"}]},\"id\":\"1261\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"1225\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1223\",\"type\":\"LinearScale\"}],\"root_ids\":[\"1216\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"bc394151-a0f4-42d2-819f-cc1cbef8b28a\",\"roots\":{\"1216\":\"aa3ed18e-e9c9-4ab0-a113-78630b650da6\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1216" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.histogram(mt1_filt.info.AF.first(), title='AF Histogram', legend='AF', bins=50)\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AF<1% のバリアントが多くあることが分かります。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7 フィルタリング後のゲノムデータを保存します\n", "下記のコードは、マージしたゲノムデータを `outputs/chrAll.filtered.mt` に保存します。\n", "ここまでのセルには実行完了までに時間がかかるものが存在します。\n", "ここをチェックポイントとして同じことに時間をかけずに再開できるようにしておきましょう。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-17 18:33:43.273 Hail: INFO: wrote matrix table with 29799034 rows and 2318 columns in 89 partitions to outputs/chrAll.filtered.mt\n" ] } ], "source": [ "mt1_filt.write('outputs/chrAll.filtered.mt', overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 8 ゲノムデータを読み込みます\n", "下記のコードでは、Step 7 で保存したゲノムデータを読み込みます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mt1_filt = hl.read_matrix_table('outputs/chrAll.filtered.mt')" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "合計 14G\r\n", "drwxr-xr-x 26 kozonishida-pg oo-nig-pg 12K 11月 17 14:14 .\r\n", "drwxr-xr-x 5 kozonishida-pg oo-nig-pg 4.0K 11月 17 16:56 ..\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.7K 11月 14 14:44 chr1.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 914M 11月 14 14:44 chr1.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 207K 11月 14 14:44 chr1.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 16M 11月 14 14:46 chr1.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 88M 11月 14 14:46 chr1.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 15:18 chr1.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.5K 11月 14 14:44 chr10.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 574M 11月 14 14:45 chr10.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 121K 11月 14 14:44 chr10.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 10M 11月 14 14:46 chr10.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 59M 11月 14 14:46 chr10.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 17:08 chr10.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.5K 11月 14 14:44 chr11.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 554M 11月 14 14:45 chr11.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 121K 11月 14 14:44 chr11.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 9.7M 11月 14 14:46 chr11.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 57M 11月 14 14:46 chr11.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 17:20 chr11.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.6K 11月 14 14:44 chr12.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 546M 11月 14 14:45 chr12.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 121K 11月 14 14:44 chr12.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 9.4M 11月 14 14:46 chr12.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 55M 11月 14 14:46 chr12.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 17:32 chr12.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.8K 11月 14 14:44 chr13.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 405M 11月 14 14:45 chr13.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 88K 11月 14 14:44 chr13.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.9M 11月 14 14:46 chr13.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 41M 11月 14 14:46 chr13.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 17:43 chr13.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.9K 11月 14 14:44 chr14.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 387M 11月 14 14:45 chr14.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 82K 11月 14 14:44 chr14.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.4M 11月 14 14:46 chr14.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 38M 11月 14 14:46 chr14.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 17:54 chr14.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.9K 11月 14 14:44 chr15.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 347M 11月 14 14:45 chr15.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 76K 11月 14 14:44 chr15.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.0M 11月 14 14:46 chr15.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 36M 11月 14 14:46 chr15.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:04 chr15.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.8K 11月 14 14:44 chr16.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 374M 11月 14 14:45 chr16.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 74K 11月 14 14:44 chr16.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.5M 11月 14 14:46 chr16.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 40M 11月 14 14:46 chr16.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:14 chr16.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.9K 11月 14 14:44 chr17.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 327M 11月 14 14:45 chr17.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 72K 11月 14 14:44 chr17.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.7M 11月 14 14:46 chr17.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 34M 11月 14 14:46 chr17.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:23 chr17.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.9K 11月 14 14:44 chr18.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 322M 11月 14 14:45 chr18.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 70K 11月 14 14:44 chr18.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.7M 11月 14 14:46 chr18.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 35M 11月 14 14:46 chr18.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:32 chr18.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.2K 11月 14 14:44 chr19.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 284M 11月 14 14:45 chr19.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 51K 11月 14 14:44 chr19.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.2M 11月 14 14:46 chr19.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 26M 11月 14 14:46 chr19.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:43 chr19.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 6.6K 11月 14 14:44 chr2.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 957M 11月 14 14:44 chr2.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 219K 11月 14 14:44 chr2.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 16M 11月 14 14:46 chr2.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 93M 11月 14 14:46 chr2.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 15:32 chr2.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 3.1K 11月 14 14:44 chr20.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 249M 11月 14 14:45 chr20.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 55K 11月 14 14:44 chr20.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.7M 11月 14 14:46 chr20.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 29M 11月 14 14:46 chr20.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 18:54 chr20.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 2.4K 11月 14 14:44 chr21.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 174M 11月 14 14:45 chr21.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 33K 11月 14 14:44 chr21.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 2.7M 11月 14 14:46 chr21.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 17M 11月 14 14:46 chr21.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 19:06 chr21.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 2.4K 11月 14 14:44 chr22.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 185M 11月 14 14:45 chr22.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 33K 11月 14 14:44 chr22.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 2.9M 11月 14 14:46 chr22.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 19M 11月 14 14:46 chr22.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 19:19 chr22.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.2K 11月 14 14:44 chr3.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 793M 11月 14 14:44 chr3.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 180K 11月 14 14:44 chr3.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 14M 11月 14 14:46 chr3.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 79M 11月 14 14:46 chr3.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 15:44 chr3.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.2K 11月 14 14:44 chr4.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 837M 11月 14 14:44 chr4.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 174K 11月 14 14:44 chr4.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 13M 11月 14 14:46 chr4.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 74M 11月 14 14:46 chr4.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 15:55 chr4.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.3K 11月 14 14:44 chr5.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 708M 11月 14 14:44 chr5.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 163K 11月 14 14:44 chr5.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 12M 11月 14 14:46 chr5.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 68M 11月 14 14:46 chr5.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 16:09 chr5.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 5.2K 11月 14 14:44 chr6.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 752M 11月 14 14:45 chr6.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 155K 11月 14 14:44 chr6.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 13M 11月 14 14:46 chr6.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 73M 11月 14 14:46 chr6.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 16:20 chr6.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.6K 11月 14 14:44 chr7.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 684M 11月 14 14:45 chr7.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 144K 11月 14 14:44 chr7.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 11M 11月 14 14:46 chr7.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 63M 11月 14 14:46 chr7.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 16:32 chr7.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.5K 11月 14 14:44 chr8.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 619M 11月 14 14:45 chr8.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 132K 11月 14 14:44 chr8.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 11M 11月 14 14:46 chr8.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 61M 11月 14 14:46 chr8.conform-gt.vcf.gz\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 16:46 chr8.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.5K 11月 14 14:44 chr9.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 496M 11月 14 14:45 chr9.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 110K 11月 14 14:44 chr9.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 8.5M 11月 14 14:46 chr9.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 52M 11月 14 14:46 chr9.conform-gt.vcf.gz\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 14 16:56 chr9.mt\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 17 15:29 chrAll.filtered.mt\r\n", "drwxr-xr-x 8 kozonishida-pg oo-nig-pg 4.0K 11月 15 20:03 chrAll.mt\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 1.8K 11月 14 14:44 chrX_PAR1.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 43M 11月 14 14:45 chrX_PAR1.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 1.8K 11月 14 14:44 chrX_PAR1.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 123K 11月 14 14:46 chrX_PAR1.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 971K 11月 14 14:46 chrX_PAR1.conform-gt.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 1.8K 11月 14 14:44 chrX_PAR2.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 2.1M 11月 14 14:46 chrX_PAR2.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 697 11月 14 14:44 chrX_PAR2.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 11K 11月 14 14:46 chrX_PAR2.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 71K 11月 14 14:46 chrX_PAR2.conform-gt.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.7K 11月 14 14:44 chrX_nonPAR.beagle.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 479M 11月 14 14:46 chrX_nonPAR.beagle.vcf.gz\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 134K 11月 14 14:44 chrX_nonPAR.beagle.vcf.gz.tbi\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 4.3M 11月 14 14:46 chrX_nonPAR.conform-gt.log\r\n", "-rw-r--r-- 1 kozonishida-pg oo-nig-pg 27M 11月 14 14:46 chrX_nonPAR.conform-gt.vcf.gz\r\n" ] } ], "source": [ "!ls -alh outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 9-1 PRSモデルを読み込みます (PGS002724, PGS002725)\n", "下記のコードを実行すると、`prs-models/PGS002724.txt` と `prs-models/PGS002724.txt` のデータが読み込まれます。" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2022-11-17 14:37:13-- https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS002724/ScoringFiles/PGS002724.txt.gz\n", "ftp.ebi.ac.uk (ftp.ebi.ac.uk) をDNSに問いあわせています... 193.62.193.138\n", "ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:443 に接続しています... 接続しました。\n", "HTTP による接続要求を送信しました、応答を待っています... 200 OK\n", "長さ: 16818717 (16M) [application/x-gzip]\n", "`PGS002724.txt.gz' に保存中\n", "\n", "PGS002724.txt.gz 100%[===================>] 16.04M 7.16MB/s in 2.2s \n", "\n", "2022-11-17 14:37:16 (7.16 MB/s) - `PGS002724.txt.gz' へ保存完了 [16818717/16818717]\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 59:===============> (24 + 8) / 89]\r" ] } ], "source": [ "!wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS002724/ScoringFiles/PGS002724.txt.gz" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2022-11-17 14:37:19-- https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS002725/ScoringFiles/PGS002725.txt.gz\n", "ftp.ebi.ac.uk (ftp.ebi.ac.uk) をDNSに問いあわせています... 193.62.193.138\n", "ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:443 に接続しています... 接続しました。\n", "HTTP による接続要求を送信しました、応答を待っています... 200 OK\n", "長さ: 81992802 (78M) [application/x-gzip]\n", "`PGS002725.txt.gz' に保存中\n", "\n", "PGS002725.txt.gz 99%[==================> ] 77.85M 1009KB/s eta 0s " ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 59:===============> (24 + 8) / 89]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "PGS002725.txt.gz 99%[==================> ] 78.12M 1.02MB/s eta 0s \r\n", "PGS002725.txt.gz 100%[===================>] 78.19M 1.04MB/s in 58s \r\n", "\r\n", "2022-11-17 14:38:18 (1.34 MB/s) - `PGS002725.txt.gz' へ保存完了 [81992802/81992802]\r\n", "\r\n" ] } ], "source": [ "!wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS002725/ScoringFiles/PGS002725.txt.gz" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "!gunzip PGS002724.txt.gz" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "!gunzip PGS002725.txt.gz" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PGS002724.txt\r\n", "PGS002725.txt\r\n", "Tutorial_2022_11_15.ipynb\r\n", "hail-20221111-1605-0.2.105-acd89e80c345.log\r\n", "hail-20221114-1432-0.2.105-acd89e80c345.log\r\n", "outputs\r\n", "prs-models\r\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 59:===============> (25 + 8) / 89]\r" ] } ], "source": [ "!ls" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 59:===============> (24 + 8) / 89]\r" ] } ], "source": [ "!mkdir prs-models" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 59:================> (26 + 8) / 89]\r" ] } ], "source": [ "!mv PGS002724.txt prs-models/" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "!mv PGS002725.txt prs-models/" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PGS002724.txt PGS002725.txt\r\n" ] } ], "source": [ "!ls prs-models" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "###PGS CATALOG SCORING FILE - see https://www.pgscatalog.org/downloads/#dl_ftp_scoring for additional information\r\n", "#format_version=2.0\r\n", "##POLYGENIC SCORE (PGS) INFORMATION\r\n", "#pgs_id=PGS002724\r\n", "#pgs_name=GIGASTROKE_iPGS_EUR\r\n", "#trait_reported=Ischemic stroke\r\n", "#trait_mapped=stroke|Ischemic stroke\r\n", "#trait_efo=EFO_0000712|HP_0002140\r\n", "#genome_build=GRCh37\r\n", "#variants_number=1213574\r\n", "#weight_type=NR\r\n", "##SOURCE INFORMATION\r\n", "#pgp_id=PGP000333\r\n", "#citation=Mishra A et al. Nature (2022). doi:10.1038/s41586-022-05165-3\r\n", "chr_name\tchr_position\teffect_allele\tother_allele\teffect_weight\r\n", "1\t752721\tG\tA\t50.2009138795063\r\n", "1\t754182\tG\tA\t141.073654032741\r\n", "1\t760912\tT\tC\t180.556536852976\r\n", "1\t768448\tA\tG\t-74.6438253333578\r\n", "1\t779322\tG\tA\t-137.02495892717\r\n", "1\t838555\tA\tC\t-128.635559661359\r\n", "1\t846808\tT\tC\t-103.154370468722\r\n", "1\t853954\tA\tC\t-128.44445344713\r\n", "1\t854250\tG\tA\t-67.2278476132285\r\n", "1\t861808\tG\tA\t327.215410265433\r\n", "1\t863124\tT\tG\t403.185761038646\r\n", "1\t864938\tA\tG\t-173.628855846674\r\n", "1\t870645\tC\tT\t-80.5551733833478\r\n", "1\t873558\tT\tG\t-261.898526036879\r\n", "1\t879317\tT\tC\t14.4910134312575\r\n" ] } ], "source": [ "!head prs-models/PGS002724.txt -n 30" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 12:56:23.980 Hail: INFO: Reading table to impute column types\n", "2022-11-23 12:56:31.052 Hail: INFO: Finished type imputation (0 + 1) / 1]\n", " Loading field 'chr_name' as type int32 (imputed)\n", " Loading field 'chr_position' as type int32 (imputed)\n", " Loading field 'effect_allele' as type str (imputed)\n", " Loading field 'other_allele' as type str (imputed)\n", " Loading field 'effect_weight' as type float64 (imputed)\n" ] } ], "source": [ "model_PGS002724 = hl.import_table('prs-models/PGS002724.txt', impute=True, force=True, comment='#')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 12:57:00.648 Hail: INFO: Reading table to impute column types\n", "2022-11-23 12:57:16.612 Hail: INFO: Finished type imputation (1 + 1) / 2]\n", " Loading field 'chr_name' as type int32 (imputed)\n", " Loading field 'chr_position' as type int32 (imputed)\n", " Loading field 'effect_allele' as type str (imputed)\n", " Loading field 'other_allele' as type str (imputed)\n", " Loading field 'effect_weight' as type float64 (imputed)\n" ] } ], "source": [ "model_PGS002725 = hl.import_table('prs-models/PGS002725.txt', impute=True, force=True, comment='#')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRSモデルに含まれるバリアントの個数を表示します。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 23:> (0 + 1) / 1]\r" ] }, { "data": { "text/plain": [ "1213574" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_PGS002724.count()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 24:> (0 + 2) / 2]\r" ] }, { "data": { "text/plain": [ "6010730" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_PGS002725.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`1213574` と `6010730` と表示されました。\n", "これは、PRSモデル 002724 と 002725 に含まれるバリアントの個数がそれぞれ 1213574, 6010730 個であることを意味します。\n", "\n", "下記のコードは、読み込んだ PRS モデルの最初の 5 行を表示します。" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
chr_name
chr_position
effect_allele
other_allele
effect_weight
int32int32strstrfloat64
1752721"G""A"5.02e+01
1754182"G""A"1.41e+02
1760912"T""C"1.81e+02
1768448"A""G"-7.46e+01
1779322"G""A"-1.37e+02

showing top 5 rows

\n" ], "text/plain": [ "+----------+--------------+---------------+--------------+---------------+\n", "| chr_name | chr_position | effect_allele | other_allele | effect_weight |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| int32 | int32 | str | str | float64 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| 1 | 752721 | \"G\" | \"A\" | 5.02e+01 |\n", "| 1 | 754182 | \"G\" | \"A\" | 1.41e+02 |\n", "| 1 | 760912 | \"T\" | \"C\" | 1.81e+02 |\n", "| 1 | 768448 | \"A\" | \"G\" | -7.46e+01 |\n", "| 1 | 779322 | \"G\" | \"A\" | -1.37e+02 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model_PGS002724.show(5)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
chr_name
chr_position
effect_allele
other_allele
effect_weight
int32int32strstrfloat64
1711310"A""G"-5.64e-07
1731718"C""T"-3.64e-06
1732032"C""A"-3.54e-06
1734349"C""T"-3.11e-06
1742990"T""C"-5.43e-06

showing top 5 rows

\n" ], "text/plain": [ "+----------+--------------+---------------+--------------+---------------+\n", "| chr_name | chr_position | effect_allele | other_allele | effect_weight |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| int32 | int32 | str | str | float64 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| 1 | 711310 | \"A\" | \"G\" | -5.64e-07 |\n", "| 1 | 731718 | \"C\" | \"T\" | -3.64e-06 |\n", "| 1 | 732032 | \"C\" | \"A\" | -3.54e-06 |\n", "| 1 | 734349 | \"C\" | \"T\" | -3.11e-06 |\n", "| 1 | 742990 | \"T\" | \"C\" | -5.43e-06 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model_PGS002725.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、読み込んだ PRS モデルに `variantID` を追加します。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "model_PGS002724 = model_PGS002724.annotate(\n", " variantID = hl.str(model_PGS002724.chr_name) + \":\" + hl.str(model_PGS002724.chr_position) \n", ")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "model_PGS002725 = model_PGS002725.annotate(\n", " variantID = hl.str(model_PGS002725.chr_name) + \":\" + hl.str(model_PGS002725.chr_position) \n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは `variantID` を追加した後の最初の 5 行を表示します。" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
chr_name
chr_position
effect_allele
other_allele
effect_weight
variantID
int32int32strstrfloat64str
1752721"G""A"5.02e+01"1:752721"
1754182"G""A"1.41e+02"1:754182"
1760912"T""C"1.81e+02"1:760912"
1768448"A""G"-7.46e+01"1:768448"
1779322"G""A"-1.37e+02"1:779322"

showing top 5 rows

\n" ], "text/plain": [ "+----------+--------------+---------------+--------------+---------------+\n", "| chr_name | chr_position | effect_allele | other_allele | effect_weight |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| int32 | int32 | str | str | float64 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| 1 | 752721 | \"G\" | \"A\" | 5.02e+01 |\n", "| 1 | 754182 | \"G\" | \"A\" | 1.41e+02 |\n", "| 1 | 760912 | \"T\" | \"C\" | 1.81e+02 |\n", "| 1 | 768448 | \"A\" | \"G\" | -7.46e+01 |\n", "| 1 | 779322 | \"G\" | \"A\" | -1.37e+02 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "\n", "+------------+\n", "| variantID |\n", "+------------+\n", "| str |\n", "+------------+\n", "| \"1:752721\" |\n", "| \"1:754182\" |\n", "| \"1:760912\" |\n", "| \"1:768448\" |\n", "| \"1:779322\" |\n", "+------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model_PGS002724.show(5)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
chr_name
chr_position
effect_allele
other_allele
effect_weight
variantID
int32int32strstrfloat64str
1711310"A""G"-5.64e-07"1:711310"
1731718"C""T"-3.64e-06"1:731718"
1732032"C""A"-3.54e-06"1:732032"
1734349"C""T"-3.11e-06"1:734349"
1742990"T""C"-5.43e-06"1:742990"

showing top 5 rows

\n" ], "text/plain": [ "+----------+--------------+---------------+--------------+---------------+\n", "| chr_name | chr_position | effect_allele | other_allele | effect_weight |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| int32 | int32 | str | str | float64 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "| 1 | 711310 | \"A\" | \"G\" | -5.64e-07 |\n", "| 1 | 731718 | \"C\" | \"T\" | -3.64e-06 |\n", "| 1 | 732032 | \"C\" | \"A\" | -3.54e-06 |\n", "| 1 | 734349 | \"C\" | \"T\" | -3.11e-06 |\n", "| 1 | 742990 | \"T\" | \"C\" | -5.43e-06 |\n", "+----------+--------------+---------------+--------------+---------------+\n", "\n", "+------------+\n", "| variantID |\n", "+------------+\n", "| str |\n", "+------------+\n", "| \"1:711310\" |\n", "| \"1:731718\" |\n", "| \"1:732032\" |\n", "| \"1:734349\" |\n", "| \"1:742990\" |\n", "+------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model_PGS002725.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`variantID` のカラムが追加されていることが分かります。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 9-2 ゲノムデータとPRSモデルに共通するバリアントを抽出します\n", "下記のコードは、PRSモデルのバリアント情報を variantID で検索できるようにします。" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "model_PGS002724 = model_PGS002724.key_by('variantID')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "model_PGS002725 = model_PGS002725.key_by('variantID')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 33:===========================================> (3 + 1) / 4]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
info
locus
alleles
rsid
qual
filters
AF
DR2
IMP
variantID
locus<GRCh37>array<str>strfloat64set<str>array<float64>array<float64>boolstr
1:534247["C","T"]"rs201475892"-1.00e+01{}[7.80e-03][1.00e+00]False"1:534247"
1:565286["C","T"]"rs1578391"-1.00e+01{}[9.93e-01][1.00e+00]False"1:565286"
1:674211["C","T"]"rs546906063"-1.00e+01{}[2.25e-02][4.40e-01]True"1:674211"
1:701299["A","G"]"rs553919012"-1.00e+01{}[2.57e-02][7.20e-01]True"1:701299"
1:701625["T","C"]"rs576411494"-1.00e+01{}[2.40e-03][5.60e-01]True"1:701625"

showing top 5 rows

\n" ], "text/plain": [ "+---------------+------------+---------------+-----------+----------+\n", "| locus | alleles | rsid | qual | filters |\n", "+---------------+------------+---------------+-----------+----------+\n", "| locus | array | str | float64 | set |\n", "+---------------+------------+---------------+-----------+----------+\n", "| 1:534247 | [\"C\",\"T\"] | \"rs201475892\" | -1.00e+01 | {} |\n", "| 1:565286 | [\"C\",\"T\"] | \"rs1578391\" | -1.00e+01 | {} |\n", "| 1:674211 | [\"C\",\"T\"] | \"rs546906063\" | -1.00e+01 | {} |\n", "| 1:701299 | [\"A\",\"G\"] | \"rs553919012\" | -1.00e+01 | {} |\n", "| 1:701625 | [\"T\",\"C\"] | \"rs576411494\" | -1.00e+01 | {} |\n", "+---------------+------------+---------------+-----------+----------+\n", "\n", "+----------------+----------------+----------+------------+\n", "| info.AF | info.DR2 | info.IMP | variantID |\n", "+----------------+----------------+----------+------------+\n", "| array | array | bool | str |\n", "+----------------+----------------+----------+------------+\n", "| [7.80e-03] | [1.00e+00] | False | \"1:534247\" |\n", "| [9.93e-01] | [1.00e+00] | False | \"1:565286\" |\n", "| [2.25e-02] | [4.40e-01] | True | \"1:674211\" |\n", "| [2.57e-02] | [7.20e-01] | True | \"1:701299\" |\n", "| [2.40e-03] | [5.60e-01] | True | \"1:701625\" |\n", "+----------------+----------------+----------+------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt1_filt.rows().show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、ゲノムデータと PRS モデルに共通するバリアントを抽出します。" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002724 = mt1_filt.annotate_rows(**model_PGS002724[mt1_filt.variantID])\n", "mt_match_PGS002724 = mt_match_PGS002724.filter_rows(hl.is_defined(mt_match_PGS002724.effect_weight))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002725 = mt1_filt.annotate_rows(**model_PGS002725[mt1_filt.variantID])\n", "mt_match_PGS002725 = mt_match_PGS002725.filter_rows(hl.is_defined(mt_match_PGS002725.effect_weight))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 13:05:08.949 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 13:05:16.082 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 13:11:19.282 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 47:> (0 + 1) / 1]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
info
locus
alleles
rsid
qual
filters
AF
DR2
IMP
variantID
chr_name
chr_position
effect_allele
other_allele
effect_weight
locus<GRCh37>array<str>strfloat64set<str>array<float64>array<float64>boolstrint32int32strstrfloat64
1:752721["A","G"]"rs3131972"-1.00e+01{}[6.73e-01][1.00e+00]False"1:752721"1752721"G""A"5.02e+01
1:754182["A","G"]"rs3131969"-1.00e+01{}[7.01e-01][9.90e-01]True"1:754182"1754182"G""A"1.41e+02
1:760912["C","T"]"rs1048488"-1.00e+01{}[7.49e-01][9.80e-01]True"1:760912"1760912"T""C"1.81e+02
1:768448["G","A"]"rs12562034"-1.00e+01{}[1.57e-01][1.00e+00]False"1:768448"1768448"A""G"-7.46e+01
1:779322["A","G"]"rs4040617"-1.00e+01{}[2.11e-01][1.00e+00]False"1:779322"1779322"G""A"-1.37e+02

showing top 5 rows

\n" ], "text/plain": [ "+---------------+------------+--------------+-----------+----------+\n", "| locus | alleles | rsid | qual | filters |\n", "+---------------+------------+--------------+-----------+----------+\n", "| locus | array | str | float64 | set |\n", "+---------------+------------+--------------+-----------+----------+\n", "| 1:752721 | [\"A\",\"G\"] | \"rs3131972\" | -1.00e+01 | {} |\n", "| 1:754182 | [\"A\",\"G\"] | \"rs3131969\" | -1.00e+01 | {} |\n", "| 1:760912 | [\"C\",\"T\"] | \"rs1048488\" | -1.00e+01 | {} |\n", "| 1:768448 | [\"G\",\"A\"] | \"rs12562034\" | -1.00e+01 | {} |\n", "| 1:779322 | [\"A\",\"G\"] | \"rs4040617\" | -1.00e+01 | {} |\n", "+---------------+------------+--------------+-----------+----------+\n", "\n", "+----------------+----------------+----------+------------+----------+\n", "| info.AF | info.DR2 | info.IMP | variantID | chr_name |\n", "+----------------+----------------+----------+------------+----------+\n", "| array | array | bool | str | int32 |\n", "+----------------+----------------+----------+------------+----------+\n", "| [6.73e-01] | [1.00e+00] | False | \"1:752721\" | 1 |\n", "| [7.01e-01] | [9.90e-01] | True | \"1:754182\" | 1 |\n", "| [7.49e-01] | [9.80e-01] | True | \"1:760912\" | 1 |\n", "| [1.57e-01] | [1.00e+00] | False | \"1:768448\" | 1 |\n", "| [2.11e-01] | [1.00e+00] | False | \"1:779322\" | 1 |\n", "+----------------+----------------+----------+------------+----------+\n", "\n", "+--------------+---------------+--------------+---------------+\n", "| chr_position | effect_allele | other_allele | effect_weight |\n", "+--------------+---------------+--------------+---------------+\n", "| int32 | str | str | float64 |\n", "+--------------+---------------+--------------+---------------+\n", "| 752721 | \"G\" | \"A\" | 5.02e+01 |\n", "| 754182 | \"G\" | \"A\" | 1.41e+02 |\n", "| 760912 | \"T\" | \"C\" | 1.81e+02 |\n", "| 768448 | \"A\" | \"G\" | -7.46e+01 |\n", "| 779322 | \"G\" | \"A\" | -1.37e+02 |\n", "+--------------+---------------+--------------+---------------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt_match_PGS002724.rows().show(5)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\n", "[Stage 48:> (0 + 1) / 1]\r" ] }, { "data": { "text/plain": [ "99864" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_PGS002724.filter(model_PGS002724.chr_name==1).count()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 66:=============================> (1 + 1) / 2]\r" ] }, { "data": { "text/plain": [ "475080" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_PGS002725.filter(model_PGS002725.chr_name==1).count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "今後のステップの実行時間を短縮するため(ここを第2のチェックポイントとするため)、一旦抽出したゲノムデータを保存しておきます。\n", "\n", "下記のコードは、抽出したゲノムデータを `outputs/chrAll.filtered.matched_PGS002724.mt`, `outputs/chrAll.filtered.matched_PGS002725.mt` に保存します。\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-21 16:25:48.517 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 16:25:53.704 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 16:34:47.212 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 16:51:37.284 Hail: INFO: wrote matrix table with 1211967 rows and 2318 columns in 89 partitions to outputs/chrAll.filtered.matched_PGS002724.mt\n" ] } ], "source": [ "mt_match_PGS002724.write('outputs/chrAll.filtered.matched_PGS002724.mt', overwrite=True)\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-21 21:56:42.035 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 21:56:54.561 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 22:13:20.912 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-21 22:52:14.176 Hail: INFO: wrote matrix table with 5930286 rows and 2318 columns in 89 partitions to outputs/chrAll.filtered.matched_PGS002725.mt\n" ] } ], "source": [ "mt_match_PGS002725.write('outputs/chrAll.filtered.matched_PGS002725.mt', overwrite=True)\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 13:20:39.449 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 13:20:51.962 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 13:26:39.054 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 57:=============================> (1 + 1) / 2]\r" ] }, { "data": { "text/plain": [ "1211916" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dict(mt_match_PGS002724.aggregate_rows(hl.agg.counter(mt_match_PGS002724.variantID))))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 14:05:53.594 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 14:06:12.039 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 14:14:15.707 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 67:=============================> (1 + 1) / 2]\r" ] }, { "data": { "text/plain": [ "5929754" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dict(mt_match_PGS002725.aggregate_rows(hl.agg.counter(mt_match_PGS002725.variantID))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 9-3 PRSを計算して保存します\n", "下記のコードは、ゲノムデータのアリル情報とPRSモデルのアリル情報を比較し、合致しているかをチェックします。" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "flip_PGS002724 = hl.case().when( \n", " (mt_match_PGS002724.effect_allele == mt_match_PGS002724.alleles[0])\n", " & (mt_match_PGS002724.other_allele == mt_match_PGS002724.alleles[1]), True ).when( \n", " (mt_match_PGS002724.effect_allele == mt_match_PGS002724.alleles[1])\n", " & (mt_match_PGS002724.other_allele == mt_match_PGS002724.alleles[0]), False ).or_missing()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "flip_PGS002725 = hl.case().when( \n", " (mt_match_PGS002725.effect_allele == mt_match_PGS002725.alleles[0]) \n", " & (mt_match_PGS002725.other_allele == mt_match_PGS002725.alleles[1]), True ).when( \n", " (mt_match_PGS002725.effect_allele == mt_match_PGS002725.alleles[1])\n", " & (mt_match_PGS002725.other_allele == mt_match_PGS002725.alleles[0]), False ).or_missing()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002724 = mt_match_PGS002724.annotate_rows(flip=flip_PGS002724)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002725 = mt_match_PGS002725.annotate_rows(flip=flip_PGS002725)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、各々のバリアントについて研究対象者の持っているアリル数(`mt_match_PGS002724.DS`)とバリアントの重み(`mt_match_PGS002724.effect_weight`)を掛け合わせて、ゲノムデータとPRSモデルの共通するバリアントについて足し合わせます。 これにより、PRSを計算することができます。" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "prs_PGS002724 = hl.agg.sum(hl.float64(mt_match_PGS002724.effect_weight) * \n", " hl.if_else( mt_match_PGS002724.flip, \n", " 2 - mt_match_PGS002724.DS.first(),\n", " mt_match_PGS002724.DS.first()))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002724 = mt_match_PGS002724.annotate_cols(prs=prs_PGS002724)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PGS002725 についても同様のことを行ってみましょう。" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "prs_PGS002725 = hl.agg.sum(hl.float64(mt_match_PGS002725.effect_weight) * \n", " hl.if_else( mt_match_PGS002725.flip, \n", " 2 - mt_match_PGS002725.DS.first(),\n", " mt_match_PGS002725.DS.first()))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "mt_match_PGS002725 = mt_match_PGS002725.annotate_cols(prs=prs_PGS002725)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRSの値を表示します。" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 16:24:35.335 Hail: WARN: cols(): Resulting column table is sorted by 'col_key'.\n", " To preserve matrix table column order, first unkey columns with 'key_cols_by()'\n", "2022-11-23 16:29:55.700 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 16:30:04.670 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 16:36:22.299 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 76:=======================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
s
prs
strfloat64
"_HG00096"-2.35e+06
"_HG00097"-2.39e+06
"_HG00098"-2.27e+06
"_HG00099"-2.39e+06
"_HG00100"-2.25e+06

showing top 5 rows

\n" ], "text/plain": [ "+------------+-----------+\n", "| s | prs |\n", "+------------+-----------+\n", "| str | float64 |\n", "+------------+-----------+\n", "| \"_HG00096\" | -2.35e+06 |\n", "| \"_HG00097\" | -2.39e+06 |\n", "| \"_HG00098\" | -2.27e+06 |\n", "| \"_HG00099\" | -2.39e+06 |\n", "| \"_HG00100\" | -2.25e+06 |\n", "+------------+-----------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt_match_PGS002724.cols().show(5)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 16:53:43.532 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 16:54:00.362 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:01:50.156 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 87:=======================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
s
prs
strfloat64
"_HG00096"-1.73e+00
"_HG00097"-1.48e+00
"_HG00098"-1.82e+00
"_HG00099"-2.21e+00
"_HG00100"-1.47e+00

showing top 5 rows

\n" ], "text/plain": [ "+------------+-----------+\n", "| s | prs |\n", "+------------+-----------+\n", "| str | float64 |\n", "+------------+-----------+\n", "| \"_HG00096\" | -1.73e+00 |\n", "| \"_HG00097\" | -1.48e+00 |\n", "| \"_HG00098\" | -1.82e+00 |\n", "| \"_HG00099\" | -2.21e+00 |\n", "| \"_HG00100\" | -1.47e+00 |\n", "+------------+-----------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mt_match_PGS002725.cols().show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRSの分布を表示します。" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 17:18:12.181 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:18:20.554 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:23:52.106 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:36:20.617 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:36:26.549 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:42:27.683 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 109:======================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"b09b200d-9f59-4d77-86a5-21b106ff622d\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"#EEEEEE\"},\"below\":[{\"id\":\"1473\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1477\",\"type\":\"Grid\"},{\"id\":\"1482\",\"type\":\"Grid\"},{\"id\":\"1507\",\"type\":\"Legend\"}],\"left\":[{\"id\":\"1478\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1499\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1463\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1489\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1465\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"1469\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1467\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1471\",\"type\":\"LinearScale\"}},\"id\":\"1462\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"1503\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"items\":[{\"id\":\"1508\",\"type\":\"LegendItem\"}]},\"id\":\"1507\",\"type\":\"Legend\"},{\"attributes\":{\"source\":{\"id\":\"1496\",\"type\":\"ColumnDataSource\"}},\"id\":\"1500\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1474\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1505\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"axis_label\":\"PGS002724\",\"formatter\":{\"id\":\"1505\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1474\",\"type\":\"BasicTicker\"}},\"id\":\"1473\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1546\",\"type\":\"Selection\"},{\"attributes\":{\"overlay\":{\"id\":\"1506\",\"type\":\"BoxAnnotation\"}},\"id\":\"1485\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1483\",\"type\":\"PanTool\"},{\"id\":\"1484\",\"type\":\"WheelZoomTool\"},{\"id\":\"1485\",\"type\":\"BoxZoomTool\"},{\"id\":\"1486\",\"type\":\"SaveTool\"},{\"id\":\"1487\",\"type\":\"ResetTool\"},{\"id\":\"1488\",\"type\":\"HelpTool\"}]},\"id\":\"1489\",\"type\":\"Toolbar\"},{\"attributes\":{\"ticker\":{\"id\":\"1474\",\"type\":\"BasicTicker\"}},\"id\":\"1477\",\"type\":\"Grid\"},{\"attributes\":{\"text\":\"PRS Histogram\"},\"id\":\"1463\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"1483\",\"type\":\"PanTool\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1479\",\"type\":\"BasicTicker\"}},\"id\":\"1482\",\"type\":\"Grid\"},{\"attributes\":{\"data_source\":{\"id\":\"1496\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1497\",\"type\":\"Quad\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1498\",\"type\":\"Quad\"},\"selection_glyph\":null,\"view\":{\"id\":\"1500\",\"type\":\"CDSView\"}},\"id\":\"1499\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"axis_label\":\"Frequency\",\"formatter\":{\"id\":\"1503\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1479\",\"type\":\"BasicTicker\"}},\"id\":\"1478\",\"type\":\"LinearAxis\"},{\"attributes\":{\"callback\":null},\"id\":\"1467\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1486\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"1488\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"1479\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1484\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1469\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1471\",\"type\":\"LinearScale\"},{\"attributes\":{\"label\":{\"value\":\"PGS002724\"},\"renderers\":[{\"id\":\"1499\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1508\",\"type\":\"LegendItem\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1497\",\"type\":\"Quad\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1498\",\"type\":\"Quad\"},{\"attributes\":{\"callback\":null,\"end\":247596.81219581296,\"start\":-3266276.4611581615},\"id\":\"1465\",\"type\":\"Range1d\"},{\"attributes\":{\"callback\":null,\"data\":{\"left\":[-3106554.948732981,-2946833.4363078005,-2787111.9238826195,-2627390.411457439,-2467668.8990322584,-2307947.386607078,-2148225.874181897,-1988504.3617567164,-1828782.8493315356,-1669061.3369063549,-1509339.8244811743,-1349618.3120559936,-1189896.7996308128,-1030175.2872056323,-870453.7747804518,-710732.2623552708,-551010.7499300903,-391289.23750490975,-231567.72507972876,-71846.21265454823],\"right\":[-2946833.4363078005,-2787111.9238826195,-2627390.411457439,-2467668.8990322584,-2307947.386607078,-2148225.874181897,-1988504.3617567164,-1828782.8493315356,-1669061.3369063549,-1509339.8244811743,-1349618.3120559936,-1189896.7996308128,-1030175.2872056323,-870453.7747804518,-710732.2623552708,-551010.7499300903,-391289.23750490975,-231567.72507972876,-71846.21265454823,87875.2997706323],\"top\":[3,11,34,100,134,166,144,154,133,97,79,117,174,237,255,237,142,67,28,6]},\"selected\":{\"id\":\"1546\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1547\",\"type\":\"UnionRenderers\"}},\"id\":\"1496\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1547\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1487\",\"type\":\"ResetTool\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1506\",\"type\":\"BoxAnnotation\"}],\"root_ids\":[\"1462\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"b09b200d-9f59-4d77-86a5-21b106ff622d\",\"roots\":{\"1462\":\"88bb8edf-d0f4-440f-b0cb-f609117a6a5f\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1462" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.histogram(mt_match_PGS002724.prs, title=\"PRS Histogram\", legend=\"PGS002724\", bins=20)\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRSの計算結果を chrAll.filtered.PGS002724.PRS.txt ファイルに保存します。" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 17:57:54.529 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 17:58:03.476 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:03:44.519 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:11:38.831 Hail: INFO: Coerced sorted dataset=======>(88 + 1) / 89]\n", "2022-11-23 18:11:40.349 Hail: INFO: merging 17 files totalling 47.5K...\n", "2022-11-23 18:11:40.442 Hail: INFO: while writing:\n", " chrAll.filtered.PGS002724.PRS.txt\n", " merge time: 91.440ms\n" ] } ], "source": [ "mt_match_PGS002724.cols().export('chrAll.filtered.PGS002724.PRS.txt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PGS002725 についても同様のことを行ってみましょう。" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 18:34:36.295 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:34:57.684 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:43:45.743 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:59:19.136 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 18:59:31.370 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 19:08:03.664 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "[Stage 143:======================================================>(88 + 1) / 89]\r" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"19d9d954-a6a6-440e-888f-c62666a04620\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"#EEEEEE\"},\"below\":[{\"id\":\"1608\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1612\",\"type\":\"Grid\"},{\"id\":\"1617\",\"type\":\"Grid\"},{\"id\":\"1642\",\"type\":\"Legend\"}],\"left\":[{\"id\":\"1613\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1634\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1598\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1624\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1600\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"1604\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1602\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1606\",\"type\":\"LinearScale\"}},\"id\":\"1597\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"source\":{\"id\":\"1631\",\"type\":\"ColumnDataSource\"}},\"id\":\"1635\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1690\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"text\":\"PRS Histogram\"},\"id\":\"1598\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"1618\",\"type\":\"PanTool\"},{\"attributes\":{\"data_source\":{\"id\":\"1631\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1632\",\"type\":\"Quad\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1633\",\"type\":\"Quad\"},\"selection_glyph\":null,\"view\":{\"id\":\"1635\",\"type\":\"CDSView\"}},\"id\":\"1634\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"1609\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1623\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"1614\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1619\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1640\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"callback\":null},\"id\":\"1602\",\"type\":\"DataRange1d\"},{\"attributes\":{\"label\":{\"value\":\"PGS002725\"},\"renderers\":[{\"id\":\"1634\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1643\",\"type\":\"LegendItem\"},{\"attributes\":{\"callback\":null,\"data\":{\"left\":[-3.0223283454621774,-2.8732396926687254,-2.724151039875273,-2.575062387081821,-2.4259737342883687,-2.276885081494916,-2.127796428701464,-1.9787077759080118,-1.8296191231145595,-1.6805304703211073,-1.531441817527655,-1.3823531647342029,-1.2332645119407506,-1.0841758591472983,-0.9350872063538462,-0.7859985535603937,-0.6369099007669417,-0.4878212479734896,-0.33873259518003707,-0.189643942386585],\"right\":[-2.8732396926687254,-2.724151039875273,-2.575062387081821,-2.4259737342883687,-2.276885081494916,-2.127796428701464,-1.9787077759080118,-1.8296191231145595,-1.6805304703211073,-1.531441817527655,-1.3823531647342029,-1.2332645119407506,-1.0841758591472983,-0.9350872063538462,-0.7859985535603937,-0.6369099007669417,-0.4878212479734896,-0.33873259518003707,-0.189643942386585,-0.04055528959313248],\"top\":[1,5,21,74,140,245,294,260,202,178,167,120,162,152,133,84,42,27,6,5]},\"selected\":{\"id\":\"1689\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1690\",\"type\":\"UnionRenderers\"}},\"id\":\"1631\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"end\":0.10853336320031978,\"start\":-3.1714169982556295},\"id\":\"1600\",\"type\":\"Range1d\"},{\"attributes\":{},\"id\":\"1621\",\"type\":\"SaveTool\"},{\"attributes\":{\"items\":[{\"id\":\"1643\",\"type\":\"LegendItem\"}]},\"id\":\"1642\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"1638\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1618\",\"type\":\"PanTool\"},{\"id\":\"1619\",\"type\":\"WheelZoomTool\"},{\"id\":\"1620\",\"type\":\"BoxZoomTool\"},{\"id\":\"1621\",\"type\":\"SaveTool\"},{\"id\":\"1622\",\"type\":\"ResetTool\"},{\"id\":\"1623\",\"type\":\"HelpTool\"}]},\"id\":\"1624\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1622\",\"type\":\"ResetTool\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1641\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"axis_label\":\"Frequency\",\"formatter\":{\"id\":\"1638\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1614\",\"type\":\"BasicTicker\"}},\"id\":\"1613\",\"type\":\"LinearAxis\"},{\"attributes\":{\"axis_label\":\"PGS002725\",\"formatter\":{\"id\":\"1640\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1609\",\"type\":\"BasicTicker\"}},\"id\":\"1608\",\"type\":\"LinearAxis\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1633\",\"type\":\"Quad\"},{\"attributes\":{\"overlay\":{\"id\":\"1641\",\"type\":\"BoxAnnotation\"}},\"id\":\"1620\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"1689\",\"type\":\"Selection\"},{\"attributes\":{\"ticker\":{\"id\":\"1609\",\"type\":\"BasicTicker\"}},\"id\":\"1612\",\"type\":\"Grid\"},{\"attributes\":{\"bottom\":{\"value\":0},\"fill_color\":{\"value\":\"#1f77b4\"},\"left\":{\"field\":\"left\"},\"right\":{\"field\":\"right\"},\"top\":{\"field\":\"top\"}},\"id\":\"1632\",\"type\":\"Quad\"},{\"attributes\":{},\"id\":\"1604\",\"type\":\"LinearScale\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1614\",\"type\":\"BasicTicker\"}},\"id\":\"1617\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1606\",\"type\":\"LinearScale\"}],\"root_ids\":[\"1597\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"19d9d954-a6a6-440e-888f-c62666a04620\",\"roots\":{\"1597\":\"388fdbbc-e4ba-42b5-8232-91c93ca5f8bb\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1597" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.histogram(mt_match_PGS002725.prs, title=\"PRS Histogram\", legend=\"PGS002725\", bins=20)\n", "show(p)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 19:34:31.787 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 19:34:49.417 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 19:43:15.359 Hail: INFO: Ordering unsorted dataset with network shuffle\n", "2022-11-23 19:54:13.539 Hail: INFO: Coerced sorted dataset========(89 + 0) / 89]\n", "2022-11-23 19:54:13.800 Hail: INFO: merging 17 files totalling 47.5K...\n", "2022-11-23 19:54:13.828 Hail: INFO: while writing:\n", " chrAll.filtered.PGS002725.PRS.txt\n", " merge time: 26.715ms\n" ] } ], "source": [ "mt_match_PGS002725.cols().export('chrAll.filtered.PGS002725.PRS.txt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 10 PRSのスコア値を比較します\n", "ここまでで 2 つの脳梗塞PRSモデルを用いて、PRSスコア値を計算しました。\n", "\n", "このチュートリアルの最後に、そのPRSスコア値を比較してみます。\n", "\n", "下記のコードは、PRSモデル PGS002724 を用いて計算した PRS スコア値を読み込みます。" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 20:19:37.892 Hail: INFO: Reading table to impute column types\n", "2022-11-23 20:19:38.596 Hail: INFO: Finished type imputation\n", " Loading field 's' as type str (imputed)\n", " Loading field 'prs' as type float64 (imputed)\n" ] } ], "source": [ "prs_PGS002724 = hl.import_table('chrAll.filtered.PGS002724.PRS.txt', impute=True, force=True)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 20:20:10.640 Hail: INFO: Reading table to impute column types\n", "2022-11-23 20:20:11.192 Hail: INFO: Finished type imputation\n", " Loading field 's' as type str (imputed)\n", " Loading field 'prs' as type float64 (imputed)\n" ] } ], "source": [ "prs_PGS002725 = hl.import_table('chrAll.filtered.PGS002725.PRS.txt', impute=True, force=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRS スコア値を subject ID (変数名 `s`) で検索できるようにします。" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "prs_PGS002724 = prs_PGS002724.key_by('s')" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "prs_PGS002725 = prs_PGS002725.key_by('s')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、2 種類の PRS スコア値を subject ID (変数名 s) で突合し、データマージを行います。" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "prs_merge = prs_PGS002724.rename({'s':'subjectID', 'prs':'PGS002724'})" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "prs_merge = prs_merge.annotate(PGS002725 = prs_PGS002725[prs_merge.subjectID].prs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、データマージした結果を表示します。" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 20:24:34.655 Hail: INFO: Coerced sorted dataset\n", "2022-11-23 20:24:35.058 Hail: INFO: Coerced sorted dataset\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
subjectID
PGS002724
PGS002725
strfloat64float64
"_HG00096"-2.35e+06-1.73e+00
"_HG00097"-2.39e+06-1.48e+00
"_HG00098"-2.27e+06-1.82e+00
"_HG00099"-2.39e+06-2.21e+00
"_HG00100"-2.25e+06-1.47e+00

showing top 5 rows

\n" ], "text/plain": [ "+------------+-----------+-----------+\n", "| subjectID | PGS002724 | PGS002725 |\n", "+------------+-----------+-----------+\n", "| str | float64 | float64 |\n", "+------------+-----------+-----------+\n", "| \"_HG00096\" | -2.35e+06 | -1.73e+00 |\n", "| \"_HG00097\" | -2.39e+06 | -1.48e+00 |\n", "| \"_HG00098\" | -2.27e+06 | -1.82e+00 |\n", "| \"_HG00099\" | -2.39e+06 | -2.21e+00 |\n", "| \"_HG00100\" | -2.25e+06 | -1.47e+00 |\n", "+------------+-----------+-----------+\n", "showing top 5 rows" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "prs_merge.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、`PGS002724` と `PGS002725` のスコア値をプロットします。" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 20:25:35.986 Hail: INFO: Coerced sorted dataset\n", "2022-11-23 20:25:36.310 Hail: INFO: Coerced sorted dataset\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"f274ae56-0691-47c2-b06a-5054ee4d2863\":{\"roots\":{\"references\":[{\"attributes\":{\"below\":[{\"id\":\"1750\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1754\",\"type\":\"Grid\"},{\"id\":\"1759\",\"type\":\"Grid\"}],\"left\":[{\"id\":\"1755\",\"type\":\"LinearAxis\"}],\"plot_height\":800,\"plot_width\":800,\"renderers\":[{\"id\":\"1779\",\"type\":\"GlyphRenderer\"}],\"title\":null,\"toolbar\":{\"id\":\"1766\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1742\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"1746\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1744\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1748\",\"type\":\"LinearScale\"}},\"id\":\"1740\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"1833\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1746\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1756\",\"type\":\"BasicTicker\"},{\"attributes\":{\"ticker\":{\"id\":\"1751\",\"type\":\"BasicTicker\"}},\"id\":\"1754\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null,\"tooltips\":[[\"x\",\"@x\"],[\"y\",\"@y\"]]},\"id\":\"1774\",\"type\":\"HoverTool\"},{\"attributes\":{},\"id\":\"1765\",\"type\":\"HelpTool\"},{\"attributes\":{\"callback\":null},\"id\":\"1742\",\"type\":\"DataRange1d\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1778\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"1835\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"axis_label\":\"PGS002725\",\"formatter\":{\"id\":\"1831\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1756\",\"type\":\"BasicTicker\"}},\"id\":\"1755\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1831\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"overlay\":{\"id\":\"1836\",\"type\":\"BoxAnnotation\"}},\"id\":\"1762\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"1763\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"1748\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1764\",\"type\":\"ResetTool\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1777\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"1834\",\"type\":\"Selection\"},{\"attributes\":{\"axis_label\":\"PGS002724\",\"formatter\":{\"id\":\"1833\",\"type\":\"BasicTickFormatter\"},\"ticker\":{\"id\":\"1751\",\"type\":\"BasicTicker\"}},\"id\":\"1750\",\"type\":\"LinearAxis\"},{\"attributes\":{\"data_source\":{\"id\":\"1775\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1777\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1778\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"1780\",\"type\":\"CDSView\"}},\"id\":\"1779\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"dimension\":1,\"ticker\":{\"id\":\"1756\",\"type\":\"BasicTicker\"}},\"id\":\"1759\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null,\"data\":{\"index\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1038,1039,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,1050,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072,1073,1074,1075,1076,1077,1078,1079,1080,1081,1082,1083,1084,1085,1086,1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100,1101,1102,1103,1104,1105,1106,1107,1108,1109,1110,1111,1112,1113,1114,1115,1116,1117,1118,1119,1120,1121,1122,1123,1124,1125,1126,1127,1128,1129,1130,1131,1132,1133,1134,1135,1136,1137,1138,1139,1140,1141,1142,1143,1144,1145,1146,1147,1148,1149,1150,1151,1152,1153,1154,1155,1156,1157,1158,1159,1160,1161,1162,1163,1164,1165,1166,1167,1168,1169,1170,1171,1172,1173,1174,1175,1176,1177,1178,1179,1180,1181,1182,1183,1184,1185,1186,1187,1188,1189,1190,1191,1192,1193,1194,1195,1196,1197,1198,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1210,1211,1212,1213,1214,1215,1216,1217,1218,1219,1220,1221,1222,1223,1224,1225,1226,1227,1228,1229,1230,1231,1232,1233,1234,1235,1236,1237,1238,1239,1240,1241,1242,1243,1244,1245,1246,1247,1248,1249,1250,1251,1252,1253,1254,1255,1256,1257,1258,1259,1260,1261,1262,1263,1264,1265,1266,1267,1268,1269,1270,1271,1272,1273,1274,1275,1276,1277,1278,1279,1280,1281,1282,1283,1284,1285,1286,1287,1288,1289,1290,1291,1292,1293,1294,1295,1296,1297,1298,1299,1300,1301,1302,1303,1304,1305,1306,1307,1308,1309,1310,1311,1312,1313,1314,1315,1316,1317,1318,1319,1320,1321,1322,1323,1324,1325,1326,1327,1328,1329,1330,1331,1332,1333,1334,1335,1336,1337,1338,1339,1340,1341,1342,1343,1344,1345,1346,1347,1348,1349,1350,1351,1352,1353,1354,1355,1356,1357,1358,1359,1360,1361,1362,1363,1364,1365,1366,1367,1368,1369,1370,1371,1372,1373,1374,1375,1376,1377,1378,1379,1380,1381,1382,1383,1384,1385,1386,1387,1388,1389,1390,1391,1392,1393,1394,1395,1396,1397,1398,1399,1400,1401,1402,1403,1404,1405,1406,1407,1408,1409,1410,1411,1412,1413,1414,1415,1416,1417,1418,1419,1420,1421,1422,1423,1424,1425,1426,1427,1428,1429,1430,1431,1432,1433,1434,1435,1436,1437,1438,1439,1440,1441,1442,1443,1444,1445,1446,1447,1448,1449,1450,1451,1452,1453,1454,1455,1456,1457,1458,1459,1460,1461,1462,1463,1464,1465,1466,1467,1468,1469,1470,1471,1472,1473,1474,1475,1476,1477,1478,1479,1480,1481,1482,1483,1484,1485,1486,1487,1488,1489,1490,1491,1492,1493,1494,1495,1496,1497,1498,1499,1500,1501,1502,1503,1504,1505,1506,1507,1508,1509,1510,1511,1512,1513,1514,1515,1516,1517,1518,1519,1520,1521,1522,1523,1524,1525,1526,1527,1528,1529,1530,1531,1532,1533,1534,1535,1536,1537,1538,1539,1540,1541,1542,1543,1544,1545,1546,1547,1548,1549,1550,1551,1552,1553,1554,1555,1556,1557,1558,1559,1560,1561,1562,1563,1564,1565,1566,1567,1568,1569,1570,1571,1572,1573,1574,1575,1576,1577,1578,1579,1580,1581,1582,1583,1584,1585,1586,1587,1588,1589,1590,1591,1592,1593,1594,1595,1596,1597,1598,1599,1600,1601,1602,1603,1604,1605,1606,1607,1608,1609,1610,1611,1612,1613,1614,1615,1616,1617,1618,1619,1620,1621,1622,1623,1624,1625,1626,1627,1628,1629,1630,1631,1632,1633,1634,1635,1636,1637,1638,1639,1640,1641,1642,1643,1644,1645,1646,1647,1648,1649,1650,1651,1652,1653,1654,1655,1656,1657,1658,1659,1660,1661,1662,1663,1664,1665,1666,1667,1668,1669,1670,1671,1672,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682,1683,1684,1685,1686,1687,1688,1689,1690,1691,1692,1693,1694,1695,1696,1697,1698,1699,1700,1701,1702,1703,1704,1705,1706,1707,1708,1709,1710,1711,1712,1713,1714,1715,1716,1717,1718,1719,1720,1721,1722,1723,1724,1725,1726,1727,1728,1729,1730,1731,1732,1733,1734,1735,1736,1737,1738,1739,1740,1741,1742,1743,1744,1745,1746,1747,1748,1749,1750,1751,1752,1753,1754,1755,1756,1757,1758,1759,1760,1761,1762,1763,1764,1765,1766,1767,1768,1769,1770,1771,1772,1773,1774,1775,1776,1777,1778,1779,1780,1781,1782,1783,1784,1785,1786,1787,1788,1789,1790,1791,1792,1793,1794,1795,1796,1797,1798,1799,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,1839,1840,1841,1842,1843,1844,1845,1846,1847,1848,1849,1850,1851,1852,1853,1854,1855,1856,1857,1858,1859,1860,1861,1862,1863,1864,1865,1866,1867,1868,1869,1870,1871,1872,1873,1874,1875,1876,1877,1878,1879,1880,1881,1882,1883,1884,1885,1886,1887,1888,1889,1890,1891,1892,1893,1894,1895,1896,1897,1898,1899,1900,1901,1902,1903,1904,1905,1906,1907,1908,1909,1910,1911,1912,1913,1914,1915,1916,1917,1918,1919,1920,1921,1922,1923,1924,1925,1926,1927,1928,1929,1930,1931,1932,1933,1934,1935,1936,1937,1938,1939,1940,1941,1942,1943,1944,1945,1946,1947,1948,1949,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028,2029,2030,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050,2051,2052,2053,2054,2055,2056,2057,2058,2059,2060,2061,2062,2063,2064,2065,2066,2067,2068,2069,2070,2071,2072,2073,2074,2075,2076,2077,2078,2079,2080,2081,2082,2083,2084,2085,2086,2087,2088,2089,2090,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100,2101,2102,2103,2104,2105,2106,2107,2108,2109,2110,2111,2112,2113,2114,2115,2116,2117,2118,2119,2120,2121,2122,2123,2124,2125,2126,2127,2128,2129,2130,2131,2132,2133,2134,2135,2136,2137,2138,2139,2140,2141,2142,2143,2144,2145,2146,2147,2148,2149,2150,2151,2152,2153,2154,2155,2156,2157,2158,2159,2160,2161,2162,2163,2164,2165,2166,2167,2168,2169,2170,2171,2172,2173,2174,2175,2176,2177,2178,2179,2180,2181,2182,2183,2184,2185,2186,2187,2188,2189,2190,2191,2192,2193,2194,2195,2196,2197,2198,2199,2200,2201,2202,2203,2204,2205,2206,2207,2208,2209,2210,2211,2212,2213,2214,2215,2216,2217,2218,2219,2220,2221,2222,2223,2224,2225,2226,2227,2228,2229,2230,2231,2232,2233,2234,2235,2236,2237,2238,2239,2240,2241,2242,2243,2244,2245,2246,2247,2248,2249,2250,2251,2252,2253,2254,2255,2256,2257,2258,2259,2260,2261,2262,2263,2264,2265,2266,2267,2268,2269,2270,2271,2272,2273,2274,2275,2276,2277,2278,2279],\"x\":{\"__ndarray__\":\"\",\"dtype\":\"float64\",\"shape\":[2280]},\"y\":{\"__ndarray__\":\"\",\"dtype\":\"float64\",\"shape\":[2280]}},\"selected\":{\"id\":\"1834\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1835\",\"type\":\"UnionRenderers\"}},\"id\":\"1775\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1751\",\"type\":\"BasicTicker\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1836\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"callback\":null},\"id\":\"1744\",\"type\":\"DataRange1d\"},{\"attributes\":{\"source\":{\"id\":\"1775\",\"type\":\"ColumnDataSource\"}},\"id\":\"1780\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1761\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1760\",\"type\":\"PanTool\"},{\"id\":\"1761\",\"type\":\"WheelZoomTool\"},{\"id\":\"1762\",\"type\":\"BoxZoomTool\"},{\"id\":\"1763\",\"type\":\"SaveTool\"},{\"id\":\"1764\",\"type\":\"ResetTool\"},{\"id\":\"1765\",\"type\":\"HelpTool\"},{\"id\":\"1774\",\"type\":\"HoverTool\"}]},\"id\":\"1766\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1760\",\"type\":\"PanTool\"}],\"root_ids\":[\"1740\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n var render_items = [{\"docid\":\"f274ae56-0691-47c2-b06a-5054ee4d2863\",\"roots\":{\"1740\":\"90224fd8-77e2-4056-b47f-2bb9badf2327\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n clearInterval(timer);\n embed_document(root);\n } else {\n attempts++;\n if (attempts > 100) {\n clearInterval(timer);\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n }\n }\n }, 10, root)\n }\n})(window);", "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1740" } }, "output_type": "display_data" } ], "source": [ "p = hl.plot.scatter(prs_merge.PGS002724, prs_merge.PGS002725, xlabel=\"PGS002724\", ylabel=\"PGS002725\")\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下記のコードは、PRS スコア値の相関係数を計算します。\n", "\n", "`to_pandas()`関数を用いることで、`hail`特有のデータタイプから、`python` でよく使われる `pandas` ライブラリのデータフレームへと変換することができます。\n", "そうすることで、`hail`の関数だけでなく、`python` の様々な機能を使って分析することが可能になります。" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-23 20:26:30.449 Hail: INFO: Coerced sorted dataset\n", "2022-11-23 20:26:30.729 Hail: INFO: Coerced sorted dataset\n" ] } ], "source": [ "prs_merge_pandas = prs_merge.to_pandas()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " PGS002724 PGS002725\n", "PGS002724 1.000000 0.168966\n", "PGS002725 0.168966 1.000000\n" ] } ], "source": [ "print(prs_merge_pandas.corr())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "この結果から、次のことが分かります。\n", "\n", "- PGS002724のスコア値と PGS002725のスコア値の相関係数は 0.168966\n", "\n", "### 以上でこのチュートリアルは終了です" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.6 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "97abf26e187cb9ea731f0328cc399625c2b8e0cd6c7d611d150716b7a93d0fe8" } } }, "nbformat": 4, "nbformat_minor": 4 }