{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 5: Data and package updates\n", "\n", "The datasets that `cptac` distributes are still being actively worked on by the teams that generated them. Additionally, we periodically make improvements to the `cptac` package itself. Thus, we regularly release new versions of the data and the package. This tutorial will go over how to access both those data and package updates.\n", "\n", "**Note: In this tutorial, we intentionally get `cptac` to generate the various errors and warnings it gives when your data or package is out of date. We do this on purpose, so you can see what it looks like; the tutorial is not broken.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Updating the package\n", "\n", "Each time you import `cptac` into a Python environment, it automatically checks whether you have the most recent release of the package. If you don't, it will print a warning like this:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: Your version of cptac (0.6.2) is out-of-date. Latest is 0.6.3. Please run 'pip install --upgrade cptac' to update it. (/home/caleb/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/ipykernel_launcher.py, line 1)\n" ] } ], "source": [ "import cptac" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the warning directs, simply run `pip install --upgrade cptac` to get the latest version of the package. This will ensure that you have all the latest functionality of the package, and that you're able to access the latest versions of all the datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Watching the repository for new releases\n", "Each time there's a new version of the package, we release the new version on PyPI, and also post a release page on GitHub. You can use GitHub's \"Watch\" feature to get an email sent to you every time we do this. Simply log in to GitHub, browse to the [main page for our repository](https://github.com/PayneLab/cptac), click on the \"Watch\" button in the upper right corner of the page, and select the \"Releases only\" option from the drop-down box, as shown below. You will then get an email every time we release another version of the package.\n", "\n", "![How to watch releases](img/github_watch.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accessing data updates\n", "\n", "Periodically, there will be data updates released for different datasets. `cptac` automatically checks for this whenever you load a dataset, and if you don't manually specify a version when loading a dataset, it will raise an exception if your latest installed version of the data doesn't match the latest data version that's released. The error message will give you instructions for downloading the new data version.\n", "\n", "**Note: The error information below is rather long. This is because Jupyter Notebooks automatically prints the entire stack trace that accompanies an error. The informative error message is at the bottom. If you were using `cptac` in the command line or in a script, only the informative error message at the bottom would be printed.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \r" ] }, { "ename": "AmbiguousLatestError", "evalue": "You requested to load the gbm dataset. Latest version is 2.0, which is not installed locally. To download it, run \"cptac.download(dataset='gbm')\". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call \"cptac.Gbm(version='1.0')\".", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAmbiguousLatestError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mgb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcptac\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mGbm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/gbm.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, version)\u001b[0m\n\u001b[1;32m 57\u001b[0m }\n\u001b[1;32m 58\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 59\u001b[0;31m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcancer_type\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"gbm\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mversion\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mversion\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalid_versions\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalid_versions\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata_files\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdata_files\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 60\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 61\u001b[0m \u001b[0;31m# Load the data into dataframes in the self._data dict\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/dataset.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, cancer_type, version, valid_versions, data_files)\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 44\u001b[0m \u001b[0;31m# Validate the version\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 45\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_version\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvalidate_version\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mversion\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cancer_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muse_context\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"init\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalid_versions\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalid_versions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 46\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[0;31m# Get the paths to the data files\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/file_tools.py\u001b[0m in \u001b[0;36mvalidate_version\u001b[0;34m(version, dataset, use_context, valid_versions)\u001b[0m\n\u001b[1;32m 67\u001b[0m \u001b[0mreturn_version\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex_latest\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 68\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0muse_context\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"init\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 69\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mAmbiguousLatestError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"You requested to load the {dataset} dataset. Latest version is {index_latest}, which is not installed locally. To download it, run \\\"cptac.download(dataset='{dataset}')\\\". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call \\\"cptac.{dataset.title()}(version='{latest_installed}')\\\".\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 70\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 71\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mInvalidParameterError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"{version} is an invalid version for the {dataset} dataset. Valid versions: {', '.join(index.keys())}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAmbiguousLatestError\u001b[0m: You requested to load the gbm dataset. Latest version is 2.0, which is not installed locally. To download it, run \"cptac.download(dataset='gbm')\". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call \"cptac.Gbm(version='1.0')\"." ] } ], "source": [ "gb = cptac.Gbm()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To download the new data version, run the `cptac.download` function as the error message directs. `cptac` will notify you that it is downloading new data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cptac.download(dataset=\"gbm\", version=\"latest\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can then load the dataset, and `cptac` will automatically load the latest data version." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: The GBM dataset is under publication embargo until March 01, 2021. CPTAC is a community resource project and data are made available rapidly after generation for community research use. The embargo allows exploring and utilizing the data, but analysis may not be published until after the embargo date. Please see https://proteomics.cancer.gov/data-portal/about/data-use-agreement or enter cptac.embargo() to open the webpage for more details. (C:\\Users\\humbe\\miniconda3\\lib\\site-packages\\ipykernel_launcher.py, line 1)\n" ] }, { "data": { "text/plain": [ "'3.0'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb = cptac.Gbm()\n", "gb.version()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accessing old data versions after updates\n", "\n", "After you have updated a dataset, you can still access old versions of the data. This is helpful, for example, if you want to compare your analyses between data versions. To load an older version of the data, simply pass the desired version number to the `version` parameter when loading the dataset:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading gbm v1.0... \r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: Old gbm data version. Latest is 3.0. This is 1.0. (C:\\Users\\humbe\\miniconda3\\lib\\site-packages\\ipykernel_launcher.py, line 1)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " \r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: The GBM dataset is under publication embargo until March 01, 2021. CPTAC is a community resource project and data are made available rapidly after generation for community research use. The embargo allows exploring and utilizing the data, but analysis may not be published until after the embargo date. Please see https://proteomics.cancer.gov/data-portal/about/data-use-agreement or enter cptac.embargo() to open the webpage for more details. (C:\\Users\\humbe\\miniconda3\\lib\\site-packages\\ipykernel_launcher.py, line 1)\n" ] }, { "data": { "text/plain": [ "'1.0'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb = cptac.Gbm(version=\"1.0\")\n", "gb.version()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }