# Tutorial 5: Data and package updates

The datasets that `cptac` distributes are still being actively worked on by the teams that generated them. Additionally, we periodically make improvements to the `cptac` package itself. Thus, we regularly release new versions of the data and the package. This tutorial will go over how to access both those data and package updates.

**Note: In this tutorial, we intentionally get `cptac` to generate the various errors and warnings it gives when your data or package is out of date. We do this on purpose, so you can see what it looks like; the tutorial is not broken.**

## Updating the package

Each time you import `cptac` into a Python environment, it automatically checks whether you have the most recent release of the package. If you don't, it will print a warning like this:

In [1]:
import cptac



As the warning directs, simply run `pip install --upgrade cptac` to get the latest version of the package. This will ensure that you have all the latest functionality of the package, and that you're able to access the latest versions of all the datasets.

## Watching the repository for new releases
Each time there's a new version of the package, we release the new version on PyPI, and also post a release page on GitHub. You can use GitHub's "Watch" feature to get an email sent to you every time we do this. Simply log in to GitHub, browse to the [main page for our repository](https://github.com/PayneLab/cptac), click on the "Watch" button in the upper right corner of the page, and select the "Releases only" option from the drop-down box, as shown below. You will then get an email every time we release another version of the package.

![How to watch releases](img/github_watch.png)

## Accessing data updates

Periodically, there will be data updates released for different datasets. `cptac` automatically checks for this whenever you load a dataset, and if you don't manually specify a version when loading a dataset, it will raise an exception if your latest installed version of the data doesn't match the latest data version that's released. The error message will give you instructions for downloading the new data version.

**Note: The error information below is rather long. This is because Jupyter Notebooks automatically prints the entire stack trace that accompanies an error. The informative error message is at the bottom. If you were using `cptac` in the command line or in a script, only the informative error message at the bottom would be printed.**

In [2]:
gb = cptac.Gbm()

                                    

AmbiguousLatestError: You requested to load the gbm dataset. Latest version is 2.0, which is not installed locally. To download it, run "cptac.download(dataset='gbm')". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call "cptac.Gbm(version='1.0')".

To download the new data version, run the `cptac.download` function as the error message directs. `cptac` will notify you that it is downloading new data.

In [2]:
cptac.download(dataset="gbm", version="latest")

                                        

True

You can then load the dataset, and `cptac` will automatically load the latest data version.

In [3]:
gb = cptac.Gbm()
gb.version()

                                        



'3.0'

## Accessing old data versions after updates

After you have updated a dataset, you can still access old versions of the data. This is helpful, for example, if you want to compare your analyses between data versions. To load an older version of the data, simply pass the desired version number to the `version` parameter when loading the dataset:

In [4]:
gb = cptac.Gbm(version="1.0")
gb.version()

Loading gbm v1.0...                     



                        



'1.0'