{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Publication of packages\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Github\n", "\n", "When making a repo for your package, the repo name *is* the name of `pkg_name` from before. \n", "```python\n", "/pkg_name # repo-name is at this level\n", " /pkg_name \n", " __init__.py \n", " module1.py\n", " module2.py\n", " module3.py\n", " ...\n", " setup.py\n", " README.md\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation in *Development Mode*\n", "\n", "Once your basic package architecture is built, you can install it locally using pip. Before installing, make sure you are in the directory immediately above your package. If my package is in `~/bebi103a/pkg_name/`, I would `cd ~/bei103a/`, then do the following on the command line:\n", "\n", "```bash\n", "pip install -e pkg_name\n", "```\n", "\n", "The `-e` flag is important, which tells pip that this is a local, editable package. Your package is now accessible on your machine whenever you run the Python interpreter! Note that the `-e` flag is present when installing your own local package that is *not* yet in PyPI. \n", "\n", "For most of your own packages, this setup is ideal. As your making changes, you can test them in Jupyter notebooks by importing the package in a fresh cell." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Live Editing \n", "\n", "As a small example of how you'll be editing your packages, I'll pull a toy package I made last night. We'll edit the version numbers locally and see how you the jupyter interface can streamline the process. You can follow along by cloning the repo [octopus](https://github.com/atisor73/octopus):\n", "\n", "```bash\n", "git clone https://github.com/atisor73/octopus\n", "```\n", "\n", "We'll first cd to the directory immediately above, and run\n", "\n", "```bash\n", "pip install -e octopus\n", "```\n", "\n", "Alternatively, you could stay in the directory and run\n", "\n", "```bash\n", "pip install -e .\n", "```\n", "\n", "Now you have a package! Let's import it into this notebook with `autoreload`, that we don't have to keep restarting our kernel every time we make a change." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import octopus as op" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.0.1'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "op.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cool.\n", "\n", "Let's open up the `__init__.py` file in `octopus/octopus/` and change the string in the `__version__` variable." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.0.2'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "op.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":O Would ya look at that. No need to reinstall, reimport, or anything. The autoreload function and the --editable pip install allows dynamic editing. This is very helpful, and how I would recommend making changes. Once things look ok, you can commit and push." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the case all went to hell and we want to get back to the original state, we can just run:\n", "\n", "```bash\n", "git checkout main\n", "```\n", "\n", "Now let's check the version number again." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.0.1'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "op.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Stable Releases\n", "\n", "As you improve your package, you may find that you want to create checkpoints that are stable and functional. While \"version control\" is technically handled by git every time you make a commit, sometimes it can be hard to traverse these commits to find the most representative version of your package that you want.\n", "\n", "You can instead create **releases** in GitHub that are stable versions of your code. If you know your code worked with a given release, you might check out that old release and use that for your analysis.\n", "\n", "Releases are easy, and you know how to do them since this is how you turn in your homework. I created a release of my initial package, with the tag `v0.0.1`. You should follow a consistent form of tagging and naming your releases to make them easy to follow. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collaboration\n", "\n", "Sharing your code is quite simple: just have someone clone your repo and install it with:\n", "\n", "```bash\n", "pip install -e .\n", "```\n", "\n", "Alternatively, if you don’t expect them to be making changes to the package, a more static build can be accomplished with:\n", "\n", "```bash\n", "python setup.py install\n", "```\n", "\n", "from the root directory.\n", "\n", "But, let's assume we want people to collaborate on our package. There are many useful types of collaboration. Since these are all primarily accomplished through GitHub, more details on each of these can be found in [Recitation 2](https://bebi103a.github.io/recitations/02/git_and_github_tips.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Raising Issues\n", "\n", "Perhaps the most productive and simple-for-all-parties approach to collaborating on a package is to have people use it and raise issues on GitHub. In this case, you are the primary contributor to you package, but others can list bugs, enhancement ideas, or even code for enhancements without having to navigate your code themselves.\n", "\n", "Issues have been amazing for me, as I always have a record of what needs to be done and can complete them when I think up a fix and have time to implement it. It’s a much more robust method than someone emailing me and mentioning in passing that there’s an issue. I will, 100% of the time, forget. Seriously.\n", "\n", "### Forking\n", "Forking allows one to make a copy of the repository in their own GitHub, clone it to their machine, and then submit pull requests to the original repo when they have an enhancement. The pull request would be looked over by you (the owner of the repo) and then merged. The contributing user does not need to be an explicit collaborator on the repository.\n", "\n", "### Adding collaborators directly\n", "If you trust your labmates, or anyone using the package, you can add them as collaborators on the project. This is done in the Settings > Collaborators tab on Github. Collaborators then have many of the same editing privileges as you, making it easier for them to make and push changes.\n", "\n", "In addition to making and pushing changes directly to the repo (on the “master” branch), one might make new branches. Branching fills a similar function to forking, but without making a full copy of the repo for the contributing user. Branches are very flexible: multiple branches can be made for different projects and folded back into the project when finished." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test-Driven Development\n", "\n", "We can make some testing modules in a new directory called tests/ I’ve added the module `test_octopus_functions.py` with the following import:\n", "\n", "```python\n", "import numpy as np\n", "import octopus.ink\n", "\n", "def test_ocotopi():\n", " assert float(octopus.ink.octopi()) is 3.1415926\n", "```" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 -- /Users/rosita/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "rootdir: /Users/rosita/octopus\n", "plugins: dash-1.12.0\n", "collected 1 item \u001b[0m\n", "\n", "../../../octopus/tests/test_octopus_functions.py::test_octopi \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m1 passed\u001b[0m\u001b[32m in 0.29s\u001b[0m\u001b[32m ===============================\u001b[0m\n" ] } ], "source": [ "!pytest -v ~/octopus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! Obviously with more involved packages, you'll want to separate test modules correspondingly, and be more principled in the naming. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Publishing to PyPI\n", "\n", "You might want to add your package to PyPI so anyone can `pip install` it, without having to clone the repo. If a package is new and not already installed on a user's local machine, *pip* will look through PyPI and retrieve files from there. Thus the goal is to upload our package to PyPI! \n", "The way I learned to do this was actually through a [blog post](https://towardsdatascience.com/build-your-first-open-source-python-project-53471c9942a7) written by Jeff Hale. I've reproduced some of the relevant steps below.\n", "\n", "1. After installing **Twine** in an activated virtual environment, I installed my dependencies with:\n", "\n", "```bash\n", "pip install -r requirements.txt\n", "```\n", "\n", "\n", "2. Then to create your package files, run the following: \n", "\n", "```bash\n", "python setup.py sdist bdist_wheel\n", "```\n", "\n", "This will create many hidden folders like *.dist* and *.build*. Inside *.dist*, there is a *.whl* file that is your *wheel* file. The *.tar.gz* file is the source archive (this is kinda like the files you download from a tag release on Git). Generally, pip will install packages as *wheels* whenever it can. The process is faster, but if pip struggles with installing the wheel, it will fall back on the source archive.\n", "\n", "3. After you have these valuable files, you should create a **TestPyPI** account [here](https://test.pypi.org/account/register/), as well as a PyPI account. These are two separate accounts! \n", "\n", "4. Use Twine to securely publish your package to TestPyPI with the following command (no modifications are necessary).\n", "\n", "```bash\n", "twine upload --repository-url https://test.pypi.org/legacy/ dist/*\n", "```\n", "\n", "\n", "Enter your username and password.\n", "If there are any errors (sometimes you'll have typos in your text files and whatnot), make a new version number in `setup.py` and delete the old build artifacts `build`, `dist`, and `egg` folders. You can rebuild with steps 2. and 3. and re-upload with twine. Version numbers on TestPyPI are meaningless, you're the only one who will see these. Just don't forget to change it back to the original version once you do get everything to work. \n", "\n", "5. After uploading to TestPyPI, I would deactivate the current activated environment and start fresh in a new one. To see if you can successfully import your package, install it with: \n", "\n", "```bash\n", "pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple package_name\n", "```\n", "\n", "- The `--index-url flag` tells pip to download from TestPyPI instead of PyPI\n", "\n", "\n", "If everything works as it should...\n", "\n", "6. Push to PyPI! Here's the code: (Make sure to chnage your version number back to the one you want.)\n", "\n", "```bash\n", "twine upload dist/*\n", "```\n", "\n", "\n", "7. You can now push to GitHub. Exclude (delete) any virtual environments. The `.gitignore` file will keep build artifacts from being indexed. Hurrah!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final thoughts\n", "\n", "There's only so much you can learn about packages from reading about them. I highly encourage you to find utilities you're excited about, something you really truly see yourself using, and go through the process of turning it into a package. From then on, those functions can be utilized in your notebooks with a single import and will always be under git's robust version control. \n", "\n", "Of course, there is always more to discuss. Two things worth briefly mentioning are testing with [travis](https://travis-ci.org/) and the [Cookiecutter](https://cookiecutter.readthedocs.io/en/latest/) package, which conveniently provides templates for setting up packages, though it includes many more features beyond what you might include for a small package for specific lab purposes.\n", "\n", "Finally, as you might guess, `octopus` is not a real package that I will be using or maintaining. So, feel free to experiment on it (try forking, pull requests, etc.) and see what happens! The good news: you can't permanently ruin it, because (as you well know) it's under version control. How *convenient*." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }