![PyData_logo](./static/pydata-logo-madrid-2016.png)

# Embrace conda packages

## The build system we always needed, but never deserved

###### Juan Luis Cano Rodríguez
###### Madrid, 2016-04-08

## Outline

* Introduction
* Motivation: What brought us here?
* Our first conda package
* Some more tricks
* Working with other languages
* conda-forge: a community repository
* Limitations and future work
* Conclusions

## Who is this guy?



* _Almost_ **Aerospace Engineer**
* Quant Developer for BBVA at Indizen (yeah, lots of Python there!)
* Writer and furious tweeter at **Pybonacci**
* Chair ~~and BDFL~~ of **Python España**
* Co-creator and charismatic leader of **AeroPython** (\*not the Lorena Barba course)
* _When time permits (rare) [writes some open source Python code](https://github.com/Juanlu001/)_

You know, I've been giving talks on Python and its scientific ecosystem for about three years now... And I always write this bit there, that "Almost" word in italics before my background. You may reasonably wonder now what the heck I've been doing all these years to always introduce myself as an "almost" Aerospace Engineer, right? Well, I promise that I'm taking the required steps to graduate not later than this Autumn, but anyway this talk reflects one of the severe pains I've been going through while carrying my final project.

## Motivation: What brought us here?

Let's begin with some questions:

* Who writes Python code here, either for a living or for fun?
* Who can write a `setup.py`... without copying a working one from the Internet?
* How many Linux users... can configure a Visual Studio project properly?
* How many of you are using Anaconda... because it was the only way to survive?

### _...or: "The sad state of scientific software"_

* [The scientific Python community was told to "fix the packaging problem themselves" in 2014](https://speakerdeck.com/teoliphant/building-the-pydata-community), Christoph Gohlke packages were the only practical way to use Python on Windows for years before Python(x,y), Canopy and Anaconda were born

* One of the FAQ items of the Sage project: [_"Wouldn’t it be way better if Sage did not ship as a gigantic bundle?"_](http://doc.sagemath.org/html/en/faq/faq-general.html), [they started a SaaS to end the pain](http://sagemath.blogspot.com.es/2014/10/a-non-technical-overview-of.html)

* PETSc (solution of PDEs): They are forced to maintain their own forks because [upstream projects won't fix bugs, even with patches and reproducible tests](http://scisoftdays.org/pdf/2016_slides/brown.pdf)

* DOLFIN (part of the FEniCS project): Extremely difficult to make it work outside Ubuntu, pure Python alternatives are being developed, [my fenics-recipes project has at least 7 meaningful forks already](http://firedrakeproject.org/)

## Some inconvenient truths: 

# Portability is hard (unless you stick to pure Python)

# Properly distributing software libraries is very hard

### Result:



## What horror have we created

> If you’re missing a library or program, and that library or program happens to be written in C, **you either need root to install it from your package manager, or you will descend into a lovecraftian nightmare of attempted local builds from which there is no escape**. You say you need lxml on shared hosting and they don’t have libxml2 installed? Well, fuck you.
>
> — Eevee, ["The sad state of web app deployment"](https://eev.ee/blog/2015/09/17/the-sad-state-of-web-app-deployment/)

## Are virtual machines and containers the solution?

> _"It's easy to build a VM if you automate the install process, and providing that install script for even one OS can demystify the install process for others; conversely, **just because you provide a VM doesn't mean that anyone other than you can install your software**"_
>
> — C. Titus Brown, ["Virtual machines considered harmful for reproducibility"](http://ivory.idyll.org/blog/vms-considered-harmful.html)

## Our first conda package

Let's install `conda-build`!

In [1]:
!conda install -y conda-build -q -n root

Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ........
Solving package specifications: .........

# All requested packages already installed.
# packages in environment at /home/juanlu/.miniconda3:
#
conda-build 1.20.0 py34_0 


conda packages are created from conda recipes. We can create a bare recipe using `conda skeleton` to build it from a PyPI package.

In [3]:
!conda skeleton pypi pytest-benchmark > /dev/null

Using Anaconda Cloud api site https://api.anaconda.org


In [4]:
!ls pytest-benchmark

bld.bat build.sh meta.yaml


These are the minimum files for the recipe:

* `meta.yaml` contains all the metadata
* `build.sh` and `bld.bat` are the build scripts for Linux/OS X and Windows respectively

### The `meta.yaml` file

It contains the metadata in YAML format.

* `package`, `source` and `build` specify the name, version and source of the package
* `requirements` specify the build (install time) and run (runtime) requirements
* `test` specify imports, commands and scripts to test
* `about` adds some additional data for the package

In [26]:
!grep -v "#" pytest-benchmark/meta.yaml | head -n24

package:
 name: pytest-benchmark
 version: "3.0.0"

source:
 fn: pytest-benchmark-3.0.0.zip
 url: https://pypi.python.org/packages/source/p/pytest-benchmark/pytest-benchmark-3.0.0.zip
 md5: f8ab8e438f039366e3765168ad831b4c

build:
 preserve_egg_dir: True



requirements:
 build:
 - python
 - setuptools
 - pytest >=2.6

 run:
 - python
 - setuptools
 - pytest >=2.6


## The `build.sh` and `bld.bat` files

They specify how to build the package.

In [28]:
!cat pytest-benchmark/build.sh

#!/bin/bash

$PYTHON setup.py install

# Add more build steps here, if they are necessary.

# See
# http://docs.continuum.io/conda/build.html
# for a list of environment variables that are set during the build process.


In [30]:
!grep -v "::" pytest-benchmark/bld.bat

"%PYTHON%" setup.py install
if errorlevel 1 exit 1




### The build process

Adapted from http://conda.pydata.org/docs/building/recipe.html#conda-recipe-files-overview

1. Downloads the source
2. Applies patches (if any)
3. Install build dependencies
4. Runs the build script
5. Packages new files
6. Run tests against newly created package

Seems legit!

In [32]:
!conda build pytest-benchmark --python 3.5 > /dev/null # It works!

Using Anaconda Cloud api site https://api.anaconda.org
+ /home/juanlu/.miniconda3/envs/_build/bin/python setup.py install


In [33]:
!ls ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2

/home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2




(From http://conda.pydata.org/docs/building/pkg-name-conv.html)

In [35]:
!conda install pytest-benchmark --use-local --yes

Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: .........

Package plan for installation in environment /home/juanlu/.miniconda3/envs/py35:

The following NEW packages will be INSTALLED:

 pytest-benchmark: 3.0.0-py35_0

Linking packages ...
[ COMPLETE ]|###################################################| 100%


### Build, test, upload, repeat

* Custom packages can be uploaded to Anaconda Cloud https://anaconda.org/
* This process can be automated through Anaconda Build http://docs.anaconda.org/build.html
* Later on we can use our custom **channels** to install non-official packages



Let's upload the package first using `anaconda-client`:

In [38]:
!conda install anaconda-client --quiet --yes

Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ........
Solving package specifications: .........

# All requested packages already installed.
# packages in environment at /home/juanlu/.miniconda3/envs/py35:
#
anaconda-client 1.4.0 py35_0 


In [39]:
!anaconda upload ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2

Using Anaconda Cloud api site https://api.anaconda.org
detecting package type ...
conda
extracting package attributes for upload ...
done

Uploading file Juanlu001/pytest-benchmark/3.0.0/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 ... 
 uploaded 54 of 54Kb: 100.00% ETA: 0.0 minutes


Upload(s) Complete

Package located at:
https://anaconda.org/juanlu001/pytest-benchmark



And now, let's install it!

In [47]:
!conda remove pytest-benchmark --yes > /dev/null

Using Anaconda Cloud api site https://api.anaconda.org


In [48]:
!conda install pytest-benchmark --channel juanlu001 --yes

Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: .........

Package plan for installation in environment /home/juanlu/.miniconda3/envs/py35:

The following NEW packages will be INSTALLED:

 pytest-benchmark: 3.0.0-py35_0

Linking packages ...
[ COMPLETE ]|###################################################| 100%


## Some more tricks

### Running the tests

You can run your tests with Python, Perl or shell scripts (`run_test.[py,pl,sh,bat]`)

### Convert pure Python packages to other platforms

Using `conda convert` for pure Python packages, we can quickly provide packages for other platforms

In [3]:
!conda convert ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 --platform all | grep Converting

Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to osx-64
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to linux-32
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to linux-64
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to win-32
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to win-64


### Platform-specific metadata

### Templating for `meta.yaml`

Metadata files support templating using Jinja2!

## Working with other languages

### _or: conda as a cross-platform package manager_

* conda can be used to build software written in any language
* Just don't include `python` as a build or run dependency!
* It's already being used to distribute pure C and C++ libraries, R packages...

### Important caveat:

## The burden is on _you_

### _There be dragons_

* _conda-build does not solve cross-compiling_ so you will need to build compiled packages on each platform
* Regarding Linux, there are [a lot of sources of binary incompatibility](https://www.python.org/dev/peps/pep-0513/#key-causes-of-inter-linux-binary-incompatibility)
 - Building on a clean operative system is key
 - Using an old version of Linux (CentOS 5?) also helps, because many core system libraries have strict backwards compatibility policies
 - **Packages that assume everything is on root locations will fail to compile**
 - Sometimes careful editing of compiler flags and event patching is necessary

If the recipe builds on a fresh, headless, old Linux it will work everywhere

## conda-forge: a community repository



> [**conda-forge**](https://github.com/conda-forge) is a github organization containing repositories of conda recipes. Thanks to some awesome continuous integration providers (AppVeyor, CircleCI and TravisCI), each repository, also known as a feedstock, automatically builds its own recipe in a clean and repeatable way on Windows, Linux and OSX.

Features:

* Automatic linting of recipes
* Continuous integration of recipes in Linux, OS X and Windows
* Automatic upload of packages

What I love:

* Having a blessed community channel (like Arch Linux AUR)
* Ensuring recipes run everywhere
* High quality standards!

## Limitations and future work

conda (2012?) and conda-build (2013) are very young projects and still have some pain points that ought to be addressed

* Support for [gcc](https://github.com/conda/conda-recipes/pull/279) and [libgfortran](https://github.com/ContinuumIO/anaconda-issues/issues/686) is not yet polished in Anaconda and there are still some portability issues

* [No way to include custom channels on a `meta.yaml`](https://github.com/conda/conda-build/issues/532), the only option is to keep a copy of all dependencies

* [Pinning NumPy versions on `meta.yaml` can be a mess](https://github.com/conda/conda-build/pull/650)

The state of Python packaging is improving upstream too!

* pip builds and caches wheels locally - the problem of compiling NumPy over and over again was addressed a while ago
* Windows and OS X wheels are easy to build and widely available for many scientific packages
* [PEP 0513](https://www.python.org/dev/peps/pep-0513/) provides a way to **finally** upload Linux wheels to PyPI which are compatible with _many_ Linux distributions
* [PEP 0516](https://www.python.org/dev/peps/pep-0516/) proposes "a simple and standard sdist format **that isn't intertwined with distutils**"!!!1!

Still, there are some remaining irks:

* [pip does not have a dependency solver](https://github.com/pypa/pip/issues/988)
* conda-build has a more streamlined process to build and test packages in an isolated way

## Conclusion





* This talk: https://github.com/AeroPython/embrace-conda-packages
* My GitHub: https://github.com/Juanlu001/
* Me on Twitter: @astrojuanlu, @Pybonacci, @PyConES, @AeroPython

### Approach me during the conference, interrupt me while I'm on a conversation, ask me questions, let's talk about your ideas and projects! 😊

# Thanks for yor attention!