#  A modern approach to science
*with python*

<!-- logos -->
<table>
<tr>
<td colspan=3> <img src='images/scai-logo.png' width=500> </td>
</tr><tr>
<td> <img src='http://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Python_logo_and_wordmark.svg/2000px-Python_logo_and_wordmark.svg.png' width=400> </td>
<td> <img src='https://raw.githubusercontent.com/jupyter/nature-demo/master/images/jupyter-logo.png' width=400> </td>
<td> <img src='https://pbs.twimg.com/profile_images/562949148932964352/a5XS7EQh.png' width=400> </td>
</tr>
</table>

##Here we go 
...with live code from the beginning!

In [1]:
# Install an extension from my repo
%install_ext https://raw.githubusercontent.com/pdonorio/pdonorio.github.io/master/scripts/whoami.py

Installed whoami.py. To use it, type:
  %load_ext whoami


In [2]:
# Load the installed extension
%load_ext whoami

<small> wait, what is that?? </small>

A command about *my self*

In [3]:
# This is a new command added from the extension 'whoami'
%helloworld


	Who am i

twitter: @paolo.donorio
github: @pdonorio
email: p.donoriodemeo@cineca.it


42

### I will double check
* note to self: click on the cell below
* then press shift+enter

In [4]:
import time
print ("Today is " + time.strftime("%d/%m/%Y"))

Today is 15/06/2015


## Introduction

**Quick note**

*These presentations are based on the awesome work of J.R. Johansson* 

source: http://dml.riken.jp/~rob

*I also took inspiration from one of that project forks* 

source: http://nbviewer.ipython.org/gist/rpmuller/5920182

<small>(opensource rulez)</small>



** Testing the audience **

- Do you already know python?
- Which version of python are you using?
- What is the main reason you decided to use python? 
- Have you ever used `numpy`?
- Do you know what `ipython` is?

## A notebook and a Calculator

Many of the things I used to use a calculator for, I now use Python for:

In [5]:
2+2

4

In [6]:
(50-5*6)/4

5

There are some gotchas compared to using a normal calculator.

In [7]:
7/3

2

* Python integer division, like C or Fortran integer division, truncates the remainder and returns an integer. 
    * At least it does in version 2. 
    * In version 3, Python returns a floating point number. 

In [8]:
# Preview of py3k feature in Python 2 by importing the module from the *future* 
from __future__ import division
7/3

2.3333333333333335

Alternatively, you can convert one of the integers to a floating point number, in which case the division function returns another floating point number.

In [9]:
# One way
7/3.

2.3333333333333335

In [10]:
# Second way
7/float(3)

2.3333333333333335

##What did we see so far?

* integers
* floating point numbers
* import (of a python library)
* libraries are called **modules**

In [11]:
# An example of using a module
from math import sqrt
sqrt(81)

9.0

In [12]:
# Or you can simply import the math library itself
import math
math.sqrt(81)

9.0

You can define variables using the equals (=) sign:

In [13]:
width = 20
length = 30
area = length*width
area

600

If you try to access a variable that you haven't yet defined, you get an error:

In [14]:
volume

NameError: name 'volume' is not defined

and you need to define it:

In [None]:
depth = 10
volume = area*depth
volume

* You can name a variable *almost* anything you want
* It needs to start with an alphabetical character or "\_" 
* It can contain alphanumeric charcters plus underscores ("\_")

Certain words, however, are **reserved** for the *language*:

    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, i
    s, lambda, not, or,
    pass, print, raise, return, try, while, with, yield

In [None]:
# Trying to define a variable using one of these will result in a syntax error:
return = 0

a little step back:

## The role of computing in science

Science has traditionally been divided into 

* **experimental** and 
* **theoretical** disciplines.

During the last several decades ***computing*** has emerged

* Related to both experiments and theory.
* Often viewed as a new third branch of science. 

<img src="images/theory-experiment-computation.png" width="600">

*Nowadays a vast majority of both experimental and theoretical papers involve some **numerical calculations**, simulations or computer modeling*.

In experimental sciences
* the methods used and the results are published
* All experimental data should be available upon request
* It is considered unscientific to withhold crucial details in a theoretical proof

In computational sciences 
* There are not yet any well established guidelines for how **source code** and **generated data** should be handled. 

*A number of editorials in high-profile journals have started 
to demand of authors to provide the source code for simulation software*


## Requirements on scientific computing

With respect to numerical work:

**Replication**
    - An author of a scientific paper that involves numerical calculations should be able to rerun the simulations and replicate the results upon request. 
    - Other scientist should also be able to perform the same calculations and obtain the same results, given the information about the methods used in a publication.

**Reproducibility** 
    - The results obtained should be reproducible with an independent implementation of the method, or using a different method altogether. 

### To achieve these goals

* Keep source code and version that was used to produce data and figures in published papers
* Record information of which version of external software that was used
    - Keep access to the environment that was used
* Be ready to give additional information about the methods used
* Ideally codes should be published online

# Tools for managing source code
*this is extremely important for your future work*

Ensuring replicability and reprodicibility of scientific simulations is a *complicated problem*

####Revision Control System (RCS) software
Good choices include
* git - http://git-scm.com
* mercurial - http://mercurial.selenic.com. Also known as `hg`
* subversion - http://subversion.apache.org. Also known as `svn`

####Online repositories for source code
Some good alternatives are
* Github - http://www.github.com
* Bitbucket - http://www.bitbucket.com
* Privately hosted repositories on the university's or department's servers

Repositories are also excellent for version controlling manuscripts, figures, thesis files, data files, lab logs, etc. 

Basically for any digital content that must be preserved and is frequently updated!

They are also excellent **collaboration tools**!

## What is Python?

[Python](http://www.python.org/) is: 

> a modern, general-purpose, object-oriented, high-level programming language.

###General characteristics of Python

* **clean and simple language:** 
    * Easy-to-read and intuitive code
    * easy-to-learn minimalistic syntax
    * maintainability scales well with size of projects

* **expressive language:** 
    * Fewer lines of code
    * fewer bugs
    * easier to maintain

###Technical details

* **dynamically typed:** 
    * No need to define the type of variables, function arguments or return types.
* **automatic memory management:** 
    * No need to explicitly allocate and deallocate memory for variables and data arrays
        * No memory leak bugs 
* **interpreted:** 
    * No need to compile the code
        * The Python interpreter reads and executes the python code directly

###Advantages

* The main advantage is ease of programming
    - minimizing the **time required** to develop, debug and maintain the code
* Well designed language that encourage many good programming practices
    - Modular and object-oriented programming
    - good system for packaging and re-use of code
    - This often results in more transparent
    - maintainable and bug-free code
* Self describing (*introspection*)
    - Documentation tightly integrated with the code
* A large standard library, and a large collection of add-on packages

###Disadvantages

* Since Python is an interpreted and dynamically typed programming language, the execution of python **code can be slow compared to compiled **statically typed programming languages
    - such as C and Fortran...
* Very different from functional programming, which you may already know a little bit
    - such as C and Fortran...

# Popularity

<table> <tr> <td>
<img 
src='http://static1.squarespace.com/static/51361f2fe4b0f24e710af7ae/t/52dc3638e4b0d99728f927ae/1390163522743/codeeval2014.jpg?format=750w'
width=500
>
</td><td>
<img 
src='http://static1.squarespace.com/static/51361f2fe4b0f24e710af7ae/54b5c35ee4b0b6572f6dac96/54b5c367e4b0226a8ffadefe/1421198184280/codeeval2015.001.jpg?format=750w'
width=500
>
</td></tr></table>

sources:
- http://blog.codeeval.com/codeevalblog/2014#.VXGkF5rtlBc=
- http://blog.codeeval.com/codeevalblog/2015#.VXGkE5rtlBc=

### What makes python suitable for scientific computing?

<img src="images/optimizing-what.png" width="800">

Python has a strong position in *scientific computing* 

- Large community of users
- easy to find help and documentation

Extensive ecosystem of *scientific libraries* and environments
- **numpy** http://numpy.scipy.org - Numerical Python
- **scipy** http://www.scipy.org -  Scientific Python
- **matplotlib** http://www.matplotlib.org - graphics library


* Great performance due to close integration with time-tested and highly optimized codes written in C and Fortran:
    * blas, altas blas, lapack, arpack, Intel MKL, ...

* Good support for 
    * Parallel processing with processes and threads
    * Interprocess communication (MPI)
    * GPU computing (OpenCL and CUDA)

* Readily available and suitable for use on high-performance computing clusters. 

##No license costs!

No unnecessary use of research budget

### The scientific python software stack
Lots of '*goodies*'

<img src="images/scientific-python-stack.jpg" width="750">

# Python interpreter

The standard way to use the Python programming language is to use the Python interpreter to run python code

* The python interpreter is a program that read and execute the python code in files passed to it as arguments
* At the command prompt, the command ``python`` is used to invoke the Python interpreter

For example, to run a file ``my-program.py`` that contains python code from the command prompt, use:

    $ python my-program.py

We can also start the interpreter by simply typing ``python`` at the command line, and interactively type python code into the interpreter. 

<img src="images/python-screenshot.jpg" width="700">


### IPython

IPython is an interactive shell that addresses the limitation of the standard python interpreter

...it is a work-horse for scientific use of python! 

It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

<img src="images/ipython-screenshot.jpg" width="800">

Some of the many useful features of IPython includes:

* Command history, which can be browsed with the up and down arrows on the keyboard.
* Tab auto-completion.
* In-line editing of code.
* Object introspection, and automatic extract of documentation strings from python objects like classes and functions.
* Good interaction with operating system shell.
* Support for multiple parallel back-end processes, that can run on computing clusters or cloud services like Amazon EE2.


# IPython notebook

[IPython notebook](http://ipython.org/notebook.html) is an HTML-based notebook environment for Python

* Based on the IPython shell
* Provides a cell-based environment with great interactivity
* calculations can be organized documented in a structured way


<img src="images/ipython-notebook-screenshot.jpg" width="800">

Although using the a web browser as graphical interface, 

IPython notebooks are usually run **locally**

from the same computer that run the browser. 


To start a new IPython notebook session, run the following command:

    $ ipython notebook

from a directory where you want the notebooks to be stored. 

<small>(This will open a new browser window with a running explorer of the current path)</small>

#Welcome, *you* 
##a <u>notebooker</u> scientist

`let's demo together`

Link: http://j.mp/ourpylab

<small> ... and yes, you will be able to do what i can do at the end of this day </small>

##The notebook magic

- explorer, create new, remove, rename

- move inside, run cell code, help

- the kernel, cell types 

- markdown and notes

- download ipynb, python, html

- install a library and use it

## Versions of Python

There are currently two versions of python: 

**Python 2** and **Python 3**. 


* Python 3 will eventually supercede Python 2
    + but **it is not backward-compatible** with Python 2
* A lot of existing python code and packages has been written for Python 2
    + and it is still the most wide-spread version
    
We will stick with Python 2 for this time.

*Note*:

> Several versions of Python can be installed in parallel, as shown above.


before getting serious:

## Python and module versions

For the reproducibility of an IPython notebook:

- We record the versions of all these different software packages
- If this is done properly it will be easy to reproduce the environment 


To encourage the practice of recording versions in notebooks:

> a simple IPython extension that produces a table with versions numbers of selected software components

In [15]:
# you only need to do this once
%install_ext https://raw.githubusercontent.com/cineca-scai/lectures/milano/version_information/version_information.py

Installed version_information.py. To use it, type:
  %load_ext version_information


Now, to load the extension and produce the version table

In [16]:
# Execute this now

%load_ext version_information

%version_information numpy, scipy, pandas, matplotlib, seaborn, bokeh

Software,Version
Python,2.7.10 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython,3.1.0
OS,Linux 4.0.3 boot2docker x86_64 with debian jessie sid
numpy,1.9.2
scipy,0.15.1
pandas,0.16.1
matplotlib,1.4.3
seaborn,0.5.1
bokeh,0.9.0
Mon Jun 15 14:10:14 2015 UTC,Mon Jun 15 14:10:14 2015 UTC


<small>
These talk may be found #*static*# using the URL http://bit.ly/cineca_python
</small>


**Let's move to the next part :)**