{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data analysis in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This course is aimed at the intermediate Python developer who wants to learn how to do useful data analysis tasks in Python.\n", "It will initially focus on the Python package [pandas](http://pandas.pydata.org/) but will also cover [matplotlib](http://matplotlib.org/), [NumPy](http://www.numpy.org/) and [SciPy](https://www.scipy.org/) to some degree.\n", "\n", "Data analysis is a huge topic and we couldn't possibly cover it all in one short course so the purpose of this workshop is to give you an introduction to some of the most useful tools and to demonstrate some of the most common problems that surface." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In previous courses, you've used the `python` command line program to execute scripts and `ipython` to run interactively. This course will use another tool called Jupyter (previously known as IPython Notebook) to run your Python code. It operates like a standard IPython interactive session with the addition of allowing you to intersperse your code with blocks of text to explain what you're doing and embed output such as graphs directly into the page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get started, launch Jupyter Notebook as shown by your instructor. Towards the top-right of that page there should be three buttons, the middle of which is labelled *New*. Clicking that gives a drop-down and you should select the *Python 3* option.\n", "\n", "This will open a new notebook file. Give it a name by clicking on the 'Untitled' at the top of the screen.\n", "\n", "Throughout this course you will likely want to start a new notebook for each section of the course so name them appropriately to make it easier to find them later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting started\n", "\n", "Once the notebook is launched, you will see a wide white box with a grey box inside it with a blue `In [ ]:` to the left. The grey box is an input cell, similar to that which you find in the IPython command line program. You type any Python code you want to run inside that box:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output appears below when the cell is run\n", "To run a cell, press Ctrl-Enter with the cursor inside or use the run button in the toolbar at the top\n" ] } ], "source": [ "# Python code can be written in 'Code' cells\n", "print('Output appears below when the cell is run')\n", "print('To run a cell, press Ctrl-Enter with the cursor inside or use the run button in the toolbar at the top')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In your notebook, type the following in the first cell and then run it, you should see the same output:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 5\n", "b = 7\n", "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cells in a notebook are linked together so a variable defined in one is available in all the cells from that point on so in the second cell you can use the variables `a` and `b`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-2\n" ] } ], "source": [ "c = a - b\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some Python libraries have special integration with Jupyter notebooks and so can display their output directly into the page. For example `pandas` will format tables of data nicely and `matplotlib` will embed graphs directly:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "
---|---|---|---|
0 | \n", "1 | \n", "2 | \n", "3 | \n", "
1 | \n", "5 | \n", "6 | \n", "6 | \n", "