{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro to Jupyter and Python\n",
    "\n",
    "> Abstract: A brief introduction to using Jupyter and Python\n",
    "\n",
    "## Preface\n",
    "\n",
    "This resource is intended to be a whirlwind tour of using Python within Jupyter notebooks. Consider it a collection of pointers for further study and some small useful things you may not have known before.\n",
    "\n",
    "### Other resources\n",
    "\n",
    "**Beginner:** For more educative introductions, see:\n",
    "\n",
    "- https://swcarpentry.github.io/python-novice-gapminder/\n",
    "- https://www.w3schools.com/python/default.asp\n",
    "- https://inferentialthinking.com/chapters/03/programming-in-python.html\n",
    "\n",
    "If you already know other languages, you might find this useful to get up to speed with Python syntax: https://learnxinyminutes.com/docs/python\n",
    "\n",
    "**Intermediate:** For creating importable modules and packages, testing, documenting, logging, optimising...:\n",
    "\n",
    "- https://softwaresaved.github.io/az-intermediate-software-skills-course/ (alternatively available [here](https://carpentries-incubator.github.io/python-intermediate-development/index.html))\n",
    "- https://python-102.readthedocs.io/\n",
    "\n",
    "For a broader set of skills in data science:\n",
    "- [The Turing Way](https://the-turing-way.netlify.app/)\n",
    "- [Code Refinery](https://coderefinery.org/lessons/)\n",
    "\n",
    "For guides specifically for Earth sciences:\n",
    "- [Project Pythia Foundations](https://foundations.projectpythia.org/)\n",
    "\n",
    "## Jupyter basics\n",
    "\n",
    "A *Jupyter notebook* is a file (extension .ipynb) which can contain both (live) code and its outputs, as well as explanatory text. Content is organised into *cells*, where each cell may be either *Code* or [*Markdown*](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) (or others). A notebook can be rendered non-interactively in a web browser, converted to other formats (HTML, PDF ...), or can be run interactively through a range of software. The standard software is JupyterLab which can be interacted with in a web browser, while the server backend (where the code itself runs) can be run either locally on your own machine, or on a remote machine as is the case with the *Virtual Research Environment (VRE)*. A notebook must be connected to a *kernel* which provides the software executable used to run code within the notebook (see the top right \"Python 3\" in JupyterLab). When accessing the VRE, your Python code and data associated with a notebook are stored and running on a remote machine - you only view the inputs and outputs from the code such that data transfer over the internet to your web browser is minimal. The main advantage is that there is zero setup required - all you need is a web browser.\n",
    "\n",
    "When running interactively, notebook cells can be added, deleted, edited, and executed using buttons in JupyterLab and with keyboard shortcuts. Double click on this text to see the raw Markdown - re-render it by running the cell: use the \"play\" button at the top or use **Ctrl-Enter** (run) or **Shift-Enter** (run and advance to the next cell).\n",
    "\n",
    "A notebook switches between two modes: **command** and **edit**. Enter edit mode by double clicking inside a cell, or by pressing **Enter**. Enter command mode by pressing **Esc** and then the notebook can be navigated with the arrow keys.\n",
    "\n",
    "The following cell is set to \"Code\". Run the cell - you should see that the output of the last line of the cell is displayed as the output of the cell. This is a convenient way to check the state of variables rather than having to use `print()` as below. To suppress this behaviour, end the line with`;`\n",
    "\n",
    "Cells can be run in any order and the memory is persistent - see the incrementing counter `[1]`, `[2]` on the left - so be careful when you run (or re-run) cells out of order!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = 1\n",
    "b = 2\n",
    "a + b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(a, \",\", b)\n",
    "a, b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Markdown cells\n",
    "\n",
    "These cells provide a way to document and describe your code. These can include rich text, equations, embedded images and video, and HTML. Consult [a reference](https://www.markdownguide.org/) (or see `Help / Markdown Reference` in JupyterLab) for more details, but some things to get you started (again, double click this cell to see the markdown):\n",
    "\n",
    "Use `# Title`, `## Subtitle ...` to create headings to structure documents - these should be placed *at the start* of a new cell.\n",
    "\n",
    "Use `*..*`, `**..**`, for *italics* and **bold**.\n",
    "\n",
    "Use `- ...` at the beginning of successive lines to make an unordered list:\n",
    "\n",
    "- list item 1\n",
    "- list item 2\n",
    "\n",
    "Use `1. ...` to make an ordered list:\n",
    "\n",
    "1. list item 1\n",
    "2. list item 2\n",
    "\n",
    "Use `$...$` to insert mathematics using [Latex](https://en.wikibooks.org/wiki/LaTeX/Mathematics) to create them inline like $\\frac{dy}{dx}=\\sin{\\theta} + 5k$, or `$$...$$` to create them in a centred equation style like $$\\frac{dy}{dx}=\\sin{\\theta} + 5k$$\n",
    "\n",
    "Use `---` to insert a horizontal line to break up sections\n",
    "\n",
    "---\n",
    "\n",
    "Use `` `...` `` to insert raw text like code: `print()`, or,\n",
    "\n",
    "\\`\\`\\`python  \n",
    "\\# insert python code here  \n",
    "\\`\\`\\`\n",
    "<!-- Note: Using \\ to escape the special characters (and this is a hidden comment, same as in HTML) -->\n",
    "\n",
    "to render Python code with syntax highlighting like:\n",
    "\n",
    "```python\n",
    "for i in range(5):\n",
    "    print(i)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Jupyter tricks and shortcuts\n",
    "\n",
    "As well as keyboard shortcuts to help interact with notebooks, there are [many](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) extra tricks that you will learn over time. Some of these are:\n",
    "\n",
    "**use ! to execute shell commands: (beware these will be system-specific!)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pwd      # print working directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -l    # list files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**use % to access [*IPython line magics*](https://ipython.readthedocs.io/en/stable/interactive/magics.html):**\n",
    "\n",
    "e.g. [%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time) or [%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) to test execution time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit [x**2 for x in range(100)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[%pdb](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pdb) to access the Python debugger"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**use ? to get help:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "?print"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also try using **Tab** to auto-complete and **Shift-Tab** to see quick help on the object you are accessing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python basics\n",
    "\n",
    "Values are *assigned* to *variables* with `=`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = 5\n",
    "y = 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Operators* perform operations between two given variables - the outcomes are obvious for arithmetic situations such as in this table, but behaviour of operators can be defined for other object types (as with lists and strings as below).\n",
    "\n",
    "| .         | Addition | Subtraction | Multiplication | Division | Exponent |  Modulo | . | Equal    | Not equal | Less than | [and more...](https://www.w3schools.com/python/python_operators.asp) |\n",
    "|-----------|:--------:|:-----------:|:--------------:|:--------:|:--------:|:-------:|---|----------|-----------|-----------|----------------------------------------------------------------------|\n",
    "|  Example: |  `5 + 2` |   `5 - 2`   |     `5 * 2`    |  `5 / 2` | `5 ** 2` | `5 % 2` | . | `5 == 2` | `5 != 2`  | `5 < 2`   | ...                                                                  |\n",
    "| Output:   | 7        | 3           | 10             | 2.5      | 25       | 1       | . | `False`  | `True`    | `False`   | ...                                                                  |\n",
    "\n",
    "We can use conditional operators in an [\"if statement\"](https://www.w3schools.com/python/python_conditions.asp) to control program flow, as below. Indentation / whitespace at the beginning of the line is used to define scope in the code, where other languages might use brackets or an `endif`. You should be able to find the bug in this code for some values of x and y, and fix it by adding additional `elif` cases before the final `else`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x, y = 5, 2\n",
    "if x > 2:\n",
    "    print(x, \">\", y)\n",
    "else:\n",
    "    print(x, \"<\", y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whitespace is important in Python, unlike other languages. Everything within the `if` block must be indented!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python data types\n",
    "\n",
    "You should be familiar with some of the core Python data types: integer, floating point number, string, list, tuple, [dictionary](https://www.freecodecamp.org/news/python-dictionaries-detailed-visual-introduction/):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = 1\n",
    "b = 1.0\n",
    "c = \"one\"\n",
    "d = [a, b, c]\n",
    "e = (a, b, c)\n",
    "f = {\"a\": a, \"b\": b, \"c\": c, \"d\": d, \"e\": e}\n",
    "a, b, c, d, e, f"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `type()` to see what the type of an object is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(a), type(b), type(c), type(d), type(e), type(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Jupyter provides tools to help coding. Create a new code cell here, type `f.` then press **Tab** - you will see a list completion options. Extract the keys from the dictionary with `f.keys()`. Here we have used `.` to access one of the *methods* the object we assigned to `f`. The parentheses are necessary as this is a function call - arguments could be passed here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.keys  # is the function itself"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.keys()  # actually runs the function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use `?` to get help on the dictionary we created. This can be used to get a quick look into the documentation for any object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This shows us that we can create such a dictionary using the `dict()` function - before we used the shortcut syntax with curly brackets, `{}`. Let's create a new dictionary using `dict()` instead. Click inside the parentheses in `dict(..)` and press **Shift-Tab** to access the quick help. Follow the example text near the end of the quick help to create a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dict = dict()\n",
    "my_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dict = dict(a=\"hello!\", b=42)\n",
    "my_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should be able to extract values from the dictionary using the key names, in two different ways:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(my_dict[\"a\"])\n",
    "print(my_dict.get(\"b\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember that dictionaries consist of (*key*, *value*) pairs. Use `my_dict[\"key_name\"]` or `my_dict.get(\"key_name\")` to extract a value, or if you want to supply a default value when they key hasn't been found:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dict.get(\"c\", \"empty!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python *objects*\n",
    "\n",
    "Everything in Python is an *object*. An object is a particular *instance* of a *class*, and has a set of associated behaviours that the class provides. These behaviours come in the form of *properties* (attributes that the object carries around, and typically involve no real computation time to access), and *methods* (functions that act on the object). These are accessed as `object.property` and `object.method()`.\n",
    "\n",
    "To demonstrate, here we create a complex number, then separately access the real and imaginary components of it which are stored as object properties. Finally we use `.conjugate()` to evaluate its complex conjugate - this returns a new complex number object that can be manipulated in the same way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = 1 + 2j\n",
    "print(type(z))\n",
    "print(z)\n",
    "print(z.real)\n",
    "print(z.imag)\n",
    "print(z.conjugate())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the basis of *object-oriented programming* which is a popular programming paradigm that Python software takes advantage of."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Manipulating lists and strings\n",
    "\n",
    "Lists and strings are closely related objects - a string can be thought of as a list of characters. Python comes with an in-built toolbox to help work with these objects ([*string methods*](https://www.w3schools.com/python/python_ref_string.asp) and [*list methods*](https://www.w3schools.com/python/python_ref_list.asp)). Here we use the string methods `split()` and `join()` to split a phrase up then rejoin the words differently. Afterwards we show that the `replace()` method already provides this functionality, replacing instances of a chosen character(s) (in this case, space) with another."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = \"some words\"\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l = s.split()\n",
    "l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"; \".join(l)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.replace(\" \", \"; \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at some list manipulation now using our new list, `l`. We can use `append()` to append a new item to the end (this is done in-place, updating the state of `l`, in contrast to the string methods above which return new strings/lists), then sort the resulting three words alphabetically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l.append(\"more\")\n",
    "l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l.sort()\n",
    "l"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access items from a list using `[]` and the index number within the list, **starting from zero**. Using a colon, e.g. `a:b`, performs *list slicing* to return all values in the list from index `a` up to, but not including, `b`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "l[0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to list methods, which are particular to list objects and so are accessed as `list.method()`, there are other more fundamental operations that are provided as [built-in functions](https://www.w3schools.com/python/python_ref_functions.asp) that you could try to apply to any object so are accessed as `function(list)`. Below we evaluate the length of the list, and of the string which is the first item of the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(l), len(l[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way to add items to a list is to concatenate two lists together, which could be done using `list1.extend(list2)` as opposed to `append()` above (which just adds one item to a list). Let's achieve the same thing by using the simpler `+` operator which for lists, simply joins them together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "[10, 11, 12] + [5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But what if we wanted to add the number to each item in the list (i.e. we are trying to do some arithmetic)? In this case we would have to *loop through* each item in the list and add `5` to each one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loops\n",
    "\n",
    "For-loops are constructed with `for a in b:`, followed by an indented block as with if-statements. `a` is a variable which will change on each iteration of the loop, and `b` is an *iterable* - in a simple case like below this can just be a list, something containing items that can be iterated through. The length of `b` therefore determines the number of iterations the loop will make. We could create an index counter to iterate through and access items in a list (more normally, this would be achieved with `for i in range(3)...`) ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [10, 11, 12]\n",
    "for i in [0, 1, 2]:\n",
    "    print(my_list[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "but it is more *Pythonic* to iterate directly through the list items themselves:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [10, 11, 12]\n",
    "for i in my_list:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create a new list with each item incremented by 5, we first create an empty list, then append the new number on each iteration of the loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [10, 11, 12]\n",
    "my_list2 = []\n",
    "for i in my_list:\n",
    "    my_list2.append(i + 5)\n",
    "my_list2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, this can be done in-place on the original list if we iterate through the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [10, 11, 12]\n",
    "for i in range(len(my_list)):\n",
    "    my_list[i] = my_list[i] + 5\n",
    "my_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To be properly Pythonic, we should use a *list comprehension* which can implement looping behaviour while constructing a list directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [10, 11, 12]\n",
    "[i+5 for i in my_list]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Functions\n",
    "\n",
    "Functions can be *defined* using `def function_name(arguments): ...` followed by an indented block. Usually something should be *returned* by the function by putting `return ...` on the last line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_5(input_list):\n",
    "    output_list = [i+5 for i in input_list]\n",
    "    return output_list\n",
    "\n",
    "add_5([10, 11, 12])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Arguments (args) for the function can be provided separated by commas, followed by optional *keyword arguments* (kwargs) like `x=<default value>`. We should also document the function with a *docstring* which goes inside triple-quotes `\"\"\"` at the top of the function, and comments with `#`. Adding a docstring here makes the code *self-documenting* - we can now access this through the notebook help, which is revealed by Shift-Tab when typing `print_things(...)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def print_things(a, b, c=None):\n",
    "    \"\"\"Print 'a; b; c' or 'a; b'\"\"\"\n",
    "    # A shortcut to do if-else logic\n",
    "    output = [a, b, c] if c else [a, b]\n",
    "    # Cast each of a, b, c to strings then join them\n",
    "    print(\"; \".join([str(i) for i in output]))\n",
    "\n",
    "print_things(1, 2, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extending functionality by *importing* from other packages\n",
    "\n",
    "Obviously the above is an overly complex way to just add two sets of numbers together. This is because lists are not the appropriate object for this, and instead an *array* is neccessary. Array functionality can be *imported* from the *package*, [**numpy**](https://docs.scipy.org/doc/numpy/user/quickstart.html) (numerical Python). Array objects behave like vectors and matrices in mathematics, and can be higher dimensional - hence the `ndarray` name, n-dimensional array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = numpy.array([10, 11, 12])\n",
    "print(type(a))\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a + 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is customary to shorten the name used to refer to the imported package - this is done with `import ... as ...` - here we import `numpy` and rename it as `np` so that we can refer to anything within numpy as `np.[...]`. Another option is to just import only the part we want with `from ... import ...`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from numpy import array\n",
    "\n",
    "l1 = np.array([1, 2, 3])\n",
    "l2 = array([1, 2, 3])\n",
    "\n",
    "np.array_equal(l1, l2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Arrays come with new properties and methods for mathematical operations, as well as many functions under the numpy *namespace* (what we can access with `np.[...]`). More advanced tools can be found in [**scipy**](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) (e.g. `from scipy import fft`) or elsewhere.\n",
    "\n",
    "The standard tool for making figures is [**matplotlib**](https://matplotlib.org/tutorials/index.html). Numpy and matplotlib should be quite familiar to those coming from Matlab - you just need to get used to working from the numpy and matplotlib namespaces, and to start counting from zero instead of one!\n",
    "\n",
    "The \"pyplot\" module of matplotlib can be used in a similar way to Matlab, with repeated calls to `plt.[...]` updating the state of the figure being created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.arange(-2*np.pi, 2*np.pi, 0.1)\n",
    "y = np.sin(x)\n",
    "plt.plot(x, y)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"y\")\n",
    "plt.title(\"y = sin(x)\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, for full control of figures, it is more flexible to use the object oriented approach as below. For more, refer to the [matplotlib tutorials](https://matplotlib.org/tutorials/index.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a \"figure\" object that contains everything\n",
    "fig = plt.figure()\n",
    "# Add an \"axes\" object as part of the figure\n",
    "ax = fig.add_subplot(111)\n",
    "# NB: Instead of the above, it is more convenient to use:\n",
    "#       fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "# Add things to the axes\n",
    "ax.plot(x, y)\n",
    "ax.set_xlabel(\"x\")\n",
    "ax.set_ylabel(\"y\")\n",
    "ax.set_title(\"y = sin(x)\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The demonstrations above show that there are quite a few lines of code necessary to do the typical data analysis flow of loading/generating data, performing some computation, and plotting the results, in addition to having to pass around variables (like `x` and `y` above). This code may very often be quite similar so you might imagine standardising this process. It is for this reason that [**pandas**](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) has become popular, which provides the `DataFrame` object (inspired by the R language). This enables us to create a single tabular object to pass around (called `df` below) that contains our data (in this case numpy arrays), that has built-in plotting methods to rapidly inspect the data (creating matplotlib figures). Pandas accepts a wide range of input formats to load data, and contains computational tools that are particularly useful for time series, as well as connecting to [**Dask**](https://docs.dask.org/en/latest/why.html) for doing larger computations (larger than memory, multi-process, distributed etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\"x\": np.arange(-2*np.pi, 2*np.pi, 0.1),\n",
    "                   \"y\": np.sin(np.arange(-2*np.pi, 2*np.pi, 0.1))})\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot(x=\"x\", y=\"y\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Moving beyond pandas, we come to [**xarray**](http://xarray.pydata.org/en/stable/), which extends the pandas concepts to n-dimensional data.\n",
    "\n",
    "Try the [Xarray Tutorial](https://xarray-contrib.github.io/xarray-tutorial/) to learn more."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}