{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Arithmetic Operations And Aggregations\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the LArray library:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from larray import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arithmetic operations\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = ndtest((3, 3))\n", "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# addition\n", "arr + 10" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# multiplication\n", "arr * 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'true' division\n", "arr / 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'floor' division\n", "arr // 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "**Warning:** Python has two different division operators: \n", "\n", "- the 'true' division (/) always returns a float.\n", "- the 'floor' division (//) returns an integer result (discarding any fractional result).\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# % means modulo (aka remainder of division)\n", "arr % 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ** means raising to the power\n", "arr ** 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More interestingly, binary operators as above also works between two arrays:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load the 'demography_eurostat' dataset\n", "demo_eurostat = load_example_data('demography_eurostat')\n", "\n", "# extract the 'pop' array\n", "pop = demo_eurostat.pop\n", "pop" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregation_matrix = Array([[1, 0, 0], [0, 1, 1]], axes=(Axis('country=Belgium,France+Germany'), pop.country))\n", "aggregation_matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# @ means matrix product\n", "aggregation_matrix @ pop['Male']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "**Note:** Be careful when mixing different data types.\n", "You can use the method [astype](../_generated/larray.Array.astype.rst#larray.Array.astype) to change the data type of an array.\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregation_matrix = Array([[1, 0, 0], [0, 0.5, 0.5]], axes=(Axis('country=Belgium,France+Germany/2'), pop.country))\n", "aggregation_matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregation_matrix @ pop['Male']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# force the resulting matrix to be an integer matrix\n", "(aggregation_matrix @ pop['Male']).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Axis order does not matter much (except for output)\n", "\n", "You can do operations between arrays having different axes order.\n", "The axis order of the result is the same as the left array\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# extract the 'births' array\n", "births = demo_eurostat.births\n", "\n", "# let's change the order of axes of the 'births' array\n", "births_transposed = births.transpose()\n", "births_transposed" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# LArray doesn't care of axes order when performing \n", "# arithmetic operations between arrays\n", "pop + births_transposed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Axes must be compatible\n", "\n", "Arithmetic operations between two arrays only works when they have compatible axes (i.e. same labels)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the 'pop' and 'births' have compatible axes\n", "pop + births" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now, let's replace the country names by the country codes\n", "births_codes = births.set_labels('country', ['BE', 'FR', 'DE'])\n", "births_codes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# arithmetic operations between arrays \n", "# having incompatible axes raise an error\n", "try:\n", " pop + births_codes\n", "except Exception as e:\n", " print(type(e).__name__, e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " **Warning:** Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the [ignore_labels](../_generated/larray.Array.ignore_labels.rst#larray.Array.ignore_labels) method.\n", "In that case only the position on the axis is used and not the labels.\n", "Using this method is done at your own risk.\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# use the .ignore_labels() method on axis 'country'\n", "# to avoid the incompatible axes error (risky)\n", "pop + births_codes.ignore_labels('country')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extra Or Missing Axes (Broadcasting)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The condition that axes must be compatible only applies on common axes. \n", "Arithmetic operations between two arrays can be performed even if the second array has extra or missing axes compared to the first one:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# let's define a 'multiplicator' vector with \n", "# one value defined for each gender\n", "multiplicator = Array([-1, 1], axes=pop.gender)\n", "multiplicator" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the multiplication below has been propagated to the \n", "# 'country' and 'time' axes.\n", "# This behavior is called broadcasting\n", "pop * multiplicator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Boolean Operations\n", "\n", "Python comparison operators are: \n", "\n", "| Operator | Meaning |\n", "|-----------|-------------------------|\n", "|``==`` | equal | \n", "|``!=`` | not equal | \n", "|``>`` | greater than | \n", "|``>=`` | greater than or equal | \n", "|``<`` | less than | \n", "|``<=`` | less than or equal |\n", "\n", "Applying a comparison operator on an array returns a boolean array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are greater than 10 millions\n", "pop > 10e6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparison operations can be combined using Python bitwise operators:\n", "\n", "| Operator | Meaning |\n", "|----------|------------------------------------- |\n", "| & | and |\n", "| \\| | or |\n", "| ~ | not |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are greater than 10 millions and less than 40 millions\n", "(pop > 10e6) & (pop < 40e6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are less than 10 millions or greater than 40 millions\n", "(pop < 10e6) | (pop > 40e6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are not less than 10 millions\n", "~(pop < 10e6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned boolean array can then be used in selections and assignments:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop_copy = pop.copy()\n", "\n", "# set all values greater than 40 millions to 40 millions\n", "pop_copy[pop_copy > 40e6] = 40e6\n", "pop_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Boolean operations can be made between arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test where the two arrays have the same values\n", "pop == pop_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test if all values between are equals, use the [equals](../_generated/larray.Array.equals.rst#larray.Array.equals) method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.equals(pop_copy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregates\n", "\n", "The LArray library provides many aggregation functions. The list is given in the [Aggregation Functions](../api.rst#aggregation-functions) subsection of the [API Reference](../api.rst) page.\n", "\n", "Aggregation operations can be performed on axes or groups. Axes and groups can be mixed. \n", "\n", "The main rules are: \n", "\n", "- Axes are separated by commas ``,``\n", "- Groups belonging to the same axis are grouped inside parentheses ()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the sum along an axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.sum('gender')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or several axes (axes are separated by commas ``,``):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.sum('country', 'gender')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the sum along all axes except one by appending `_by` to the aggregation function:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.sum_by('time')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the sum along groups (the groups belonging to the same axis must grouped inside parentheses ()):\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "even_years = pop.time[2014::2] >> 'even_years'\n", "odd_years = pop.time[2013::2] >> 'odd_years'\n", "\n", "pop.sum((odd_years, even_years))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mixing axes and groups in aggregations:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.sum('gender', (odd_years, even_years))" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "livereveal": { "autolaunch": false, "scroll": true } }, "nbformat": 4, "nbformat_minor": 2 }