{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "nbpresent": { "id": "04352eb2-a708-4e9d-be5e-140d1fb82590" }, "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "
\n", "

Pandas Hands-on

\n", "

Universitat Pompeu Fabra (UPF) - Barcelona

\n", "

Massimo Quadrana

\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "fae371fd-868d-45d5-9b11-a5a41bdac839" }, "slideshow": { "slide_type": "slide" } }, "source": [ "# About me: Massimo Quadrana\n", "\n", "- PhD student at Politecnico di Milano\n", "- Working on Recommendation Systems\n", "->\n", "\n", "- https://github.com/mquad\n", "- [@mquad](https://twitter.com/mquad)\n", "\n", "\n", "Originally Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "8317c7f2-0f11-4e8c-a58b-1f9f6dc7b126" }, "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "# Content of this talk\n", "\n", "- Why do you need pandas?\n", "- Basic introduction to:\n", " - Data structures and basic operations\n", " - Indexing and selecting data\n", " - Groupby operation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Material\n", "\n", "- All materials (notebook, data, link to nbviewer): https://github.com/mquad/pandas-tutorial\n", "- The complete tutorial is available at https://github.com/jorisvandenbossche/pandas-tutorial\n", "- You need `pandas` >= 0.15.2 (easy solution is using Anaconda)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "5c11d3c6-daee-4080-b8f5-0d7cd5ade8f7" }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Why do you need pandas?" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "a175be5b-cf10-487f-8146-3166eb5e60b5" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Why do you need pandas?\n", "\n", "When working with *tabular or structured data* (like R dataframe, SQL table, Excel spreadsheet, ...):\n", "\n", "- Import data\n", "- Clean up messy data\n", "- Explore data, gain insight into data\n", "- Process and prepare your data for analysis\n", "- Analyse your data (together with scikit-learn, statsmodels, __Keras__...)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "nbpresent": { "id": "431d8d39-3deb-4f1f-b6aa-142e5001436a" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "# Pandas: data analysis in python\n", "\n", "For data-intensive work in Python the [Pandas](http://pandas.pydata.org) library has become essential.\n", "\n", "What is ``pandas``?\n", "\n", "* Pandas can be thought of as NumPy arrays with labels for rows and columns, and better support for heterogeneous data types, but it's also much, much more than that.\n", "* Pandas can also be thought of as `R`'s `data.frame` in Python.\n", "* Powerful for working with missing data, working with time series data, for reading and writing your data, for reshaping, grouping, merging your data, ...\n", "\n", "It's documentation: http://pandas.pydata.org/pandas-docs/stable/" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "02d99173-0900-4675-b5e6-1ee49ea908a0" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Key features\n", "\n", "* Fast, easy and flexible input/output for a lot of different data formats\n", "* Working with missing data (`.dropna()`, `pd.isnull()`)\n", "* Merging and joining (`concat`, `join`)\n", "* Grouping: `groupby` functionality\n", "* Reshaping (`stack`, `pivot`) [ADVANCED]\n", "* Powerful time series manipulation (resampling, timezones, ..) [ADVANCED]\n", "* Easy plotting" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "nbpresent": { "id": "1a1c4fdb-a27c-4369-a0e8-a2a9b9ca6b61" }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Further reading\n", "\n", "- the documentation: http://pandas.pydata.org/pandas-docs/stable/\n", "- Wes McKinney's book \"Python for Data Analysis\"\n", "- Jake VanderPlas's Python Data Science Handbook: https://github.com/jakevdp/PythonDataScienceHandbook\n", "- Tom Augspurger's series on modern idiomatic pandas: https://tomaugspurger.github.io/modern-1.html\n", "- lots of tutorials on the internet, eg http://github.com/jvns/pandas-cookbook, https://github.com/brandon-rhodes/pycon-pandas-tutorial/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "nbpresent": { "id": "b8672524-da19-4038-9396-584415c29a33" } }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "nbpresent": { "slides": { "18d518b8-3fcf-40e8-b015-c03ae6d4f203": { "id": "18d518b8-3fcf-40e8-b015-c03ae6d4f203", "prev": "2901b664-7a34-47e2-b2df-4109e915930d", "regions": { "9e4bb7d7-eb82-4d26-9922-92ca8934b494": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "fae371fd-868d-45d5-9b11-a5a41bdac839", "part": "whole" }, "id": "9e4bb7d7-eb82-4d26-9922-92ca8934b494" } } }, "1b1d69a3-4c2f-4243-8baf-e035163abc55": { "id": "1b1d69a3-4c2f-4243-8baf-e035163abc55", "prev": "18d518b8-3fcf-40e8-b015-c03ae6d4f203", "regions": { "559338df-d665-4c96-9a3b-dd1ea12fee39": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "8317c7f2-0f11-4e8c-a58b-1f9f6dc7b126", "part": "whole" }, "id": "559338df-d665-4c96-9a3b-dd1ea12fee39" } } }, "239b9ea7-1483-44f8-9eee-ff4ca4a2c90a": { "id": "239b9ea7-1483-44f8-9eee-ff4ca4a2c90a", "prev": "a02063d5-d8c0-4690-a152-a50479080f88", "regions": { "fd8ca7e3-c5e0-41d6-9626-6c8b231d112e": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "a175be5b-cf10-487f-8146-3166eb5e60b5", "part": "whole" }, "id": "fd8ca7e3-c5e0-41d6-9626-6c8b231d112e" } } }, "2901b664-7a34-47e2-b2df-4109e915930d": { "id": "2901b664-7a34-47e2-b2df-4109e915930d", "prev": null, "regions": { "99a84244-deb7-4b3f-877f-43332b5686e1": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "04352eb2-a708-4e9d-be5e-140d1fb82590", "part": "whole" }, "id": "99a84244-deb7-4b3f-877f-43332b5686e1" } } }, "516ce62c-55fd-44f1-b182-8db658b45c05": { "id": "516ce62c-55fd-44f1-b182-8db658b45c05", "prev": "239b9ea7-1483-44f8-9eee-ff4ca4a2c90a", "regions": { "4647a68a-626d-42c7-be88-84a54b599ba1": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "431d8d39-3deb-4f1f-b6aa-142e5001436a", "part": "whole" }, "id": "4647a68a-626d-42c7-be88-84a54b599ba1" } } }, "97c8f96f-6f45-4d66-8064-6fc091070068": { "id": "97c8f96f-6f45-4d66-8064-6fc091070068", "prev": "e01c8c16-065a-40e9-9467-cbd241b2a133", "regions": { "8ccc2152-7dae-48b2-8ad7-3d89073c2dff": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "1a1c4fdb-a27c-4369-a0e8-a2a9b9ca6b61", "part": "whole" }, "id": "8ccc2152-7dae-48b2-8ad7-3d89073c2dff" } } }, "a02063d5-d8c0-4690-a152-a50479080f88": { "id": "a02063d5-d8c0-4690-a152-a50479080f88", "prev": "1b1d69a3-4c2f-4243-8baf-e035163abc55", "regions": { "7b7c9c8f-7b9f-4945-88f7-372403ae0ef2": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "5c11d3c6-daee-4080-b8f5-0d7cd5ade8f7", "part": "whole" }, "id": "7b7c9c8f-7b9f-4945-88f7-372403ae0ef2" } } }, "a5b57b6d-28c0-4cac-96d7-43b58c53b778": { "id": "a5b57b6d-28c0-4cac-96d7-43b58c53b778", "prev": "97c8f96f-6f45-4d66-8064-6fc091070068", "regions": { "9e2296ce-3fce-4713-a093-0da51aaf0db7": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "b8672524-da19-4038-9396-584415c29a33", "part": "whole" }, "id": "9e2296ce-3fce-4713-a093-0da51aaf0db7" } } }, "e01c8c16-065a-40e9-9467-cbd241b2a133": { "id": "e01c8c16-065a-40e9-9467-cbd241b2a133", "prev": "516ce62c-55fd-44f1-b182-8db658b45c05", "regions": { "35721f06-8e1d-41f7-a582-9a674da526d8": { "attrs": { "height": 0.8, "width": 0.8, "x": 0.1, "y": 0.1 }, "content": { "cell": "02d99173-0900-4675-b5e6-1ee49ea908a0", "part": "whole" }, "id": "35721f06-8e1d-41f7-a582-9a674da526d8" } } } }, "themes": {} } }, "nbformat": 4, "nbformat_minor": 0 }