{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas basics" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "pandas series is similar to numpy array, But it suppport lots of extra functionality like Pandaseries.describe()\n", "
\n", "\n",
"Basic acces is samilar to numpy arrary, it support access by index( s[5] ) or slicing ( s[5:10] ).
\n",
"It also support vectorise operation and looping like numpy array.
\n",
"Implemented in C so it works very fast.\n",
"
\n", "Hybrid of list and python Dictionary. It map key value pair.\n", "
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ram 40\n", "Syam 12\n", "Rahul 43\n", "Ganesh 56\n", "dtype: int64\n" ] } ], "source": [ "sal=pd.Series([40,12,43,56],\n", " index=['Ram',\n", " 'Syam',\n", " \"Rahul\",\n", " \"Ganesh\"])\n", "print sal" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "40\n" ] } ], "source": [ "print sal[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using sal[position] is not prefered instead prefer to use sal.iloc[position]\n", "becouse Index has different meaning in series so it avoid confusion
" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56\n" ] } ], "source": [ "print sal.iloc[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "argmax() function return index of max value element
" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ganesh\n" ] } ], "source": [ "print sal.argmax()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56\n", "56\n" ] } ], "source": [ "print sal.loc[\"Ganesh\"]\n", "print sal.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding series with Differen index" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 1\n", "b 2\n", "c 3\n", "d 4\n", "dtype: int64\n" ] } ], "source": [ "a=pd.Series([1,2,3,4],\n", " index=[\"a\",\"b\",\"c\",\"d\"])\n", "b=pd.Series([9,8,7,6],\n", " index=[\"c\",\"d\",\"e\",\"f\"])\n", "print a" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "c 9\n", "d 8\n", "e 7\n", "f 6\n", "dtype: int64\n" ] } ], "source": [ "print b" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a NaN\n", "b NaN\n", "c 12\n", "d 12\n", "e NaN\n", "f NaN\n", "dtype: float64\n" ] } ], "source": [ "print a+b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "C,D are common in both so added correctly rest are just assign a volue NaN (Not a number)
\n", "\n", "adding 5 to each element , we can do this by simply series+5 becouse it is a vector, But lets do using this new techniqe s.apply(function)\n", "
" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 1\n", "b 2\n", "c 12\n", "d 12\n", "e 7\n", "f 6\n", "dtype: float64\n" ] } ], "source": [ "print res" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 6\n", "b 7\n", "c 17\n", "d 17\n", "e 12\n", "f 11\n", "dtype: float64\n" ] } ], "source": [ "print res+5" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def add_5(x):\n", " return x+5" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 6\n", "b 7\n", "c 17\n", "d 17\n", "e 12\n", "f 11\n", "dtype: float64\n" ] } ], "source": [ "print res.apply(add_5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "automaticaly plot index vs data plot\n", "
" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/plain": [ "