{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 3.10 (Pandas-02)

" ] }, { "cell_type": "markdown", "id": "172aaa16", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "\n", "\n", "## _Overview of Pandas Series Data Structure.ipynb_" ] }, { "cell_type": "markdown", "id": "806db2fe", "metadata": {}, "source": [ "#### Read about Pandas Data: https://pandas.pydata.org/docs/user_guide" ] }, { "cell_type": "code", "execution_count": null, "id": "d50f638e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9727124d", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "\n", "1. Overview of Python Pandas library and its data structures\n", "2. Creating a Series\n", " - From Python List\n", " - From NumPy Arrays\n", " - From Python Dictionary\n", " - From a scalar value\n", "3. Attributes of a Pandas Series\n", "4. Understanding Index in a Series and its usage\n", " - Identification\n", " - Selection/Filtering/Subsetting\n", " - Alignment" ] }, { "cell_type": "code", "execution_count": null, "id": "e251bdc6", "metadata": {}, "outputs": [], "source": [ "# To install this library in Jupyter notebook\n", "import sys\n", "!{sys.executable} -m pip install pandas --quiet" ] }, { "cell_type": "code", "execution_count": 2, "id": "dba905d0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('1.3.4',\n", " ['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas'])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pd.__version__ , pd.__path__" ] }, { "cell_type": "code", "execution_count": null, "id": "82b882b6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "99b11957", "metadata": {}, "source": [ "\n", "\n", "## 1. Creating a Series\n", "> **A Series is a one-dimensional array capable of holding a sequence of values of any data type (integers, floating point numbers, strings, Python objects etc) which by default have numeric data labels starting from zero. You can imagine a Pandas Series as a column in a spreadsheet or a Pandas Dataframe object.**\n", "- To create a Series object you can use `pd.Series()` method\n", "\n", "**```pd.Series(data, index, dtype, name)```**\n", "- Where,\n", " - `data`: can be a Python list, Python dictionary, numPy array, or a scalar value.\n", " - `index`: If you donot pass the index argument, it will default to `np.arrange(n)`. Indices must be hashable (numbers or strings) and have the same length as `data`. Non-unique index values are allowed. Index is used for three purposes:\n", " - Identification.\n", " - Selection.\n", " - Alignment.\n", " - `dtype`: Optionally, you can assign any valid numpy datatype to the series object (np.sctypes). If not specified, this will be inferred from `data`.\n", " - `name`: Optionally, you can assign a name to a series, which becomes attribute of the series object. Moreover, it becomes the column name, if that series object is used to create a dataframe later." ] }, { "cell_type": "code", "execution_count": null, "id": "05b1234a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "315fe124", "metadata": {}, "source": [ "### a. Creating a Series from Python List" ] }, { "cell_type": "code", "execution_count": 2, "id": "3fe671e5", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Arif\n", "1 Rauf\n", "2 Maaz\n", "3 \n", "4 Hadeed\n", "dtype: object\n", "\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "list1 = ['Arif', 'Rauf', 'Maaz', '','Hadeed'] # note the empty string\n", "\n", "# When index is not provided, it creates an index for the data starting from zero and with a step size of one.\n", "s = pd.Series(data=list1)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "markdown", "id": "c0e6abcf", "metadata": {}, "source": [ ">Observe that output is shown in two columns - the index is on the left and the data value is on the right. If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1. Here N is the number of data elements." ] }, { "cell_type": "code", "execution_count": null, "id": "f4bb226e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ebb2f672", "metadata": {}, "source": [ "**You can explicitly specify the index for a Series object, which can be either int or string type, and must be of the same size as the values in the series. Otherwise, it will raise a ValueError**" ] }, { "cell_type": "code", "execution_count": 3, "id": "f00c9d4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MS01 Arif\n", "MS02 Rauf\n", " Maaz\n", "MS02 Hadeed\n", "dtype: object\n", "\n" ] } ], "source": [ "list1 = ['Arif', 'Rauf', 'Maaz', 'Hadeed']\n", "indices = ['MS01', 'MS02', '', 'MS02'] # non-unique index values are allowed and you can have empty string as index\n", "\n", "s = pd.Series(data=list1, index=indices)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "12587e41", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 5, "id": "5c07c297", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Arif'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s['MS01']" ] }, { "cell_type": "code", "execution_count": null, "id": "35f66636", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "72e6dd18", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "dce485cc", "metadata": {}, "source": [ ">Also note that non-unique indices are allowed" ] }, { "cell_type": "code", "execution_count": 1, "id": "7086feab", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'pd' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/1t/g3ylw8h50cjdqmk5d6jh1qmm0000gn/T/ipykernel_29216/2678464800.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mindices\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m2.1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.4\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0ms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlist1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'pd' is not defined" ] } ], "source": [ "list1 = ['Arif', 'Rauf', 'Maaz', 'Hadeed']\n", "indices = [2.1, 2.2, 2.3, 2.4] \n", "\n", "s = pd.Series(data=list1, index=indices)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "6ac54543", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "d57856ac", "metadata": {}, "source": [ "**You can create a series with NaN values, using `np.nan`, which is IEEE 754 floating-point representation of Not a Number. NaN values can act as a placeholder for any missing numerical values in the array.**" ] }, { "cell_type": "code", "execution_count": 6, "id": "e420a746", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1.0\n", "1 2.7\n", "2 NaN\n", "3 54.0\n", "dtype: float64\n", "\n" ] } ], "source": [ "list1 = [1, 2.7, np.nan, 54]\n", "s = pd.Series(data=list1)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "markdown", "id": "04a86a0f", "metadata": {}, "source": [ ">Also note the `dtype` of the series object is inferred from the data as `float64`" ] }, { "cell_type": "code", "execution_count": null, "id": "80ff6a9c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "77c3bf63", "metadata": {}, "source": [ "**You can use the `dtype` argument to specify a datatype to the series object.**" ] }, { "cell_type": "code", "execution_count": 7, "id": "ca5026c7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 27\n", "1 33\n", "2 19\n", "dtype: uint8\n", "\n" ] } ], "source": [ "list1 = [27, 33, 19]\n", "s = pd.Series(data=list1, dtype=np.uint8)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "9886a997", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a1c2a099", "metadata": {}, "source": [ "**Optionally, you can assign a name to a series, which becomes attribute of the series object. Moreover, it becomes the column name, if that series object is used to create a dataframe later.**" ] }, { "cell_type": "code", "execution_count": 8, "id": "8b6eeb30", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MS01 Arif\n", "MS02 Rauf\n", "MS03 \n", "MS04 Hadeed\n", "Name: myseries1, dtype: object\n", "\n" ] } ], "source": [ "list1 = ['Arif', 'Rauf', '', 'Hadeed']\n", "indices = ['MS01', 'MS02', 'MS03', 'MS04']\n", "s = pd.Series(data=list1, index=indices, name='myseries1') \n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "8a304a93", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "adefe54c", "metadata": {}, "source": [ "### b. Creating a Series from NumPy Array" ] }, { "cell_type": "code", "execution_count": null, "id": "46c41869", "metadata": {}, "outputs": [], "source": [ "s = pd.Series(data = np.arange(4))\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "ac84af6f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 9, "id": "32fb1954", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 22.3\n", "1 33.6\n", "2 98.0\n", "3 44.0\n", "dtype: float64\n", "\n" ] } ], "source": [ "arr1 = np.array([22.3,33.6, 98, 44])\n", "s = pd.Series(data=arr1, dtype='float64')\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "014a7b02", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6a8917ca", "metadata": {}, "source": [ "### c. Creating a Series from Python Dictionary" ] }, { "cell_type": "code", "execution_count": 10, "id": "665af5d7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name Arif\n", "gender Male\n", "Role Teacher\n", "subject Data Science\n", "dtype: object\n", "\n" ] } ], "source": [ "my_dict = {\n", " 'name':\"Arif\", \n", " 'gender':\"Male\", \n", " 'Role':\"Teacher\", \n", " 'subject':\"Data Science\"}\n", "s = pd.Series(data=my_dict)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "markdown", "id": "7d1af826", "metadata": {}, "source": [ "**When you create a series from dictionary, it will automatically take the keys as index and the value as data**" ] }, { "cell_type": "code", "execution_count": null, "id": "a373d70a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cd475971", "metadata": {}, "source": [ "### d. Creating a Series from Scalar value" ] }, { "cell_type": "code", "execution_count": 11, "id": "f08b327f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 25\n", "dtype: int64\n", "\n" ] } ], "source": [ "s = pd.Series(data=25)\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "7e39a8a8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "83ffbdd4", "metadata": {}, "source": [ "### e. Creating an Empty Series" ] }, { "cell_type": "code", "execution_count": 12, "id": "46319b2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Series([], dtype: float64)\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/1t/g3ylw8h50cjdqmk5d6jh1qmm0000gn/T/ipykernel_20298/938514528.py:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.\n", " s=pd.Series()\n" ] } ], "source": [ "# Need to pass atleast `dtype` else you get a warning\n", "s=pd.Series()\n", "print(s)\n", "print(type(s))" ] }, { "cell_type": "code", "execution_count": null, "id": "ea08592c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bec8ddc7", "metadata": {}, "source": [ "## 3. Attributes of Panda Series\n", "- We can access certain properties called attributes of a series by using that property with the series name using dot `.` notation" ] }, { "cell_type": "code", "execution_count": 24, "id": "e021e537", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Rauf\n", "1 NaN\n", "2 Maaz\n", "3 Hadeed\n", "4 Mujahid\n", "5 Mohid\n", "6 Jamil\n", "Name: myseries1, dtype: object" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_dict = {0:\"Rauf\", 1:np.nan, 2:\"Maaz\", 3:\"Hadeed\", 4:\"Mujahid\", 5:\"Mohid\", 6:\"Jamil\"}\n", "s = pd.Series(my_dict, name=\"myseries1\")\n", "s" ] }, { "cell_type": "code", "execution_count": 14, "id": "666018f4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'myseries1'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `name` attribute of a series object return the name of the series object\n", "s.name" ] }, { "cell_type": "code", "execution_count": null, "id": "9db01ffb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 15, "id": "a24ade38", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `index` attribute of a series object return the list of indices and its datatype\n", "s.index" ] }, { "cell_type": "code", "execution_count": null, "id": "a2999a6c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 18, "id": "1f7d188a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Rauf', '', 'Maaz', 'Hadeed', 'Mujahid', 'Mohid', 'Jamil'],\n", " dtype=object)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `values` attribute of a series object return the list of values and its datatype\n", "s.values" ] }, { "cell_type": "code", "execution_count": null, "id": "8d3fb450", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 19, "id": "0c117bdd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `dtype` attribute of a series object return the type of underlying data\n", "s.dtype" ] }, { "cell_type": "code", "execution_count": null, "id": "14f93c3b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 20, "id": "2e044be9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7,)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `shape` attribute of a series object return a tuple of shape of underlying data\n", "s.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "101670e3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 21, "id": "be57c960", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "56" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `nbytes` attribute of a series object return the number of bytes of underlying data (object data type take 8 bytes)\n", "s.nbytes" ] }, { "cell_type": "code", "execution_count": null, "id": "ee4ab797", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0236e399", "metadata": {}, "outputs": [], "source": [ "# `size` attribute of a series object return number of elements in the underlying data\n", "s.size" ] }, { "cell_type": "code", "execution_count": null, "id": "b5c0a63b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 22, "id": "4d41072e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `ndim` attribute of a series object return number of dimensions of underlying data\n", "s.ndim" ] }, { "cell_type": "code", "execution_count": null, "id": "287f42d9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 25, "id": "0072cb0f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `hasnans` attribute of a series object return true if there are NaN values in the data\n", "s.hasnans" ] }, { "cell_type": "code", "execution_count": null, "id": "27db5cc4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "94db271e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1471a464", "metadata": {}, "source": [ "\n", "\n", "## 4. Understanding Index in a Series\n", "- Every series object has an index associated with every item. \n", "- The Pandas series object supports both integer-based (default) and label/string-based indexing and provides a host of methods for performing operations involving the index.\n", "

\n", " - When index is unique, Pandas use a hashtable to map `key to value` and searching can be done in O(1) time. \n", " - When index is non-unique but sorted, Pandas use binary search, which takes logarithmic time O(logN).\n", " - When index is randomly ordered, searching takes linear time, as Pandas need to check all the keys in the index O(N).

\n", "- Index in series object is used for three purposes:\n", " - Identification\n", " - Selection/Filtering/Subsetting\n", " - Alignment

" ] }, { "cell_type": "code", "execution_count": null, "id": "26d6e7a3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "08a63cdf", "metadata": {}, "source": [ "### a. Changing Index of a Series Object\n", "- In above examples, we have seen that\n", " - If we create a Series object from dictionary, the keys of dictionray become the index \n", " - If we create a Series object from a list or numPy array, the index defaults to integers from 0, 1, 2, ...\n", " - Last but not the least, we can assign the indices of our own choice, which can be integers or strings\n", "- Let us see as how we can change the indices of a series object after creation" ] }, { "cell_type": "code", "execution_count": 26, "id": "81221a73", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Rauf\n", "1 Arif\n", "2 Maaz\n", "3 Hadeed\n", "4 Mujahid\n", "dtype: object\n", "RangeIndex(start=0, stop=5, step=1)\n" ] } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "s = pd.Series(data=list1)\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "markdown", "id": "6514f86e", "metadata": {}, "source": [ ">Index attribute of series object shows that index range for this series is from (0-4) with step value of 1" ] }, { "cell_type": "markdown", "id": "2ac638df", "metadata": {}, "source": [ "**Let us modify the index of this series object to some random integers by assigning a random array of integers to `index` attribute of this series object**" ] }, { "cell_type": "code", "execution_count": 27, "id": "ca48f7cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "113 Rauf\n", "152 Arif\n", "176 Maaz\n", "191 Hadeed\n", "179 Mujahid\n", "dtype: object\n", "Int64Index([113, 152, 176, 191, 179], dtype='int64')\n" ] } ], "source": [ "arr1 = np.random.randint(low = 100, high = 200, size = 5)\n", "\n", "s.index = arr1\n", "\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "code", "execution_count": 28, "id": "4154c8d6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0 Rauf\n", "4.0 Arif\n", "2.0 Maaz\n", "6.3 Hadeed\n", "9.0 Mujahid\n", "dtype: object\n", "Float64Index([1.0, 4.0, 2.0, 6.3, 9.0], dtype='float64')\n" ] } ], "source": [ "s.index = [1,4,2,6.3,9]\n", "\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "code", "execution_count": null, "id": "ccd09941", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c7252611", "metadata": {}, "source": [ "**Changing index of a series to a list of strings**" ] }, { "cell_type": "code", "execution_count": 29, "id": "3b0106b8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Rauf\n", "1 Arif\n", "2 Maaz\n", "3 Hadeed\n", "4 Mujahid\n", "dtype: object\n", "RangeIndex(start=0, stop=5, step=1)\n" ] } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "s = pd.Series(data=list1)\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "code", "execution_count": 30, "id": "c8738768", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "num1 Rauf\n", "num2 Arif\n", "num3 Maaz\n", "num4 Hadeed\n", "num5 Mujahid\n", "dtype: object\n", "Index(['num1', 'num2', 'num3', 'num4', 'num5'], dtype='object')\n" ] } ], "source": [ "indices = ['num1', 'num2', 'num3', 'num4', 'num5']\n", "\n", "s.index = indices\n", "\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "code", "execution_count": null, "id": "872bac58", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6ae33d5a", "metadata": {}, "source": [ "\n", "\n", "### b. First use of Index (Identification)\n", "- Since every data value of a series object has an associated index (integer or string). So we can use this index/label to identify or access data value(s)\n", "- There are three ways to access elements of a series:\n", " - Using `s[]` operator and specifying the index (integer/label)\n", " - Using `s.loc[]` method and specifying the index (integer/label)\n", " - Using `.iloc[]` method and specify the position (an integer value from 0 to length-1). It also support negative indexing, the last element can be accessed by an index of -1" ] }, { "cell_type": "markdown", "id": "2f62003c", "metadata": {}, "source": [ "**Identification using Integer Indices or by Position**" ] }, { "cell_type": "code", "execution_count": 31, "id": "d708f31c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 Rauf\n", "10 Arif\n", "15 Maaz\n", "20 Hadeed\n", "25 Mujahid\n", "dtype: object" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "indices = [5, 10, 15, 20, 25]\n", "s = pd.Series(data=list1, index=indices)\n", "s" ] }, { "cell_type": "code", "execution_count": 41, "id": "c9045a42", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Mujahid'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give index to subscript operator\n", "s[25]\n", "\n", "# Subscript operator do not work on position\n", "#s[0] # will raise an error because index 0 do not exist" ] }, { "cell_type": "code", "execution_count": null, "id": "5db907c0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 43, "id": "aa0c38b5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hadeed'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give index to loc method\n", "s.loc[20]\n", "# loc method do not work on position\n", "#s.loc[0] # will raise an error because index 0 do not exist" ] }, { "cell_type": "code", "execution_count": 45, "id": "e21b7e48", "metadata": {}, "outputs": [], "source": [ "# iloc method is position based, so will flag an error if you pass an actual index\n", "#s.iloc[20] " ] }, { "cell_type": "code", "execution_count": 46, "id": "855fbda2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hadeed'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The iloc method is passed position and not index\n", "s.iloc[3]\n" ] }, { "cell_type": "code", "execution_count": null, "id": "eb2a1bbf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "50d136c4", "metadata": {}, "source": [ "**Fancy Indexing**" ] }, { "cell_type": "code", "execution_count": 47, "id": "a63e1a06", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20 Hadeed\n", "5 Rauf\n", "dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Can access multiple values by specifying a list of indices\n", "s[[20, 5]]" ] }, { "cell_type": "code", "execution_count": 48, "id": "bee25cf3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20 Hadeed\n", "5 Rauf\n", "dtype: object" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Can access multiple values by specifying a list of indices\n", "s.loc[[20, 5]]" ] }, { "cell_type": "code", "execution_count": 49, "id": "712eb404", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20 Hadeed\n", "5 Rauf\n", "dtype: object" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Can access multiple values by specifying list of positions\n", "s.iloc[[3, 0]]" ] }, { "cell_type": "markdown", "id": "2cdfc9bc", "metadata": {}, "source": [ "**Negative Indexing, work only for `iloc`**" ] }, { "cell_type": "code", "execution_count": null, "id": "49ed6d72", "metadata": {}, "outputs": [], "source": [ "#s[-1]\n", "#s.loc[-1]\n", "s.iloc[-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "7b447220", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f8025bfc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "77ac3b00", "metadata": {}, "source": [ "**Identification using String Indices or by Position**" ] }, { "cell_type": "code", "execution_count": 50, "id": "d4794a46", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "num1 Rauf\n", "num2 Arif\n", "num3 Maaz\n", "num4 Hadeed\n", "num5 Mujahid\n", "dtype: object" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "indices = ['num1', 'num2', 'num3', 'num4', 'num5']\n", "s = pd.Series(data=list1, index=indices)\n", "s" ] }, { "cell_type": "code", "execution_count": 53, "id": "7896749e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Rauf'" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give index to subscript operator (which in this case is a string or label)\n", "s['num1']" ] }, { "cell_type": "code", "execution_count": 55, "id": "7a47d221", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Maaz'" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for position as well\n", "s[2]" ] }, { "cell_type": "code", "execution_count": 56, "id": "6eb25351", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Rauf'" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give index to loc method (which in this case is a string or label)\n", "s.loc['num1']" ] }, { "cell_type": "code", "execution_count": 1, "id": "ff783e46", "metadata": {}, "outputs": [], "source": [ "# Will not work on position the way [] worked previously\n", "#s.loc[0]" ] }, { "cell_type": "code", "execution_count": 57, "id": "33fa58b7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Rauf'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# iloc method is position based, so will flag an error if you pass it string indices\n", "#s.iloc['num1'] \n", "# however will work fine if you pass an integer specifying the position\n", "s.iloc[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "fb5955b7", "metadata": {}, "outputs": [], "source": [ "s.iloc[-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "73243d05", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bcfca675", "metadata": {}, "source": [ "**Fancy Indexing**" ] }, { "cell_type": "code", "execution_count": null, "id": "f11d537e", "metadata": {}, "outputs": [], "source": [ "# Can access multiple values by specifying a list of indices (which in this case are strings or labels)\n", "s[['num3', 'num1']]" ] }, { "cell_type": "code", "execution_count": null, "id": "4f6464df", "metadata": {}, "outputs": [], "source": [ "# Can access multiple values by specifying a list of indices (which in this case are strings or labels)\n", "s.loc[['num3', 'num1']]" ] }, { "cell_type": "code", "execution_count": null, "id": "5219c075", "metadata": {}, "outputs": [], "source": [ "# iloc method is position based, so will flag an error if you pass it string indices\n", "#s.iloc['num3', 'num1'] \n", "# however will work fine if you pass an integer specifying the position\n", "s.iloc[[2,0]]" ] }, { "cell_type": "code", "execution_count": null, "id": "5fb3b05c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c33193ff", "metadata": {}, "source": [ "\n", "\n", "### c. Second use of Index (Selection)\n", "- A series can be sliced using `:` symbol, which returns a subset of a series object (values with corresponding indices).\n", "- There are three arguments of slice object `[[start]:[stop][:step]]`, and all are optional\n", "\n", "- The slice object can be used in three ways to slice a Pandas Series object::\n", " - Using `s[]` operator and specifying the index (integer/label)\n", " - Using `s.loc[]` method and specifying the index (integer/label)\n", " - Using `.iloc` method and specify the position (an integer value from 0 to length-1). It also support negative indexing, the last element can be accessed by an index of -1\n", "- Keep following points in mind:\n", " - The `stop` argument is NOT inclusive for `s[]` for integer indices, while it is inclusive for string indices.\n", " - The `stop` argument is inclusive for `s.loc[]` for both integer and label indices.\n", " - The `stop` argument is NOT inclusive for `s.iloc[]` being position based.\n", " \n", ">**Note: Once you slice a Pandas series, you get a view of the original object, which is similar to shallow copy. So if you modify an element in original series object, the change will also be visible in the other series object.**" ] }, { "cell_type": "markdown", "id": "1e3ea7a4", "metadata": {}, "source": [ "**Selection/Filtering/Subsetting of Series object having Integer indices**" ] }, { "cell_type": "code", "execution_count": 1, "id": "6ee5572d", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'pd' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/1t/g3ylw8h50cjdqmk5d6jh1qmm0000gn/T/ipykernel_18786/296599127.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mlist1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'Rauf'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Arif'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Maaz'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Hadeed'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Mujahid'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mindices\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m15\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m25\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0ms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlist1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0ms\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'pd' is not defined" ] } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "indices = [5, 10, 15, 20, 25]\n", "s = pd.Series(data=list1, index=indices)\n", "s" ] }, { "cell_type": "code", "execution_count": 67, "id": "98f62fcb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Series([], dtype: object)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[5:15]" ] }, { "cell_type": "code", "execution_count": 61, "id": "1da738e1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10 Arif\n", "15 Maaz\n", "20 Hadeed\n", "dtype: object" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The subscript operator considers the slice object as positional index and not as the actual indices \n", "# (if we have integer indices)\n", "# The `stop` argument is NOT inclusive for `s[]` for integer indices\n", "s[1:4]" ] }, { "cell_type": "code", "execution_count": 62, "id": "faacfdb9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 Rauf\n", "10 Arif\n", "15 Maaz\n", "dtype: object" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#The loc[] method considers the slice object as actual indices and not as positional indices\n", "# The stop argument is inclusive for `s.loc[]` for both integer and label indices\n", "s.loc[5:15]" ] }, { "cell_type": "code", "execution_count": 63, "id": "b9516690", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10 Arif\n", "15 Maaz\n", "20 Hadeed\n", "dtype: object" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The iloc[] method considers the slice object as positional index and not as the actual indices\n", "# The `stop` argument is NOT inclusive for `s.iloc[]` being position based\n", "s.iloc[1:4]" ] }, { "cell_type": "code", "execution_count": null, "id": "04f57513", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6492eee8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "47eed183", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e97851e4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "633520b4", "metadata": {}, "source": [ "**Selection/Filtering/Subsetting of Series object having String Indices**" ] }, { "cell_type": "code", "execution_count": 64, "id": "c56129ab", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "num1 Rauf\n", "num2 Arif\n", "num3 Maaz\n", "num4 Hadeed\n", "num5 Mujahid\n", "dtype: object" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list1 = ['Rauf', 'Arif', 'Maaz', 'Hadeed', 'Mujahid']\n", "indices = ['num1', 'num2', 'num3', 'num4', 'num5']\n", "s = pd.Series(data=list1, index=indices)\n", "s" ] }, { "cell_type": "code", "execution_count": 65, "id": "2fc7acf6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "num1 Rauf\n", "num2 Arif\n", "dtype: object" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0:2]" ] }, { "cell_type": "code", "execution_count": 66, "id": "6618a2d7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "num2 Arif\n", "num3 Maaz\n", "num4 Hadeed\n", "dtype: object" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The subscript operator considers the slice object as positional index and not as the actual indices\n", "# (if we have integer indices). However, will also consider the actual indices in case of string indices\n", "# The `stop` argument is inclusive for `s[]` for string indices, while it is NOT inclusive for integer indices.\n", "s['num2':'num4']" ] }, { "cell_type": "code", "execution_count": null, "id": "2d059b44", "metadata": {}, "outputs": [], "source": [ "# The `stop` argument is inclusive for `s[]` for string indices, while it is NOT inclusive for integer indices.\n", "s[0:2]" ] }, { "cell_type": "code", "execution_count": null, "id": "66660f6c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5efe05df", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e310d26b", "metadata": {}, "outputs": [], "source": [ "#The loc[] method considers the slice object as actual indices and not as positional indices\n", "# The stop argument is inclusive for `s.loc[]` for both integer and label indices\n", "s.loc['num2':'num4']" ] }, { "cell_type": "code", "execution_count": null, "id": "29bcd409", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "92613467", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e9214d37", "metadata": {}, "outputs": [], "source": [ "# The iloc[] method considers the slice object as positional index and not as the actual indices\n", "# iloc method is position based, so will flag an error if you pass it string indices\n", "#s.iloc['num2': 'num4'] \n", "# however will work fine if you pass an integer values (specifying positions) in the slice operator\n", "# Moreover the stop index is not inclusive\n", "s.iloc[1:4]" ] }, { "cell_type": "code", "execution_count": null, "id": "a46059a2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "93f3eedc", "metadata": {}, "source": [ "**Understanding Step with Series object having String Indices**" ] }, { "cell_type": "code", "execution_count": null, "id": "e21a6ec1", "metadata": {}, "outputs": [], "source": [ "s" ] }, { "cell_type": "code", "execution_count": null, "id": "3ec7cf17", "metadata": {}, "outputs": [], "source": [ "# The step works fine with string indices as well\n", "s['num2':'num5':1]" ] }, { "cell_type": "code", "execution_count": null, "id": "c0e987e9", "metadata": {}, "outputs": [], "source": [ "s['num2':'num5':2]" ] }, { "cell_type": "code", "execution_count": null, "id": "0a54d42f", "metadata": {}, "outputs": [], "source": [ "s['num5':'num3':-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "05f5be0c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "d4be2333", "metadata": {}, "source": [ "\n", "\n", "### d. Third use of Index (Alignment)\n", "- We can perform basic arithmetic operations like addition, subtraction, multiplication, division, etc., on two Series objects, to produce a new Series instance.\n", "- The operation is done on each corresponding pair of elements. This is done by matching the indices of the two series objects." ] }, { "cell_type": "markdown", "id": "66b50cf4", "metadata": {}, "source": [ "**Example 1:** Adding two series object with same integer indices" ] }, { "cell_type": "code", "execution_count": 68, "id": "1896462a", "metadata": {}, "outputs": [], "source": [ "list1 = [1,3,5,7,9];\n", "list2 = [2,4,6,8,10];\n", "s1 = pd.Series(data=list1);\n", "s2 = pd.Series(data=list1);" ] }, { "cell_type": "code", "execution_count": 69, "id": "78574d22", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1\n", "1 3\n", "2 5\n", "3 7\n", "4 9\n", "dtype: int64\n", "RangeIndex(start=0, stop=5, step=1)\n" ] } ], "source": [ "print(s1)\n", "print(s1.index)" ] }, { "cell_type": "code", "execution_count": 70, "id": "788d4ac4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1\n", "1 3\n", "2 5\n", "3 7\n", "4 9\n", "dtype: int64\n", "RangeIndex(start=0, stop=5, step=1)\n" ] } ], "source": [ "print(s2)\n", "print(s2.index)" ] }, { "cell_type": "code", "execution_count": 71, "id": "753cef5e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 2\n", "1 6\n", "2 10\n", "3 14\n", "4 18\n", "dtype: int64\n", "RangeIndex(start=0, stop=5, step=1)\n" ] } ], "source": [ "s3 = s1 + s2\n", "print(s3)\n", "print(s3.index)" ] }, { "cell_type": "code", "execution_count": null, "id": "c948732c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3e2db378", "metadata": {}, "source": [ "**Example 2:** Adding two series object having different integer indices" ] }, { "cell_type": "code", "execution_count": null, "id": "3be3a214", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f5f62753", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7b011a58", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "cd75a7c3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 72, "id": "66e62e56", "metadata": {}, "outputs": [], "source": [ "list1 = [6,9,7,5]\n", "index1 = [0,1,2,3]\n", "list2 = [8,6,2,1]\n", "index2 = [0,2,3,5]\n", "s1 = pd.Series(data=list1, index=index1);\n", "s2 = pd.Series(data=list2, index=index2);" ] }, { "cell_type": "code", "execution_count": 73, "id": "5919d832", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 6\n", "1 9\n", "2 7\n", "3 5\n", "dtype: int64\n", "Int64Index([0, 1, 2, 3], dtype='int64')\n" ] } ], "source": [ "print(s1)\n", "print(s1.index)" ] }, { "cell_type": "code", "execution_count": 74, "id": "ca5a5129", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 8\n", "2 6\n", "3 2\n", "5 1\n", "dtype: int64\n", "Int64Index([0, 2, 3, 5], dtype='int64')\n" ] } ], "source": [ "print(s2)\n", "print(s2.index)" ] }, { "cell_type": "code", "execution_count": 75, "id": "871fc971", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 14.0\n", "1 NaN\n", "2 13.0\n", "3 7.0\n", "5 NaN\n", "dtype: float64\n", "Int64Index([0, 1, 2, 3, 5], dtype='int64')\n" ] } ], "source": [ "s3 = s1 + s2\n", "print(s3)\n", "print(s3.index)" ] }, { "cell_type": "markdown", "id": "0dc94b75", "metadata": {}, "source": [ "**Problem:** While performing mathematical operations on series having mismatched indices, all missing values are filled in with NaN by default." ] }, { "cell_type": "code", "execution_count": null, "id": "defbb4cb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5890b468", "metadata": {}, "source": [ "**Solution:** To handle this problem, instead of using the operators (`+, -, *, /`), an explicit call to `s.add()`, `s.sub()`, `s.mul()` and `s.div()` is preferred. This allows us to replace the missing values in any of the series witth a specific value, so as to have a concrete output in place of NaN" ] }, { "cell_type": "code", "execution_count": 76, "id": "2de7e8d0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 14.0\n", "1 9.0\n", "2 13.0\n", "3 7.0\n", "5 1.0\n", "dtype: float64" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.add(s2, fill_value=0) # Compare it with above result" ] }, { "cell_type": "code", "execution_count": null, "id": "682e1f94", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cf9c2aac", "metadata": {}, "source": [ "**Example 3:** Adding two series object having different string indices" ] }, { "cell_type": "code", "execution_count": 3, "id": "8e7704fc", "metadata": {}, "outputs": [], "source": [ "list1 = [6,9,7,5, 2]\n", "labels1 = ['num1', 'num2', 'num3', 'num4', 'num5']\n", "\n", "list2 = [8,6,2,3,6]\n", "labels2 = ['num1', 'num2', 'num3', 'num8', 'num5']\n", "\n", "s1 = pd.Series(data=list1, index=labels1)\n", "s2 = pd.Series(data=list2, index=labels2)\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "c300583c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "num1 6\n", "num2 9\n", "num3 7\n", "num4 5\n", "num5 2\n", "dtype: int64\n", "Index(['num1', 'num2', 'num3', 'num4', 'num5'], dtype='object')\n" ] } ], "source": [ "print(s1)\n", "print(s1.index)" ] }, { "cell_type": "code", "execution_count": 5, "id": "c55c3331", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "num1 8\n", "num2 6\n", "num3 2\n", "num8 3\n", "num5 6\n", "dtype: int64\n", "Index(['num1', 'num2', 'num3', 'num8', 'num5'], dtype='object')\n" ] } ], "source": [ "print(s2)\n", "print(s2.index)" ] }, { "cell_type": "code", "execution_count": 6, "id": "ec478667", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "num1 14.0\n", "num2 15.0\n", "num3 9.0\n", "num4 10.0\n", "num5 8.0\n", "num8 8.0\n", "dtype: float64\n", "Index(['num1', 'num2', 'num3', 'num4', 'num5', 'num8'], dtype='object')\n" ] } ], "source": [ "# Let us use the `add()` method\n", "#s1+s2\n", "s3 = s1.add(s2, fill_value=5)\n", "#s3 = s1.add(s2)\n", "print(s3)\n", "print(s3.index)" ] }, { "cell_type": "code", "execution_count": null, "id": "71fab575", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bf470aef", "metadata": {}, "source": [ "**My dear students, please make time to practice following topics related to Series:**\n", "- Boolean/Fancy Indexing and Slicing\n", "- Use of `reset_index()` method for completely resetting the index\n", "- Use of other manipulation methods like \n", " - `s.pop(index)` is passed an index and it returns the data item at the index and removes it from series\n", " - `s.drop(indexes)` is passed one or a list of indices and returns a series of the data items. Series remains unchanged unless the inplace = True argument is passed\n", " - `s1.append(s2, ignore_index=False, verify_integrity=False)` is used to concatenate two series and return the concatenated series, original series remain unchanged\n", " - `s1.update(s2)` is used to miduft the series `s1` inplace using the values from passed series\n", ">**We will discuss these while studying Pandas Dataframe object InshaAllah**" ] }, { "cell_type": "code", "execution_count": null, "id": "37405e5c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c73f204f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e50d5209", "metadata": {}, "source": [ "# Pandas Series vs NumPy 1-D Arrays\n", ">- In a series object we can define our own labeled index to access elements of an array. These can be numbers or strings. NumPy arrays are accessed by their integer position using numbers only.\n", ">- In a series object the elements can be indexed in descending order also. In NumPy arrays, the indexing starts with zero for the first element and the index is fixed.\n", ">- While performing arithmetic operations on series having misaligned indices, NaN or missing values may be generated. In NumPy arrays, the concept of broadcasting exist and there is no concept of NaN values. While performing arithmetic on incompatible numPy arrays the operation fails.\n", ">- Series require more memory. NumPy arrays occupies lesser memory.\n", " \n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "a3e6fafb", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }