{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 3.18 (Pandas-10)

" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "## _Merging, Joining, Concatenating and Appending Dataframes_\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "68b37b21", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e9dffd4a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2a90b6a5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "12db95e1", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "\n", "**PART-I: (Merging and Joining)**\n", "1. Merging DataFrames using `pd.merge()` method\n", " - Perform **Inner Join** (which is default)\n", " - Peform **Outer**/**Full Outer Join**\n", " - Perform **Left Outer Join**\n", " - Perform **Right Outer Join**

\n", "2. Additional Parameters to `pd.merge()` Method \n", " - Use of `indicator` parameter to indicate the df to which the value belong\n", " - Use of `suffixes` parameter to differentiate between common column labels\n", " - Use of `validate` parameter to check for duplicate keys\n", " \n", "**PART-II: (Concatenating and Appending)** \n", "\n", "3. Row wise Concatenation using `pd.concat()`\n", "\n", "4. Column wise Concatenation using `pd.concat()`\n", "\n", "5. Adding a Single Row/Column in a Dataframe using `pd.concat()`\n", "\n", "6. Appending Dataframes using `df.append()`" ] }, { "cell_type": "code", "execution_count": null, "id": "4272d642", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "932255a2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "33c142c4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "b820b858", "metadata": {}, "source": [ "# Part-I (Merging and Joining)" ] }, { "cell_type": "markdown", "id": "b919365a", "metadata": {}, "source": [ "## 1. Merging DataFrames using `pd.merge()` Method\n", "Pandas `pd.merge()` is a versatile method to perform all standard database join operations between DataFrame or named Series objects.\n", "\n", "```\n", "pd.merge(left, right, how=\"inner\", indicator=False, on=None, suffixes=(\"_x\", \"_y\"), validate=None)\n", "```\n", "Where,\n", "- **`left`:** A DataFrame or named Series object.\n", "- **`right`:** Another DataFrame or named Series object.\n", "- **`how`:** specifies the type of join {`inner`, `outer`, `left`, `right`} (default is `inner`)\n", "- **`on`:** Column or index level names to join on. Must be found in both the left and right DataFrame and/or Series objects. \n", "- **`indicator`:** If set to True, adds a column to the output DataFrame called **`_merge`** with information on the source of each row {`left_only` means, this element is present only in left Dataframe, `right_only` means this is present only in right dataframe, `both` means they are present in both\n", "- **`suffixes`:** A tuple of string suffixes to apply to overlapping columns. Defaults to ('_x', '_y').\n", "- **`validate`:** If specified, checks for uniqueness of keys. This parameter can take following four values (default is None):\n", " - “one_to_one” or “1:1”: checks if merge keys are unique in both left and right datasets.\n", " - “one_to_many” or “1:m”: checks if merge keys are unique in left dataset.\n", " - “many_to_one” or “m:1”: checks if merge keys are unique in right dataset.\n", " - “many_to_many” or “m:m”: allowed, but does not result in checks." ] }, { "cell_type": "code", "execution_count": null, "id": "73dafdca", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2f0c9c87", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9cff27b2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6946785c", "metadata": {}, "source": [ "### a. Inner Join:\n", "\n", "It is the most common type of join you’ll be working with. It returns a dataframe with only those rows that have common characteristics.\n", "An inner join requires each row in the two joined dataframes to have matching column values. This is similar to the intersection of two sets.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "573706e5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 7, "id": "d880f298", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature
0Lahore39
1Muree14
2Peshawer29
3Sialkot32
\n", "
" ], "text/plain": [ " city temperature\n", "0 Lahore 39\n", "1 Muree 14\n", "2 Peshawer 29\n", "3 Sialkot 32" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us create a simple data frame\n", "import pandas as pd\n", "\n", "# This dataframe doesn't have entry for Lahore\n", "df_temp = pd.DataFrame({\n", " 'city': ['Lahore', 'Muree', 'Peshawer', 'Sialkot'],\n", " 'temperature' : [39, 14, 29, 32]\n", "})\n", "df_temp" ] }, { "cell_type": "code", "execution_count": 9, "id": "bea9cf5d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumidity
0Karachi76
1Lahore95
2Peshawer72
3Lahore70
4Muree75
\n", "
" ], "text/plain": [ " city humidity\n", "0 Karachi 76\n", "1 Lahore 95\n", "2 Peshawer 72\n", "3 Lahore 70\n", "4 Muree 75" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# This Dataframe has an extra entry for Multan\n", "df_hum = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Lahore', 'Peshawer', 'Lahore', 'Muree'],\n", " 'humidity' : [76, 95, 72, 70, 75]\n", "})\n", "df_hum" ] }, { "cell_type": "code", "execution_count": null, "id": "5ed2fa36", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8120fb57", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3a4de6a9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7e1325d0", "metadata": {}, "source": [ "**Note the column `city` on which we want to perform an inner join, in the two dataframes has only four cities in common. So the resulting dataframe will have only four rows that are common in both dataframes**" ] }, { "cell_type": "code", "execution_count": 3, "id": "934ce070", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3995
1Lahore3970
2Muree1475
3Peshawer2972
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 39 95\n", "1 Lahore 39 70\n", "2 Muree 14 75\n", "3 Peshawer 29 72" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d1 = pd.merge(df_temp, df_hum, how='inner')\n", "d1" ] }, { "cell_type": "code", "execution_count": 4, "id": "0c2e41af", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity_merge
0Lahore3995both
1Lahore3970both
2Muree1475both
3Peshawer2972both
\n", "
" ], "text/plain": [ " city temperature humidity _merge\n", "0 Lahore 39 95 both\n", "1 Lahore 39 70 both\n", "2 Muree 14 75 both\n", "3 Peshawer 29 72 both" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# merge will perform only for those cities that are common in both, which means it by-default performs inner-join\n", "d1 = pd.merge(df_temp, df_hum, on='city', how = 'inner', indicator=True)\n", "d1" ] }, { "cell_type": "code", "execution_count": 5, "id": "dbb58d7d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumiditytemperature_merge
0Lahore9539both
1Lahore7039both
2Peshawer7229both
3Muree7514both
\n", "
" ], "text/plain": [ " city humidity temperature _merge\n", "0 Lahore 95 39 both\n", "1 Lahore 70 39 both\n", "2 Peshawer 72 29 both\n", "3 Muree 75 14 both" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note only the sequence of o/p dataframe changes once we change the order in case of inner join\n", "d1 = pd.merge(df_hum, df_temp, on='city', how = 'inner', indicator=True)\n", "d1" ] }, { "cell_type": "code", "execution_count": null, "id": "247e167a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6ce3fa49", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "70c6ee02", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d378c092", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "965cd00c", "metadata": {}, "source": [ "### b. Full Join:\n", "Also known as Full Outer Join, returns all those records which either have a match in the left or right dataframe. This is similar to the union of two sets.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "4ca0bcc3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "42016dbd", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b0033b72", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ae1c49c3", "metadata": {}, "outputs": [], "source": [ "df_temp = pd.DataFrame({\n", " 'city': ['Lahore', 'Muree', 'Peshawer', 'Sialkot'],\n", " 'temperature' : [39, 14, 29, 32],\n", "})\n", "df_temp" ] }, { "cell_type": "code", "execution_count": null, "id": "56d45d38", "metadata": {}, "outputs": [], "source": [ "df_hum = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Lahore', 'Peshawer', 'Lahore', 'Muree'],\n", " 'humidity' : [76, 95, 72, 70, 75],\n", "})\n", "df_hum" ] }, { "cell_type": "markdown", "id": "015bf262", "metadata": {}, "source": [ "**Note the column `city` on which we want to perform a full outer join, in the two dataframes has a union of seven cities. So the resulting dataframe will have seven rows**" ] }, { "cell_type": "code", "execution_count": null, "id": "ff6531e4", "metadata": {}, "outputs": [], "source": [ "d2 = pd.merge(df_temp, df_hum, on='city', how='outer', indicator=True)\n", "d2" ] }, { "cell_type": "code", "execution_count": null, "id": "a594b233", "metadata": {}, "outputs": [], "source": [ "# Note only the sequence of o/p dataframe changes once we change the order in case of inner join\n", "d3 = pd.merge(df_hum, df_temp, on='city', how='outer', indicator=True)\n", "d3" ] }, { "cell_type": "code", "execution_count": null, "id": "d4f35e06", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8e1d58bd", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b47a7599", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d2dfad95", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "117ae3fb", "metadata": {}, "source": [ "### c. Left Join\n", "Also known as Left outer join. It is simply performs an inner join plus all the non-matching rows of the left dataframe are taken as it is filled with NaN for columns of the right dataframe.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "37dfc333", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2a77e4a1", "metadata": {}, "outputs": [], "source": [ "df_temp = pd.DataFrame({\n", " 'city': ['Lahore', 'Muree', 'Peshawer', 'Sialkot'],\n", " 'temperature' : [39, 14, 29, 32],\n", "})\n", "df_temp" ] }, { "cell_type": "code", "execution_count": null, "id": "a99c5976", "metadata": {}, "outputs": [], "source": [ "df_hum = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Lahore', 'Peshawer', 'Lahore', 'Muree'],\n", " 'humidity' : [76, 95, 72, 70, 75],\n", "})\n", "df_hum" ] }, { "cell_type": "markdown", "id": "c1565198", "metadata": {}, "source": [ "**Note the column `city` on which we want to perform a left outer join, in the two dataframes has an intersection of four rows. Other than these four rows, record of city Sialkot from left dataframe will also be included in the resulting dataframe being a left outer join**" ] }, { "cell_type": "code", "execution_count": null, "id": "39e6bedf", "metadata": {}, "outputs": [], "source": [ "# In left outer join, it takes all the rows from left dataframe and only common rows from right dataframe\n", "d3 = pd.merge(df_temp, df_hum, on='city', how='left', indicator=True)\n", "d3" ] }, { "cell_type": "markdown", "id": "516c5e5a", "metadata": {}, "source": [ "**Left and right outer join also depend on the order of Dataframes that are passed to merge() function. Let us change the order and understand this**" ] }, { "cell_type": "code", "execution_count": null, "id": "a1050bad", "metadata": {}, "outputs": [], "source": [ "d4 = pd.merge(df_hum, df_temp, on='city', how='left', indicator=True)\n", "d4" ] }, { "cell_type": "code", "execution_count": null, "id": "4c26ba60", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8ccf530e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "01a0b8cc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1770311c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e6d366a5", "metadata": {}, "source": [ "### d. Right Join\n", "Also known as Right outer join. It is simply performs an inner join plus all the non-matching rows of the right dataframe are taken as it is filled with NaN for columns of the left dataframe.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 2, "id": "dce2e9be", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature
0Lahore39
1Muree14
2Peshawer29
3Sialkot32
\n", "
" ], "text/plain": [ " city temperature\n", "0 Lahore 39\n", "1 Muree 14\n", "2 Peshawer 29\n", "3 Sialkot 32" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df_temp = pd.DataFrame({\n", " 'city': ['Lahore', 'Muree', 'Peshawer', 'Sialkot'],\n", " 'temperature' : [39, 14, 29, 32],\n", "})\n", "df_temp" ] }, { "cell_type": "code", "execution_count": 3, "id": "e332bc9b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumidity
0Karachi76
1Lahore95
2Peshawer72
3Lahore70
4Muree75
\n", "
" ], "text/plain": [ " city humidity\n", "0 Karachi 76\n", "1 Lahore 95\n", "2 Peshawer 72\n", "3 Lahore 70\n", "4 Muree 75" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_hum = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Lahore', 'Peshawer', 'Lahore', 'Muree'],\n", " 'humidity' : [76, 95, 72, 70, 75],\n", "})\n", "df_hum" ] }, { "cell_type": "markdown", "id": "7f2fad33", "metadata": {}, "source": [ "**Note the column `city` on which we want to perform a right outer join, in the two dataframes has an intersection of four rows. Other than these four rows, record of city Karachi from right dataframe will also be included in the resulting dataframe being a right outer join**" ] }, { "cell_type": "code", "execution_count": 4, "id": "378b81da", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity_merge
0KarachiNaN76right_only
1Lahore39.095both
2Peshawer29.072both
3Lahore39.070both
4Muree14.075both
\n", "
" ], "text/plain": [ " city temperature humidity _merge\n", "0 Karachi NaN 76 right_only\n", "1 Lahore 39.0 95 both\n", "2 Peshawer 29.0 72 both\n", "3 Lahore 39.0 70 both\n", "4 Muree 14.0 75 both" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# In Right outer join, it takes all the rows from Right dataframe and only common rows from left dataframe\n", "df3 = pd.merge(df_temp, df_hum, on='city', how='right', indicator=True)\n", "df3" ] }, { "cell_type": "markdown", "id": "a86b2254", "metadata": {}, "source": [ "**Left and right outer join also depend on the order of Dataframes that are passed to merge() function. Let us change the order and understand this**" ] }, { "cell_type": "code", "execution_count": 5, "id": "ff470642", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumiditytemperature_merge
0Lahore95.039both
1Lahore70.039both
2Muree75.014both
3Peshawer72.029both
4SialkotNaN32right_only
\n", "
" ], "text/plain": [ " city humidity temperature _merge\n", "0 Lahore 95.0 39 both\n", "1 Lahore 70.0 39 both\n", "2 Muree 75.0 14 both\n", "3 Peshawer 72.0 29 both\n", "4 Sialkot NaN 32 right_only" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4 = pd.merge(df_hum, df_temp, on='city', how='right', indicator=True)\n", "df4" ] }, { "cell_type": "code", "execution_count": null, "id": "b1e9d89c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "02431004", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "824a1129", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ad5a5458", "metadata": {}, "source": [ "## 2. Additional Parameters to `pd.merge()` Method" ] }, { "cell_type": "markdown", "id": "6ff418a6", "metadata": {}, "source": [ "#### Use of `suffixes` Parameter\n", "- When you merge dataframes having columns with same labels, other than the one on which you are joining ('city`)\n", "- The resulting dataframe will have appended suffixes (`_x`, `_y`) with column labels to differentiate b/w columns of both dataframes\n", "- For better understanding you can pass `suffixes`.....\n", "- Let us understand this by example" ] }, { "cell_type": "code", "execution_count": 6, "id": "e367b100", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer'],\n", " 'temperature' : [35, 39, 33],\n", " 'humidity' : [76, 95, 72]\n", "})\n", "\n", "df1" ] }, { "cell_type": "code", "execution_count": 7, "id": "d5092f29", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Karachi4188
1Peshawer4499
2Islamabad4779
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Karachi 41 88\n", "1 Peshawer 44 99\n", "2 Islamabad 47 79" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Peshawer', 'Islamabad'],\n", " 'temperature' : [41, 44, 47],\n", " 'humidity' : [88, 99, 79]\n", "})\n", "df2" ] }, { "cell_type": "code", "execution_count": 8, "id": "b729d62b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature_xhumidity_xtemperature_yhumidity_y
0Karachi39954188
1Peshawer33724499
\n", "
" ], "text/plain": [ " city temperature_x humidity_x temperature_y humidity_y\n", "0 Karachi 39 95 41 88\n", "1 Peshawer 33 72 44 99" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = pd.merge(df1, df2, on='city', how='inner')\n", "df3" ] }, { "cell_type": "markdown", "id": "5c2d02e8", "metadata": {}, "source": [ "- **Note that `merge` has automatically appended suffixes with column labels to differentiate b/w columns of both dataframes**\n", "- **You can use the `suffixes` parameter to `pd.merge()` method to specify the suffixes other than `_x` and `_y` to something more meaningful.**" ] }, { "cell_type": "code", "execution_count": 9, "id": "5861db25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature_lefthumidity_lefttemperature_righthumidity_right
0Karachi39954188
1Peshawer33724499
\n", "
" ], "text/plain": [ " city temperature_left humidity_left temperature_right \\\n", "0 Karachi 39 95 41 \n", "1 Peshawer 33 72 44 \n", "\n", " humidity_right \n", "0 88 \n", "1 99 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d3 = pd.merge(df1, df2, on='city', how='inner', suffixes=('_left','_right'))\n", "d3" ] }, { "cell_type": "code", "execution_count": null, "id": "6fd64dd8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5ea2b5ad", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "dd5940c8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "30693470", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3cf67c74", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "40ee79f2", "metadata": {}, "source": [ "#### Use `validate` Parameter to Check for Duplicate Keys\n", "- We can use the `validate` parameter to the `pd.merge()` method to check for uniqueness of keys. This parameter can take following four values (default is None):\n", " - `one_to_one` or `1:1`: checks if merge keys are unique in both left and right datasets.\n", " - `one_to_many` or `1:m`: checks if merge keys are unique in left dataset.\n", " - `many_to_one` or `m:1`: checks if merge keys are unique in right dataset.\n", " - `many_to_many` or `m:m`: allowed, but does not result in checks." ] }, { "cell_type": "code", "execution_count": 39, "id": "40e9ff4a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature
0Lahore35
1Karachi39
2Muree15
\n", "
" ], "text/plain": [ " city temperature\n", "0 Lahore 35\n", "1 Karachi 39\n", "2 Muree 15" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Muree'],\n", " 'temperature' : [35, 39, 15],\n", "})\n", "df1" ] }, { "cell_type": "code", "execution_count": 40, "id": "53191111", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumidity
0Lahore76
1Karachi95
2Islamabad72
3Lahore76
\n", "
" ], "text/plain": [ " city humidity\n", "0 Lahore 76\n", "1 Karachi 95\n", "2 Islamabad 72\n", "3 Lahore 76" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Islamabad', 'Lahore'],\n", " 'humidity' : [76, 95, 72, 76],\n", "})\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "id": "5dcfaf7a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e6df3cec", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 42, "id": "d0bd6257", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumidity
0LahoreNaN
1KarachiNaN
2MureeNaN
3LahoreNaN
4KarachiNaN
5IslamabadNaN
6LahoreNaN
7Lahore76.0
8Karachi95.0
9Islamabad72.0
10Lahore76.0
\n", "
" ], "text/plain": [ " city humidity\n", "0 Lahore NaN\n", "1 Karachi NaN\n", "2 Muree NaN\n", "3 Lahore NaN\n", "4 Karachi NaN\n", "5 Islamabad NaN\n", "6 Lahore NaN\n", "7 Lahore 76.0\n", "8 Karachi 95.0\n", "9 Islamabad 72.0\n", "10 Lahore 76.0" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.concat([df1, df2] , ignore_index=True)\n", "df1" ] }, { "cell_type": "code", "execution_count": 38, "id": "91661194", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityhumidity
0LahoreNaN
1KarachiNaN
2MureeNaN
3Lahore76.0
4Karachi95.0
5Islamabad72.0
6Lahore76.0
7Lahore76.0
8Karachi95.0
9Islamabad72.0
10Lahore76.0
11Lahore76.0
12Karachi95.0
13Islamabad72.0
14Lahore76.0
15LahoreNaN
16KarachiNaN
17MureeNaN
18Lahore76.0
19Karachi95.0
20Islamabad72.0
21Lahore76.0
22Lahore76.0
23Karachi95.0
24Islamabad72.0
25Lahore76.0
26Lahore76.0
27Karachi95.0
28Islamabad72.0
29Lahore76.0
30Lahore76.0
31Karachi95.0
32Islamabad72.0
33Lahore76.0
34LahoreNaN
35KarachiNaN
36MureeNaN
37Lahore76.0
38Karachi95.0
39Islamabad72.0
40Lahore76.0
41Lahore76.0
42Karachi95.0
43Islamabad72.0
44Lahore76.0
45Lahore76.0
46Karachi95.0
47Islamabad72.0
48Lahore76.0
49Lahore76.0
50Karachi95.0
51Islamabad72.0
52Lahore76.0
\n", "
" ], "text/plain": [ " city humidity\n", "0 Lahore NaN\n", "1 Karachi NaN\n", "2 Muree NaN\n", "3 Lahore 76.0\n", "4 Karachi 95.0\n", "5 Islamabad 72.0\n", "6 Lahore 76.0\n", "7 Lahore 76.0\n", "8 Karachi 95.0\n", "9 Islamabad 72.0\n", "10 Lahore 76.0\n", "11 Lahore 76.0\n", "12 Karachi 95.0\n", "13 Islamabad 72.0\n", "14 Lahore 76.0\n", "15 Lahore NaN\n", "16 Karachi NaN\n", "17 Muree NaN\n", "18 Lahore 76.0\n", "19 Karachi 95.0\n", "20 Islamabad 72.0\n", "21 Lahore 76.0\n", "22 Lahore 76.0\n", "23 Karachi 95.0\n", "24 Islamabad 72.0\n", "25 Lahore 76.0\n", "26 Lahore 76.0\n", "27 Karachi 95.0\n", "28 Islamabad 72.0\n", "29 Lahore 76.0\n", "30 Lahore 76.0\n", "31 Karachi 95.0\n", "32 Islamabad 72.0\n", "33 Lahore 76.0\n", "34 Lahore NaN\n", "35 Karachi NaN\n", "36 Muree NaN\n", "37 Lahore 76.0\n", "38 Karachi 95.0\n", "39 Islamabad 72.0\n", "40 Lahore 76.0\n", "41 Lahore 76.0\n", "42 Karachi 95.0\n", "43 Islamabad 72.0\n", "44 Lahore 76.0\n", "45 Lahore 76.0\n", "46 Karachi 95.0\n", "47 Islamabad 72.0\n", "48 Lahore 76.0\n", "49 Lahore 76.0\n", "50 Karachi 95.0\n", "51 Islamabad 72.0\n", "52 Lahore 76.0" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.concat([df1, df2], join='outer', ignore_index=True )\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "id": "bb6095db", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "30f82f5f", "metadata": {}, "source": [ ">**`one_to_one` or `1:1`: checks if merge keys are unique in both left and right dataframes, if not then throw exception**" ] }, { "cell_type": "code", "execution_count": null, "id": "548b7339", "metadata": {}, "outputs": [], "source": [ "df1, df2" ] }, { "cell_type": "code", "execution_count": 13, "id": "ae9bb499", "metadata": {}, "outputs": [], "source": [ "#pd.merge(df1, df2, on='city', how='outer', validate='one_to_one')" ] }, { "cell_type": "code", "execution_count": null, "id": "c3ff2dec", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "52715b67", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7b4c42a0", "metadata": {}, "source": [ ">**`one_to_many` or `1:m`: checks if merge keys are unique in left dataframe, if not then throw exception**" ] }, { "cell_type": "code", "execution_count": null, "id": "033124a0", "metadata": {}, "outputs": [], "source": [ "df1, df2" ] }, { "cell_type": "code", "execution_count": 14, "id": "47fd4d90", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore35.076.0
1Lahore35.076.0
2Karachi39.095.0
3Muree15.0NaN
4IslamabadNaN72.0
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35.0 76.0\n", "1 Lahore 35.0 76.0\n", "2 Karachi 39.0 95.0\n", "3 Muree 15.0 NaN\n", "4 Islamabad NaN 72.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df1, df2, on='city', how='outer', validate='one_to_many')" ] }, { "cell_type": "code", "execution_count": null, "id": "2cfb94bb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3839c38e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c4cab63d", "metadata": {}, "source": [ ">**`many_to_one` or `m:1`: checks if merge keys are unique in right dataframe, if not then throw exception**" ] }, { "cell_type": "code", "execution_count": null, "id": "44ed9a0c", "metadata": {}, "outputs": [], "source": [ "df1, df2" ] }, { "cell_type": "code", "execution_count": null, "id": "588d2d3f", "metadata": {}, "outputs": [], "source": [ "#pd.merge(df1, df2, on='city', how='outer', validate='many_to_one')" ] }, { "cell_type": "code", "execution_count": null, "id": "34cef7f2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6e56a98b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "86e7f5c9", "metadata": {}, "source": [ ">**`many_to_many` or `m:m`: No checks are performed on keys uniqueness**" ] }, { "cell_type": "code", "execution_count": null, "id": "4d9f59fb", "metadata": {}, "outputs": [], "source": [ "df1, df2" ] }, { "cell_type": "code", "execution_count": null, "id": "181fd043", "metadata": {}, "outputs": [], "source": [ "pd.merge(df1, df2, on='city', how='outer', validate='many_to_many')" ] }, { "cell_type": "code", "execution_count": null, "id": "b77d3f96", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "096fb729", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "084a804d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f9968408", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ca0391c3", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "bcdeaa8f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "80d035c8", "metadata": {}, "source": [ "# Part-II (Concatenating and Appending)" ] }, { "cell_type": "code", "execution_count": null, "id": "b826751f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e26ad476", "metadata": {}, "source": [ "## Concatenation of DataFrames (Row Wise + Column Wise)\n", "\n", "\n", "\n", "\n", "









\n", "\n", "\n", "- The `pd.concat()` method is used to concat pandas objects along a particular axis with optional set logic along the other axes. \n", "```\n", "pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, verify_integrity=False)\n", "```\n", "\n", "Where,\n", "- `objs`: a sequence or mapping of Series or DataFrame objects\n", "- `axis`: The axis to concatenate along. {0/’index’, 1/’columns’}, default 0\n", "- `join`{‘inner’, ‘outer’}, Default is `outer` for union. If `inner` that means intersection\n", "- `ignore_index`: If True, the resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. (default is False)\n", "- `keys`: sequence, default None (Construct hierarchical index using the passed keys as the outermost level.)\n", "- `verify_integrity` : boolean, default False. Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.\n" ] }, { "cell_type": "markdown", "id": "feecd6b4", "metadata": {}, "source": [ "## 3. Row-Wise Concatenation\n", "" ] }, { "cell_type": "code", "execution_count": 26, "id": "1626679f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "Pak_Weather = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " 'humidity' : [76, 95, 72, 81, 70],\n", "})\n", "Pak_Weather" ] }, { "cell_type": "code", "execution_count": 27, "id": "adb91c56", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Dubai 41 88\n", "1 Sharja 44 99\n", "2 Ajman 47 79\n", "3 Abu Dhabi 45 86" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "UAE_Weather = pd.DataFrame({\n", " 'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],\n", " 'temperature' : [41, 44, 47, 45],\n", " 'humidity' : [88, 99, 79, 86],\n", "})\n", "UAE_Weather" ] }, { "cell_type": "code", "execution_count": 30, "id": "e90c0d88", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70\n", "0 Dubai 41 88\n", "1 Sharja 44 99\n", "2 Ajman 47 79\n", "3 Abu Dhabi 45 86" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.concat([Pak_Weather, UAE_Weather], join='outer' )\n", "df1" ] }, { "cell_type": "markdown", "id": "3473f4e0", "metadata": {}, "source": [ "#### Concatenate Dataframes (row-wise)" ] }, { "cell_type": "code", "execution_count": 17, "id": "cf0a1462", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70\n", "0 Dubai 41 88\n", "1 Sharja 44 99\n", "2 Ajman 47 79\n", "3 Abu Dhabi 45 86" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.concat([Pak_Weather, UAE_Weather])\n", "df1 = pd.concat([Pak_Weather, UAE_Weather], axis=0)\n", "df1" ] }, { "cell_type": "markdown", "id": "7e545d4f", "metadata": {}, "source": [ "- Notice the index is also concatenated as such\n", "- To handle this pass `ignore_index` parameter a value of `True`, so that the resulting axis is be labeled 0, …, n - 1. \n", "- Useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information.\n", "- Note the index values on the other axes (i.e., columns) have still respected in the join." ] }, { "cell_type": "code", "execution_count": 18, "id": "4550a39e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
5Dubai4188
6Sharja4499
7Ajman4779
8Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70\n", "5 Dubai 41 88\n", "6 Sharja 44 99\n", "7 Ajman 47 79\n", "8 Abu Dhabi 45 86" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.concat([Pak_Weather,UAE_Weather], axis=0, ignore_index=True)\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "id": "d4389cd0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "45a45239", "metadata": {}, "source": [ "- Other than the numeric index, if you want to have an additional index for your sub groups, you can use the `keys` argument to `pd.concat()` method\n", "- It provides multi-indexing\n", "- Remember this will work only if the `ignore_index` argument is `False` which is the default" ] }, { "cell_type": "code", "execution_count": 19, "id": "9faa0c04", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
city0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
\n", "
" ], "text/plain": [ " city temperature humidity\n", "city 0 Lahore 35 76\n", " 1 Karachi 39 95\n", " 2 Peshawer 33 72\n", " 3 Islamabad 29 81\n", " 4 Muree 15 70" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = pd.concat([Pak_Weather, UAE_Weather], axis=0, keys=[\"city\",])\n", "df3" ] }, { "cell_type": "code", "execution_count": 20, "id": "c0424d4a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
Pak0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
UAE0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "Pak 0 Lahore 35 76\n", " 1 Karachi 39 95\n", " 2 Peshawer 33 72\n", " 3 Islamabad 29 81\n", " 4 Muree 15 70\n", "UAE 0 Dubai 41 88\n", " 1 Sharja 44 99\n", " 2 Ajman 47 79\n", " 3 Abu Dhabi 45 86" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = pd.concat([Pak_Weather, UAE_Weather], axis=0, keys=[\"Pak\", \"UAE\"])\n", "df3" ] }, { "cell_type": "markdown", "id": "fc7f40f1", "metadata": {}, "source": [ "- The advantage of doing this is you can use `df.loc` to get a subset of your dataframe\n", "- So, after getting a big dataframe if you want to get the dataframe from which it was created keys arg is useful" ] }, { "cell_type": "code", "execution_count": 21, "id": "99a6d084", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.loc['Pak', :]" ] }, { "cell_type": "code", "execution_count": 22, "id": "fb637708", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Dubai 41 88\n", "1 Sharja 44 99\n", "2 Ajman 47 79\n", "3 Abu Dhabi 45 86" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.loc['UAE', :]" ] }, { "cell_type": "code", "execution_count": null, "id": "de96c379", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "98968442", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ed1be4fb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6138a715", "metadata": {}, "source": [ "#### What will Happen if one of the Dataframe has an Additional Column\n", "- If you combine two Dataframe objects which do not have all the same columns, then the columns outside the intersection will be filled with NaN values." ] }, { "cell_type": "code", "execution_count": 23, "id": "d83fb7a5", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperature
0Lahore35
1Karachi39
2Peshawer33
3Islamabad29
4Muree15
\n", "
" ], "text/plain": [ " city temperature\n", "0 Lahore 35\n", "1 Karachi 39\n", "2 Peshawer 33\n", "3 Islamabad 29\n", "4 Muree 15" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Pak_Weather = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " \n", "})\n", "Pak_Weather" ] }, { "cell_type": "code", "execution_count": 24, "id": "25490022", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Dubai4188
1Sharja4499
2Ajman4779
3Abu Dhabi4586
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Dubai 41 88\n", "1 Sharja 44 99\n", "2 Ajman 47 79\n", "3 Abu Dhabi 45 86" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "UAE_Weather = pd.DataFrame({\n", " 'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],\n", " 'temperature' : [41, 44, 47, 45],\n", " 'humidity' : [88, 99, 79, 86],\n", "})\n", "UAE_Weather" ] }, { "cell_type": "code", "execution_count": 25, "id": "6a3a2da6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore35NaN
1Karachi39NaN
2Peshawer33NaN
3Islamabad29NaN
4Muree15NaN
5Dubai4188.0
6Sharja4499.0
7Ajman4779.0
8Abu Dhabi4586.0
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 NaN\n", "1 Karachi 39 NaN\n", "2 Peshawer 33 NaN\n", "3 Islamabad 29 NaN\n", "4 Muree 15 NaN\n", "5 Dubai 41 88.0\n", "6 Sharja 44 99.0\n", "7 Ajman 47 79.0\n", "8 Abu Dhabi 45 86.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NaN will be placed where values are missing\n", "df = pd.concat([Pak_Weather,UAE_Weather], axis=0, ignore_index=True)\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "3e303b79", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "85ff090a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2756791b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "214dbb0b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fa0954ac", "metadata": {}, "source": [ "## 4. Column Wise Concatenation\n", "- It is not advised to concatenate dataframes column wise. If you want to then you need to take care of some checks like:\n", " - the number of rows must be same in both dataframes, and\n", " - Indexes of both dataframes are sorted\n", "- If you are done with all the checks then you can simply use `axis=1` to do the job.\n", "\n", "" ] }, { "cell_type": "markdown", "id": "916d7dcf", "metadata": {}, "source": [ "### a. Creating a two Simple Dataframe" ] }, { "cell_type": "code", "execution_count": null, "id": "0f961d7e", "metadata": {}, "outputs": [], "source": [ "temp_df = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", "})\n", "temp_df" ] }, { "cell_type": "code", "execution_count": null, "id": "aa8344bc", "metadata": {}, "outputs": [], "source": [ "wind_df = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'wind speed' : [9, 12, 7, 13, 18],\n", "})\n", "wind_df" ] }, { "cell_type": "markdown", "id": "4e2288f3", "metadata": {}, "source": [ "### b. Concatenate Dataframes (column-wise)" ] }, { "cell_type": "code", "execution_count": null, "id": "175448a6", "metadata": {}, "outputs": [], "source": [ "# We have to use the argument axis=1\n", "df = pd.concat([temp_df,wind_df], axis=1)\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "1fd40a60", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e46f0d9d", "metadata": {}, "source": [ "### c. What will happen if we have missing data in our dataframes" ] }, { "cell_type": "code", "execution_count": null, "id": "02ce2467", "metadata": {}, "outputs": [], "source": [ "# This dataframe do not have the temperature for Lahore\n", "temp_df = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [39, 33, 29, 15],\n", "})\n", "temp_df" ] }, { "cell_type": "code", "execution_count": null, "id": "78b39323", "metadata": {}, "outputs": [], "source": [ "#This dataframe do not have the windspeed of Islamabad\n", "wind_df = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Muree'],\n", " 'wind speed' : [9, 12, 7, 18],\n", "})\n", "wind_df" ] }, { "cell_type": "code", "execution_count": null, "id": "0469a354", "metadata": { "scrolled": true }, "outputs": [], "source": [ "df1 = pd.concat([temp_df,wind_df], axis=1)\n", "df1" ] }, { "cell_type": "markdown", "id": "2ec7b9b6", "metadata": {}, "source": [ "**This doesnot look correct**\n", "- We have missing data in the resulting dataframe, i.e., it does not contain record for Lahore, which was there in the second dataframe but not in the first\n", "- Solution is while creating the dataframe you pass it the index\n", "- In Pandas, while creating a DataFrame, you can pass the index argument with appropriate related indices, which is a way to align rows from different dataframes" ] }, { "cell_type": "code", "execution_count": null, "id": "8700fe49", "metadata": {}, "outputs": [], "source": [ "# This dataframe do not have the temperature for Lahore\n", "temp_df = pd.DataFrame({\n", " 'city': [ 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [39, 33, 29, 15],\n", "},index=[0,1,2,3])\n", "temp_df" ] }, { "cell_type": "code", "execution_count": null, "id": "d3e1dc50", "metadata": {}, "outputs": [], "source": [ "#This dataframe do not have the windspeed of Islamabad\n", "# Note the indices in wind_df are related to indices of temp_df\n", "wind_df = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Muree'],\n", " 'wind speed' : [9, 12, 7, 18],\n", "}, index=[4,0,1,3])\n", "wind_df" ] }, { "cell_type": "markdown", "id": "f25ca5b9", "metadata": {}, "source": [ "---\n", "#### Note the indexes in above two dataframes match. Now concatenation will be OK\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "5fcd61ae", "metadata": {}, "outputs": [], "source": [ "df = pd.concat([temp_df,wind_df], axis=1)\n", "df" ] }, { "cell_type": "markdown", "id": "6e3306da", "metadata": {}, "source": [ ">- Concatenating Dataframes along axis = 1 adds one Dataframe along the other. It is like a full outer join. Placing NaN for non-matching rows in the left as well as right Dataframes.\n", ">- By default, a concatenation results in a set union, where all data is preserved." ] }, { "cell_type": "code", "execution_count": null, "id": "f339166b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ea8276dd", "metadata": {}, "source": [ "## 5. Adding a Single Row/Column in a Dataframe\n", "- Now let us see how we can concat a single row or a single column to a dataframe using the `pd.concat()` method." ] }, { "cell_type": "markdown", "id": "aa3e80c9", "metadata": {}, "source": [ "### a. Adding a Row in a Dataframe" ] }, { "cell_type": "code", "execution_count": null, "id": "969be5e0", "metadata": {}, "outputs": [], "source": [ "df1 = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " 'humidity' : [76, 95, 72, 81, 70],\n", "})\n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "e88f7016", "metadata": {}, "outputs": [], "source": [ "df2 = pd.DataFrame({\"city\": \"Multan\", \"temperature\": 45, \"humidity\": 75}, index=[5])\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "id": "a4ca835d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "acbed946", "metadata": {}, "outputs": [], "source": [ "df3 = pd.concat([df1, df2], ignore_index=True, axis = 0)\n", "df3" ] }, { "cell_type": "markdown", "id": "eaee8c26", "metadata": {}, "source": [ ">**You can place the new row at your desired location using slicing operator, as shown below**" ] }, { "cell_type": "code", "execution_count": null, "id": "3578ad25", "metadata": {}, "outputs": [], "source": [ "df3 = pd.concat([df1[:2], df2, df1[2:]], ignore_index = True)\n", "df3" ] }, { "cell_type": "code", "execution_count": null, "id": "21fcac05", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "de0bb80d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b58a3ec8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a4b7432b", "metadata": {}, "source": [ "### b. Adding a Column in a Dataframe" ] }, { "cell_type": "code", "execution_count": null, "id": "06be80a5", "metadata": {}, "outputs": [], "source": [ "Pak_Weather = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " 'humidity' : [76, 95, 72, 81, 70],\n", "})\n", "Pak_Weather" ] }, { "cell_type": "code", "execution_count": null, "id": "18946847", "metadata": {}, "outputs": [], "source": [ "s = pd.Series([\"Humid\", 'Dry', 'Rainy', 'Humid', 'Rainy'], name=\"event\")\n", "s" ] }, { "cell_type": "code", "execution_count": null, "id": "cdddbf58", "metadata": {}, "outputs": [], "source": [ "df = pd.concat([Pak_Weather, s], axis=1)\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "a864c0a5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f99f919f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ab483930", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9af9231d", "metadata": {}, "source": [ "\n", "\n", "## 6. Appending DataFrames\n", "- The `df1.append(df2)` method is used to concat the second dataframe’s records at the end of first dataframe (along axis=0). Columns not present in the first DataFrame are added as new columns\n", "- The `df1.append(df2)` method considers the calling dataframe as main object and adds rows to that dataframe from the dataframes that are passed to the function as argument.\n", "- It returns a new dataframe object consisting of the rows of caller and the rows of `other`. The dataframe that called the `append()` method, remain unchanged.\n", "```\n", "df.append(other, ignore_index=False, verify_integrity=False, sort=False)\n", "```\n", "\n", " - `other`: DataFrame or Series/dict-like object, or list of these (The data to append.)\n", " - `ignore_index`: If True, the resulting axis will be labeled 0, 1, …, n - 1 (default is False)\n", " - `verify_integrity`: If True, raise ValueError on creating index with duplicates (default is False)\n", " - `sort`: Sort columns if the columns of `self` and `other` are not aligned (default is False)" ] }, { "cell_type": "markdown", "id": "be5cc5d3", "metadata": {}, "source": [ "### a. Append Two DataFrames" ] }, { "cell_type": "code", "execution_count": null, "id": "1023fe8f", "metadata": {}, "outputs": [], "source": [ "Pak_Weather = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " 'humidity' : [76, 95, 72, 81, 70],\n", "})\n", "Pak_Weather" ] }, { "cell_type": "code", "execution_count": null, "id": "40ae4f44", "metadata": {}, "outputs": [], "source": [ "UAE_Weather = pd.DataFrame({\n", " 'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],\n", " 'temperature' : [41, 44, 47, 45],\n", " 'humidity' : [88, 99, 79, 86],\n", "})\n", "UAE_Weather" ] }, { "cell_type": "code", "execution_count": null, "id": "2ba1f307", "metadata": {}, "outputs": [], "source": [ "# append Dataframe\n", "df2 = Pak_Weather.append(UAE_Weather)\n", "df2" ] }, { "cell_type": "markdown", "id": "bdbd5966", "metadata": {}, "source": [ "- Notice the index is also concatenated as such\n", "- To handle this pass `ignore_index` parameter a value of `True`, so that the resulting axis is be labeled 0, …, n - 1. \n", "- Useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information." ] }, { "cell_type": "code", "execution_count": null, "id": "e17da324", "metadata": {}, "outputs": [], "source": [ "# set the ignore_index to true\n", "df2 = Pak_Weather.append(UAE_Weather, ignore_index=True)\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "id": "c30e92d2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9fc7cb85", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a1126bfc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ac0b6a94", "metadata": {}, "source": [ "### b. Append a Row in DataFrame" ] }, { "cell_type": "code", "execution_count": 43, "id": "11741128", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
0Lahore3576
1Karachi3995
2Peshawer3372
3Islamabad2981
4Muree1570
\n", "
" ], "text/plain": [ " city temperature humidity\n", "0 Lahore 35 76\n", "1 Karachi 39 95\n", "2 Peshawer 33 72\n", "3 Islamabad 29 81\n", "4 Muree 15 70" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Pak_Weather = pd.DataFrame({\n", " 'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],\n", " 'temperature' : [35, 39, 33, 29, 15],\n", " 'humidity' : [76, 95, 72, 81, 70],\n", "})\n", "Pak_Weather" ] }, { "cell_type": "code", "execution_count": 44, "id": "b67aafdc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citytemperaturehumidity
5Multan4575
\n", "
" ], "text/plain": [ " city temperature humidity\n", "5 Multan 45 75" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creating a row to be appended\n", "d1 = pd.DataFrame({\"city\": \"Multan\", \"temperature\": 45, \"humidity\": 75}, index=[5])\n", "d1" ] }, { "cell_type": "code", "execution_count": null, "id": "7731c2ca", "metadata": {}, "outputs": [], "source": [ "# Append this dataframe having single row to Pak_Weather dataframe\n", "df3 = Pak_Weather.append(d1)\n", "df3" ] }, { "cell_type": "code", "execution_count": null, "id": "89e23b8e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6f8a148f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a463c667", "metadata": {}, "source": [ "**Columns of passed/other dataframe that are not in the caller are added as new columns**" ] }, { "cell_type": "code", "execution_count": null, "id": "61338b44", "metadata": {}, "outputs": [], "source": [ "d1 = pd.DataFrame({\"city\": \"Sialkot\", \"temperature\": 45, \"humidity\": 75, \"newcol\": 66}, index=[5])\n", "d1" ] }, { "cell_type": "code", "execution_count": null, "id": "77f1795a", "metadata": {}, "outputs": [], "source": [ "df3 = Pak_Weather.append(d1)\n", "df3" ] }, { "cell_type": "code", "execution_count": null, "id": "1bf283c7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "39898602", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7580b7e4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }