{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How do I merge DataFrames in pandas? ([video](https://www.youtube.com/watch?v=iYWKfUOtGaw&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=32))\n", "\n", "- [My pandas video series](https://www.dataschool.io/easier-data-analysis-with-pandas/)\n", "- [GitHub repository](https://github.com/justmarkham/pandas-videos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of contents\n", "\n", "1. Selecting a Function\n", "2. Joining (Merging) DataFrames\n", "3. What if...?\n", "4. Four Types of Joins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Selecting a Function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Taken from [Merging DataFrames with pandas](https://www.datacamp.com/courses/merging-dataframes-with-pandas?tap_a=5644-dce66f&tap_s=280411-a25fc8) (DataCamp course):\n", "\n", "- `df1.append(df2)`: stacking vertically\n", "- `pd.concat([df1, df2])`:\n", " - stacking many horizontally or vertically\n", " - simple inner/outer joins on Indexes\n", "- `df1.join(df2)`: inner/outer/left/right joins on Indexes\n", "- `pd.merge(df1, df2)`: many joins on multiple columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2: Joining (Merging) DataFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the [MovieLens 100k data](http://grouplens.org/datasets/movielens/), let's create two DataFrames:\n", "\n", "- **movies**: shows information about movies, namely a unique **movie_id** and its **title**\n", "- **ratings**: shows the **rating** that a particular **user_id** gave to a particular **movie_id** at a particular **timestamp**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Movies" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitle
01Toy Story (1995)
12GoldenEye (1995)
23Four Rooms (1995)
34Get Shorty (1995)
45Copycat (1995)
\n", "
" ], "text/plain": [ " movie_id title\n", "0 1 Toy Story (1995)\n", "1 2 GoldenEye (1995)\n", "2 3 Four Rooms (1995)\n", "3 4 Get Shorty (1995)\n", "4 5 Copycat (1995)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movie_cols = ['movie_id', 'title']\n", "movies = pd.read_table('data/u.item', sep='|', header=None, names=movie_cols, usecols=[0, 1])\n", "movies.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1682, 2)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.shape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1682" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.movie_id.nunique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ratings" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idmovie_idratingtimestamp
01962423881250949
11863023891717742
2223771878887116
3244512880606923
41663461886397596
\n", "
" ], "text/plain": [ " user_id movie_id rating timestamp\n", "0 196 242 3 881250949\n", "1 186 302 3 891717742\n", "2 22 377 1 878887116\n", "3 244 51 2 880606923\n", "4 166 346 1 886397596" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rating_cols = ['user_id', 'movie_id', 'rating', 'timestamp']\n", "ratings = pd.read_table('data/u.data', sep='\\t', header=None, names=rating_cols)\n", "ratings.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(100000, 4)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings.shape" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1682" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings.movie_id.nunique()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idmovie_idratingtimestamp
2430814887736532
45428715875334088
95714814877019411
97128014891700426
13246613883601324
\n", "
" ], "text/plain": [ " user_id movie_id rating timestamp\n", "24 308 1 4 887736532\n", "454 287 1 5 875334088\n", "957 148 1 4 877019411\n", "971 280 1 4 891700426\n", "1324 66 1 3 883601324" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings.loc[ratings.movie_id == 1, :].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merging Movies and Ratings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's pretend that you want to examine the ratings DataFrame, but you want to know the **title** of each movie rather than its **movie_id**. The best way to accomplish this objective is by \"joining\" (or \"merging\") the DataFrames using the Pandas `merge` function:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['movie_id', 'title'], dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.columns" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['user_id', 'movie_id', 'rating', 'timestamp'], dtype='object')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings.columns" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['movie_id', 'title', 'user_id', 'rating', 'timestamp'], dtype='object')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movie_ratings = pd.merge(movies, ratings)\n", "movie_ratings.columns" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleuser_idratingtimestamp
01Toy Story (1995)3084887736532
11Toy Story (1995)2875875334088
21Toy Story (1995)1484877019411
31Toy Story (1995)2804891700426
41Toy Story (1995)663883601324
\n", "
" ], "text/plain": [ " movie_id title user_id rating timestamp\n", "0 1 Toy Story (1995) 308 4 887736532\n", "1 1 Toy Story (1995) 287 5 875334088\n", "2 1 Toy Story (1995) 148 4 877019411\n", "3 1 Toy Story (1995) 280 4 891700426\n", "4 1 Toy Story (1995) 66 3 883601324" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movie_ratings.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(100000, 5)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movie_ratings.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what just happened:\n", "\n", "- Pandas noticed that movies and ratings had one column in common, namely **movie_id**. This is the \"key\" on which the DataFrames will be joined.\n", "- The first **movie_id** in movies is 1. Thus, Pandas looked through every row in the ratings DataFrame, searching for a movie_id of 1. Every time it found such a row, it recorded the **user_id**, **rating**, and **timestamp** listed in that row. In this case, it found 452 matching rows.\n", "- The second **movie_id** in movies is 2. Again, Pandas did a search of ratings and found 131 matching rows.\n", "- This process was repeated for all of the remaining rows in movies.\n", "\n", "At the end of the process, the movie_ratings DataFrame is created, which contains the two columns from movies (**movie_id** and **title**) and the three other colums from ratings (**user_id**, **rating**, and **timestamp**).\n", "\n", "- **movie_id** 1 and its **title** are listed 452 times, next to the **user_id**, **rating**, and **timestamp** for each of the 452 matching ratings.\n", "- **movie_id** 2 and its **title** are listed 131 times, next to the **user_id**, **rating**, and **timestamp** for each of the 131 matching ratings.\n", "- And so on, for every movie in the dataset." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1682, 2)\n", "(100000, 4)\n", "(100000, 5)\n" ] } ], "source": [ "print(movies.shape)\n", "print(ratings.shape)\n", "print(movie_ratings.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice the shapes of the three DataFrames:\n", "\n", "- There are 1682 rows in the movies DataFrame.\n", "- There are 100000 rows in the ratings DataFrame.\n", "- The `merge` function resulted in a movie_ratings DataFrame with 100000 rows, because every row from ratings matched a row from movies.\n", "- The movie_ratings DataFrame has 5 columns, namely the 2 columns from movies, plus the 4 columns from ratings, minus the 1 column in common.\n", "\n", "By default, the `merge` function joins the DataFrames using all column names that are in common (**movie_id**, in this case). The [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html) explains how you can override this behavior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3: What if...?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What if the columns you want to join on don't have the same name?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['m_id', 'title'], dtype='object')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.columns = ['m_id', 'title']\n", "movies.columns" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['user_id', 'movie_id', 'rating', 'timestamp'], dtype='object')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings.columns" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
m_idtitleuser_idmovie_idratingtimestamp
01Toy Story (1995)30814887736532
11Toy Story (1995)28715875334088
21Toy Story (1995)14814877019411
31Toy Story (1995)28014891700426
41Toy Story (1995)6613883601324
\n", "
" ], "text/plain": [ " m_id title user_id movie_id rating timestamp\n", "0 1 Toy Story (1995) 308 1 4 887736532\n", "1 1 Toy Story (1995) 287 1 5 875334088\n", "2 1 Toy Story (1995) 148 1 4 877019411\n", "3 1 Toy Story (1995) 280 1 4 891700426\n", "4 1 Toy Story (1995) 66 1 3 883601324" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(movies, ratings, left_on='m_id', right_on='movie_id').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What if you want to join on one index?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
title
m_id
1Toy Story (1995)
2GoldenEye (1995)
3Four Rooms (1995)
4Get Shorty (1995)
5Copycat (1995)
\n", "
" ], "text/plain": [ " title\n", "m_id \n", "1 Toy Story (1995)\n", "2 GoldenEye (1995)\n", "3 Four Rooms (1995)\n", "4 Get Shorty (1995)\n", "5 Copycat (1995)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies = movies.set_index('m_id')\n", "movies.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleuser_idmovie_idratingtimestamp
24Toy Story (1995)30814887736532
454Toy Story (1995)28715875334088
957Toy Story (1995)14814877019411
971Toy Story (1995)28014891700426
1324Toy Story (1995)6613883601324
\n", "
" ], "text/plain": [ " title user_id movie_id rating timestamp\n", "24 Toy Story (1995) 308 1 4 887736532\n", "454 Toy Story (1995) 287 1 5 875334088\n", "957 Toy Story (1995) 148 1 4 877019411\n", "971 Toy Story (1995) 280 1 4 891700426\n", "1324 Toy Story (1995) 66 1 3 883601324" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(movies, ratings, left_index=True, right_on='movie_id').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What if you want to join on two indexes?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idratingtimestamp
movie_id
2421963881250949
3021863891717742
377221878887116
512442880606923
3461661886397596
\n", "
" ], "text/plain": [ " user_id rating timestamp\n", "movie_id \n", "242 196 3 881250949\n", "302 186 3 891717742\n", "377 22 1 878887116\n", "51 244 2 880606923\n", "346 166 1 886397596" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings = ratings.set_index('movie_id')\n", "ratings.head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleuser_idratingtimestamp
1Toy Story (1995)3084887736532
1Toy Story (1995)2875875334088
1Toy Story (1995)1484877019411
1Toy Story (1995)2804891700426
1Toy Story (1995)663883601324
\n", "
" ], "text/plain": [ " title user_id rating timestamp\n", "1 Toy Story (1995) 308 4 887736532\n", "1 Toy Story (1995) 287 5 875334088\n", "1 Toy Story (1995) 148 4 877019411\n", "1 Toy Story (1995) 280 4 891700426\n", "1 Toy Story (1995) 66 3 883601324" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(movies, ratings, left_index=True, right_index=True).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4: Four Types of Joins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are actually four types of joins supported by the Pandas `merge` function. Here's how they are described by the documentation:\n", "\n", "- **inner:** use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys\n", "- **outer:** use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically\n", "- **left:** use only keys from left frame, similar to a SQL left outer join; preserve key order\n", "- **right:** use only keys from right frame, similar to a SQL right outer join; preserve key order\n", "\n", "The default is the \"inner join\", which was used when creating the movie_ratings DataFrame.\n", "\n", "It's easiest to understand the different types by looking at some simple examples:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example DataFrames A and B" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colornum
0green1
1yellow2
2red3
\n", "
" ], "text/plain": [ " color num\n", "0 green 1\n", "1 yellow 2\n", "2 red 3" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = pd.DataFrame({'color': ['green', 'yellow', 'red'], 'num':[1, 2, 3]})\n", "A" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colorsize
0greenS
1yellowM
2pinkL
\n", "
" ], "text/plain": [ " color size\n", "0 green S\n", "1 yellow M\n", "2 pink L" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = pd.DataFrame({'color': ['green', 'yellow', 'pink'], 'size':['S', 'M', 'L']})\n", "B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inner join\n", "\n", "Only include observations found in both A and B:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colornumsize
0green1S
1yellow2M
\n", "
" ], "text/plain": [ " color num size\n", "0 green 1 S\n", "1 yellow 2 M" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(A, B, how='inner')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outer join\n", "\n", "Include observations found in either A or B:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colornumsize
0green1.0S
1yellow2.0M
2red3.0NaN
3pinkNaNL
\n", "
" ], "text/plain": [ " color num size\n", "0 green 1.0 S\n", "1 yellow 2.0 M\n", "2 red 3.0 NaN\n", "3 pink NaN L" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(A, B, how='outer')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Left join\n", "\n", "Include all observations found in A:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colornumsize
0green1S
1yellow2M
2red3NaN
\n", "
" ], "text/plain": [ " color num size\n", "0 green 1 S\n", "1 yellow 2 M\n", "2 red 3 NaN" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(A, B, how='left')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Right join\n", "\n", "Include all observations found in B:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colornumsize
0green1.0S
1yellow2.0M
2pinkNaNL
\n", "
" ], "text/plain": [ " color num size\n", "0 green 1.0 S\n", "1 yellow 2.0 M\n", "2 pink NaN L" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(A, B, how='right')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 1 }