Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1
In this article we will discuss how to merge different Dataframes into a single Dataframe using Pandas Dataframe.merge() function. Merging is a big topic, so in this part we will focus on merging dataframes using common columns as Join Key and joining using Inner Join, Right Join, Left Join and Outer Join.
Dataframe.merge()
In Python’s Pandas Library Dataframe class provides a function to merge Dataframes i.e.
1 |
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None) |
It accepts a hell lot of arguments. Let’s discuss some of them,
Imp Arguments :
- right : A dataframe or series to be merged with calling dataframe
- how : Merge type, values are : left, right, outer, inner. Default is ‘inner’. If both dataframes has some different columns, then based on this value, it will be decided which columns will be in the merged dataframe.
- on : Column name on which merge will be done. If not provided then merged on indexes.
- left_on : Specific column names in left dataframe, on which merge will be done.
- right_on : Specific column names in right dataframe, on which merge will be done.
- left_index : bool (default False)
- If True will choose index from left dataframe as join key.
- right_index : bool (default False)
- If True will choose index from right dataframe as join key.
- suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
- Suffex to be applied on overlapping columns in left & right dataframes respectively.
Well these are a lot of arguments and things seems over engineered here. So, let’s discuss each details be small examples one by one.
First of all, let’s create two dataframes to be merged.
Dataframe 1:
This dataframe contains the details of the employees like, ID, name, city, experience & Age i.e.
1 2 3 4 5 6 7 8 9 10 11 12 |
# List of Tuples empoyees = [ (11, 'jack', 34, 'Sydney', 5) , (12, 'Riti', 31, 'Delhi' , 7) , (13, 'Aadi', 16, 'New York', 11) , (14, 'Mohit', 32,'Delhi' , 15) , (15, 'Veena', 33, 'Delhi' , 4) , (16, 'Shaunak', 35, 'Mumbai', 5 ), (17, 'Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'h']) |
Contents of the first dataframe empDfObj created are,
1 2 3 4 5 6 7 8 |
ID Name Age City Experience a 11 jack 34 Sydney 5 b 12 Riti 31 Delhi 7 c 13 Aadi 16 New York 11 d 14 Mohit 32 Delhi 15 e 15 Veena 33 Delhi 4 f 16 Shaunak 35 Mumbai 5 h 17 Shaun 35 Colombo 11 |
Dataframe 2:
This dataframe contains the details of the employees like, ID, salary, bonus and experience i.e.
1 2 3 4 5 6 7 8 9 10 11 12 |
# List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) , (14, 15, 90000, 2000) , (15, 4, 61000, 1500) , (16, 5, 71000, 1000), (21, 10,81000, 2000) ] # Create a DataFrame object salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'h']) |
Contents of the second dataframe created are,
1 2 3 4 5 6 7 8 |
ID Experience Salary Bonus a 11 5 70000 1000 b 12 7 72200 1100 c 13 11 84999 1000 d 14 15 90000 2000 e 15 4 61000 1500 f 16 5 71000 1000 h 21 10 81000 2000 |
Now let’s see different ways to merge these two dataframes,
Merge DataFrames on common columns (Default Inner Join)
In both the Dataframes we have 2 common column names i.e. ‘ID’ & ‘Experience’. If we directly call Dataframe.merge() on these two Dataframes, without any additional arguments, then it will merge the columns of the both the dataframes by considering common columns as Join Keys i.e. ‘ID’ & ‘Experience’ in our case. So, basically columns from both the dataframes will be merged for the rows in which values of ‘ID’ & ‘Experience’ are same i.e.
1 2 |
# Merge two Dataframes on common columns using default inner join mergedDf = empDfObj.merge(salaryDfObj) |
Merged Dataframe mergedDf contents are:
1 2 3 4 5 6 7 |
ID Name Age City Experience Salary Bonus 0 11 jack 34 Sydney 5 70000 1000 1 12 Riti 31 Delhi 7 72200 1100 2 13 Aadi 16 New York 11 84999 1000 3 14 Mohit 32 Delhi 15 90000 2000 4 15 Veena 33 Delhi 4 61000 1500 5 16 Shaunak 35 Mumbai 5 71000 1000 |
It merged the contents of the unique columns (salary & bonus) from dataframe 2 with the columns of dataframe 1 based on ‘ID’ & ‘Experience’ columns. Because if we don’t provide the column names on which we want to merge the two dataframes then it by defaults merge on columns with common names. Like, in our case it was ‘ID’ & ‘Experience’.
Also, we didn’t provided the ‘how’ argument in merge() call. Default value of ‘how’ is ‘inner’. It means dataframes are merged like INNER JOIN in Databases.
What is Inner Join ?
While Merging or Joining on columns (keys) in two Dataframes. Include only rows from Left & Right dataframes which have same values in key columns.
In above example key columns on which inner join happened were ‘ID’ & ‘Experience’ columns. So, during inner join only those rows are picked in merged dataframe for which values of ‘ID’ & ‘Experience’ columns are same in 2 dataframes. So basically by default Inner Join was done by using intersection of keys in both the dataframes.
Results will be same if we explicitly pass ‘how’ argument with value ‘inner’ i.e.
1 2 |
# Merge two Dataframes on common columns using inner join mergedDf = empDfObj.merge(salaryDfObj, how='inner') |
Merge Dataframes using Left Join
What is left join ?
While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Left dataframe and add NaN for values which are
missing in right dataframe for those keys.
In above example if we will pass how argument with value ‘left’ then it will merge two dataframes using left join i.e.
1 2 |
# Merge two Dataframes on common columns using left join mergedDf = empDfObj.merge(salaryDfObj, how='left') |
Contents of the merged dataframe :
1 2 3 4 5 6 7 8 9 |
Contents of the Merged Dataframe : ID Name Age City Experience Salary Bonus 0 11 jack 34 Sydney 5 70000.0 1000.0 1 12 Riti 31 Delhi 7 72200.0 1100.0 2 13 Aadi 16 New York 11 84999.0 1000.0 3 14 Mohit 32 Delhi 15 90000.0 2000.0 4 15 Veena 33 Delhi 4 61000.0 1500.0 5 16 Shaunak 35 Mumbai 5 71000.0 1000.0 6 17 Shaun 35 Colombo 11 NaN NaN |
We can see that it picked all rows from left dataframe and there is no row with ‘ID’ 17 and ‘Experience’ 11 in right dataframe. Therefore for that row values of unique Columns from right dataframe (Salary and Bonus) are NaN in merged dataframe.
Merge DataFrames using Right Join
What is Right join ?
While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Right dataframe and add NaN for values which are
missing in Left dataframe for those keys.
In above example if we will pass how argument with value ‘right’ then it will merge two dataframes using Right Join i.e.
1 2 |
# Merge two Dataframes on common columns using right join mergedDf = empDfObj.merge(salaryDfObj, how='right') |
Contents of the merged dataframe :
1 2 3 4 5 6 7 8 |
ID Name Age City Experience Salary Bonus 0 11 jack 34.0 Sydney 5 70000 1000 1 12 Riti 31.0 Delhi 7 72200 1100 2 13 Aadi 16.0 New York 11 84999 1000 3 14 Mohit 32.0 Delhi 15 90000 2000 4 15 Veena 33.0 Delhi 4 61000 1500 5 16 Shaunak 35.0 Mumbai 5 71000 1000 6 21 NaN NaN NaN 10 81000 2000 |
We can see that it picked all rows from right dataframe and there is no row with ID 21 and Experience 10 in left dataframe. Therefore for that row values of unique Columns from left dataframe (i.e. Name, Age, City) are NaN in merged dataframe.
Merge DataFrames using Outer Join
What is Outer join ?
While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Right and Left dataframes and add NaN for values which are missing in either Left or Right dataframe for any key.
In above example if we will pass how argument with value ‘outer’ then it will merge two dataframes using Outer Join i.e.
1 2 |
# Merge two Dataframes on common columns using outer join mergedDf = empDfObj.merge(salaryDfObj, how='outer') |
Contents of the merged dataframe :
1 2 3 4 5 6 7 8 9 |
ID Name Age City Experience Salary Bonus 0 11 jack 34.0 Sydney 5 70000.0 1000.0 1 12 Riti 31.0 Delhi 7 72200.0 1100.0 2 13 Aadi 16.0 New York 11 84999.0 1000.0 3 14 Mohit 32.0 Delhi 15 90000.0 2000.0 4 15 Veena 33.0 Delhi 4 61000.0 1500.0 5 16 Shaunak 35.0 Mumbai 5 71000.0 1000.0 6 17 Shaun 35.0 Colombo 11 NaN NaN 7 21 NaN NaN NaN 10 81000.0 2000.0 |
We can see that it picked all rows from right & left dataframes and there is no row with,
- ID 21 and Experience 10 in left dataframe
- ID 17 and Experience 11 in right dataframe
Therefore for that row NaN is added for missing values in merged dataframe.
Complete example is as follows,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
import pandas as pd def main(): print('*** Creating Dataframe 1 ***') # List of Tuples empoyees = [ (11, 'jack', 34, 'Sydney', 5) , (12, 'Riti', 31, 'Delhi' , 7) , (13, 'Aadi', 16, 'New York', 11) , (14, 'Mohit', 32,'Delhi' , 15) , (15, 'Veena', 33, 'Delhi' , 4) , (16, 'Shaunak', 35, 'Mumbai', 5 ), (17, 'Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'h']) print("Dataframe 1 : ") print(empDfObj) print('*** Creating Dataframe 2 ***') # List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) , (14, 15, 90000, 2000) , (15, 4, 61000, 1500) , (16, 5, 71000, 1000), (21, 10,81000, 2000) ] # Create a DataFrame object salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'h']) print("Dataframe 2 : ") print(salaryDfObj) print('**** Merge two Dataframes on Common Columns using Default Inner Join ****') # Merge two Dataframes on common columns using default inner join mergedDf = empDfObj.merge(salaryDfObj) print('Contents of the Merged Dataframe :') print(mergedDf) print('**** Merge two Dataframes using Left Join ****') # Merge two Dataframes on common columns using left join mergedDf = empDfObj.merge(salaryDfObj, how='left') print('Contents of the Merged Dataframe :') print(mergedDf) print('**** Merge two Dataframes using Right Join ****') # Merge two Dataframes on common columns using right join mergedDf = empDfObj.merge(salaryDfObj, how='right') print('Contents of the Merged Dataframe :') print(mergedDf) print('**** Merge two Dataframes using Outer Join ****') # Merge two Dataframes on common columns using outer join mergedDf = empDfObj.merge(salaryDfObj, how='outer') print('Contents of the Merged Dataframe :') print(mergedDf) if __name__ == '__main__': main() |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
*** Creating Dataframe 1 *** Dataframe 1 : ID Name Age City Experience a 11 jack 34 Sydney 5 b 12 Riti 31 Delhi 7 c 13 Aadi 16 New York 11 d 14 Mohit 32 Delhi 15 e 15 Veena 33 Delhi 4 f 16 Shaunak 35 Mumbai 5 h 17 Shaun 35 Colombo 11 *** Creating Dataframe 2 *** Dataframe 2 : ID Experience Salary Bonus a 11 5 70000 1000 b 12 7 72200 1100 c 13 11 84999 1000 d 14 15 90000 2000 e 15 4 61000 1500 f 16 5 71000 1000 h 21 10 81000 2000 **** Merge two Dataframes on Common Columns using Default Inner Join **** Contents of the Merged Dataframe : ID Name Age City Experience Salary Bonus 0 11 jack 34 Sydney 5 70000 1000 1 12 Riti 31 Delhi 7 72200 1100 2 13 Aadi 16 New York 11 84999 1000 3 14 Mohit 32 Delhi 15 90000 2000 4 15 Veena 33 Delhi 4 61000 1500 5 16 Shaunak 35 Mumbai 5 71000 1000 **** Merge two Dataframes using Left Join **** Contents of the Merged Dataframe : ID Name Age City Experience Salary Bonus 0 11 jack 34 Sydney 5 70000.0 1000.0 1 12 Riti 31 Delhi 7 72200.0 1100.0 2 13 Aadi 16 New York 11 84999.0 1000.0 3 14 Mohit 32 Delhi 15 90000.0 2000.0 4 15 Veena 33 Delhi 4 61000.0 1500.0 5 16 Shaunak 35 Mumbai 5 71000.0 1000.0 6 17 Shaun 35 Colombo 11 NaN NaN **** Merge two Dataframes using Right Join **** Contents of the Merged Dataframe : ID Name Age City Experience Salary Bonus 0 11 jack 34.0 Sydney 5 70000 1000 1 12 Riti 31.0 Delhi 7 72200 1100 2 13 Aadi 16.0 New York 11 84999 1000 3 14 Mohit 32.0 Delhi 15 90000 2000 4 15 Veena 33.0 Delhi 4 61000 1500 5 16 Shaunak 35.0 Mumbai 5 71000 1000 6 21 NaN NaN NaN 10 81000 2000 **** Merge two Dataframes using Outer Join **** Contents of the Merged Dataframe : ID Name Age City Experience Salary Bonus 0 11 jack 34.0 Sydney 5 70000.0 1000.0 1 12 Riti 31.0 Delhi 7 72200.0 1100.0 2 13 Aadi 16.0 New York 11 84999.0 1000.0 3 14 Mohit 32.0 Delhi 15 90000.0 2000.0 4 15 Veena 33.0 Delhi 4 61000.0 1500.0 5 16 Shaunak 35.0 Mumbai 5 71000.0 1000.0 6 17 Shaun 35.0 Colombo 11 NaN NaN 7 21 NaN NaN NaN 10 81000.0 2000.0 |
Follow @thisPtr
Click Here to Subscribe for more Articles / Tutorials like this.
::...免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:
或是邮件反馈可也:
askdama[AT]googlegroups.com
订阅 substack 体验古早写作:
点击注册~> 获得 100$ 体验券:
关注公众号, 持续获得相关各种嗯哼:
自怼圈/年度番新

关于 ~ DebugUself with DAMA ;-)
粤ICP备18025058号-1
公安备案号: 44049002000656 ...::