pandas: merged (inner join) data frame has more rows than the original ones

Question

Because you have duplicates of the merge column in both data sets, you’ll get k * m rows with that merge column value, where k is the number of rows with that value in data set 1 and m is the number of rows with that value in data set 2.

try drop_duplicates

dfa = df_A.drop_duplicates(subset=['my_icon_number'])
dfb = df_B.drop_duplicates(subset=['my_icon_number'])

new_df = pd.merge(dfa, dfb, how='inner', on='my_icon_number')

Example

In this example, the only value in common is 4 but I have it 3 times in each data set. That means I should get 9 total rows in the resulting merge, one for every combination.

df_A = pd.DataFrame(dict(my_icon_number=[1, 2, 3, 4, 4, 4], other_column1=range(6)))
df_B = pd.DataFrame(dict(my_icon_number=[4, 4, 4, 5, 6, 7], other_column2=range(6)))

pd.merge(df_A, df_B,  how='inner', on='my_icon_number')

   my_icon_number  other_column1  other_column2
0               4              3              0
1               4              3              1
2               4              3              2
3               4              4              0
4               4              4              1
5               4              4              2
6               4              5              0
7               4              5              1
8               4              5              2

Example

Leave a Comment Cancel reply