Merge on single level of MultiIndex

Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join.

df1.join(df2, how='inner') # how='outer' keeps all records from both data frames

The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than:

merge(df1.reset_index(),
      df2.reset_index(),
      on=['index1'],
      how='inner'
     ).set_index(['index1','index2'])

The docs also mention that .join can not be used to merge two multiindexed DataFrames on a single level and from the GitHub tracker discussion for the previous issue, it seems like this might not of priority to implement:

so I merged in the single join, see #6363; along with some docs on
how to do a multi-multi join. That’s fairly complicated to actually
implement. and IMHO not worth the effort as it really doesn’t change
the memory usage/speed that much at all.

However, there is a GitHub conversation regarding this, where there has been some recent development https://github.com/pydata/pandas/issues/6360. It is also possible achieve this by resetting the indices as mentioned earlier and described in the docs as well.


Update for pandas >= 0.24.0

It is now possible to merge multiindexed data frames with each other. As per the release notes:

index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
                                        ('K1', 'X2')],
                                        names=['key', 'X'])

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']}, index=index_left)

index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
                                        ('K2', 'Y2'), ('K2', 'Y3')],
                                        names=['key', 'Y'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)

left.join(right)

Out:

            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)