Preserve Dataframe column data type after outer merge

This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.

Because of this, I’d recommend using the new nullable dtypes. You can use Int64 for your integer and 'boolean' for your Boolean columns. Both of these now support missing values with <NA>: pandas._libs.missing.NAType

import pandas as pd

df = pd.DataFrame({'a': [1]*6, 'b': [1, 2]*3, 'c': range(6)})
df2 = pd.DataFrame({'d': [1, 2], 'e': [True, False]})

df2['d'] = df2['d'].astype('Int64')
df2['e'] = df2['e'].astype('boolean')
df2.dtypes
#d      Int64
#e    boolean
#dtype: object

df.join(df2)
#   a  b  c     d      e
#0  1  1  0     1   True
#1  1  2  1     2  False
#2  1  1  2  <NA>   <NA>
#3  1  2  3  <NA>   <NA>
#4  1  1  4  <NA>   <NA>
#5  1  2  5  <NA>   <NA>

df.join(df2).dtypes
#a      int64
#b      int64
#c      int64
#d      Int64    <- dtype preserved
#e    boolean    <- dtype preserved

With Int64/Bool64 the fill value remains true to what you specify and the column is only upcast if you fill with a value incapable of fitting in the current dtype.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)