Making Int64 the default integer dtype instead of standard int64 in pandas

Question

You could use a function like this:

def nan_ints(df, convert_strings=False, subset=None):
    types = ["int64", "float64"]
    if subset is None:
        subset = list(df)
    if convert_strings:
        types.append("object")
    for col in subset:
        if df[col].dtype in types:
            df[col] = (
                df[col].astype(float, errors="ignore").astype("Int64", errors="ignore")
            )
    return df

It iterates through each column and coverts it to an Int64 if it is a int. If it’s a float it will convert to a Int64 only if all of the values in the column could be converted to ints other than the NaN’s. I’ve given you the option to convert strings to Int64 as well with the convert_strings argument.

df1 = pd.DataFrame({'a':[1.1,2,3,1],
                  'b':[1,2,3,np.nan],
                  'c':['1','2','3',np.nan],
                  'd':[3,2,1,np.nan]})


nan_ints(df1,convert_strings=True,subset=['b','c'])
df1.info()

Will return the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
a    4 non-null float64
b    3 non-null Int64
c    3 non-null Int64
d    3 non-null float64
dtypes: Int64(2), float64(2)
memory usage: 216.0 bytes

if you are going to use this on every DataFrame you could add the function to a module and import it every time you want to use pandas.
from my_module import nan_ints
Then just use it with something like:
nan_ints(pd.read_csv(path))

Note: Nullable integer data type is New in version 0.24.0.
Here is the documentation.

Leave a Comment Cancel reply