pd.NA vs np.nan for pandas

As of now (release of pandas-1.0.0) I would really recommend to use it carefully.

First, it’s still an experimental feature:

Experimental: the behaviour of pd.NA can still change without warning.

Second, the behaviour differs from np.nan:

Compared to np.nan, pd.NA behaves differently in certain operations. In addition to arithmetic operations, pd.NA also propagates as “missing” or “unknown” in comparison operations.

Both quotas from release-notes

To show some additional example, I was surprised with interpolation behaviour:

Create simple DataFrame:

df = pd.DataFrame({"a": [0, pd.NA, 2], "b": [0, np.nan, 2]})
df
#       a    b
# 0     0  0.0
# 1  <NA>  NaN
# 2     2  2.0

and try to interpolate:

df.interpolate()
#       a    b
# 0     0  0.0
# 1  <NA>  1.0
# 2     2  2.0

There are some reasons for that (I am still discovering that), anyway, I just want to highlighted those differences – It is an experimental feature and it behaves differently in some cases.

I think it will be very useful feature, but I would be really careful with statements like “It should be completely fine to use it instead of np.nan“. It might be true for most cases, but can cause some troubles when you are not aware of it.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)