Python: Check if dataframe column contain string type

4 years since the creation of this question and I believe there’s still not a definitive answer.

I don’t think strings were ever considered as a first class citizen in Pandas (even >= 1.0.0). As an example:

import pandas as pd
import datetime

df = pd.DataFrame({
    'str': ['a', 'b', 'c', None],
    'hete': [1, 2.0, datetime.datetime.utcnow(), None]
})

string_series = df['str']
print(string_series.dtype)
print(pd.api.types.is_string_dtype(string_series.dtype))

heterogenous_series = df['hete']
print(heterogenous_series.dtype)
print(pd.api.types.is_string_dtype(heterogenous_series.dtype))

prints

object
True
object
True

so although hete does not contain any explicit strings, it is considered as a string series.

After reading the documentation, I think the only way to make sure a series contains only strings is:

def is_string_series(s : pd.Series):
    if isinstance(s.dtype, pd.StringDtype):
        # The series was explicitly created as a string series (Pandas>=1.0.0)
        return True
    elif s.dtype == 'object':
        # Object series, check each value
        return all((v is None) or isinstance(v, str) for v in s)
    else:
        return False


print(is_string_series(string_series))
print(is_string_series(heterogenous_series))

prints

True
False

April 2023 Update

It seems like the recently released Pandas 2 behaves the same way (the test script above produces the same output with Python 3.11).

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)