4 years since the creation of this question and I believe there’s still not a definitive answer.
I don’t think strings were ever considered as a first class citizen in Pandas (even >= 1.0.0). As an example:
import pandas as pd
import datetime
df = pd.DataFrame({
'str': ['a', 'b', 'c', None],
'hete': [1, 2.0, datetime.datetime.utcnow(), None]
})
string_series = df['str']
print(string_series.dtype)
print(pd.api.types.is_string_dtype(string_series.dtype))
heterogenous_series = df['hete']
print(heterogenous_series.dtype)
print(pd.api.types.is_string_dtype(heterogenous_series.dtype))
prints
object
True
object
True
so although hete does not contain any explicit strings, it is considered as a string series.
After reading the documentation, I think the only way to make sure a series contains only strings is:
def is_string_series(s : pd.Series):
if isinstance(s.dtype, pd.StringDtype):
# The series was explicitly created as a string series (Pandas>=1.0.0)
return True
elif s.dtype == 'object':
# Object series, check each value
return all((v is None) or isinstance(v, str) for v in s)
else:
return False
print(is_string_series(string_series))
print(is_string_series(heterogenous_series))
prints
True
False
April 2023 Update
It seems like the recently released Pandas 2 behaves the same way (the test script above produces the same output with Python 3.11).