Note: As @ritchie46’s answer states, this solution may be redundant since pandas version 0.25 per the new argument cache_dates
that defaults to True
Try using this function for parsing dates:
def lookup(date_pd_series, format=None):
"""
This is an extremely fast approach to datetime parsing.
For large data, the same dates are often repeated. Rather than
re-parse these, we store all unique dates, parse them, and
use a lookup to convert all dates.
"""
dates = {date:pd.to_datetime(date, format=format) for date in date_pd_series.unique()}
return date_pd_series.map(dates)
Use it like:
df['date-column'] = lookup(df['date-column'], format="%Y%m%d")
Benchmarks:
$ python date-parse.py
to_datetime: 5799 ms
dateutil: 5162 ms
strptime: 1651 ms
manual: 242 ms
lookup: 32 ms
Source: https://github.com/sanand0/benchmarks/tree/master/date-parse