Reading a csv with a timestamp column, with pandas

Question

Use to_datetime and pass unit="s" to parse the units as unix timestamps, this will be much faster:

In [7]:
pd.to_datetime(df.index, unit="s")

Out[7]:
DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',
               '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',
               '2015-12-02 11:02:19.250000'],
              dtype="datetime64[ns]", name=0, freq=None)

Timings:

In [9]:

import time
%%timeit
import time
def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]

df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col="DateTime", 
                 names=['DateTime', 'X'], header=None)
100 loops, best of 3: 4.07 ms per loop

and

In [12]:
%%timeit
t="""1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12"""
df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0])
df.index = pd.to_datetime(df.index, unit="s")
100 loops, best of 3: 1.69 ms per loop

So using to_datetime is over 2x faster on this small dataset, I expect this to scale much better than the other methods

Leave a Comment Cancel reply