pytables – Tarik Billa

install HDF5 and pytables in ubuntu

April 12, 2024 by Tarik

I found that installing the libhdf5-serial-dev with sudo apt-get install libhdf5-serial-dev did the trick.

Convert large csv to hdf5

January 6, 2024 by Tarik

Use append=True in the call to to_hdf: import numpy as np import pandas as pd filename=”/tmp/test.h5″ df = pd.DataFrame(np.arange(10).reshape((5,2)), columns=[‘A’, ‘B’]) print(df) # A B # 0 0 1 # 1 2 3 # 2 4 5 # 3 6 7 # 4 8 9 # Save to HDF5 df.to_hdf(filename, ‘data’, mode=”w”, format=”table”) del df … Read more

Improve pandas (PyTables?) HDF5 table write performance

September 24, 2023 by Tarik

Here is a similar comparison I just did. Its about 1/3 of the data 10M rows. The final size is abou 1.3GB I define 3 timing functions: Test the Fixed format (called Storer in 0.12). This writes in a PyTables Array format def f(df): store = pd.HDFStore(‘test.h5′,’w’) store[‘df’] = df store.close() Write in the Table … Read more

Missing optional dependency ‘tables’. In pandas to_hdf

August 22, 2023 by Tarik

For conda users: conda install pytables

HDF5 taking more space than CSV?

August 16, 2023 by Tarik

Copy of my answer from the issue: https://github.com/pydata/pandas/issues/3651 Your sample is really too small. HDF5 has a fair amount of overhead with really small sizes (even 300k entries is on the smaller side). The following is with no compression on either side. Floats are really more efficiently represented in binary (that as a text representation). … Read more

Is there an analysis speed or memory usage advantage to using HDF5 for large array storage (instead of flat binary files)?

December 27, 2022 by Tarik

HDF5 Advantages: Organization, flexibility, interoperability Some of the main advantages of HDF5 are its hierarchical structure (similar to folders/files), optional arbitrary metadata stored with each item, and its flexibility (e.g. compression). This organizational structure and metadata storage may sound trivial, but it’s very useful in practice. Another advantage of HDF is that the datasets can … Read more