install HDF5 and pytables in ubuntu
I found that installing the libhdf5-serial-dev with sudo apt-get install libhdf5-serial-dev did the trick.
I found that installing the libhdf5-serial-dev with sudo apt-get install libhdf5-serial-dev did the trick.
Use append=True in the call to to_hdf: import numpy as np import pandas as pd filename=”/tmp/test.h5″ df = pd.DataFrame(np.arange(10).reshape((5,2)), columns=[‘A’, ‘B’]) print(df) # A B # 0 0 1 # 1 2 3 # 2 4 5 # 3 6 7 # 4 8 9 # Save to HDF5 df.to_hdf(filename, ‘data’, mode=”w”, format=”table”) del df … Read more
This is actually one of the use-cases of HDF5. If you just want to be able to access all the datasets from a single file, and don’t care how they’re actually stored on disk, you can use external links. From the HDF5 website: External links allow a group to include objects in another HDF5 file … Read more
HDF Java follows a layered approach: JHI5 – the low level JNI wrappers: very flexible, but also quite tedious to use. Java HDF object package – a high-level interface based on JHI5. HDFView – a Java-based viewer application based on the Java HDF object package. JHDF5 provides a high-level interface building on the JHI5 layer … Read more
Per the FAQ, you can expand the dataset using dset.resize. For example, import os import h5py import numpy as np path=”/tmp/out.h5″ os.remove(path) with h5py.File(path, “a”) as f: dset = f.create_dataset(‘voltage284’, (10**5,), maxshape=(None,), dtype=”i8″, chunks=(10**4,)) dset[:] = np.random.random(dset.shape) print(dset.shape) # (100000,) for i in range(3): dset.resize(dset.shape[0]+10**4, axis=0) dset[-10**4:] = np.random.random(10**4) print(dset.shape) # (110000,) # (120000,) # … Read more
This works for me: $ brew install hdf5 $ export HDF5_DIR=”$(brew –prefix hdf5)” $ pip install –no-binary=h5py h5py
Here is a similar comparison I just did. Its about 1/3 of the data 10M rows. The final size is abou 1.3GB I define 3 timing functions: Test the Fixed format (called Storer in 0.12). This writes in a PyTables Array format def f(df): store = pd.HDFStore(‘test.h5′,’w’) store[‘df’] = df store.close() Write in the Table … Read more
The easiest thing is to use the .value attribute of the HDF5 dataset. >>> hf = h5py.File(‘/path/to/file’, ‘r’) >>> data = hf.get(‘dataset_name’).value # `data` is now an ndarray. You can also slice the dataset, which produces an actual ndarray with the requested data: >>> hf[‘dataset_name’][:10] # produces ndarray as well But keep in mind that … Read more
For conda users: conda install pytables
Copy of my answer from the issue: https://github.com/pydata/pandas/issues/3651 Your sample is really too small. HDF5 has a fair amount of overhead with really small sizes (even 300k entries is on the smaller side). The following is with no compression on either side. Floats are really more efficiently represented in binary (that as a text representation). … Read more