Combining hdf5 files

This is actually one of the use-cases of HDF5. If you just want to be able to access all the datasets from a single file, and don’t care how they’re actually stored on disk, you can use external links. From the HDF5 website: External links allow a group to include objects in another HDF5 file … Read more

Incremental writes to hdf5 with h5py

Per the FAQ, you can expand the dataset using dset.resize. For example, import os import h5py import numpy as np path=”/tmp/out.h5″ os.remove(path) with h5py.File(path, “a”) as f: dset = f.create_dataset(‘voltage284’, (10**5,), maxshape=(None,), dtype=”i8″, chunks=(10**4,)) dset[:] = np.random.random(dset.shape) print(dset.shape) # (100000,) for i in range(3): dset.resize(dset.shape[0]+10**4, axis=0) dset[-10**4:] = np.random.random(10**4) print(dset.shape) # (110000,) # (120000,) # … Read more

How to store dictionary in HDF5 dataset

I found two ways to this: I) transform datetime object to string and use it as dataset name h = h5py.File(‘myfile.hdf5’) for k, v in d.items(): h.create_dataset(k.strftime(‘%Y-%m-%dT%H:%M:%SZ’), data=np.array(v, dtype=np.int8)) where data can be accessed by quering key strings (datasets name). For example: for ds in h.keys(): if ‘2012-04’ in ds: print(h[ds].value) II) transform datetime object … Read more

Read HDF5 file into numpy array

The easiest thing is to use the .value attribute of the HDF5 dataset. >>> hf = h5py.File(‘/path/to/file’, ‘r’) >>> data = hf.get(‘dataset_name’).value # `data` is now an ndarray. You can also slice the dataset, which produces an actual ndarray with the requested data: >>> hf[‘dataset_name’][:10] # produces ndarray as well But keep in mind that … Read more

How to list all datasets in h5py file?

You have to use the keys method. This will give you a List of unicode strings of your dataset and group names. For example: Datasetnames=hf.keys() Another gui based method would be to use HDFView. https://support.hdfgroup.org/products/java/release/download.html

Experience with using h5py to do analytical work on big data in Python?

We use Python in conjunction with h5py, numpy/scipy and boost::python to do data analysis. Our typical datasets have sizes of up to a few hundred GBs. HDF5 advantages: data can be inspected conveniently using the h5view application, h5py/ipython and the h5* commandline tools APIs are available for different platforms and languages structure data using groups … Read more

How to overwrite array inside h5 file using h5py

You want to assign values, not create a dataset: f1 = h5py.File(file_name, ‘r+’) # open the file data = f1[‘meas/frame1/data’] # load the data data[…] = X1 # assign new values to data f1.close() # close the file To confirm the changes were properly made and saved: f1 = h5py.File(file_name, ‘r’) np.allclose(f1[‘meas/frame1/data’].value, X1) #True