How to append data to one specific dataset in a hdf5 file with h5py

I have found a solution that seems to work!

Have a look at this: incremental writes to hdf5 with h5py!

In order to append data to a specific dataset it is necessary to first resize the specific dataset in the corresponding axis and subsequently append the new data at the end of the “old” nparray.

Thus, the solution looks like this:

with h5py.File('.\PreprocessedData.h5', 'a') as hf:
    hf["X_train"].resize((hf["X_train"].shape[0] + X_train_data.shape[0]), axis = 0)
    hf["X_train"][-X_train_data.shape[0]:] = X_train_data

    hf["X_test"].resize((hf["X_test"].shape[0] + X_test_data.shape[0]), axis = 0)
    hf["X_test"][-X_test_data.shape[0]:] = X_test_data

    hf["Y_train"].resize((hf["Y_train"].shape[0] + Y_train_data.shape[0]), axis = 0)
    hf["Y_train"][-Y_train_data.shape[0]:] = Y_train_data

    hf["Y_test"].resize((hf["Y_test"].shape[0] + Y_test_data.shape[0]), axis = 0)
    hf["Y_test"][-Y_test_data.shape[0]:] = Y_test_data

However, note that you should create the dataset with maxshape=(None,), for example

h5f.create_dataset('X_train', data=orig_data, compression="gzip", chunks=True, maxshape=(None,)) 

otherwise the dataset cannot be extended.

Leave a Comment

404 Not Found

Not Found

The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.