What is the advantage of saving `.npz` files instead of `.npy` in python, regarding speed, memory and look-up?

There are two parts of explanation for answering your question.

I. NPY vs. NPZ

As we already read from the doc, the .npy format is:

the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. … The format is designed to be as simple as possible while achieving its limited goals. (sources)

And .npz is only a

simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “.npy” files. We recommend using the file extension “.npz” for these archives. (sources)

So, .npz is just a ZipFile containing multiple “.npy” files. And this ZipFile can be either compressed (by using np.savez_compressed) or uncompressed (by using np.savez).

It’s similar to tarball archive file in Unix-like system, where a tarball file can be just an uncompressed archive file which containing other files or a compressed archive file by combining with various compression programs (gzip, bzip2, etc.)

II. Different APIs for binary serialization

And Numpy also provides different APIs to produce these binary file output:

  • np.save —> Save an array to a binary file in NumPy .npy format
  • np.savez –> Save several arrays into a single file in uncompressed .npz format
  • np.savez_compressed –> Save several arrays into a single file in compressed .npz format
  • np.load –> Load arrays or pickled objects from .npy, .npz or pickled files

If we skim the source code of Numpy, under the hood:

def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
    ...
    if compress:
        compression = zipfile.ZIP_DEFLATED
    else:
        compression = zipfile.ZIP_STORED
    ...


def savez(file, *args, **kwds):
    _savez(file, args, kwds, False)


def savez_compressed(file, *args, **kwds):
    _savez(file, args, kwds, True)

Then back to the question:

  • If only use np.save, there is no more compression on top of the .npy format, only just a single archive file for the convenience of managing multiple related files.
  • If use np.savez_compressed, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)