tf.data.Dataset: how to get the dataset size (number of elements in an epoch)?
len(list(dataset)) works in eager mode, although that’s obviously not a good general solution.
len(list(dataset)) works in eager mode, although that’s obviously not a good general solution.
Assuming you have all_dataset variable of tf.data.Dataset type: test_dataset = all_dataset.take(1000) train_dataset = all_dataset.skip(1000) Test dataset now has first 1000 elements and the rest goes for training.
from_tensors combines the input and returns a dataset with a single element: >>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensors(t) >>> [x for x in ds] [<tf.Tensor: shape=(2, 2), dtype=int32, numpy= array([[1, 2], [3, 4]], dtype=int32)>] from_tensor_slices creates a dataset with a separate element for each row of the input tensor: >>> … Read more
TL;DR Despite their similar names, these arguments have quite difference meanings. The buffer_size in Dataset.shuffle() can affect the randomness of your dataset, and hence the order in which elements are produced. The buffer_size in Dataset.prefetch() only affects the time it takes to produce the next element. The buffer_size argument in tf.data.Dataset.prefetch() and the output_buffer_size argument … Read more