tensorflow-datasets – Tarik Billa

Difference between tf.data.Dataset.map() and tf.data.Dataset.apply()

December 22, 2023 by Tarik

The difference is that map will execute one function on every element of the Dataset separately, whereas apply will execute one function on the whole Dataset at once (such as group_by_window given as example in the documentation). The argument of apply is a function that takes a Dataset and returns a Dataset when the argument … Read more

how to get string value out of tf.tensor which dtype is string

December 15, 2023 by Tarik

You can use tf.py_func to wrap load_audio_file(). import tensorflow as tf tf.enable_eager_execution() def load_audio_file(file_path): # you should decode bytes type to string type print(“file_path: “,bytes.decode(file_path),type(bytes.decode(file_path))) return file_path train_dataset = tf.data.Dataset.list_files(‘clean_4s_val/*.wav’) train_dataset = train_dataset.map(lambda x: tf.py_func(load_audio_file, [x], [tf.string])) for one_element in train_dataset: print(one_element) file_path: clean_4s_val/1.wav <class ‘str’> (<tf.Tensor: id=32, shape=(), dtype=string, numpy=b’clean_4s_val/1.wav’>,) file_path: clean_4s_val/3.wav <class ‘str’> … Read more

How to create only one copy of graph in tensorboard events file with custom tf.Estimator?

August 10, 2023 by Tarik

You need to use the TensorBoard tool for visualizing the contents of your summary logs. The event file log can be read and use it. You can see the example from this link provides information about how to read events written to an event file. # This example supposes that the events file contains summaries … Read more

Tensorflow tf.data AUTOTUNE

August 3, 2023 by Tarik

tf.data builds a performance model of the input pipeline and runs an optimization algorithm to find a good allocation of its CPU budget across all parameters specified as AUTOTUNE. While the input pipeline is running, tf.data tracks the time spent in each operation, so that these times can be fed into the optimization algorithm. The … Read more

parallelising tf.data.Dataset.from_generator

August 2, 2023 by Tarik

Turns out I can use Dataset.map if I make the generator super lightweight (only generating meta data) and then move the actual heavy lighting into a stateless function. This way I can parallelise just the heavy lifting part with .map using a py_func. Works; but feels a tad clumsy… Would be great to be able … Read more

How do I split Tensorflow datasets?

July 29, 2023 by Tarik

You may use Dataset.take() and Dataset.skip(): train_size = int(0.7 * DATASET_SIZE) val_size = int(0.15 * DATASET_SIZE) test_size = int(0.15 * DATASET_SIZE) full_dataset = tf.data.TFRecordDataset(FLAGS.input_file) full_dataset = full_dataset.shuffle() train_dataset = full_dataset.take(train_size) test_dataset = full_dataset.skip(train_size) val_dataset = test_dataset.skip(test_size) test_dataset = test_dataset.take(test_size) For more generality, I gave an example using a 70/15/15 train/val/test split but if you don’t … Read more

Tensorflow : logits and labels must have the same first dimension

July 27, 2023 by Tarik

The problem is in your target shape and is related to the correct choice of an appropriate loss function. you have 2 possibilities: 1. possibility: if you have 1D integer encoded target, you can use sparse_categorical_crossentropy as loss function n_class = 3 n_features = 100 n_sample = 1000 X = np.random.randint(0,10, (n_sample,n_features)) y = np.random.randint(0,n_class, … Read more

TensorFlow: training on my own image

May 6, 2023 by Tarik

If you are interested in how to input your own data in TensorFlow, you can look at this tutorial. I’ve also written a guide with best practices for CS230 at Stanford here. New answer (with tf.data) and with labels With the introduction of tf.data in r1.4, we can create a batch of images without placeholders … Read more

How to extract data/labels back from TensorFlow dataset

April 23, 2023 by Tarik

In case your tf.data.Dataset is batched, the following code will retrieve all the y labels: y = np.concatenate([y for x, y in ds], axis=0) Quick explanation: [y for x, y in ds] is known as “list comprehension” in python. If dataset is batched, this expression will loop thru each batch and put each batch y … Read more

tf.data with multiple inputs / outputs in Keras

March 23, 2023 by Tarik

I’m not using Keras but I would go with an tf.data.Dataset.from_generator() – like: def _input_fn(): sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64) sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64) sent1 = np.reshape(sent1, (8, 1, 1)) sent2 = np.reshape(sent2, (8, 1, 1)) labels = np.array([40, 30, 20, 10, … Read more