tensorflow – Tarik Billa

tensorboard: command not found

April 12, 2024 by Tarik

You could call tensorboard as a python module like this: python3 -m tensorboard.main –logdir=~/my/training/dir or add this to your .profile alias tensorboard=’python3 -m tensorboard.main’

How to profile TensorFlow networks?

April 11, 2024 by Tarik

If you want to find how much time was spent on each operation at TF, you can do this in tensorboard using runtime statistics. You will need to do something like this (check the full example in the above-mentioned link): run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() sess.run(<values_you_want_to_execute>, options=run_options, run_metadata=run_metadata) your_writer.add_run_metadata(run_metadata, ‘step%d’ % i) Better than … Read more

Could not load dynamic library ‘libcublas.so.10’; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory;

April 10, 2024 by Tarik

On Ubuntu 20.04, you can simply install NVIDIAs cuda toolkit cuda: sudo apt-get update sudo apt install nvidia-cuda-toolkit There are also install advices for Windows. The packge is around 1GB and it took a while to install… Some minutes later you need to export PATH variables so that it can be found: Find Shared Object … Read more

How does TensorFlow SparseCategoricalCrossentropy work?

April 10, 2024 by Tarik

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded. When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4×2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with … Read more

How to understand masked multi-head attention in transformer

April 9, 2024 by Tarik

I had the very same question after reading the Transformer paper. I found no complete and detailed answer to the question in the Internet so I’ll try to explain my understanding of Masked Multi-Head Attention. The short answer is – we need masking to make the training parallel. And the parallelization is good as it … Read more

shuffle in the model.fit of keras

January 8, 2024 by Tarik

It will shuffle your entire dataset (x, y and sample_weight together) first and then make batches according to the batch_size argument you passed to fit. Edit As @yuk pointed out in the comment, the code has been changed significantly since 2018. The documentation for the shuffle parameter now seems more clear on its own. You … Read more

Keras verbose training progress bar writing a new line on each batch issue

January 7, 2024 by Tarik

I’ve added built-in support for keras in tqdm so you could use it instead (pip install “tqdm>=4.41.0”): from tqdm.keras import TqdmCallback … model.fit(…, verbose=0, callbacks=[TqdmCallback(verbose=2)]) This turns off keras‘ progress (verbose=0), and uses tqdm instead. For the callback, verbose=2 means separate progressbars for epochs and batches. 1 means clear batch bars when done. 0 means … Read more

Is there any way to get variable importance with Keras?

January 5, 2024 by Tarik

*Edited to include relevant code to implement permutation importance. I answered a similar question at Feature Importance Chart in neural network using Keras in Python. It does implement what Teque5 mentioned above, namely shuffling the variable among your sample or permutation importance using the ELI5 package. from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor import eli5 from eli5.sklearn … Read more

Flatten batch in tensorflow

January 2, 2024 by Tarik

You can do it easily with tf.reshape() without knowing the batch size. x = tf.placeholder(tf.float32, shape=[None, 9,2]) shape = x.get_shape().as_list() # a list: [None, 9, 2] dim = numpy.prod(shape[1:]) # dim = prod(9,2) = 18 x2 = tf.reshape(x, [-1, dim]) # -1 means “all” The -1 in the last line means the whole column no … Read more

What does opt.apply_gradients() do in TensorFlow?

December 28, 2023 by Tarik

The update rule that the apply_gradients method actually applies depends on the specific optimizer. Take a look at the implementation of apply_gradients in the tf.train.Optimizer class here. It relies on the derived classes implementing the update rule in the methods _apply_dense and _apply_spares. The update rule you are referring to is implemented by the GradientDescentOptimizer. … Read more