Custom TensorFlow Keras optimizer

Update: TF2.2 forced me to clean up all implementations – so now they can be used as a reference for TF best practices. Also added a section below on _get_hyper vs. _set_hyper.


I’ve implemented Keras AdamW in all major TF & Keras versions – I invite you to examine optimizers_v2.py. Several points:

  • You should inherit OptimizerV2, which is actually what you linked; it’s the latest and current base class for tf.keras optimizers
  • You are correct in (1) – this is a documentation mistake; the methods are private, as they aren’t meant to be used by the user directly.
  • apply_gradients (or any other method) is only overidden if the default doesn’t accomplish what’s needed for a given optimizer; in your linked example, it’s just a one-liner addon to the original
  • “So, it seems that a _create_slots method must be defined in an optimizer subclass if that subclass does not override apply_gradients – the two are unrelated; it’s coincidental.

  • What is the difference between _resource_apply_dense and _resource_apply_sparse?

Latter deals with sparse layers – e.g. Embedding – and former with everything else; example.

  • When should I use _create_slots()?

When defining trainable tf.Variables; example: weights’ first and second order moments (e.g. Adam). It uses add_slot().


_get_hyper vs. _set_hyper: they enable setting and getting Python literals (int, str, etc), callables, and tensors. They exist largely for convenience: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I dedicated a Q&A to it here.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)