(from stack overflow)
https://stackoverflow.com/questions/41918795/minimize-a-function-of-one-variable-in-tensorflow
Many of the other solutions use clipping to avoid an undefined
gradient. Depending on your problem, clipping introduces bias and may
not be acceptable in all cases. As the following code demonstrates, we
need only handle the point of discontinuity--not the region near it.
Specific Answer
def cross_entropy(x, y, axis=-1):
? safe_y = tf.where(tf.equal(x, 0.), tf.ones_like(y), y)
? return -tf.reduce_sum(x * tf.log(safe_y), axis)
def entropy(x, axis=-1):
? return cross_entropy(x, x, axis)
But did it work?
x = tf.constant([0.1, 0.2, 0., 0.7])
e = entropy(x)
# ==> 0.80181855
g = tf.gradients(e, x)[0]
# ==> array([1.30258512,? 0.60943794, 0., -0.64332503], dtype=float32)? Yay! No NaN.
(Note: deleteddup cross-post.)
General Recipe
Use an innertf.whereto ensure the function has no asymptote.That is, alter the input to the inf generating function such that no inf can be created.Then use a secondtf.whereto always select the valid code-path.That is, implement the mathematical condition as you would "normally", i.e., the "naive" implementation.
In Python code, the recipe is:
Instead of this:
tf.where(x_ok, f(x), safe_f(x))
Do this:
safe_x = tf.where(x_ok, x, safe_x)
tf.where(x_ok, f(safe_x), safe_f(x))
Example
Suppose you wish to compute:
f(x) = { 1/x, x!=0
? ? ? { 0,? x=0
A naive implementation results in NaNs in the gradient, i.e.,
def f(x):
? x_ok = tf.not_equal(x, 0.)
? f = lambda x: 1. / x
? safe_f = tf.zeros_like
? return tf.where(x_ok, f(x), safe_f(x))
Does it work?
x = tf.constant([-1., 0, 1])
tf.gradients(f(x), x)[0].eval()
# ==> array([ -1.,? nan,? -1.], dtype=float32)
#? ...bah! We have a NaN at the asymptote despite not having
# an asymptote in the non-differentiated result.
The basic pattern for avoiding NaN gradients when usingtf.whereis to calltf.wheretwice.? The innermosttf.whereensures that the resultf(x)is always finite. The outermosttf.whereensures the correct result is chosen.? For the running example, the trick plays out like this:
def safe_f(x):
? x_ok = tf.not_equal(x, 0.)
? f = lambda x: 1. / x
? safe_f = tf.zeros_like
? safe_x = tf.where(x_ok, x, tf.ones_like(x))
? return tf.where(x_ok, f(safe_x), safe_f(x))
But did it work?
x = tf.constant([-1., 0, 1])
tf.gradients(safe_f(x), x)[0].eval()
# ==> array([-1.,? 0., -1.], dtype=float32)
# ...yay! double-where trick worked. Notice that the gradient
# is now a constant at the asymptote (as opposed to being NaN).