【問題】I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits
. What is it? In a lot of methods in the API docs it is written like
我正想通過tensorflow API文檔在這里购对。在tensorflow文檔中,他們使用了一個叫做關(guān)鍵字logits
陶因。它是什么骡苞?API文檔中的很多方法都是這樣寫的
tf.nn.softmax(logits, name=None)
If what is written is those logits
are only Tensors
, why keeping a different name like logits
?
Another thing is that there are two methods I could not differentiate. They were
如果寫的是logits只有這些Tensors,為什么要保留一個不同的名字logits楷扬?
另一件事是有兩種方法我不能區(qū)分解幽。他們是
tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)
What are the differences between them? The docs are not clear to me. I know what tf.nn.softmax
does. But not the other. An example will be really helpful.
他們之間有什么不同?文檔對我不明確烘苹。我知道是什么tf.nn.softmax躲株。但不是其他。一個例子會非常有用镣衡。
Short version:
Suppose you have two tensors, where y_hat
contains computed scores for each class (for example, from y = W*x +b) and y_true
contains one-hot encoded true labels.
假設(shè)您有兩個張量霜定,其中y_hat包含每個類的計算得分(例如,從y = W * x + b)廊鸥,并y_true包含一個熱點編碼的真實標簽望浩。
y_hat = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded
If you interpret the scores in y_hat
as unnormalized log probabilities, then they are logits.
Additionally, the total cross-entropy loss computed in this manner:
如果您將分數(shù)解釋為y_hat非標準化的日志概率,那么它們就是logits惰说。
另外曾雕,以這種方式計算的總交叉熵損失:
y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))
本質(zhì)上等價于用函數(shù)計算的總交叉熵損失softmax_cross_entropy_with_logits():
is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits()
:
total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
Long version:
In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b
. To serve as an example, below I've created a y_hat
as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.
在神經(jīng)網(wǎng)絡(luò)的輸出層中,您可能會計算一個數(shù)組助被,其中包含每個訓(xùn)練實例的類分數(shù)剖张,例如來自計算y_hat = W*x + b。作為一個例子揩环,下面我創(chuàng)建了y_hat一個2×3數(shù)組搔弄,其中行對應(yīng)于訓(xùn)練實例,列對應(yīng)于類丰滑。所以這里有2個訓(xùn)練實例和3個類別顾犹。
import tensorflow as tf
import numpy as np
sess = tf.Session()
# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5, 1.5, 0.1],
# [ 2.2, 1.3, 1.7]])
Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.
請注意,這些值沒有標準化(即每一行的和不等于1)褒墨。為了對它們進行歸一化炫刷,我們可以應(yīng)用softmax函數(shù),它將輸入解釋為非歸一化對數(shù)概率(又名logits)并輸出歸一化的線性概率郁妈。
y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863 , 0.61939586, 0.15274114],
# [ 0.49674623, 0.20196195, 0.30129182]])
It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.
充分理解softmax輸出的含義非常重要浑玛。下面我列出了一張更清楚地表示上面輸出的表格∝洌可以看出顾彰,例如极阅,訓(xùn)練實例1為“2類”的概率為0.619。每個訓(xùn)練實例的類概率被歸一化涨享,所以每行的總和為1.0筋搏。
Pr(Class 1) Pr(Class 2) Pr(Class 3)
,--------------------------------------
Training instance 1 | 0.227863 | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182
So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".
Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true
array, where again the rows are training instances and columns are classes. Below I've created an example y_true
one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".
所以現(xiàn)在我們有每個訓(xùn)練實例的類概率,我們可以在每個行的argmax()中生成最終的分類厕隧。從上面奔脐,我們可以生成訓(xùn)練實例1屬于“2類”,訓(xùn)練實例2屬于“1類”吁讨。
這些分類是否正確髓迎?我們需要根據(jù)訓(xùn)練集中的真實標簽進行測量。您將需要一個熱點編碼y_true數(shù)組挡爵,其中行又是訓(xùn)練實例竖般,列是類甚垦。下面我創(chuàng)建了一個示例y_trueone-hot數(shù)組茶鹃,其中訓(xùn)練實例1的真實標簽為“Class 2”,訓(xùn)練實例2的真實標簽為“Class 3”艰亮。
y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0., 1., 0.],
# [ 0., 0., 1.]])
Is the probability distribution in y_hat_softmax
close to the probability distribution in y_true
? We can use cross-entropy loss to measure the error.
概率分布是否y_hat_softmax
接近概率分布y_true
闭翩?我們可以使用交叉熵損失來衡量錯誤。
We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax
showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true
; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".
我們可以逐行計算交叉熵損失并查看結(jié)果迄埃。下面我們可以看到疗韵,訓(xùn)練實例1損失了0.479,而訓(xùn)練實例2損失了1.200侄非。這個結(jié)果是有道理的蕉汪,因為在我們上面的例子中y_hat_softmax,訓(xùn)練實例1的最高概率是“類2”逞怨,它與訓(xùn)練實例1匹配y_true; 然而者疤,訓(xùn)練實例2的預(yù)測顯示“1類”的最高概率,其與真實類“3類”不匹配叠赦。
loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 , 1.19967598])
What we really want is the total loss over all the training instances. So we can compute:
我們真正想要的是所有培訓(xùn)實例的全部損失驹马。所以我們可以計算:
total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944
Using softmax_cross_entropy_with_logits()
We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits()
function, as shown below.
使用softmax_cross_entropy_with_logits()
我們可以用tf.nn.softmax_cross_entropy_with_logits()函數(shù)來計算總的交叉熵損失,如下所示除秀。
loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 , 1.19967598])
total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922
Note that total_loss_1
and total_loss_2
produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits()
.
請注意糯累,total_loss_1并total_loss_2產(chǎn)生基本相同的結(jié)果,在最后一位數(shù)字中有一些小的差異册踩。但是泳姐,你可以使用第二種方法:它只需要少一行代碼,并累積更少的數(shù)字錯誤暂吉,因為softmax是在你內(nèi)部完成的softmax_cross_entropy_with_logits()仗岸。
form Stack Overflow[https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits?noredirect=1&lq=1]