- transformer decoder里的K和V為什么要用encoder輸出的K和V
In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as 38
作者:Mr.g
鏈接:https://www.zhihu.com/question/458687952/answer/1878623992
來源:知乎
著作權(quán)歸作者所有。商業(yè)轉(zhuǎn)載請聯(lián)系作者獲得授權(quán)虹茶,非商業(yè)轉(zhuǎn)載請注明出處芜果。