這篇論文主要介紹了DGN的算法赌渣,在DQN的基礎(chǔ)上加了圖網(wǎng)絡(luò)魏铅,用于狀態(tài)的融合。在多智能體環(huán)境下運(yùn)用坚芜。relation kernel用的是self-attention览芳。
這篇論文提到的幾個(gè)點(diǎn):
-
因?yàn)橹悄荏w之間的關(guān)系變化太快了,所以圖動(dòng)態(tài)變化太快鸿竖,不利于收斂沧竟,所以在連續(xù)2個(gè)時(shí)間點(diǎn)保持圖暫時(shí)不變。
unlike other methods with parameter-sharing, e.g., DQN, that sample experiences from individual agents, DGN samples experiences based on the graph of agents, not individual agents, and thus takes into con- sideration the interactions between agents.(這個(gè)沒(méi)太看懂缚忧,怎么根據(jù)圖來(lái)sample呢悟泵?)
-
Temporal Relation Regularization.
這篇論文和論文:Deep Reinforcement Learning with Relational Inductive Biases. 都用到了圖網(wǎng)絡(luò)和強(qiáng)化學(xué)習(xí)的結(jié)合,都提到了relational reinforcement learning 這個(gè)概念搔谴。有機(jī)會(huì)可以了解一下魁袜。