Introduction to Coordination in Multi-Agent Reinforcement Learning

It is a fact that we live in a world involving interaction with others, including both cooperation and competition. Thus, it is attractive to apply reinforcement learning into multi-agent systems.

Multi-agent System

Framework

Because the problem of math formula editor, I will give a picture showing the definition from the perspective of markov decision process.

Multi-agent Reinforcement Learning

Advantages

There are many advantages of multiple agents acting in the systems.

  1. <strong>Explore Efficiently</strong>. There is a trade-off between exploration and exploitation in single agent reinforcement learning. How powerful it will be if there are multiple agents together to explore and communicate with each other, upon which the efficiency of sampling will be dramatically improved. For a recent research result, please see [1].

  2. <strong>Robust Securely</strong>. It is nor rare that some machines suddenly break down in reality, resulting in collapse of the systems. Thus, we need spare machines to avoid unexpected accidents. Thus, multi-agent reinforcement learning comes.

  3. <strong>Transfer and Lifelong Learning</strong>. By teaching and imitating, new agents can learn more faster than learning primitively.

  4. <strong>Cooperation and Competition</strong>. Some Tasks directly need us to cooperate to accomplish, like playing soccer, playing combat games and so on. By teamwork, it can tackle complicated environment. In addition, when it comes to the conflict of self-interest, we need to think about how to achieve best reward. Interesting phenomenons includes Nash Equilibrium.

Problems

We have talked about lots of advantages of multi-agent reinforcement learning. Now, what's the disadvantages or problems in multi-agent reinforcement learning?

  1. <strong>Huge State and Action Space</strong>. It is no doubt that the space of discrete state and action will grow exponentially with the number of agents, not to mention that the state abstraction and representation will be more tough.

  2. <strong>Partially Observation</strong>. Considering that the range single agent can perceive is small from the perspective of whole systems, there is problem of partial observation. Maybe agents need to communicate and then get a deal about the complete state information. If we think further, how to design the mechanism of communication channel among agents is also a trouble. For recent research results, please see [2] [3].

  3. <strong>Instability in Learning</strong>. Because the transition model is determined by all agents, the quality of policy singe agent has learned is affected by other agent's policies. Think when single agent do the same action again, only to find that the next state and reward, it will be confused and do not know how to learn. Under this constitution, the process of learning may be stuck in oscillation.

  4. <strong>Coordination and Cooperation</strong>. In the following picture, agents need to coordinate to escape obstacle and keep formation. That means agent 1 needs to know what's action agent 2 will choose in order to achieve best payoff. Vice Versa. It is impossible to complete such task by only choosing individual actions regardless of other's actions. It will be more complicated when agents need to coordinate with a series of actions.

Coordination

Reference

[1] Maria and Benjamin. Coordinated Exploration in Concurrent Reinforcement Learning. ICML 2018.

[2] Jakob, Yannis, Nando and Shimon. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. NIPS 2016.

[3] Sainbayar, Arthur and Rob. Learning Multiagent Communication with Backpropagation. NIPS 2016.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末簸呈,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子肿孵,更是在濱河造成了極大的恐慌讲衫,老刑警劉巖丢烘,帶你破解...
    沈念sama閱讀 221,888評論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件你稚,死亡現場離奇詭異材蛛,居然都是意外死亡臊泌,警方通過查閱死者的電腦和手機贰您,發(fā)現死者居然都...
    沈念sama閱讀 94,677評論 3 399
  • 文/潘曉璐 我一進店門坏平,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人锦亦,你說我怎么就攤上這事舶替。” “怎么了杠园?”我有些...
    開封第一講書人閱讀 168,386評論 0 360
  • 文/不壞的土叔 我叫張陵顾瞪,是天一觀的道長。 經常有香客問我,道長陈醒,這世上最難降的妖魔是什么惕橙? 我笑而不...
    開封第一講書人閱讀 59,726評論 1 297
  • 正文 為了忘掉前任,我火速辦了婚禮钉跷,結果婚禮上弥鹦,老公的妹妹穿的比我還像新娘。我一直安慰自己爷辙,他們只是感情好彬坏,可當我...
    茶點故事閱讀 68,729評論 6 397
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著膝晾,像睡著了一般栓始。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上玷犹,一...
    開封第一講書人閱讀 52,337評論 1 310
  • 那天混滔,我揣著相機與錄音洒疚,去河邊找鬼歹颓。 笑死,一個胖子當著我的面吹牛油湖,可吹牛的內容都是我干的巍扛。 我是一名探鬼主播,決...
    沈念sama閱讀 40,902評論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼乏德,長吁一口氣:“原來是場噩夢啊……” “哼撤奸!你這毒婦竟也來了?” 一聲冷哼從身側響起喊括,我...
    開封第一講書人閱讀 39,807評論 0 276
  • 序言:老撾萬榮一對情侶失蹤胧瓜,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后郑什,有當地人在樹林里發(fā)現了一具尸體府喳,經...
    沈念sama閱讀 46,349評論 1 318
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 38,439評論 3 340
  • 正文 我和宋清朗相戀三年蘑拯,在試婚紗的時候發(fā)現自己被綠了钝满。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,567評論 1 352
  • 序言:一個原本活蹦亂跳的男人離奇死亡申窘,死狀恐怖弯蚜,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情剃法,我是刑警寧澤碎捺,帶...
    沈念sama閱讀 36,242評論 5 350
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響收厨,放射性物質發(fā)生泄漏悍引。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,933評論 3 334
  • 文/蒙蒙 一帽氓、第九天 我趴在偏房一處隱蔽的房頂上張望趣斤。 院中可真熱鬧,春花似錦黎休、人聲如沸浓领。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,420評論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽联贩。三九已至,卻和暖如春捎拯,著一層夾襖步出監(jiān)牢的瞬間泪幌,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,531評論 1 272
  • 我被黑心中介騙來泰國打工署照, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留祸泪,地道東北人。 一個月前我還...
    沈念sama閱讀 48,995評論 3 377
  • 正文 我出身青樓建芙,卻偏偏與公主長得像没隘,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子禁荸,可洞房花燭夜當晚...
    茶點故事閱讀 45,585評論 2 359