原作者: Yvan Scher
鏈接: https://medium.com/@yvanscher/making-a-game-ai-with-deep-learning-963bb549b3d5
用pysc2和Q Learning制作小型游戲AI
要為游戲制作人工智能,您需要:
1 — 人工智能邏輯馋袜,不管是腳本行為還是人工智能
2 — 把你的游戲世界轉(zhuǎn)換成你的人工智能可以理解和執(zhí)行的東西
本文的目標(biāo)是向您展示構(gòu)建AL邏輯的方法匆赃,從腳本行為到可以學(xué)習(xí)幾乎任何任務(wù)的方法。對(duì)于那些有編程技巧的人來說坯癣,這應(yīng)該被認(rèn)為是一篇介紹bot機(jī)器人構(gòu)建的文章。我們將建立一個(gè)可以玩星際爭(zhēng)霸2小游戲的人工智能最欠。我們將使用python示罗,numpy,pysc2芝硬。Let’s go!
設(shè)置pysc2
如果你想成為一個(gè)游戲AI蚜点,你需要為你的游戲創(chuàng)建一個(gè)接口api,一個(gè)AI可以用來在你的游戲世界中查看拌阴、玩和采取行動(dòng)绍绘。我們將使用星際爭(zhēng)霸2,特別是由deepmind和google發(fā)布的一個(gè)名為pysc2的環(huán)境。首先我們需要安裝星際爭(zhēng)霸2(它是免費(fèi)的)陪拘;我使用的是linux厂镇,所以:
cd ~
curl -O http://blzdistsc2-a.akamaihd.net/Linux/SC2.3.17.zip
unzip -q SC2.3.17.zip
確保您得到了游戲的3.17版本,因?yàn)檩^新的版本似乎不適用于某些pysc2函數(shù)(查看 run_configs/platforms).解壓后大約需要7GB的空間左刽,解壓密碼是“iagreetotheeula”捺信。我在ubuntu上渲染3D視圖時(shí)遇到了一些問題。
如果使用mac欠痴,請(qǐng)確保將游戲安裝在默認(rèn)位置(~)迄靠,并在主文件夾中創(chuàng)建“Maps”、“Replays”文件夾喇辽。使用安裝程序現(xiàn)在讓我們檢查pysc2和pytorch:
conda create -n pysc2 python=3 -y
pip install pysc2
pip install numpy
現(xiàn)在我們需要得到sc2地圖掌挚,我們將用它作為我們的AI的試驗(yàn)地: 在該鏈接上獲取mini games地圖
出于某種原因,pysc2的github上的迷你游戲zip文件在linux上不起作用菩咨。所以我在我的mac電腦上解壓縮了它吠式,然后把它移到了我的linux機(jī)器上。將mini_games文件夾放入StarcraftII安裝文件夾中的Maps
文件夾旦委。小游戲地圖實(shí)際上是隨pysc2一起提供的裙品,但誰知道deepmind是否會(huì)繼續(xù)這樣做罪既。好了,現(xiàn)在我們有了所有的軟件和地圖,讓我們編寫第一個(gè)代理程序谚攒,并檢查pysc2環(huán)境的詳細(xì)信息滑沧。
實(shí)現(xiàn)隨機(jī)AI
我們要做的人工智能將要玩移動(dòng)到信標(biāo)游戲骏全。我們的人工智能將控制一個(gè)海軍陸戰(zhàn)隊(duì)(小型作戰(zhàn)部隊(duì))眠副,并將其移動(dòng)到信標(biāo)。
我要做一個(gè)簡(jiǎn)單的AI宜岛,可以與這個(gè)環(huán)境交互长踊。它會(huì)在地圖上隨機(jī)移動(dòng):
import numpy as np
from pysc2.agents import base_agent
from pysc2.lib import actions
from pysc2.lib import features
from pysc2.env import sc2_env, run_loop, available_actions_printer
from pysc2 import maps
from absl import flags
# define the features the AI can seee
_AI_RELATIVE = features.SCREEN_FEATURES.player_relative.index
# define contstants for actions
_NO_OP = actions.FUNCTIONS.no_op.id
_MOVE_SCREEN = actions.FUNCTIONS.Attack_screen.id
_SELECT_ARMY = actions.FUNCTIONS.select_army.id
# define constants about AI's world
_BACKGROUND = 0
_AI_SELF = 1
_AI_ALLIES = 2
_AI_NEUTRAL = 3
_AI_HOSTILE = 4
# constants for actions
_SELECT_ALL = [0]
_NOT_QUEUED = [0]
def get_marine_location(ai_relative_view):
'''get the indices where the world is equal to 1'''
return (ai_relative_view == _AI_SELF).nonzero()
def get_rand_location(ai_location):
'''gets a random location at least n away from current x,y point.'''
return [np.random.randint(0, 64), np.random.randint(0, 64)]
class Agent1(base_agent.BaseAgent):
# An agent for doing a simple movement form one point to another.
def step(self, obs):
# step function gets called automatically by pysc2 environment
# call the parent class to have pysc2 setup rewards/etc for us
super(Agent1, self).step(obs)
# if we can move our army (we have something selected)
if _MOVE_SCREEN in obs.observation['available_actions']:
# get what the ai can see about the world
ai_view = obs.observation['screen'][_AI_RELATIVE]
# get the location of our marine in this world
marine_x, marine_y = get_marine_location(ai_view)
# it our marine is not on the screen do nothing.
# this happens if we scroll away and look at a different
# part of the world
if not marine_x.any():
return actions.FunctionCall(_NO_OP, [])
target = get_rand_location([marine_x, marine_y])
return actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, target])
# if we can't move, we havent selected our army, so selecto ur army
else:
return actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])
下面是上面代碼運(yùn)行的視頻;AI在地圖上隨機(jī)移動(dòng):
python -m pysc2.bin.agent --map MoveToBeacon --agent agent1.Agent1
功能圖如下所示:
這里沒發(fā)生什么瘋狂的事萍倡。你可以看到一個(gè)海軍陸戰(zhàn)隊(duì)(綠色)和信標(biāo)(灰藍(lán)色)的主視圖身弊。海軍陸戰(zhàn)隊(duì)只是像我們說的那樣隨意移動(dòng)。屏幕的右邊是我們的機(jī)器人可以看到的所有不同的視圖列敲。屏幕上的單位類型不同阱佛,地形高度圖也不同。要查看此部分的更多代碼/說明戴而,請(qǐng)參閱此筆記本凑术。
具有更可讀文本的要素圖層的另一個(gè)示例:
實(shí)現(xiàn)腳本化人工智能
現(xiàn)在我們想做一些比隨機(jī)更好的事情。在移動(dòng)到燈塔迷你游戲中所意,目標(biāo)是移動(dòng)到燈塔淮逊。我們將編寫一個(gè)執(zhí)行此操作的機(jī)器人腳本:
import numpy as np
from pysc2.agents import base_agent
from pysc2.lib import actions
from pysc2.lib import features
from pysc2.env import sc2_env, run_loop, available_actions_printer
from pysc2 import maps
from absl import flags
_AI_RELATIVE = features.SCREEN_FEATURES.player_relative.index
_NO_OP = actions.FUNCTIONS.no_op.id
_MOVE_SCREEN = actions.FUNCTIONS.Attack_screen.id
_SELECT_ARMY = actions.FUNCTIONS.select_army.id
_BACKGROUND = 0
_AI_SELF = 1
_AI_ALLIES = 2
_AI_NEUTRAL = 3
_AI_HOSTILE = 4
_SELECT_ALL = [0]
_NOT_QUEUED = [0]
def get_beacon_location(ai_relative_view):
'''returns the location indices of the beacon on the map'''
return (ai_relative_view == _AI_NEUTRAL).nonzero()
class Agent2(base_agent.BaseAgent):
"""An agent for doing a simple movement form one point to another."""
def step(self, obs):
'''Step function gets called automatically by pysc2 environment'''
super(Agent2, self).step(obs)
if _MOVE_SCREEN in obs.observation['available_actions']:
ai_view = obs.observation['screen'][_AI_RELATIVE]
# get the beacon coordinates
beacon_xs, beacon_ys = get_beacon_location(ai_view)
if not beacon_ys.any():
return actions.FunctionCall(_NO_OP, [])
# get the middle of the beacon and move there
target = [beacon_ys.mean(), beacon_xs.mean()]
return actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, target])
else:
return actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])
下面是這個(gè)機(jī)器人的視頻:
python -m pysc2.bin.agent --map MoveToBeacon --agent agent2.Agent2 --save_replay True
然后您可以在星際爭(zhēng)霸2游戲客戶端中查看重播:
正如你所看到的腳本人工智能玩燈塔游戲催首,并移動(dòng)到燈塔的海軍陸戰(zhàn)隊(duì)。這個(gè)腳本機(jī)器人在運(yùn)行105集后平均每集獲得25個(gè)獎(jiǎng)勵(lì)泄鹏。這個(gè)獎(jiǎng)勵(lì)反映了我們的機(jī)器人在mingame計(jì)時(shí)器啟動(dòng)前(120秒)到達(dá)becaon的能力郎任。我們開發(fā)的任何人工智能都應(yīng)該至少和這個(gè)腳本機(jī)器人一樣好,所以一次訓(xùn)練的平均分是25分备籽。接下來涝滴,我們將使用強(qiáng)化學(xué)習(xí)實(shí)現(xiàn)一個(gè)實(shí)際的人工智能(學(xué)習(xí)如何玩)。
實(shí)現(xiàn)Q Learning人工智能
這種方法是一種稱為“Q Learning”的方法的變體胶台,它試圖為游戲世界中的每一個(gè)狀態(tài)學(xué)習(xí)一種稱為“質(zhì)量”的值,并將更高的質(zhì)量歸因于能夠帶來更多回報(bào)的狀態(tài)杂抽。我們創(chuàng)建一個(gè)表(稱為Qtable)诈唬,其中游戲世界的所有可能狀態(tài)都在y軸上,所有可能的操作都在x軸上缩麸。質(zhì)量值存儲(chǔ)在此表中铸磅,并告訴我們?cè)谌魏慰赡艿臓顟B(tài)下應(yīng)采取的操作。下面是Qtable的一個(gè)示例:
因此杭朱,當(dāng)我們的AI選擇了海軍陸戰(zhàn)隊(duì)員阅仔,但它不在信標(biāo)上時(shí),state=(1弧械,0)
八酒,我們的AI將了解到移動(dòng)到信標(biāo)上的值(索引3處的動(dòng)作)與處于相同狀態(tài)的其他動(dòng)作相比是最高的。當(dāng)它沒有選擇海軍陸戰(zhàn)隊(duì)員刃唐,并且它不在信標(biāo)上時(shí)羞迷,state=(0,0)
画饥,我們的AI了解到select海軍陸戰(zhàn)隊(duì)員有最高的值(在索引1處的動(dòng)作)衔瓮。當(dāng)它是一個(gè)信標(biāo),它選擇了海軍陸戰(zhàn)隊(duì)抖甘,state =(1热鞍,1)
,什么都不做是有價(jià)值的衔彻。
在update Q Table函數(shù)中更新Q表時(shí)薇宠,我們遵循以下公式:
它基本上是說把我們對(duì)采取行動(dòng)的回報(bào)的估計(jì)值和實(shí)際采取行動(dòng)的回報(bào)進(jìn)行比較,然后把這個(gè)差異調(diào)整我們的Q值米奸,以減少一點(diǎn)錯(cuò)誤昼接。我們的人工智能將獲取狀態(tài)信息并發(fā)出一個(gè)要采取的行動(dòng)。我已經(jīng)簡(jiǎn)化了這個(gè)世界狀態(tài)和操作悴晰,以便更容易地學(xué)習(xí)Q表并保持代碼簡(jiǎn)潔慢睡。我們給我們的代理人一個(gè)選擇逐工,而不是硬編碼邏輯總是移動(dòng)到信標(biāo)。我們給了它6件可以做的事:
_NO_OP
— 什么也不做漂辐。
_SELECT_ARMY
— 選擇海軍陸戰(zhàn)隊(duì)泪喊。
__SELECT_POINT
— 取消選擇海軍陸戰(zhàn)隊(duì)。
_MOVE_SCREEN
— 移動(dòng)到信標(biāo)髓涯。
_MOVERAND
— 移動(dòng)到不是信標(biāo)的隨機(jī)點(diǎn)袒啼。
_MOVE_MIDDLE
— 移到地圖中間的一點(diǎn)。
這是我們的預(yù)培訓(xùn)代理Q表代碼:
import math
import numpy as np
from pysc2.agents import base_agent
from pysc2.lib import actions
from pysc2.lib import features
from pysc2.env import sc2_env, run_loop, available_actions_printer
from pysc2 import maps
from absl import flags
_AI_RELATIVE = features.SCREEN_FEATURES.player_relative.index
_AI_SELECTED = features.SCREEN_FEATURES.selected.index
_NO_OP = actions.FUNCTIONS.no_op.id
_MOVE_SCREEN = actions.FUNCTIONS.Attack_screen.id
_SELECT_ARMY = actions.FUNCTIONS.select_army.id
_SELECT_POINT = actions.FUNCTIONS.select_point.id
_MOVE_RAND = 1000
_MOVE_MIDDLE = 2000
_BACKGROUND = 0
_AI_SELF = 1
_AI_ALLIES = 2
_AI_NEUTRAL = 3
_AI_HOSTILE = 4
_SELECT_ALL = [0]
_NOT_QUEUED = [0]
EPS_START = 0.9
EPS_END = 0.025
EPS_DECAY = 2500
possible_actions = [
_NO_OP,
_SELECT_ARMY,
_SELECT_POINT,
_MOVE_SCREEN,
_MOVE_RAND,
_MOVE_MIDDLE
]
def get_eps_threshold(steps_done):
return EPS_END + (EPS_START - EPS_END) * math.exp(-1. * steps_done / EPS_DECAY)
def get_state(obs):
ai_view = obs.observation['screen'][_AI_RELATIVE]
beaconxs, beaconys = (ai_view == _AI_NEUTRAL).nonzero()
marinexs, marineys = (ai_view == _AI_SELF).nonzero()
marinex, mariney = marinexs.mean(), marineys.mean()
marine_on_beacon = np.min(beaconxs) <= marinex <= np.max(beaconxs) and np.min(beaconys) <= mariney <= np.max(beaconys)
ai_selected = obs.observation['screen'][_AI_SELECTED]
marine_selected = int((ai_selected == 1).any())
return (marine_selected, int(marine_on_beacon)), [beaconxs, beaconys]
class QTable(object):
def __init__(self, actions, lr=0.01, reward_decay=0.9, load_qt=None, load_st=None):
self.lr = lr
self.actions = actions
self.reward_decay = reward_decay
self.states_list = set()
self.load_qt = load_qt
if load_st:
temp = self.load_states(load_st)
self.states_list = set([tuple(temp[i]) for i in range(len(temp))])
if load_qt:
self.q_table = self.load_qtable(load_qt)
else:
self.q_table = np.zeros((0, len(possible_actions))) # create a Q table
def get_action(self, state):
if not self.load_qt and np.random.rand() < get_eps_threshold(steps):
return np.random.randint(0, len(self.actions))
else:
if state not in self.states_list:
self.add_state(state)
idx = list(self.states_list).index(state)
q_values = self.q_table[idx]
return int(np.argmax(q_values))
def add_state(self, state):
self.q_table = np.vstack([self.q_table, np.zeros((1, len(possible_actions)))])
self.states_list.add(state)
def update_qtable(self, state, next_state, action, reward):
if state not in self.states_list:
self.add_state(state)
if next_state not in self.states_list:
self.add_state(next_state)
state_idx = list(self.states_list).index(state)
next_state_idx = list(self.states_list).index(next_state)
q_state = self.q_table[state_idx, action]
q_next_state = self.q_table[next_state_idx].max()
q_targets = reward + (self.reward_decay * q_next_state)
loss = q_targets - q_state
self.q_table[state_idx, action] += self.lr * loss
return loss
def get_size(self):
print(self.q_table.shape)
def save_qtable(self, filepath):
np.save(filepath, self.q_table)
def load_qtable(self, filepath):
return np.load(filepath)
def save_states(self, filepath):
temp = np.array(list(self.states_list))
np.save(filepath, temp)
def load_states(self, filepath):
return np.load(filepath)
class Agent3(base_agent.BaseAgent):
def __init__(self, load_qt=None, load_st=None):
super(Agent3, self).__init__()
self.qtable = QTable(possible_actions, load_qt='agent3_qtable.npy', load_st='agent3_states.npy')
def step(self, obs):
'''Step function gets called automatically by pysc2 environment'''
super(Agent3, self).step(obs)
state, beacon_pos = get_state(obs)
action = self.qtable.get_action(state)
func = actions.FunctionCall(_NO_OP, [])
if possible_actions[action] == _NO_OP:
func = actions.FunctionCall(_NO_OP, [])
elif state[0] and possible_actions[action] == _MOVE_SCREEN:
beacon_x, beacon_y = beacon_pos[0].mean(), beacon_pos[1].mean()
func = actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, [beacon_y, beacon_x]])
elif possible_actions[action] == _SELECT_ARMY:
func = actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])
elif state[0] and possible_actions[action] == _SELECT_POINT:
ai_view = obs.observation['screen'][_AI_RELATIVE]
backgroundxs, backgroundys = (ai_view == _BACKGROUND).nonzero()
point = np.random.randint(0, len(backgroundxs))
backgroundx, backgroundy = backgroundxs[point], backgroundys[point]
func = actions.FunctionCall(_SELECT_POINT, [_NOT_QUEUED, [backgroundy, backgroundx]])
elif state[0] and possible_actions[action] == _MOVE_RAND:
beacon_x, beacon_y = beacon_pos[0].max(), beacon_pos[1].max()
movex, movey = np.random.randint(beacon_x, 64), np.random.randint(beacon_y, 64)
func = actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, [movey, movex]])
elif state[0] and possible_actions[action] == _MOVE_MIDDLE:
func = actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, [32, 32]])
return func
通過下載兩個(gè)文件(agent3_qtable.npy纬纪, agent3_states.npy)蚓再,使用預(yù)先準(zhǔn)備好的代碼運(yùn)行此代理:
python -m pysc2.bin.agent --map MoveToBeacon --agent agent3.Agent3
這個(gè)人工智能可以匹配的獎(jiǎng)勵(lì)25每集我們的腳本人工智能可以得到一次訓(xùn)練。它在地圖周圍嘗試了很多不同的移動(dòng)包各,并注意到在海洋和信標(biāo)位置相互重疊的州摘仅,它會(huì)得到獎(jiǎng)勵(lì)。然后问畅,它試圖在每一個(gè)導(dǎo)致這種結(jié)果的州采取行動(dòng)娃属,以獲得最大的回報(bào)。這是一段人工智能早期播放的視頻:
這是一個(gè)視頻护姆,一旦它知道移動(dòng)到燈塔提供獎(jiǎng)勵(lì):
您還可以使用notebook代碼培訓(xùn)自己的AI矾端。
結(jié)論和未來打算
在這篇文章中,我想向你們展示三種編程人工智能行為卵皂,隨機(jī)的秩铆,腳本化的,和Q Learning AI灯变。
正如《悖論發(fā)展》(Paradox development)雜志的馬丁·安沃德所說:“復(fù)雜游戲的機(jī)器學(xué)習(xí)在這個(gè)時(shí)候主要是科幻小說豺旬。”我同意馬丁的一些觀點(diǎn)柒凉。我仍然認(rèn)為在游戲中機(jī)器學(xué)習(xí)是有潛力的族阅。剩下的部分是如何使用加權(quán)列表來制作一個(gè)好的人工智能;神經(jīng)網(wǎng)絡(luò)和加權(quán)列表是一樣的膝捞,但它是學(xué)習(xí)的坦刀。最困難的是,馬丁就在這里蔬咬,復(fù)雜的交互對(duì)于一臺(tái)計(jì)算機(jī)來說是很難拼湊和推理的鲤遥。
這是我們的Q學(xué)習(xí)信標(biāo)AI,這次以正常速度播放:
在這篇文章中林艘,我堅(jiān)持使用一個(gè)迷你游戲盖奈,因?yàn)槲蚁胍恍┮子趯?shí)驗(yàn)和編程的東西。pysc2足夠復(fù)雜狐援,可以工作钢坦。我們還沒有訓(xùn)練我們的Q學(xué)習(xí)代理來識(shí)別并移動(dòng)到beacon(我們只是將其作為一個(gè)選項(xiàng))究孕。這是可能的,但超出了本文介紹的范圍爹凹。在未來厨诸,我們將做一個(gè)類似DQN的Deepmind的論文,也許可以解決這個(gè)更復(fù)雜的任務(wù)禾酱。
最后請(qǐng)訂閱Generation Machine微酬。這里是代碼和notebook