的士司機(jī)接客游戲簡(jiǎn)介:
在這個(gè)游戲中,黃色方塊代表出租車(chē)脉执,(“|”)表示一堵墻踢代,藍(lán)色字母代表接乘客的位置盲憎,紫色字母是乘客下車(chē)的位置,出租車(chē)上有乘客時(shí)就會(huì)變綠胳挎。你作為的士司機(jī)要找到最快速接客和下客的路徑饼疙。
接到客人了
放下客人,完成慕爬!
思路:
客人和下客的位置是隨機(jī)出現(xiàn)的窑眯,不可能用機(jī)械的方法找出變動(dòng)的路徑屏积,可以用人工智能強(qiáng)化學(xué)習(xí)方法學(xué)習(xí)出最優(yōu)路徑。
代碼:
import gym
import numpy as np
env = gym.make('Taxi-v2')
Q = np.zeros((env.observation_space.n,env.action_space.n))
def trainQ():
for _ in range(10000):
observation = env.reset()
while True:
action = env.action_space.sample()
observation_,reward, done,info = env.step(action)
Q[observation,action] = reward + 0.75 * Q[observation_].max()
observation = observation_
if done:break
return Q
def findway():
observation = env.reset()
rewards = 0
while True:
action = Q[observation].argmax()
observation_,reward, done,info = env.step(action)
print(observation_,reward, done,info)
rewards += reward
observation = observation_
env.render()
if done:
print(rewards)
break
Q = trainQ()
findway()
測(cè)試:
客人在這
接到客人了
開(kāi)車(chē)
開(kāi)車(chē)
開(kāi)車(chē)
下客磅甩,完工肾请!
倒立擺游戲簡(jiǎn)介:
思路:
倒立擺游戲比的士司機(jī)游戲復(fù)雜,原因在于倒立擺的連續(xù)狀態(tài)是無(wú)窮多個(gè)更胖,人工智能 Q-learning 方法需要有限個(gè)狀態(tài)形成知識(shí)。
解決方案:只需要將連續(xù)狀態(tài)打散為離散狀態(tài)即可隔显。
代碼:
import gym
import numpy as np
env = gym.make('CartPole-v0')
eplision = 0.01
q_table = np.zeros((256,2))
def bins(clip_min, clip_max, num):
return np.linspace(clip_min, clip_max, num + 1)[1:-1]
def digitize_state(observation):
cart_pos, cart_v, pole_angle, pole_v = observation
digitized = [np.digitize(cart_pos, bins=bins(-2.4, 2.4, 4)),
np.digitize(cart_v, bins=bins(-3.0, 3.0, 4)),
np.digitize(pole_angle, bins=bins(-0.5, 0.5, 4)),
np.digitize(pole_v, bins=bins(-2.0, 2.0, 4))]
return sum([x * (4 ** i) for i, x in enumerate(digitized)])
#------------觀察期--------------#
for _ in range(1000):
observation = env.reset()
s = digitize_state(observation)
while True:
action = env.action_space.sample()
observation_, reward, done, info = env.step(action)
if done:reward = -200
s_ = digitize_state(observation_)
action_ = env.action_space.sample()
q_table[s,action] = reward + 0.85*q_table[s_,action_]
s,action = s_,action_
if done:break
print('觀察期結(jié)束')
#------------貪心策略期--------------#
for epicode in range(1000):
observation = env.reset()
s = digitize_state(observation)
while True:
eplision = epicode / 1000
action = q_table[s].argmax() if np.random.random() < eplision else env.action_space.sample()
observation_, reward, done, info = env.step(action)
if done:reward = -200
s_ = digitize_state(observation_)
action_ = q_table[s_].argmax() if np.random.random() < eplision else env.action_space.sample()
q_table[s,action] = reward + 0.85*q_table[s_,action_]
s,action = s_ ,action_
if done:break
print('貪心策略期結(jié)束')
#------------驗(yàn)證期--------------#
scores = []
for _ in range(100):
score = 0
observation = env.reset()
s = digitize_state(observation)
while True:
action = q_table[s].argmax()
observation_, reward, done, info = env.step(action)
score += reward
s = digitize_state(observation_)
#env.render()
if done:
scores.append(score)
break
print('驗(yàn)證期結(jié)束\n驗(yàn)證成績(jī):%s'%np.max(scores))
測(cè)試:
觀察期結(jié)束
貪心策略期結(jié)束
驗(yàn)證期結(jié)束
驗(yàn)證成績(jī):200.0
倒立擺
紀(jì)要:
離散化函數(shù)原先為
def digitize_state(observation,bin=5):
high_low = np.vstack([env.observation_space.high,env.observation_space.low]).T[:,::-1]
bins = np.vstack([np.linspace(*i,bin) for i in high_low])
state = [np.digitize(state,bin).tolist() for state,bin in zip(observation,bins)]
state = sum([value*2**index for index,value in enumerate(state)])
return state
效果并不好却妨,在網(wǎng)上找了個(gè)好的替換:
def bins(clip_min, clip_max, num):
return np.linspace(clip_min, clip_max, num + 1)[1:-1]
def digitize_state(observation):
cart_pos, cart_v, pole_angle, pole_v = observation
digitized = [np.digitize(cart_pos, bins=bins(-2.4, 2.4, 4)),
np.digitize(cart_v, bins=bins(-3.0, 3.0, 4)),
np.digitize(pole_angle, bins=bins(-0.5, 0.5, 4)),
np.digitize(pole_v, bins=bins(-2.0, 2.0, 4))]
return sum([x * (4 ** i) for i, x in enumerate(digitized)])
可見(jiàn)要做好人工智能還是需要了解狀態(tài)參數(shù)大致意義,不管含義什么都喂進(jìn)智能體的話括眠,智能體表現(xiàn)不會(huì)太好彪标。