1. 組成
OpenAI Gym由兩部分組成:
- gym開源庫:測試問題的集合单匣。當你測試強化學(xué)習(xí)的時候伍纫,測試問題就是環(huán)境噪沙,比如機器人玩游戲,環(huán)境的集合就是游戲的畫面哄辣。這些環(huán)境有一個公共的接口请梢,允許用戶設(shè)計通用的算法赠尾。
- OpenAI Gym服務(wù):提供一個站點(比如對于游戲cartpole-v0:https://gym.openai.com/envs/CartPole-v0)和api,允許用戶對他們的測試結(jié)果進行比較毅弧。
2. 接口
gym的核心接口是Env气嫁,作為統(tǒng)一的環(huán)境接口。Env包含下面幾個核心函數(shù):
-
reset(self)
:重置環(huán)境的狀態(tài)够坐,返回觀測寸宵。 -
step(self, action)
:物理引擎,向前推進一個時間步長元咙,返回observation梯影,reward,done庶香,info
-
render(self, mode=’human’, close=False)
:圖像引擎甲棍,重繪環(huán)境的一幀。默認模式一般比較友好赶掖,如彈出一個窗口感猛。
3. 注冊自己的模擬器
- 目標是在注冊表中注冊自己的環(huán)境。假設(shè)你在以下結(jié)構(gòu)中定義了自己的環(huán)境:
myenv/
__init__.py
myenv.py
-
myenv.py
包含適用于我們自己的環(huán)境的類奢赂。 在init.py
中陪白,輸入以下代碼:
from gym.envs.registration import register
register(
id='MyEnv-v0',
entry_point='myenv.myenv:MyEnv', # 第一個myenv是文件夾名字,第二個myenv是文件名字膳灶,MyEnv是文件內(nèi)類的名字
)
- 要使用我們自己的環(huán)境:
import gym
import myenv # 一定記得導(dǎo)入自己的環(huán)境咱士,這是很容易忽略的一點
env = gym.make('MyEnv-v0')
- 在PYTHONPATH中安裝
myenv
目錄或從父目錄啟動python。
目錄結(jié)構(gòu):
myenv/
__init__.py
my_hotter_colder.py
-------------------
__init__.py 文件:
-------------------
from gym.envs.registration import register
register(
id='MyHotterColder-v0',
entry_point='myenv.my_hotter_colder:MyHotterColder',
)
-------------------
my_hotter_colder.py文件:
-------------------
import gym
from gym import spaces
from gym.utils import seeding
import numpy as np
class MyHotterColder(gym.Env):
"""Hotter Colder
The goal of hotter colder is to guess closer to a randomly selected number
After each step the agent receives an observation of:
0 - No guess yet submitted (only after reset)
1 - Guess is lower than the target
2 - Guess is equal to the target
3 - Guess is higher than the target
The rewards is calculated as:
(min(action, self.number) + self.range) / (max(action, self.number) + self.range)
Ideally an agent will be able to recognise the 'scent' of a higher reward and
increase the rate in which is guesses in that direction until the reward reaches
its maximum
"""
def __init__(self):
self.range = 1000 # +/- value the randomly select number can be between
self.bounds = 2000 # Action space bounds
self.action_space = spaces.Box(low=np.array([-self.bounds]), high=np.array([self.bounds]))
self.observation_space = spaces.Discrete(4)
self.number = 0
self.guess_count = 0
self.guess_max = 200
self.observation = 0
self.seed()
self.reset()
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def step(self, action):
assert self.action_space.contains(action)
if action < self.number:
self.observation = 1
elif action == self.number:
self.observation = 2
elif action > self.number:
self.observation = 3
reward = ((min(action, self.number) + self.bounds) / (max(action, self.number) + self.bounds)) ** 2
self.guess_count += 1
done = self.guess_count >= self.guess_max
return self.observation, reward[0], done, {"number": self.number, "guesses": self.guess_count}
def reset(self):
self.number = self.np_random.uniform(-self.range, self.range)
self.guess_count = 0
self.observation = 0
return self.observation
參考: