print('はじまり')
array = [1,2,3]
def search(s):
print("sだよ %s" %s)
if s not in array:
v = s + 10
print("途中で帰るv %s" % v)
return -v
next_s = s + 1
v = search(next_s)
print('最後まで行ったv:%s s:%s' %(v,s))
return -v
for x in range(5):
value = search(x)
print("vだよ %s" %value)
print('おわり')
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
Observation:
Type: Box(4)
Num Observation Min Max
0 Cart Position -4.8 4.8
1 Cart Velocity -Inf Inf
2 Pole Angle -24° 24°
3 Pole Velocity At Tip -Inf Inf
そして、そもそも終了条件ってなんなん??
って思うと、これもソースコード見ると書いてありました。
Episode Termination:
Pole Angle is more than ±12°
Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)
Episode length is greater than 200
Solved Requirements
Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.