Understanding how OpenAI gym works
![](https://cdn.prod.website-files.com/6706802514ffa549d0bf0b8a/675d87862fc54945bb4f701a_0_DAzAlmItIpU2KMDw.webp)
Photo by Ryan Quintal on Unsplash
In the recent past, OpenAI has grabbed the attention and awe of many IT professionals. OpenAI has immense community support that tries to create human-like intelligence through reinforcement learning. Additionally, it provides this community with a gym.
The gym is a laboratory for engineers that want to train diverse reinforcement learning agents. It has varied environments like games and robots in the simulator. In this article, we’ll use one such environment to train an RL agent from scratch.
Prerequisites
Before moving to code, we need to learn some important terms.
Learning in RL is explained clearly in the figure below:
The Agent is securing information regarding its state from the environment. Based on the state it will try to perform some action. This action in turn will return some reward along with the new state of Agent. Throughout an episode, the agent will attempt to maximize its rewards.
The gym provides all of these parameters in its environment. The agent is to be created from multiple algorithms available while bearing in mind- the type of AI we want to develop. Some of those algorithms are listed below:
In our example, we’ll be using the DQN algorithm which works well with most of the environments of OpenAI. However, the issue with DQN is that it does not support continuous learning.
We’ll be training our model on Breakout-ram-v0. There are hundreds of such games and environments to choose from in OpenAI gym.
You can install gym and Keras-rl python library from pip using the following command:
pip install gym keras-rl
First, we’ll start with importing libraries.
1import numpy as np
2import gym
3
4from keras.models import Sequential
5from keras.layers import Dense, Activation, Flatten
6from keras.optimizers import Adam
7
8from rl.agents.dqn import DQNAgent
9from rl.policy import EpsGreedyQPolicy
10from rl.memory import SequentialMemory
We need to feed a DNN to DQNAgent to give the brain to random decision-maker. DNN will be created using the Keras library
1ENV_NAME = 'Breakout-ram-v0'
2
3# Get the environment
4env = gym.make(ENV_NAME)
5np.random.seed(33)
6env.seed(33)
7
8# Extract the number of actions available in Breakout, i.e. left and right
9nb_actions = env.action_space.n
For the RL agent, the Keras-rl library is used. We’re importing EpsGreedyQPolicy as a policy for Agent. SequentialMemory will save the whole Q-table for referencing it as a cheat sheet for all possible state-actions.
Now, start by loading the environment to gym and set the random seed for creating randomness in the environment. Extract out different actions in the environment.
1model = Sequential()
2model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
3model.add(Dense(16))
4model.add(Activation('relu'))
5model.add(Dense(nb_actions))
6model.add(Activation('linear'))
7print(model.summary())
Let’s create a DNN model to pass into DQNAgent. Bigger DNN will result in better accuracy of the model. But before increasing the size of DNN, the memory aspect should be taken into considerations. Bigger DNN will require more memory to store and compute different values.
1# Using Epsilon Greedy Policy
2policy = EpsGreedyQPolicy()
3
4# Using Sequential memory with limit of 50000
5memory = SequentialMemory(limit=50000, window_length=1)
6
7# Initializing DQNAgent
8dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy)
9dqn.compile(Adam(lr=1e-3), metrics=['mae'])
10
11# Fit data to the model we initialized.
12dqn.fit(env, nb_steps=30000, visualize=True, verbose=2)
We first use Epsilon Greedy Policy to give an Agent a set of rules to follow. SequentialMemory is used with a limit of 50000. Then initialize DQNAgent by providing the DNN model and other features as parameters.
At last, fit model with the data from the environment and it will start training the model. Here, we are visualizing the model for better understanding but it will slow the learning process and consume memory resources.
1env.close()
To solve this problem use this line to close the environment and visual window. This will clear the RAM required by the environment and model to train.
In this article, we’ve briefly covered model-based reinforcement learning. These types of algorithms are not capable of handling continuous environments. Instead the DDPG is used for an environment with continuous action space. I’ll be covering the DDPG algorithm in a separate article.
The gym has various continuous environments to train a model. Mujoco and Robotics contain such environments.
In conclusion, OpenAI gym is very useful for emerging and intermediate Reinforcement Learning developers. Moreover researchers can use the gym to test multiple models and find the best performing model.
Additionally, OpenAI is an open-source library that makes it easier for everyone to stay updated on RL revolutions and learn at the same time.
Also read: A Beginner's Guide to Understanding Deep Q-Learning.