In the recent past, OpenAI has grabbed the attention and awe of many IT professionals. OpenAI has immense community support that tries to create human-like intelligence through reinforcement learning. Additionally, it provides this community with a gym.

The gym is a laboratory for engineers that want to train diverse reinforcement learning agents. It has varied environments like games and robots in the simulator. In this article, we’ll use one such environment to train an RL agent from scratch.

Prerequisites

Python
TensorFlow
Basic understanding of deep learning

Basics for OpenAI Gym

Before moving to code, we need to learn some important terms.

State: The situation of player at any given time is called its state.
Action: Any move that the agent takes is called action.
Reward: Based on the action, the agent will get positive or negative rewards.
Episodes: Episode can be described as the time between the start of the game and the end of the game.
Policy: The set of rules for Agents to follow in the environment at any given state, with an intention of securing high rewards.

How an Agent in RL learns?

Learning in RL is explained clearly in the figure below:

The Agent is securing information regarding its state from the environment. Based on the state it will try to perform some action. This action in turn will return some reward along with the new state of Agent. Throughout an episode, the agent will attempt to maximize its rewards.

The gym provides all of these parameters in its environment. The agent is to be created from multiple algorithms available while bearing in mind- the type of AI we want to develop. Some of those algorithms are listed below:

DQN (Deep Q-learning)
DDPG (Deep Deterministic Policy Gradient)
SARSA (Stage Action Reward Stage Action)
NAF (Normalized Advantage Function)

In our example, we’ll be using the DQN algorithm which works well with most of the environments of OpenAI. However, the issue with DQN is that it does not support continuous learning.

Into the code

We’ll be training our model on Breakout-ram-v0. There are hundreds of such games and environments to choose from in OpenAI gym.

You can install gym and Keras-rl python library from pip using the following command:

pip install gym keras-rl

First, we’ll start with importing libraries.

1import numpy as np
2import gym
3
4from keras.models import Sequential
5from keras.layers import Dense, Activation, Flatten
6from keras.optimizers import Adam
7
8from rl.agents.dqn import DQNAgent
9from rl.policy import EpsGreedyQPolicy
10from rl.memory import SequentialMemory

We need to feed a DNN to DQNAgent to give the brain to random decision-maker. DNN will be created using the Keras library

1ENV_NAME = 'Breakout-ram-v0'
2
3# Get the environment
4env = gym.make(ENV_NAME)
5np.random.seed(33)
6env.seed(33)
7
8# Extract the number of actions available in Breakout, i.e. left and right
9nb_actions = env.action_space.n

For the RL agent, the Keras-rl library is used. We’re importing EpsGreedyQPolicy as a policy for Agent. SequentialMemory will save the whole Q-table for referencing it as a cheat sheet for all possible state-actions.

Now, start by loading the environment to gym and set the random seed for creating randomness in the environment. Extract out different actions in the environment.

1model = Sequential()
2model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
3model.add(Dense(16))
4model.add(Activation('relu'))
5model.add(Dense(nb_actions))
6model.add(Activation('linear'))
7print(model.summary())

Let’s create a DNN model to pass into DQNAgent. Bigger DNN will result in better accuracy of the model. But before increasing the size of DNN, the memory aspect should be taken into considerations. Bigger DNN will require more memory to store and compute different values.

1# Using Epsilon Greedy Policy
2policy = EpsGreedyQPolicy()
3
4# Using Sequential memory with limit of 50000
5memory = SequentialMemory(limit=50000, window_length=1)
6
7# Initializing DQNAgent
8dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy)
9dqn.compile(Adam(lr=1e-3), metrics=['mae'])
10
11# Fit data to the model we initialized.
12dqn.fit(env, nb_steps=30000, visualize=True, verbose=2)

We first use Epsilon Greedy Policy to give an Agent a set of rules to follow. SequentialMemory is used with a limit of 50000. Then initialize DQNAgent by providing the DNN model and other features as parameters.

At last, fit model with the data from the environment and it will start training the model. Here, we are visualizing the model for better understanding but it will slow the learning process and consume memory resources.

1env.close()

To solve this problem use this line to close the environment and visual window. This will clear the RAM required by the environment and model to train.

What’s next?

In this article, we’ve briefly covered model-based reinforcement learning. These types of algorithms are not capable of handling continuous environments. Instead the DDPG is used for an environment with continuous action space. I’ll be covering the DDPG algorithm in a separate article.

The gym has various continuous environments to train a model. Mujoco and Robotics contain such environments.

Conclusion

In conclusion, OpenAI gym is very useful for emerging and intermediate Reinforcement Learning developers. Moreover researchers can use the gym to test multiple models and find the best performing model.

Additionally, OpenAI is an open-source library that makes it easier for everyone to stay updated on RL revolutions and learn at the same time.

Also read: A Beginner's Guide to Understanding Deep Q-Learning.

Download report

Authors

Vivek Padia

Software Engineer

I work with Aubergine Solutions as a Machine Learning engineer. We believe in having a problem-solving attitude. I have worked with several different technologies related to ML and integrating them with cloud-based services.

View Author