The Agent is securing information regarding its state from the environment. Based on the state it will try to perform some action. This action in turn will return some reward along with the new state of Agent. Throughout an episode, the agent will attempt to maximize its rewards.
The gym provides all of these parameters in its environment. The agent is to be created from multiple algorithms available while bearing in mind- the type of AI we want to develop. Some of those algorithms are listed below:
- DQN (Deep Q-learning)
- DDPG (Deep Deterministic Policy Gradient)
- SARSA (Stage Action Reward Stage Action)
- NAF (Normalized Advantage Function)
In our example, we’ll be using the DQN algorithm which works well with most of the environments of OpenAI. However, the issue with DQN is that it does not support continuous learning.
Into the code
We’ll be training our model on Breakout-ram-v0. There are hundreds of such games and environments to choose from in OpenAI gym.
You can install gym and Keras-rl python library from pip using the following command:
pip install gym keras-rl
First, we’ll start with importing libraries.
We need to feed a DNN to DQNAgent to give the brain to random decision-maker. DNN will be created using the Keras library.
For the RL agent, the Keras-rl library is used. We’re importing EpsGreedyQPolicy as a policy for Agent. SequentialMemory will save the whole Q-table for referencing it as a cheat sheet for all possible state-actions.
Now, start by loading the environment to gym and set the random seed for creating randomness in the environment. Extract out different actions in the environment.
Let’s create a DNN model to pass into DQNAgent. Bigger DNN will result in better accuracy of the model. But before increasing the size of DNN, the memory aspect should be taken into considerations. Bigger DNN will require more memory to store and compute different values.
We first use Epsilon Greedy Policy to give an Agent a set of rules to follow. SequentialMemory is used with a limit of 50000. Then initialize DQNAgent by providing the DNN model and other features as parameters.
At last, fit model with the data from the environment and it will start training the model. Here, we are visualizing the model for better understanding but it will slow the learning process and consume memory resources.
To solve this problem use this line to close the environment and visual window. This will clear the RAM required by the environment and model to train.
What’s next?
In this article, we’ve briefly covered model-based reinforcement learning. These types of algorithms are not capable of handling continuous environments. Instead the DDPG is used for an environment with continuous action space. I’ll be covering the DDPG algorithm in a separate article.
The gym has various continuous environments to train a model. Mujoco and Robotics contain such environments.
Conclusion
In conclusion, OpenAI gym is very useful for emerging and intermediate Reinforcement Learning developers. Moreover researchers can use the gym to test multiple models and find the best performing model.
Additionally, OpenAI is an open-source library that makes it easier for everyone to stay updated on RL revolutions and learn at the same time.