Reinforcement learning is a broad field with millions of use cases. All these cases are never similar to each other in the real world. So, Agent should be capable of getting the task done under worst-case scenarios.

Normally, it is assumed to use the greedy approach for solving basic RL problems like games. But as the environment gets complex, it gets harder to choose an algorithm. The greedy approach could not be used for some continuous real-world tasks like self-driving. Greedy model would not be able to differentiate between the value of the life of a person and the urgency of a driver.

So, RL algorithms are divided into two separate fields like model-based and model-free RL. They could also be known as off-policy and on-policy algorithms respectively.

Model-based algorithm updates Q-table of the next state S** **and greedy action A. Based on the highest reward, it chooses the next action. At last, it tries to maximize the rewards of all episodes this way.

It is also known as off-policy model as it’s primary job is to understand the environment and then create a state-action table. This table is used for getting the prediction of rewards in every state.

In off-policy methods, the policy used to generate behavior, called the behavior policy, maybe unrelated to the policy that is evaluated and improved called estimation policy. DQN is an example of a model-based algorithm.

Suppose you are learning to swim in swimming pool. You’ll learn it by failing and getting experience from your failures. Your swimming model will be trained based on the conditions of the swimming pool. Now, if you are told to swim in flowing water than it will be a challenging task for your model.

Model-free algorithms update Q-table of the next state S and current policy’s action A’. It won’t try to understand the whole environment but instead, it follows policy approach. Policy could be some algorithm like the actor-critic. DDPG is an example of a model-free algorithm which is based on the actor-critic approach.

Here, the main difference is that model-based algorithm tries to get familiar with its environment. The model-free algorithm tries to optimize its policy gradient. If the environment is changed completely then, the model-free algorithm has a higher chance of success than a model-based algorithm.

There are several methods that could be used to differentiate between a model-based and a model-free algorithm. Those methods are as follows:

- If reward is estimated before the action is taken then this is a model-based algorithm.
- If the accuracy of the model decreased with change in the environment then it could be a model-based algorithm.

In the real world, we don’t have a fixed environment in every situation. So, most of the use cases could be solved using a model-free algorithm. Self-driving cars, robots, big games like AlphaGo.

In conclusion, for fixed environment problems like factory robots, the model-based algorithm would be more suitable and for AI with real-world interactions, the model-free algorithm is best suited.