What is Reinforcement Learning?
Reinforcement learning is about how agents interact with the environment to increase rewards. It is a machine learning method to decide how to take action with the environment for better results. It is one of the three basic machine learning paradigms alongside unsupervised learning and supervised learning. In reinforcement learning, attaining a complex objective or maximizing a specific dimension over many steps is done. The major difference between supervised learning and reinforcement learning can be seen in the below table :
|Supervised learning||Reinforcement Learning|
|In supervised learning, we have output labels and it learns from the output labels||In reinforcement learning, we don’t have output labels but the agent learns the interaction it has with the environment.|
|It solves classification and regression problems||It solves reward-based problems|
|It works on external supervision||No external supervision|
|E.g. Spam detection||E.g Robot learning|
b) Important terms in Reinforcement Learning
Agent: The entity which performs actions in the environment to gain rewards.
For e.g. drone navigating for a delivery
State: It is a configuration in which the agent puts itself in relation to other situations like obstacles, enemies, etc. It is a situation in which the agent finds itself
Reward: It is a metric for the success and failure of an agent’s action in a given state.
Action: It is a set of all the possible actions which an agent can take.
Environment: The world with which the agent interacts, which response to the agent.
c) Types of Reinforcement Learning
Positive reinforcement learning:
In this type of learning, the impact is positive on the actions taken by the agent. It maximizes the performance and sustains changes for a long period of time but too much of reinforcement leads to an overload of states which diminish the results
Negative reinforcement Learning:
In this type of learning, there is a strengthening of behavior because a negative condition is stopped or avoided. The minimum stand of performance gets defined by this method but there is one drawback that it provides enough to meet up the minimum behavior.
d) Learning models of Reinforcement
The two most popular algorithms used in reinforcement learning are :
1) Markov Decision process :
Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where results are partly random and partly under the control of a decision-maker.
The following parameters are used to get a solution:
And the mathematical approach for mapping a solution in reinforcement Learning is termed as a Markov Decision Process.
2) Q-Learning :
It is a value-based approach to supply information to inform which action agent should take.
Q-Table is a lookup table where we calculate the maximum expected future rewards for action at each state. This table will guide us to the best action at each state.
e) Application and Challenges of Reinforcement Learning:
Industrial automation using robotics
Bots in gaming
Aircraft and automobile control
Realistic environments can have partial observability.
Too much reinforcement can lead to diminishing returns
Parameters can affect the speed of learning