What is Reinforcement Learning?


Reinforcement learning is about how agents interact with the environment to increase rewards. It is a machine learning method to decide how to take action with the environment for better results. It is one of the three basic machine learning paradigms alongside unsupervised learning and supervised learning. In reinforcement learning, attaining a complex objective or maximizing a specific dimension over many steps is done. The major difference between supervised learning and reinforcement learning can be seen in the below table :

Supervised learningReinforcement  Learning
In supervised learning, we have output labels and it learns from the output labelsIn reinforcement learning, we don’t have output labels but the agent learns the interaction it has with the environment.
It solves classification and regression problemsIt solves reward-based problems
It works on external supervisionNo external supervision
E.g. Spam detectionE.g Robot learning

b) Important terms in Reinforcement Learning

Agent: The entity which performs actions in the environment to gain rewards.

For e.g. drone navigating for a delivery

State: It is a configuration in which the agent puts itself in relation to other situations like obstacles,    enemies, etc. It is a situation in which the agent finds itself

Reward: It is a metric for the success and failure of an agent’s action in a given state.

Action: It is a set of all the possible actions which an agent can take.

Environment: The world with which the agent interacts, which response to the agent.

c) Types of Reinforcement Learning






Positive reinforcement learning:

In this type of learning, the impact is positive on the actions taken by the agent. It maximizes the performance and sustains changes for a long period of time but too much of reinforcement leads to an overload of states which diminish the results

Negative reinforcement Learning:

In this type of learning, there is a strengthening of behavior because a negative condition is stopped or avoided. The minimum stand of performance gets defined by this method but there is one drawback that it provides enough to meet up the minimum behavior.

d) Learning models of Reinforcement

The two most popular algorithms used in reinforcement learning are :

 1) Markov Decision process :

Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where results are partly random and partly under the control of a decision-maker.

The following parameters are used to get a solution:

Actions- A

States -S

Reward- R

Policy- n

Value- V

And the mathematical approach for mapping a solution in reinforcement Learning is termed as a Markov Decision Process.

    2) Q-Learning :

It is a value-based approach to supply information to inform which action agent should take.

Q-Table is a lookup table where we calculate the maximum expected future rewards for action at each state. This table will guide us to the best action at each state.

e) Application and Challenges of Reinforcement Learning:

1) Applications:

Industrial automation using robotics

Bots in gaming

Aircraft and automobile control

Online recommendation

Bid Optimization

2) Challenges:

Realistic environments can have partial observability.

Too much reinforcement can lead to diminishing returns

Parameters can affect the speed of learning