Understanding Deep Reinforcement Learning (DRL) : Part XI of XI

In the first part of the article, we reviewed the field of Deep Reinforcement Learning (DRL) and started exploring some fundamental terminology in the second and third parts. In the fourth part, we explored the temporal credit assignment problem, and the fifth part focused on the exploration vs. exploitation trade-off. In the sixth part, we explored the Markov decision process (MDP) and its role in reinforcement learning.

We started exploring a detailed example in the seventh part that we continued in the eighth part. In the ninth part, we explored the role of reward functions. In the tenth part, we explored the role of time horizons in MDPs and the concept of discounting. In this final part, we will conclude our discussion of Deep Reinforcement Learning.

MDP Extensions

As we can assume, the real-world problems that we want to solve leveraging DRL do not fit concretely within the mould of conventional MDP. To accomodate this, there are many extensions to MDPs. We will review some of them here but we have to keep in perspective the fact that this is not an exhaustive list of extensions.

Scenario: Agent is unable to fully observe the environment state.

Extension: Partially Observable Markov Decision Process (POMDP)

Scenario: Really large MDPs

Extension:: Factored Markov Decision Process (FMDP) can represent transition and reward function more compactly

Scenario: : One of the key elements, like action, time, state, or a combination of these are continuous

Extension: Continuous Markov Decision Process (CMDP)

Scenario: Both probabilistic and relational knowledge is involved

Extension: Relational Markov Decision Process (RMDP)

Scenario: Abstract actions that can take multiple time steps to complete are involved

Extension: Semi-Markov Decision Process (SMDP)

Scenario: Multiple agents are involved in the same environment

Extension: Multi-Agent Markov Decision Process (MMDP)

Scenario: Multiple agents need to collaborate and maximize a common reward

Extension: Decentralized Markov Decision Process (Dec-MDP)

Conclusion

In the previous ten parts of this article series, we explored the components of any reinforcement learning problem and understood how they relate and interact with each other. We also introduced Markov Decision Process (MDP) and understood what the process entails and how it works. Then, through an example, though a simple one, explored how we can represent sequential decision-making problems as MDPs.In tandem with deep learning, reinforcement learning, in my opinion, is the answer to making decision-making automation opportunities that exist in planning areas, across functions. Not only can these algorithms bring productivity and accuracy that is not possible by humans alone, but they can also find the most optimal solution.


Leave a comment