In an MDP, the observable quantities are action, set A, the state, set S, transition model, T, and rewards, set R. This is not in case of Partially observable MDP, also known as POMDP. In a POMDP, there's an MDP inside that is not directly observable to the agent and takes the decision from whatever observations made.Â
In POMDP, there's an observation set, Z, containing different observable states and a observation function, O, which takes the s state and the z observation as inputs and outputs the probability of seeing that z observation in the s state.
POMDPs are basically a generalization of MDPs:
-
MDP: {S,A,T,R}
-
POMDP: {S,A,Z,T,R,O}
-
where, S, A, T ,and R are the same. Therefore, for a POMDP to be a true MDP, following condition:
, that is, fully observe all states
POMDP are hugely intractable...