Now that the value of each location of L={l1,l2,l3,l4,l5,l6} contains its availability in a vector, the locations can be sorted from the most available to least available location. From there, the reward matrix for the MDP process described in the first chapter can be built.
Logistic activation functions and classifiers
Overall architecture
At this point, the overall architecture contains two main components:
- Chapter 1: Become an Adaptive Thinker: A reinforcement learning program based on the value-action Q function using a reward matrix that is yet to be calculated. The reward matrix was given in the first chapter, but in real life, you'll often have to build it from scratch. This could take weeks to obtain.
- Chapter...