Logistic activation functions and classifiers
Now that the value of each location of L = {l1, l2, l3, l4, l5, l6} contains its availability in a vector, the locations can be sorted from the most available to the least available location. From there, the reward matrix, R, for the MDP process described in Chapter 1, Getting Started with Next-Generation Artifcial Intelligence through Reinforcement Learning, can be built.
Overall architecture
At this point, the overall architecture contains two main components:
- Chapter 1: A reinforcement learning program based on the value-action Q function using a reward matrix that will be finalized in this chapter. The reward matrix was provided in the first chapter as an experiment, but in the implementation phase, you'll often have to build it from scratch. It sometimes takes weeks to produce a good reward matrix.
- Chapter 2: Designing a set of 6×1 neurons that represents the flow of products at a...