Answer to question 1: Do not confuse the Q in Q-learning with the Q-function we have discussed in the previous parts. The Q-function is always the name of the function that accepts states and actions and spits out the value of that state-action pair. RL methods involve a Q-function but are not necessarily Q-learning algorithms.
Answer to question 2: No worries as you can perform the training on a CPU backend too. In that case, just remove the entries for CUDA and cuDNN dependencies from the pom.xml file and replace them with the CPU ones. The properties would be:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<nd4j.version>1.0.0-alpha</nd4j.version>
<dl4j.version>1.0.0-alpha</dl4j.version>
<datavec.version...