So far, we have extracted elements from text, added metadata, and created term clusters to discover latent topics. We will now identify latent features by using a deep learning model known as an RBM. As you may recall, we have discovered latent topics in the text by looking for term co-occurrence within a given window size. In this case, we will go back to using a neural network approach. The RBM is half the typical neural network. Instead of taking data through hidden layers to an output layer, the RBM model just takes the data to the hidden layers and this is the output. The end result is similar to factor analysis or principal component analysis. Here, we will begin the process of finding each of the 20 Newsgroups in the dataset and throughout the rest of this chapter, we will make modifications to the model to improve its performance.
To get started with building...