In this chapter, you will learn how to create document summaries. We will begin by removing parts of documents that should not be considered and tokenizing the remaining text. Next, we will apply embeddings and create clusters. These clusters will then be used to make document summaries. Also, we will learn how to use restricted Boltzmann machines (RBMs) as building blocks to create deep belief networks for topic modeling. We will begin with coding the RBM and defining the Gibbs sampling rate, contrastive divergence, and free energy for the algorithm. We will conclude by compiling multiple RBMs to create a deep belief network.
This chapter covers the following topics:
- Formatting data using tokenization
- Cleaning text to remove noise
- Applying word embeddings to increase usable data
- Clustering data into topic groups
- Summarizing...