In this subsection, we represent a semi-automated technique of TM using Spark. Using other options as defaults, we train LDA on the dataset downloaded from GitHub at https://github.com/minghui/Twitter-LDA/tree/master/data/Data4Model/test. However, we will use more well-known text datasets in the model reuse and deployment phase later in this chapter.
Topic modeling with Spark MLlib and Stanford NLP
Implementation
The following steps show TM from data reading to printing the topics, along with their term weights. Here's the short workflow of the TM pipeline:
object topicmodelingwithLDA {
def main(args: Array[String]): Unit = {
val lda =
new LDAforTM()
// actual computations are done here
val...