For this mini deployment, let's use a real-life dataset: PubMed. A sample dataset containing PubMed terms can be downloaded from: https://nlp.stanford.edu/software/tmt/tmt-0.4/examples/pubmed-oa-subset.csv. This link actually contains a dataset in CSV format but has a strange name, 4UK1UkTX.csv.
To be more specific, the dataset contains some abstracts of some biological articles, their publication year, and the serial number. A glimpse is given in the following figure:
Figure 6: A snapshot of the sample dataset
In the following code, we have already saved the trained LDA model for future use as follows:
params.ldaModel.save(spark.sparkContext, "model/LDATrainedModel")
The trained model will be saved to the previously mentioned location. The directory will include data and metadata about the model and the training...