StopWordsRemover is a Transformer that takes a String array of words and returns a String array after removing all the defined stop words. Some examples of stop words are I, you, my, and, or, and so on which are fairly commonly used in the English language. You can override or extend the set of stop words to suit the purpose of the use case. Without this cleansing process, the subsequent algorithms might be biased because of the common words.
In order to invoke StopWordsRemover, you need to import the following package:
import org.apache.spark.ml.feature.StopWordsRemover
First, you need to initialize a StopWordsRemover , specifying the input column and the output column. Here, we are choosing the words column created by the Tokenizer and generate an output column for the filtered words after removal of stop words:
scala> val remover = new StopWordsRemover(...