Word2Vec correlates words with words, while the purpose of Doc2Vec (also known as paragraph vectors) is to correlate labels with words. We will discuss Doc2Vec in this recipe. Documents are labeled in such a way that the subdirectories under the document's root represent document labels. For example, all finance-related data should be placed under the finance subdirectory. In this recipe, we will perform document classification using Doc2Vec.
Using Doc2Vec for document classification
How to do it...
- Extract and load the data using FileLabelAwareIterator:
LabelAwareIterator labelAwareIterator = new FileLabelAwareIterator.Builder()
.addSourceFolder(new ClassPathResource("label").getFile()).build();
- Create...