Let's continue with the IMDb data and put into practice the ideas from the previous sections. In this section, we will use a few familiar packages, like tidytext, plyr and dplyr, as well as the excellent text2vec by Dimitriy Selivanov, which was released in 2017, and the well-known caret package by Max Kuhn.
Sentiment analysis from movie reviews
Data preprocessing
We need to prepare our data for the algorithm.
First, a few imports that will be necessary:
library(plyr)
library(dplyr)
library(text2vec)
library(tidytext)
library(caret)
We will use the IMDb data as before:
imdb <- read.csv("./data/labeledTrainData.tsv", encoding = "utf-8", quote = "", sep="\t", stringsAsFactors = F)...