As always, we will start by setting up our data. In this case, the data is the messages received by our fantasy company, The Cake Factory. These are in the client_messages.RDS file that we created in Chapter 4, Simulating Sales Data and Working with Databases. The data contains 300 observations for 8 variables: SALE_ID, DATE, STARS, SUMMARY, MESSAGE, LAT, LNG, and MULT_PURCHASES. During this chapter, we will work with the MESSAGE and MULT_PURCHASES variables.
We will set up our seed to have reproducible results. Keep in mind that this should be before every function call that involves some randomization. We will show it just once here to save space and avoid repeating ourselves, but keep that in mind when you are trying to generate reproducible results:
set.seed(12345)
Next, we need to make sure that we don't have any missing data in...