3. Topic Modeling and Theme Extraction
Activity 3.01: Performing Topic Modeling on a Set of Documents with Unknown Topics
Solution:
- For this activity, we are going to use 1,000 movie review files. Navigate to the following link (or to your local directory where you have downloaded the GitHub files) to obtain the text data files that contain movie review comments: https://packt.live/3gISDZL. It is definitely better to download the GitHub repository rather than download 1,000 files by hand.
- Navigate to the S3 dashboard at https://s3.console.aws.amazon.com/s3/home.
- Click the bucket that you created earlier (in my case, it is "
aws-ml-input-for-topic-modeling-20200301
"): - Click
Create folder
: - Type
movie_review_files
and clickSave
:Note
For this step, you may either follow along with the exercise and type in the code in...