Perform the following steps to train the NTM model:
- Read the processed ABC News Headlines dataset from the output folder on the designated S3 bucket, as follows:
abcnews_df = pd.read_csv(os.path.join('s3://', s3_output_bucket, f.key))
We use the read_csv() function from the pandas library to read the processed news headlines into a DataFrame. The DataFrame contains 110,365 headlines and 200 words.
- Then, we split the dataset into three parts—train, validation, and test—as follows:
vol_train = int(0.8 * abcnews_csr.shape[0])
train_data = abcnews_csr[:vol_train, :]
test_data = abcnews_csr[vol_train:, :]
vol_test = test_data.shape[0]
val_data = test_data[:vol_test//2, :]
test_data = test_data[vol_test//2:, :]
In the preceding code block, we take 80% of the data for training, 10% for validation, and the remaining...