Training on Amazon's EMR infrastructure
We are going to use Amazon's Elastic Map Reduce (EMR) infrastructure to run our parsing and model building jobs.
In order to do that, we first need to create a bucket in Amazon's storage cloud. To do this, open the Amazon S3 console in your web browser by going to http://console.aws.amazon.com/s3 and click on Create Bucket
. Remember the name of the bucket, as we will need it later.
Right-click on the new bucket and select Properties. Then, change the permissions, granting everyone full access. This is not a good security practice in general, and I recommend that you change the access permissions after you complete this chapter. You can use advanced permissions in Amazon's services to give your script access and also protect against third parties viewing your data.
Left-click the bucket to open it and click on Create Folder. Name the folder blogs_train
. We are going to upload our training data to this folder for processing on the cloud.
On your computer...