To create features in a big data environment, we will use PySpark to write data preprocessing logic. This logic will be part of the Python abcheadlines_processing.py file. Before we review the logic, we need to walk through some prerequisites.
Creating features using Amazon Glue and SparkML
Walking through the prerequisites
- Provide SageMaker Execution Role access to the Amazon Glue service, as follows:
Obtaining a SageMaker Execution Role by running the get_execution_role() method of the SageMaker session object
- On the IAM Dashboard, click on Roles on the left navigation pane and search for this role. Click on the Target Role to navigate to its Summary page. Click on the Trust Relationships tab to add AWS Glue as an additional...