Loading images on to PySpark dataframes
We are now ready to begin importing our images into our notebook for classification.
Getting ready
We will be using several libraries and their dependencies in this section, which will require us to install the following packages through pip install
on the terminal within Ubuntu Desktop:
pip install tensorflow==1.4.1 pip install keras==2.1.5 pip install sparkdl pip install tensorframes pip install kafka pip install py4j pip install tensorflowonspark pip install jieba
How to do it...
The following steps will demonstrate how to decode images into a Spark dataframe:
- Initiate a
spark
session, using the following script:
spark = SparkSession.builder \ .master("local") \ .appName("ImageClassification") \ .config("spark.executor.memory", "6gb") \ .getOrCreate()
- Import the following libraries from PySpark to create dataframes, using the following script:
import pyspark.sql.functions as f import sparkdl as dl
- Execute the following script to create...