Visualizing data on HDFS - parameterizing inputs
Once we start the service, we can point our browser to http://localhost:8080
(change the port as per your modified port configuration) to view the Zeppelin UI. Zeppelin organizes its contents as notes and paragraphs. A note is simply a list of all the paragraphs on a single web page.
Using data from HDFS simply means that we point to the HDFS location instead of the local file system location. Before we consume the file from HDFS, let's quickly check the Spark version that Zeppelin uses. This can be achieved by issuing sc.version
on a paragraph. The sc
variable is an implicit variable representing the SparkContext inside Zeppelin, which simply means that we need not programmatically create a SparkContext within Zeppelin:
sc.version
res0: String = 1.6.0
Let's load the sample file profiles.json
, convert it into a DataFrame, and print the schema and the first 20 rows (show) for verification. Let's also finally register the DataFrame as a...