Lifting the hood
In the last section of this chapter, we will discuss, very briefly, how Spark works internally. For a more detailed discussion, see the References section at the end of the chapter.
When you open a Spark context, either explicitly or by launching the Spark shell, Spark starts a web UI with details of how the current task and past tasks have executed. Let's see this in action for the example mutual information program we wrote in the last section. To prevent the context from shutting down when the program completes, you can insert a call to readLine
as the last line of the main
method (after the call to takeOrdered
). This expects input from the user, and will therefore pause program execution until you press enter.
To access the UI, point your browser to 127.0.0.1:4040
. If you have other instances of the Spark shell running, the port may be 4041
, or 4042
and so on.
The first page of the UI tells us that our application contains three jobs. A job occurs as the result of an action...