Example on Structured Streaming
In this example, we will be looking at how we can leverage knowledge we have acquired on Structured Streaming throughout the previous sections. We will simulate an incoming stream of data by using one of the example datasets in which we have small JSON files that, in real scenarios, could be the incoming stream of data that we want to process. We will use these files in order to compute metrics such as counts and windowed counts on a stream of timestamped actions. Let's take a look at the contents of the structured-streaming
example dataset, as follows:
%fs ls /databricks-datasets/structured-streaming/events/
You will find that there are about 50 JSON files in the directory. You can see some of these in the following screenshot:
We can see what one of these JSON files contains by using the fs head
option, as follows:
%fs head /databricks-datasets...