Summary
There are a few steps for approaching any data science problem, and the data preparation step is one of the first. The standard Java API has a tremendous number of tools that make this task possible, and there are a lot of libraries that make it a lot easier.
In this chapter, we discussed many of them, including extensions to the Java API such as Google Guava; we talked about ways to read the data from different sources such as text, HTML, and databases; and finally we covered the DataFrame, a useful structure for manipulating tabular data.
In the next chapter, we will take a closer look at the data that we extracted in this chapter and perform Exploratory Data Analysis.