Accessing data
By now we already have spent a lot of time describing how to read and write data. But there is much more to that: data often comes in different formats such as CSV, HTML, or JSON or it can be stored in a database. Knowing how to access and process this data is important for Data Science and now we will describe in detail how to do it for the most common data formats and sources.
Text data and CSV
We already have spoken about reading text data in great detail, and it can be done, for example, using the Files
helper class from the NIO API or IOUtils
from Commons IO.
CSV (Comma Separated Values) is a common way to organize tabular data in plain text files. While it is possible to parse CSV files by hand, there are some corner cases, which make it a bit cumbersome. Luckily, there are nice libraries for that purpose, and one of them is Apache Commons CSV:
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-csv</artifactId> ...