By now we already have spent a lot of time describing how to read and write data. But there is much more to that: data often comes in different formats such as CSV, HTML, or JSON or it can be stored in a database. Knowing how to access and process this data is important for Data Science and now we will describe in detail how to do it for the most common data formats and sources.
Accessing data
Text data and CSV
We already have spoken about reading text data in great detail, and it can be done, for example, using the Files helper class from the NIO API or IOUtils from Commons IO.
CSV (Comma Separated Values) is a common way to organize tabular data in plain text files. While it is possible to parse CSV files by hand, there are some corner cases, which make it a bit...