Chapter 1. Obtaining and Cleaning Data
In this chapter, we will cover the following recipes:
- Retrieving all file names from hierarchical directories using Java
- Retrieving all file names from hierarchical directories using Apache Commons IO
- Reading contents from text files all at once using Java 8
- Reading contents from text files all at once using Apache Commons IO
- Extracting PDF text using Apache Tika
- Cleaning ASCII text files using Regular Expressions
- Parsing Comma Separated Value files using Univocity
- Parsing Tab Separated Value files using Univocity
- Parsing XML files using JDOM
- Writing JSON files using JSON.simple
- Reading JSON files using JSON.simple
- Extracting web data from a URL using JSoup
- Extracting web data from a website using Selenium
Webdriver
- Reading table data from MySQL database