Chapter 3. Data Exploration Using RethinkDB
Data exploration is the process of analyzing and refactoring structured or non-structured data and is commonly done before going onto actual data analysis. Operations such as performing a duplicate cleanup and finding whitespace data can be done at the data exploration stage.
We can keep data exploration as the pre-emptive operation before performing heavy-cost operations such as running various batches and jobs, which is quite expensive in computing, and finding irrelevant data in that stage would be painful.
Data exploration can be very useful in various scenarios. Suppose you have large dataset of DNA diversion of people living in New York or terabytes of data from NASA about Mars' temperature records. There is a huge possibility that the data is error prone. So, instead of directly uploading terabytes of data to the program written in R, we can try to make the data less error prone, which will surely process faster results.
Concepts...