Common Input Files
Let's learn about the types of input files that are commonly used to exchange data between systems and the ways to convert them into various big data file formats. This section will also provide you with the programming skills required to transform these input files for the big data environment.
CSV – Comma-Separated Values
A CSV is a text file used to store tabular data separated by a comma. CSV is row-based data storage where each row is separated by a new line. For the exchange of tabular data, CSV files are frequently used.
The first row or header row of CSV files contains the schema detail, that is, column names for the data but not the type of data. CSV files fail to represent relational data, which means that a common column in multiple files does not have any relationship or hierarchy. Foreign keys are stored in columns of one or more files, but the CSV format itself does not express the linkage between these files.
The following figure...