Looking at pipelines for real-world data-processing scenarios
This section will explore a number of real-world ETL scenarios and the corresponding Logstash pipelines that can be used to implement them.
The full datasets used and the pipeline configuration files can be found in the code repository for the book.
Loading data from CSV files into Elasticsearch
Comma-Separated Value (CSV) files are a commonly used file format and can be easily generated by a range of source systems and tools. We will explore how a dataset containing taxi trip details from the city of Chicago can be parsed and loaded into Elasticsearch for analysis.
Navigate to Chapter7/processing-csv-files
in the code repository and explore the chicago-taxi-data.csv
file. The first row contains header information, indicating what information each column contains. The following screenshot is an extract of some of the key fields in the file: