Logstash is basically used for data pipelining, through which we can take input from different sources and output to different data sources. Using Logstash, we can clean the data through filter options and mutate the input data before sending it to the output source. Logstash has different adapters to handle different applications, such as for MySQL or any other relational database connection. We have a JDBC input plugin through which we can connect to MySQL server, run queries, and take the table data as the input in Logstash. For Elasticsearch, there is a connector in Logstash that gives us the option to seamlessly transfer data from Logstash to Elasticsearch.
To run Logstash, we need to install Logstash and edit the configuration file logstash.conf, which consists of an input, output, and filter sections. We need to tell Logstash where it should get the input from through the input block, what it should do with the input through the filter block, and where it should send the output through the output block. In the following example, I am reading an Apache Access Log and sending the output to Elasticsearch:
input {
file {
path => "/var/log/apache2/access.log"
}
}
filter {
grok {
match => { message => "%{COMBINEDAPACHELOG}" }
}
}
output {
elasticsearch {
hosts => "http://127.0.0.1:9200"
index => "logs_apache"
document_type => "logs"
}
}
The input block is showing a file key that is set to /var/log/apache2/access.log. This means that we are getting the file input and path of the file, /var/log/apache2/access.log, which is Apache's log file. The filter block is showing the grok filter, which converts unstructured data into structured data by parsing it.
There are different patterns that we can apply for the Logstash filter. Here, we are parsing the Apache logs, but we can filter different things, such as email, IP addresses, and dates.