Clurgo
Blog
How to aggregate logs with ELK and Beats?

How to aggregate logs with ELK and Beats?

11/9/2021

This is the second post in my series on log aggregation with the ELK stack. In the first one, I explained how to configure ELK. Today, I’m going to tell you how you can use specific tools to collect data for it.

Once configured, ELK gives us a set of tools to collect logs. The problem is we don’t have the logs yet. They’re still scattered across different locations, and we need to gather them in order to take full advantage of ELK’s powers. In this article, I will explain how to do it by using two tools: filebeat and metricbeat.

Filebeat and Metricbeat

Filebeat is one of several tools in the Beats family. As the name suggests, it can be used, among other things, for automatic log data collection. You might also send logs right into Elastic or Logstash, but this approach comes with particular downsides. One is that you risk losing the logs during a network failure or when you update an ELK element and for some reason cannot apply the blue-green strategy.

Filebeat has enormous capabilities. Apart from the modules included in the basic version, you can also use it with custom-made ones, based on libbeat.

Metricbeat, on the other hand, is used for collecting various types of metrics. At the beginning of this post, I wrote about collecting and aggregating logs. Now I am going to discuss metrics in an analogous way. They provide information on system status that can help decide between vertical and horizontal scaling.

Configuration

We are going to use Ansible once again, but this time all the necessary tools will be installed on a virtual machine. For testing, we will connect system and Docker logs. Collecting logs directly from containers is one way to feed Elastic with data. Another way is logging to a file from a container, mounting a directory as a volume, and then configuring filebeat to be able to read the files in the volume.

Let’s start by configuring filebeat’s role. All the code resides in this repository. I have already discussed Ansible role syntax in the previous article, so I am going to skip that here for clarity’s sake.

- name: install filebeat
apt:
deb: https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-{{ filebeat.version }}-amd64.deb

- name: deploy filebeat configuration
copy:
src: filebeat.yml
dest: "{{ filebeat.conf_dir }}/filebeat.yml"

- name: restart filebeat
systemd:
daemon_reload: yes
name: filebeat
state: restarted

Let’s break down what happens above: we install filebeat from the package at elastic.co, then we deploy configuration, and finally we restart service so that the changes are applied.

filebeat.inputs[1]:
- type: log
paths:
- /var/log/syslog
- type: docker
combine_partial: true
containers:
path: /var/lib/docker/containers
stream: stdout
ids:
- '*'
output.logstash[2]:
hosts: ['127.0.0.1:5044']

Now, on to configure inputs. Here, we are interested in two input types: log and docker. For the former, we only provide the path to the system log, but for the latter – container paths, IDs, and stream types. To keep things simple, let’s just assume that’s how it is “supposed to be.” If you’d like to know more, please refer to the official documentation.

The role for metricbeat looks nearly identical:

- name: install metricbeat
apt:
deb: https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-{{ metricbeat.version }}-amd64.deb

- name: deploy metricbeat configuration
copy:
src: metricbeat.yml
dest: "{{ metricbeat.conf_dir }}/metricbeat.yml"

- name: restart metricbeat
systemd:
daemon_reload: yes
name: metricbeat
state: restarted

But the configuration is a bit different:

metricbeat.modules:
- module: docker[1]
metricsets:
- 'container'
- 'cpu'
- 'diskio'
- 'event'
- 'healthcheck'
- 'info'
- 'memory'
- 'network'
hosts: [ "unix:///var/run/docker.sock" ][2]
period: 10s[3]
enabled: true[4]

output.logstash:
hosts: ['127.0.0.1:5044']

Here we are configuring one module – docker[1]. We connect to socket[2], we set the interval for data collecting to 10 seconds[3], and of course, enable the module[4]. Finally, just like with filebeat, we set the output to Logstash.

Configuring Kibana

Data is now flowing to Elastic, but we still need to configure Kibana in order to see it. To that end, let’s open our browser and head to our virtual machine’s IP address. If you are using Vagrant (from the linked repository) with default settings, the address is http://172.168.0.2:5601/.

If you are running Kibana for the first time, it will display the following welcome screen:

Go ahead and click “Explore on my own.” The welcome screen will disappear. Now, open the side menu (the button in the top left corner), scroll down and select “Stack management.”

On the next screen, select “Index Patterns.” Even though Elasticsearch is already collecting data, we still need to view it and that’s what we need an index pattern for.

After clicking the “Index Pattern,” we are moved to another screen. Here, select “Create index pattern.”

If you have configured merticbeat and filebeat correctly, you will see a list of available indexes. Now, define an index pattern in the search bar – specifically, a pattern that will match a single source. For example, filebeat* will include the index filebeat-7.12.1. Now let’s click Next step.

In the next form, open the dropdown menu and select the timestamp that will let Kibana identify the main one. Because there may be more timestamp fields than one, we need to define the main field unequivocally. For filebeat and metricbeat, the field’s name will just be @timestamp, but it doesn’t need to be for other sources.

Now confirm the operation by clicking Create index pattern, and you should see a screen with all searchable fields.

For metricbeat, the configuration looks exactly the same. If you have done everything right, we can now look at our data. Open the side menu again (top left corner), select Discover in the Analytics section, and you should be able to view the following:

At this point, we have configured ELK and enabled syslogs, container logs, and Docker metrics collecting. I encourage you to experiment with Kibana and create dashboards from the collected data. Filebeat and metricbeat are now able to collect data from various sources. We can also set data collecting from HTTP endpoints at custom intervals. With a little bit of effort, we can now create a dashboard with information on weather, currency, or even stock prices.

Summary

An alternative solution could be Graylog, which also utilizes Elasticsearch, but Kibana and Logstash enjoy continued popularity. Collecting data from various sources, often distributed, is immensely helpful in IT systems administration and beyond. However, the information in this tutorial just skim the surface of what both ELK and the beats family can do.

How to aggregate logs with ELK and Beats?

Filebeat and Metricbeat

Configuration

Configuring Kibana

Summary

See also

A Comprehensive Guide to Full-Text Search with Hibernate Search in Relational Databases

ELK Stack configuration using Ansible