By Aykut Bulgu February 21, 2024

Integrating Apache Kafka with InfluxDB

We are in the cloud-native era. Applications or services count is increasing day by day as the application structures transform into microservices or serverless. The data that is generated by these applications or services keep growing and processing these data in a real-time manner becomes more important.

This processing can either be a real-time aggregation or a calculation whose output is a measurement or a metric. When it comes to metrics or measurements, they have to be monitored because in the cloud-native era, you have to be fail-fast. This means the sooner you get the right measurements or metric outputs, the sooner you can react and solve the issues or do the relevant changes in the system. This aligns with the idea that is stated in the book Accelerate: The Science of Lean Software and DevOps, which is initially created as a state of DevOps report, saying “Changes both rapidly and reliably is correlated to organizational success.”

A change in a system, can be captured to be observed in many ways. The most popular one, especially in a cloud-native environment, is to use events. You might already heard about the even-driven systems, which became the de facto standard for creating loosely-coupled distributed systems. You can gather your application metrics, measurements or logs in an event-driven way (Event Sourcing, CDC, etc. ) and send them to an messaging backbone to be consumed by another resource like a database or an observability tool.

In this case, durability and performance is important and the traditional message brokers don’t have these by their nature.

Apache Kafka, on the contrary, is a durable, high-performance messaging system, which is also considered as a distributed stream processing platform. Kafka can be used for many use-cases such as messaging, data integration, log aggregation -and most importantly for this article- metrics.

When it comes to metrics, having only a message backbone or a broker is not enough. You have the real-time system but systems like Apache Kafka, even if they are durable enough to keep your data forever, are not designed to run queries for metrics or monitoring. However, systems such as InfluxDB, can do both for you.

InfluxDB is a database, but a time-series database. Leaving the detailed definition for a time-series database for later in this tutorial, a time-series database keeps the data in a time-stamped way, which is very suitable for metrics or events storage.

InfluxDB provides storage and time series data retrieval for the monitoring, application metrics, Internet of Things (IoT) sensor data and real-time analytics. It can be integrated with Apache Kafka to send or receive metric or event data for both processing and monitoring. You will read about these in more detail in the following parts of this tutorial.

In this tutorial, you will:

Run InfluxDB in a containerized way by using Docker.
Configure a bucket in InfluxDB.
Run an Apache Kafka cluster in Docker containers by using Strimzi container images.
Install Telegraf, which is a server agent for collecting and reporting metrics from many sources such as Kafka.
Configure Telegraf to integrate with InfluxDB by using its output plugin.
Configure Telegraf to integrate with Apache Kafka by using its input plugin.
Run Telegraf by using the configuration created.

Click >>this link<< to read the related tutorial from the InfluxDB Blog.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Integrating Apache Kafka with InfluxDB

Don't miss my next article!

Related Posts

How to Secure Your Apache Kafka Instance with SASL-SSL Authentication and Encryption

Testing Cloud Native Kafka Applications with Testcontainers

A day in a coffee shop: Building event-driven systems with structure

Change Data Capture with CockroachDB and Strimzi

Strimzi Kafka CLI Version Update (Strimzi 0.26.1 – 0.28.0)

Installing Strimzi CLI with Homebrew

A Cheat Sheet for Strimzi Kafka CLI

Creating a Kafka Connect Cluster on Strimzi by using Strimzi CLI

Aykut Bulgu