How to send data from Apache Kafka to New Relic

Introduction

In today's fast-paced digital landscape, Apache Kafka's real-time data streaming is pivotal for businesses aiming to harness instantaneous insights. On the flip side, New Relic, with its intuitive dashboards, enhances observability, allowing teams to monitor and fine-tune their systems effectively. Combining the strengths of Kafka and New Relic can transform the way organizations approach data, ensuring timely decisions and optimized user experiences. This guide aims to provide data engineers with a step-by-step process to seamlessly integrate these two powerhouses, unlocking a myriad of benefits and use cases for enhanced operational efficiency.

Before we dive into the integration specifics, let’s understand more about the tools we are going to integrate and what we can achieve by integrating these tools.

Understanding Apache Kafka and New Relic

What is Apache Kafka?

Apache Kafka, at its essence, is an open source distributed event streaming platform. What does this mean? Picture a high-capacity conveyor belt, continuously transporting information from multiple sources to multiple destinations. That's Kafka in the realm of data. Designed by the team at LinkedIn and later open-sourced, Kafka has since taken the world of real-time data processing by storm.

Let’s understand what are the core functionalities provided by Kafka.

The core functionalities of Apache Kafka include:

  • Publish and Subscribe: Data is published to topics by producers and read from topics by subscribers, ensuring a system where data sources and recipients remain decoupled.
  • Data Storage: Beyond real-time transmission, Kafka retains large datasets for a set duration, allowing repeated access by multiple applications.
  • Stream Processing: With Kafka Streams, data can be processed and transformed in real-time during transit.
  • Fault Tolerance and Scalability: Kafka's design ensures resilience and high availability. It can expand by adding more nodes, meeting growing data demands.

Now that you understand the key features, let’s understand what Kafka can be used for.

Key use cases for Kafka include:

  • Real-Time Analytics: Enables immediate insights from current data streams.
  • Event Sourcing: Captures changes systematically, ensuring efficient system recovery post-failures.
  • Log Aggregation: Centralizes logs from varied sources, promoting uniform access.
  • Stream Processing: Powers real-time data manipulation for various applications.
  • Data System Integration: Connects effortlessly with databases, CRMs, and cloud platforms, reinforcing its centrality in data architectures.

What is New Relic?

Launched in 2008, New Relic is an observability platform, popularly used for application performance monitoring (APM). At its core, New Relic is designed to provide unparalleled software observability, granting teams an in-depth, real-time perspective into their applications' functioning.

Its suite of features is vast and varied: from application performance insights to real-user monitoring, infrastructure health checks, and beyond, all presented in intuitive dashboards. The platform is particularly renowned for its capabilities in diagnosing performance bottlenecks, ensuring backend optimization, and providing actionable insights to maintain top-notch user experiences.

Businesses spanning various sectors and sizes have tapped into New Relic's offerings. Whether it's to monitor the intricacies of microservices, capture and aggregate telemetry data, or simply to enhance system responsiveness, New Relic has proven time and again its indispensability in modern software development and IT operations.

Why send data from Apache Kafka to New Relic?

Apache Kafka excels as a centralized hub for real-time data streaming. Pairing Kafka with New Relic might generally be done for two cases:

1. Monitoring Kafka Service: Streaming Kafka service health metrics (such as throughput, and latency, etc.) to New Relic offers a granular understanding of Kafka's operational dynamics. Such insights pave the way for prompt rectifications, ensuring that businesses harnessing real-time data experience uninterrupted and optimized data streams.

2. Sending monitoring data collected in Kafka from other sources: Many businesses relay varied data, from application logs to performance metrics, into Kafka. Leveraging this data, already being ingested in Kafka topics, for New Relic offers a twofold advantage. First, it removes the need for redundant instrumentation, thereby minimizing overheads. Second, there are scenarios where direct instrumentation of specific data into New Relic is challenging, if not impossible. By routing this data via Kafka, businesses can overcome these limitations, ensuring that New Relic receives a comprehensive dataset to work with.

In essence, the synergy of Kafka and New Relic ensures an end-to-end observability spectrum. Whether it's Kafka's inherent metrics or the diverse data streams it manages, New Relic's integration offers a bird's-eye view, enabling organizations to preempt challenges and maximize data-driven insights.

Sending Data from Apache Kafka to New Relic using Custom API Integration

Integrating Apache Kafka with New Relic can provide you observability for your Kafka service or other sources that stream log, performance, or other monitoring data to Kafka topics.

There are different ways of integrating Kafka with New Relic such as:

  • Kafka Connect with New Relic Connectors: Kafka Connect is a tool provided by Kafka to connect Kafka with various systems, including New Relic. The New Relic team has developed a Kafka Sink connector to send data from Kafka to New Relic. You will be able to develop a unified observability using the existing data you have in your Kafka from various sources.
  • New Relic’s On-Host Integration: On-host integrations are the integrations that work in tandem with New Relic’s infrastructure agent. These integrations collect data from supported services and use the infrastructure agent to send that data to New Relic. Configuring Kafka on-host integration, allows you to report Kafka metrics and config data from Kafka service. This can also work with instances that are run via Kubernetes/Docker or Amazon ECS.
  • Custom API Integration: Build a custom solution where you use Kafka producers to send data to New Relic using New Relic’s APIs. This approach offers more flexibility but requires more development effort.
  • Indirect integrations: If you have some monitoring service setup already for your Kafka, you can simply integrate those services with New Relic. For example if you’re using Prometheus to monitor metrics for your Kafka, you can use Prometheus Remote Write integration to connect that data with New Relic and instantly get features such as dashboards, alerts, etc.
  • Middleware Solutions: There are third-party middleware tools that can act as intermediaries, handling the data transformation and flow between Kafka and New Relic.

We are choosing a Kafka Connect integration approach in this tutorial to cover the use cases where you want to utilize the data in kafka topics for observability. Kafka Connect framework is included in Apache Kafka and facilitates building connectors that stream data between Apache Kafka and other systems.

Let’s look at the steps in integrating New Relic with Kafka using Kafka Connect New Relic connector.

1. Prerequisites

Before diving into the integration process, ensure you have the following prerequisites in place. Setting up these tools correctly is essential for the successful integration of Apache Kafka with New Relic:

Configuring New Relic

If you don't have a New Relic account, you can sign up for a free Developer Edition which provides access to a full set of New Relic features, including API access.

Configuring Apache Kafka

Ensure that your Kafka cluster is running and healthy. This includes the Kafka brokers, ZooKeeper instances, and any other necessary infrastructure.

Also, you should have a pre-defined topic from which data will be fetched. Data pushed to this topic will be what's sent to New Relic. Then decide on a unique consumer group name for this integration, ensuring that offsets are properly managed.

Read Kafka documentation to understand various concepts related to Kafka. The guide from here on, assumes that you have started the Kafka broker.

2. Configuring the New Relic Connector for Kafka Connect

You can configure the Kafka Connect New Relic connector by downloading the file directly and copy the contents to `<kafka-home>/connect-plugins`.

Alternatively if you have Confluent Hub CLI, you may use that to install the New Relic connector by running following command

SH
confluent-hub install confluentinc/newrelic-kafka-connector:<version>

Replace `<version>` with the desired or latest connector version

Alternatively, if you do not wish to use this connector, you can create your own custom Sink connector by following this guide.

3. Configuring the New Relic Connector

You should configure your connector with one of the following classes depending on the type of telemetry you are sending:

  • `com.newrelic.telemetry.events.EventsSinkConnector`
  • `com.newrelic.telemetry.logs.LogsSinkConnector`
  • `com.newrelic.telemetry.metrics.MetricsSinkConnector`

All of the connectors expect either structured data with a schema (usually provided by the Avro, Protobuf, or JSON w/ Schema convertors), or a Java Map (usually provided by the schemaless JSON converter).

You’ll need a configuration file for that. Let’s create a configuration file, `newrelic-events-sink-connector.properties`, with the following content:

SH
name=newrelic-events-sink-connector
# switch to com.newrelic.telemetry.logs.LogsSinkConnector or com.newrelic.telemetry.metrics.MetricsSinkConnector
connector.class=com.newrelic.telemetry.events.EventsSinkConnector
# configure this based on your workload
tasks.max=1
topics=my-topic
api.key=<api-key>
# messages are stored in schemaless json on the topic
# you could use Avro, Protobuf, etc here as well
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
# declare the transformations
transforms=inserttimestamp,eventtype,flatten
#Insert the timestamp from the Kafka record
transforms.inserttimestamp.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.inserttimestamp.timestamp.field=timestamp
# we know all events on this topic represent a purchase, so set 'eventType' to 'purchaseEvent'
transforms.eventtype.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.eventtype.static.field=eventType
transforms.eventtype.static.value=purchaseEvent
# flatten all nested json fields, using . as a delimeter
transforms.flatten.type=org.apache.kafka.connect.transforms.Flatten\$Value
transforms.flatten.delimiter=.

Read these documentation for this connector to learn more about each of these configuration,

4. Starting the Connector

Run Kafka Connect with the New Relic sink connector configuration:

SH
connect-standalone.sh /path/to/connect-standalone.properties /path/to/newrelic-events-sink-connector.properties

5. Monitor data flow

Once the connector is running, data produced to the specified Kafka topic should start flowing into New Relic. You can monitor the logs of the Kafka Connect worker for any errors or issues.

Note, this is just a simple example. For production systems, you should test thoroughly and mind the New Relic API request limits.

By following this guide, you can seamlessly funnel real-time Kafka data into New Relic.

Always test in a non-production environment first to ensure data integrity and proper integration.

Conclusion

In our discussion about linking Apache Kafka and New Relic, we highlighted the many benefits of bringing these tools together. Not only can we keep an eye on how Kafka is performing, but we can also use the rich data flowing through Kafka for observability. We provided a step-by-step guide on how to make this connection using the Kafka Connect plugin, specifically the New Relic Sink Connector. This guide ensures businesses can easily send data from Kafka topics straight to New Relic. We also mentioned other types of possible integrations and provided resources for the same. In today's data-driven world, having straightforward and helpful instructions is key, helping companies make timely decisions and improve their operations.

Don't want to go through the pain of direct integration? RudderStack's Apache Kafka Source integration makes it easy to send data from Apache Kafka Source to New Relic.