Skip to content

Swiftly build and enhance your Kafka Streams applications.

License

Notifications You must be signed in to change notification settings

michelin/kstreamplify

Repository files navigation

Kstreamplify

Kstreamplify

GitHub Build Maven Central Supported Java Versions Kafka Version Spring Boot Version GitHub Stars SonarCloud Coverage SonarCloud Tests License

OverviewGetting Started

Swiftly build and enhance your Kafka Streams applications.

Kstreamplify adds extra features to Kafka Streams, simplifying development so you can write applications with minimal effort and stay focused on business implementation.

Kstreamplify application

Table of Contents

Overview

Wondering what makes Kstreamplify stand out? Here are some of the key features that make it a must-have for Kafka Streams:

  • 🚀 Bootstrapping: Automatically handles the startup, configuration, and initialization of Kafka Streams so you can focus on business logic instead of setup.

  • 📝 Avro Serializer and Deserializer: Provides common Avro serializers and deserializers out of the box.

  • ⛑️ Error Handling: Catches and routes errors to a dead-letter queue (DLQ) topic.

  • ☸️ Kubernetes: Built-in readiness and liveness probes for Kubernetes deployments.

  • 🤿 Interactive Queries: Easily access and interact with Kafka Streams state stores.

  • 🫧 Deduplication: Remove duplicate events from your stream.

  • 🧪 Testing: Automatically sets up the Topology Test Driver so you can start writing tests right away.

Getting Started

Kstreamplify simplifies bootstrapping Kafka Streams applications by handling startup, configuration, and initialization for you.

Spring Boot

javadoc

For Spring Boot applications, add the following dependency:

<dependency>
    <groupId>com.michelin</groupId>
    <artifactId>kstreamplify-spring-boot</artifactId>
    <version>${kstreamplify.version}</version>
</dependency>

Then, create a KafkaStreamsStarter bean and override the KafkaStreamsStarter#topology() method:

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        // Define your topology here
    }

    @Override
    public String dlqTopic() {
        return "dlq_topic";
    }
}

Define all your Kafka Streams properties directly from the application.yml file, under the kafka.properties key:

kafka:
  properties:
    application.id: 'myKafkaStreams'
    bootstrap.servers: 'localhost:9092'
    schema.registry.url: 'http://localhost:8081'

You're now ready to start your Kstreamplify Spring Boot application.

Java

javadoc

For simple Java applications, add the following dependency:

<dependency>
    <groupId>com.michelin</groupId>
    <artifactId>kstreamplify-core</artifactId>
    <version>${kstreamplify.version}</version>
</dependency>

Then, create a class that extends KafkaStreamsStarter and override the KafkaStreamsStarter#topology() method:

public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        // Define your topology here
    }

    @Override
    public String dlqTopic() {
        return "dlq_topic";
    }
}

From your main method, create a KafkaStreamsInitializer instance and initialize it with your KafkaStreamsStarter child class:

public class MainKstreamplify {

    public static void main(String[] args) {
        KafkaStreamsInitializer myKafkaStreamsInitializer = new KafkaStreamsInitializer();
        myKafkaStreamsInitializer.init(new MyKafkaStreams());
    }
}

Define all your Kafka Streams properties in an application.yml file, under the kafka.properties key:

kafka:
  properties:
    application.id: 'myKafkaStreams'
    bootstrap.servers: 'localhost:9092'
    schema.registry.url: 'http://localhost:8081'
server:
  port: 8080

You're now ready to start your Kstreamplify Java application.

A few important notes:

  • A server.port is required to enable the web services.
  • The core dependency does not include a logger—be sure to add one to your project.

Unit Test

javadoc

Kstreamplify simplifies the use of the Topology Test Driver for testing Kafka Streams applications.

For both Java and Spring Boot applications, add the following dependency:

<dependency>
    <groupId>com.michelin</groupId>
    <artifactId>kstreamplify-core-test</artifactId>
    <version>${kstreamplify.version}</version>
    <scope>test</scope>
</dependency>

Create a test class that extends KafkaStreamsStarterTest. Override the getKafkaStreamsStarter() method to provide your custom to provide your KafkaStreamsStarter implementation.

public class MyKafkaStreamsTest extends KafkaStreamsStarterTest {
    private TestInputTopic<String, KafkaUser> inputTopic;
    private TestOutputTopic<String, KafkaUser> outputTopic;

    @Override
    protected KafkaStreamsStarter getKafkaStreamsStarter() {
        return new MyKafkaStreams();
    }

    @BeforeEach
    void setUp() {
        inputTopic = testDriver.createInputTopic("input_topic", new StringSerializer(),
            SerdesUtils.<KafkaUser>getValueSerdes().serializer());

        outputTopic = testDriver.createOutputTopic("output_topic", new StringDeserializer(),
            SerdesUtils.<KafkaUser>getValueSerdes().deserializer());
    }

    @Test
    void shouldUpperCase() {
        inputTopic.pipeInput("1", user);
        List<KeyValue<String, KafkaUser>> results = outputTopic.readKeyValuesToList();
        assertEquals("FIRST NAME", results.get(0).value.getFirstName());
        assertEquals("LAST NAME", results.get(0).value.getLastName());
    }

    @Test
    void shouldFailAndRouteToDlqTopic() {
        inputTopic.pipeInput("1", user);
        List<KeyValue<String, KafkaError>> errors = dlqTopic.readKeyValuesToList();
        assertEquals("1", errors.get(0).key);
        assertEquals("Something bad happened...", errors.get(0).value.getContextMessage());
        assertEquals(0, errors.get(0).value.getOffset());
    }
}

Override Properties

Kstreamplify uses default properties for the tests. You can provide additional properties or override the default ones by overriding the getSpecificProperties() method:

public class MyKafkaStreamsTest extends KafkaStreamsStarterTest {
    @Override
    protected Map<String, String> getSpecificProperties() {
        return Map.of(
            STATE_DIR_CONFIG, "/tmp/kafka-streams"
        );
    }
}

Avro Serializer and Deserializer

When working with Avro schemas, you can use the SerdesUtils class to easily serialize or deserialize records:

SerdesUtils.<MyAvroValue>getValueSerdes()

or

SerdesUtils.<MyAvroValue>getKeySerdes()

Here’s an example of how to use these methods in your topology:

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        streamsBuilder
            .stream("input_topic", Consumed.with(Serdes.String(), SerdesUtils.<KafkaUser>getValueSerdes()))
            .to("output_topic", Produced.with(Serdes.String(), SerdesUtils.<KafkaUser>getValueSerdes()));
    }
}

Error Handling

Kstreamplify makes it easy to handle errors and route them to a dead-letter queue (DLQ) topic.

Set up DLQ Topic

Override the dlqTopic() method and return the name of your DLQ topic:

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        // Define your topology here
    }

    @Override
    public String dlqTopic() {
        return "dlq_topic";
    }
}

Handling Processing Errors

To catch processing errors and route them to the DLQ, use the ProcessingResult class.

DSL

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        KStream<String, KafkaUser> stream = streamsBuilder
            .stream("input_topic", Consumed.with(Serdes.String(), SerdesUtils.getValueSerdes()));

        TopologyErrorHandler
            .catchErrors(stream.mapValues(MyKafkaStreams::toUpperCase))
            .to("output_topic", Produced.with(Serdes.String(), SerdesUtils.getValueSerdes()));
    }

    @Override
    public String dlqTopic() {
        return "dlq_topic";
    }

    private static ProcessingResult<KafkaUser, KafkaUser> toUpperCase(KafkaUser value) {
        try {
            value.setLastName(value.getLastName().toUpperCase());
            return ProcessingResult.success(value);
        } catch (Exception e) {
            return ProcessingResult.fail(e, value, "Something went wrong...");
        }
    }
}

The mapValues operation returns a ProcessingResult<V, V2>, where:

  • The first type parameter (V) represents the transformed value upon success.
  • The second type (V2) represents the original value if an error occurs.

To mark a result as successful:

ProcessingResult.success(value);

To mark it as failed:

ProcessingResult.fail(e, value, "Something went wrong...");

Use TopologyErrorHandler#catchErrors() to catch and route failed records to the DLQ topic. A healthy stream is returned and can be further processed as needed.

Processor API

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        TopologyErrorHandler.catchErrors(
            streamsBuilder.stream("input_topic", Consumed.with(Serdes.String(), Serdes.String()))
                .process(CustomProcessor::new)
            )
            .to("output_topic", Produced.with(Serdes.String(), Serdes.String()));
    }

    @Override
    public String dlqTopic() {
        return "dlq_topic";
    }

    public static class CustomProcessor extends ContextualProcessor<String, String, String, ProcessingResult<String, String>> {
        @Override
        public void process(Record<String, String> record) {
            try {
              context().forward(ProcessingResult.wrapRecordSuccess(record.withValue(record.value().toUpperCase())));
            } catch (Exception e) {
              context().forward(ProcessingResult.wrapRecordFailure(e, record.withValue(record.value()), "Something went wrong..."));
            }
        }
    }
}

The process operation forwards a ProcessingResult<V, V2>, where:

  • The first type parameter (V) represents the transformed value upon success.
  • The second type (V2) represents the original value if an error occurs.

To mark a result as successful:

ProcessingResult.wrapRecordSuccess(record);

To mark it as failed:

ProcessingResult.wrapRecordFailure(e, record, "Something went wrong...");

Use TopologyErrorHandler#catchErrors() to catch and route failed records to the DLQ topic. A healthy stream is returned and can be further processed as needed.

Production and Deserialization Errors

Kstreamplify also provides handlers to manage production and deserialization errors by forwarding them to the DLQ.

Add the following properties to your application.yml

kafka:
  properties:
    default.deserialization.exception.handler: 'com.michelin.kstreamplify.error.DlqDeserializationExceptionHandler'
    default.production.exception.handler: 'com.michelin.kstreamplify.error.DlqProductionExceptionHandler'

Avro Schema

The DLQ topic must have an associated Avro schema registered in the Schema Registry. You can find the schema here.

Uncaught Exception Handler

By default, uncaught exceptions will shut down the Kafka Streams client.

To customize this behavior, override the KafkaStreamsStarter#uncaughtExceptionHandler() method:

@Override
public StreamsUncaughtExceptionHandler uncaughtExceptionHandler() {
    return throwable -> StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.SHUTDOWN_APPLICATION;
}

Web Services

Kstreamplify exposes web services on top of your Kafka Streams application.

Topology

The /topology endpoint returns the Kafka Streams topology description by default. You can customize the path by setting the following property:

topology:
  path: 'custom-topology'

Interactive Queries

A set of endpoints is available to query the state stores of your Kafka Streams application. These endpoints leverage interactive queries and handle state stores across different Kafka Streams instances by providing an RPC layer.

The following state store types are supported:

  • Key-Value store
  • Timestamped Key-Value store
  • Window store
  • Timestamped Window store

Note that only state stores with String keys are supported.

Kubernetes

Readiness and liveness probes are exposed for Kubernetes deployment, reflecting the Kafka Streams state. These are available at /ready and /liveness by default. You can customize the paths by setting the following properties:

kubernetes:
  liveness:
    path: 'custom-liveness'
  readiness:
    path: 'custom-readiness'

TopicWithSerde API

Kstreamplify provides an API called TopicWithSerde that unifies all consumption and production points, simplifying the management of topics owned by different teams across multiple environments.

Declaration

You can declare your consumption and production points in a separate class. This requires a topic name, a key SerDe, and a value SerDe.

public static TopicWithSerde<String, KafkaUser> inputTopic() {
    return new TopicWithSerde<>(
        "input_topic",
        Serdes.String(),
        SerdesUtils.getValueSerdes()
    );
}

public static TopicWithSerde<String, KafkaUser> outputTopic() {
    return new TopicWithSerde<>(
        "output_topic",
        Serdes.String(),
        SerdesUtils.getValueSerdes()
    );
}

Use it in your topology:

@Slf4j
@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        KStream<String, KafkaUser> stream = inputTopic().stream(streamsBuilder);
        outputTopic().produce(stream);
    }
}

Prefix

The TopicWithSerde API is designed to handle topics owned by different teams across various environments without changing the topology. It uses prefixes to differentiate teams and topic ownership.

In your application.yml file, declare the prefixes in a key: value format:

kafka:
  properties:
    prefix:
      self: 'staging.team1.'
      team2: 'staging.team2.'
      team3: 'staging.team3.'

Then, include the prefix when declaring your TopicWithSerde:

public static TopicWithSerde<String, KafkaUser> inputTopic() {
    return new TopicWithSerde<>(
        "input_topic",
        "team1",
        Serdes.String(),
        SerdesUtils.getValueSerdes()
    );
}

The topic staging.team1.input_topic will be consumed when running the application with the staging application.yml file.

By default, if no prefix is specified, self is used.

Remapping

Kstreamplify encourages the use of fixed topic names in the topology, using the prefix feature to manage namespacing for virtual clusters and permissions. However, there are situations where you might want to reuse the same topology with different input or output topics.

In the application.yml file, you can declare dynamic remappings in a key: value format:

kafka:
  properties:
    topic:
      remap:
        oldTopicName: newTopicName
        foo: bar

The topic oldTopicName in the topology will be mapped to newTopicName.

This feature works with both input and output topics.

Unit Test

When testing, you can use the TopicWithSerde API to create test topics with the same name as those in your topology.

TestInputTopic<String, KafkaUser> inputTopic = createInputTestTopic(inputTopic());
TestInputTopic<String, KafkaUser> outputTopic = createOutputTestTopic(outputTopic());

Interactive Queries

Kstreamplify aims to simplify the use of interactive queries in Kafka Streams application.

Configuration

The value for the "application.server" property can be derived from various sources, following this order of priority:

  1. The environment variable defined by the application.server.var.name property.
kafka:
  properties:
    application.server.var.name: 'MY_APPLICATION_SERVER'
  1. If not defined, it defaults to the APPLICATION_SERVER environment variable.
  2. If neither of the above is set, it defaults to localhost:<serverPort>.

Services

You can leverage the interactive query services provided by the web services layer to access and query the state stores of your Kafka Streams application:

@Component
public class MyService {
    @Autowired
    KeyValueStoreService keyValueStoreService;

    @Autowired
    TimestampedKeyValueStoreService timestampedKeyValueStoreService;
  
    @Autowired
    WindowStoreService windowStoreService;

    @Autowired
    TimestampedWindowStoreService timestampedWindowStoreService;
}

Web Services

The web services layer provides a set of endpoints that allow you to query the state stores of your Kafka Streams application. You can find more details in the Interactive Queries Web Services section.

Hooks

Kstreamplify provides the flexibility to execute custom code through hooks at various stages of your Kafka Streams application lifecycle.

On Start

The On Start hook allows you to execute custom code before the Kafka Streams instance starts.

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void onStart(KafkaStreams kafkaStreams) {
        // Execute code before starting the Kafka Streams instance
    }
}

Deduplication

Kstreamplify provides an easy way to deduplicate streams through the DeduplicationUtils class. You can deduplicate based on various criteria and within a specified time frame.

All deduplication methods return a KStream<String, ProcessingResult<V,V2>, which allows you to handle errors and route them to the TopologyErrorHandler#catchErrors().

Note: Only streams with String keys and Avro values are supported.

By Key

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        KStream<String, KafkaUser> myStream = streamsBuilder
            .stream("input_topic");

        DeduplicationUtils
            .deduplicateKeys(streamsBuilder, myStream, Duration.ofDays(60))
            .to("output_topic");
    }
}

By Key and Value

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        KStream<String, KafkaUser> myStream = streamsBuilder
            .stream("input_topic");

        DeduplicationUtils
            .deduplicateKeyValues(streamsBuilder, myStream, Duration.ofDays(60))
            .to("output_topic");
    }
}

By Predicate

@Component
public class MyKafkaStreams extends KafkaStreamsStarter {
    @Override
    public void topology(StreamsBuilder streamsBuilder) {
        KStream<String, KafkaUser> myStream = streamsBuilder
            .stream("input_topic");

        DeduplicationUtils
            .deduplicateWithPredicate(streamsBuilder, myStream, Duration.ofDays(60),
                value -> value.getFirstName() + "#" + value.getLastName())
            .to("output_topic");
    }
}

In the predicate approach, the provided predicate is used as the key in the window store. The stream will be deduplicated based on the values derived from the predicate.

Open Telemetry

Kstreamplify simplifies the integration of Open Telemetry with Kafka Streams applications in Spring Boot. It binds all Kafka Streams metrics to the Spring Boot registry, making monitoring and observability easier.

To run your application with the Open Telemetry Java agent, include the following JVM options:

-javaagent:/opentelemetry-javaagent.jar -Dotel.traces.exporter=otlp -Dotel.logs.exporter=otlp -Dotel.metrics.exporter=otlp

Custom Tags for Metrics

You can also add custom tags to the Open Telemetry metrics to help organize and filter them in your observability tools, like Grafana. Use the following JVM options to specify custom tags:

-Dotel.resource.attributes=environment=production,service.namespace=myNamespace,service.name=myKafkaStreams,category=orders

These tags will be included in the metrics, and you'll be able to see them in your logs during application startup, helping to track and filter metrics based on attributes like environment, service name, and category.

Swagger

The Kstreamplify Spring Boot module integrates with Springdoc to automatically generate API documentation for your Kafka Streams application.

By default:

  • The Swagger UI is available at http://host:port/swagger-ui/index.html.
  • The OpenAPI documentation can be accessed at http://host:port/v3/api-docs.

Both the Swagger UI and the OpenAPI description can be customized using the Springdoc properties.

Motivation

Developing applications with Kafka Streams can be challenging, with developers often facing various questions and obstacles. Key considerations include efficiently bootstrapping Kafka Streams applications, handling unexpected business logic issues, integrating Kubernetes probes, and more.

To assist developers in overcoming these challenges, we have built Kstreamplify. Our goal is to provide a comprehensive solution that simplifies the development process and addresses the common pain points encountered when working with Kafka Streams. By offering easy-to-use utilities, error handling mechanisms, testing support, and integration with modern tools like Kubernetes and OpenTelemetry, Kstreamplify aims to streamline Kafka Streams application development.

Contribution

We welcome contributions from the community to help make Kstreamplify even better! If you'd like to contribute, please take a moment to review our contribution guide. There, you'll find our guidelines and best practices for contributing code, reporting issues, or suggesting new features.

We appreciate your support in making Kstreamplify a powerful and user-friendly tool for everyone working with Kafka Streams.