Why Apache Kafka Streams uses RocksDB and if how is it possible to change it?

Question

RocksDB is used for several (internal) reasons (as you mentioned already for example its performance). Conceptually, Kafka Streams does not need RocksDB — it is used as internal key-value cache and any other store offering similar functionality would work, too.

Comment from @miguno below (rephrased):

One important advantage of RocksDB in contrast to pure in-memory key-value stores is its ability to write to disc. Thus, a state larger than available main memory can be supported by Kafka Streams.

Comment from @miguno above:

FYI: "RocksDB is not written in JVM compatible language, so it needs careful handling of the deployment, as it needs extra shared library (OS dependent)." As a user of Kafka Streams you don’t need to install anything.

Using Kafka Streams DSL, as of 0.10.2 release (KAFKA-3825) it’s possible to plug in custom state stores and to use a different key-value store.

Using Kafka Streams Processor API, you can implement your own store via StateStore interface and connect it to a processor node in your topology.

Leave a Comment Cancel reply