I played a bit with both lately, and here is what I gathered.
Neutral:
- I was going to make Kafka win on the community/documentation etc. But I wasn’t able to find replies to questions I had on Kafka easily, some were old and confusing (targetting the legacy API). But Pulsar documentation is good enough, the developpers are very responsive on Slack (hello @Matteo Merli 🙂 ) , and the underlying pieces (Zookeeper, Bookkeeper) have decent documentation as well should you want to dive in the internals.
- Kafka aims for high throughput, Pulsar for low latency. Both provide settings to control it.
- Both are production-ready and battle-tested in several companies
Pro pulsar:
- from my experience the API is easier to use. In Kafka, the broker is dumb and the consumers do the job of structuring communications as they see fit. This flexibility comes at the cost of the user of Kafka having to understand how to make the pieces fit together. I guess the intended benefit is increased flexibility, but since Pulsar was able to replicate Kafka Consumers API (and with fairly little code) I give that as a pro to Pulsar.
- you can do things that are not easily done (or maybe impossible in Kafka): multi-tenancy (security, isolation…), resource management (topic throttling, quotas), geo-replication
- It has some features that Kafka currently lacks, like seeking to a particular MessageId
- Pulsar scales to millions of topics, whicle Kafka is limited by the way it structures data in Zookeeper
- Easier deployment. A standalone Pulsar will start it’s own local Zookeeper, and I personally found the configuration easier to understand
- written in Java, versus a mix of legacy Scala and Java code. Also I found the codebase well organised and much easier to follow. In part because it relies on Zookeeper and Bookkeeper, which are external projects with their own documentation/community/developers etc. (please note, those are also in the Apache foundation, and also coming from Yahoo so they work well together).
Pro Kafka:
- Kafka has things built on top like Kafka Streams (never used it so I can’t say if there is an equivalent)
Also read:
- https://news.ycombinator.com/item?id=12453080
- https://news.ycombinator.com/item?id=15601222
- https://streaml.io/blog/why-apache-pulsar/
- https://kafka.apache.org/uses