apache-storm – Tarik Billa

Where do Apache Samza and Apache Storm differ in their use cases?

December 21, 2023 by Tarik

Well, I’ve been investigating these systems for a few months, and I don’t think they differ profoundly in their use cases. I think it’s best to compare them along these lines instead: Age: Storm is the older project, and the original one in this space, so it’s generally more mature and battle-tested. Samza is a … Read more

difference between exactly-once and at-least-once guarantees

September 16, 2023 by Tarik

Below definitions are quoted from Akka Documentation at-most-once delivery means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost. at-least-once delivery means that for each message handed to the mechanism potentially multiple attempts are made at delivering … Read more

Storm vs. Trident: When not to use Trident?

August 14, 2023 by Tarik

To answer your question: when shouldn’t you use Trident? Whenever you can afford not to. Trident adds complexity to a Storm topology, lowers performance and generates state. Ask yourself the question: do you need the “exactly once” processing semantics of Trident or can you live with the “at least once” processing semantics of Storm. For … Read more

Testing Storm Bolts and Spouts

May 18, 2023 by Tarik

Since version 0.8.1 Storm’s unit testing facilities have been exposed via Java: http://storm-project.net/2012/09/06/storm081-released.html For an example how to use this API have a look here: https://github.com/xumingming/storm-lib/blob/master/src/jvm/storm/TestingApiDemo.java

What is the “task” in Storm parallelism

May 3, 2023 by Tarik

Disclaimer: I wrote the article you referenced in your question above. However I’m a bit confused by the concept of “task”. Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct … Read more

How can I serialize a numpy array while preserving matrix dimensions?

April 16, 2023 by Tarik

pickle.dumps or numpy.save encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird structured dtypes. Endianness issues are probably the most important; you don’t want array([1]) to suddenly become array([16777216]) because you loaded your array on a big-endian machine. pickle is probably the … Read more

Apache Kafka vs Apache Storm

December 27, 2022 by Tarik

You use Apache Kafka as a distributed and robust queue that can handle high volume data and enables you to pass messages from one end-point to another. Storm is not a queue. It is a system that has distributed real time processing abilities, meaning you can execute all kind of manipulations on real time data … Read more

What is/are the main difference(s) between Flink and Storm?

November 17, 2022 by Tarik

Disclaimer: I’m an Apache Flink committer and PMC member and only familiar with Storm’s high-level design, not its internals. Apache Flink is a framework for unified stream and batch processing. Flink’s runtime natively supports both domains due to pipelined data transfers between parallel tasks which includes pipelined shuffles. Records are immediately shipped from producing tasks … Read more