Where do Apache Samza and Apache Storm differ in their use cases?

Well, I’ve been investigating these systems for a few months, and I don’t think they differ profoundly in their use cases. I think it’s best to compare them along these lines instead: Age: Storm is the older project, and the original one in this space, so it’s generally more mature and battle-tested. Samza is a … Read more

difference between exactly-once and at-least-once guarantees

Below definitions are quoted from Akka Documentation at-most-once delivery means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost. at-least-once delivery means that for each message handed to the mechanism potentially multiple attempts are made at delivering … Read more

Storm vs. Trident: When not to use Trident?

To answer your question: when shouldn’t you use Trident? Whenever you can afford not to. Trident adds complexity to a Storm topology, lowers performance and generates state. Ask yourself the question: do you need the “exactly once” processing semantics of Trident or can you live with the “at least once” processing semantics of Storm. For … Read more

What is the “task” in Storm parallelism

Disclaimer: I wrote the article you referenced in your question above. However I’m a bit confused by the concept of “task”. Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct … Read more

How can I serialize a numpy array while preserving matrix dimensions?

pickle.dumps or numpy.save encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird structured dtypes. Endianness issues are probably the most important; you don’t want array([1]) to suddenly become array([16777216]) because you loaded your array on a big-endian machine. pickle is probably the … Read more

What is/are the main difference(s) between Flink and Storm?

Disclaimer: I’m an Apache Flink committer and PMC member and only familiar with Storm’s high-level design, not its internals. Apache Flink is a framework for unified stream and batch processing. Flink’s runtime natively supports both domains due to pipelined data transfers between parallel tasks which includes pipelined shuffles. Records are immediately shipped from producing tasks … Read more