I cannot answer it all, but i try on some parts:
Is there some reason that the “most asynchronous” behavior—i.e., don’t
consume the collection before returning, and don’t wait for each
future to complete before moving on to the next—isn’t represented
here?
If you have dependent calculations and a limited number of threads, you can experience deadlocks. For example you have two futures depending on a third one (all three in the list of futures) and only two threads, you can experience a situation where the first two futures block all two threads and the third one never gets executed. (Of course, if your pool size is one, i.e. zou execute one calculation after the other, you can get similar situations)
To solve this, you need one thread per future, without any limitation. This works for small lists of futures, but not for big one. So if you run all in parallel, you will get a situation where small examples will run in all cases and bigger one will deadlock. (Example: Developer tests run fine, production deadlocks).
Is there a “correct” behavior for this operation on lists and streams?
I think it is impossible with futures. If you know something more of the dependencies, or when you know for sure that the calculations will not block, a more concurrent solution might be possible. But executing lists of futures looks for me “broken by design”. Best solution seems one, that will already fail for small examples for deadlocks (i.e. execute one Future after the other).
Scalaz futures with lists: do wait for each future to complete.
I think scalaz uses for comprehensions internally for traversal. With for comprehensions, it is not guaranteed that the calculations are independent. So I guess that Scalaz is doing the right thing here with for comprehensions: Doing one calculation after the other. In the case of futures, this will always work, given you have unlimited threads in you operating system.
So in other words: You see just an artifact of how for comprehensions (must) work.
I hope this makes some sense.