Select
iterates once over the entire set and, for each item, performs a conditional branch (checking for validity) and a +
operation.
Where+Select
creates an iterator that skips invalid elements (doesn’t yield
them), performing a +
only on the valid items.
So, the cost for a Select
is:
t(s) = n * ( cost(check valid) + cost(+) )
And for Where+Select
:
t(ws) = n * ( cost(check valid) + p(valid) * (cost(yield) + cost(+)) )
Where:
p(valid)
is the probability that an item in the list is valid.cost(check valid)
is the cost of the branch that checks for validitycost(yield)
is the cost of constructing the new state of thewhere
iterator, which is more complex than the simple iterator that theSelect
version uses.
As you can see, for a given n
, the Select
version is a constant, whereas the Where+Select
version is a linear equation with p(valid)
as a variable. The actual values of the costs determine the intersection point of the two lines, and since cost(yield)
can be different from cost(+)
, they don’t necessarily intersect at p(valid)
=0.5.