Why are Where and Select outperforming just Select?

Select iterates once over the entire set and, for each item, performs a conditional branch (checking for validity) and a + operation.

Where+Select creates an iterator that skips invalid elements (doesn’t yield them), performing a + only on the valid items.

So, the cost for a Select is:

t(s) = n * ( cost(check valid) + cost(+) )

And for Where+Select:

t(ws) = n * ( cost(check valid) + p(valid) * (cost(yield) + cost(+)) )

Where:

  • p(valid) is the probability that an item in the list is valid.
  • cost(check valid) is the cost of the branch that checks for validity
  • cost(yield) is the cost of constructing the new state of the where iterator, which is more complex than the simple iterator that the Select version uses.

As you can see, for a given n, the Select version is a constant, whereas the Where+Select version is a linear equation with p(valid) as a variable. The actual values of the costs determine the intersection point of the two lines, and since cost(yield) can be different from cost(+), they don’t necessarily intersect at p(valid)=0.5.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)