First of all I’m happy that what I wrote works in clang (I see your update implements 3 out of the 4 steps I mention). I have to give credit to Nick Athanasiou for this technique that discussed this with me way before writing this.
The reason I mention this now is because I was informed he released a library (in the boost library incubator) that implements this stuff; you can find related documentation here. It seems the initial idea (that we both use here) and allowed code like this:
(Op<Max>(args) + ...); // Op is a function producing the custom fold type
was left out in favor of lazy evaluation and stateful operators (or not included yet, can’t know for sure).