A shorter answer: yes, CLUSTER BY guarantees global ordering, provided you’re willing to join the multiple output files yourself.
The longer version:
ORDER BY x: guarantees global ordering, but does this by pushing all data through just one reducer. This is basically unacceptable for large datasets. You end up one sorted file as output.SORT BY x: orders data at each of N reducers, but each reducer can receive overlapping ranges of data. You end up with N or more sorted files with overlapping ranges.DISTRIBUTE BY x: ensures each of N reducers gets non-overlapping ranges ofx, but doesn’t sort the output of each reducer. You end up with N or more unsorted files with non-overlapping ranges.CLUSTER BY x: ensures each of N reducers gets non-overlapping ranges, then sorts by those ranges at the reducers. This gives you global ordering, and is the same as doing (DISTRIBUTE BY xandSORT BY x). You end up with N or more sorted files with non-overlapping ranges.
Make sense? So CLUSTER BY is basically the more scalable version of ORDER BY.