There’s a few potential solutions:
- External merge sort – O(n log n)
You basically sort the numbers on the first pass, then find the median on the second. - Order statistics distributed selection algorithm – O(n)
Simplify the problem to the original problem of finding the kth number in an unsorted array. - Counting sort histogram O(n)
You have to assume some properties about the range of the numbers – can the range fit in the memory? - If anything is known about the distribution of the numbers other
algorithms can be produced.
For more details and implementation see:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html