Sorting algorithms for data of known statistical distribution?

Question

If the data you are sorting has a known distribution, I would use a Bucket Sort algorithm. You could add some extra logic to it so that you calculated the size and/or positions of the various buckets based upon properties of the distribution (ex: for Gaussian, you might have a bucket every (sigma/k) away from the mean, where sigma is the standard deviation of the distribution).

By having a known distribution and modifying the standard Bucket Sort algorithm in this way, you would probably get the Histogram Sort algorithm or something close to it. Of course, your algorithm would be computationally faster than the the Histogram Sort algorithm because there would probably not be a need to do the first pass (described in the link) since you already know the distribution.

Edit: given your new criteria of your question, (though my previous answer concerning Histogram Sort links to the respectable NIST and contains performance information), here is a peer review journal article from the International Conference on Parallel Processing:

Adaptive Data Partition for Sorting Using Probability Distribution

The authors claim this algorithm has better performance (up to 30% better) than the popular Quick-Sort Algorithm.

Leave a Comment Cancel reply