The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Double, so GHC (and LLVM) have a chance of applying more aggressive optimizations.
Note, for those trying to reproduce, this code imports:
import qualified Data.Vector as V
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle
type Permutation = V.Vector Int
Ok. There’s lots of things you can try to improve this code.
Improvements
Data representation
- Specialize to a concrete floating point type, instead of polymorphic floating point functions
- Replace tuple
(a,a,a)with unboxed tripleT !Double !Double !Double - Switch from
Data.ArraytoData.Array.UnboxedforPermutations - Replace use of boxed array of triples with multidimensional unboxed array from
repapackage
Compiler flags
- Compile with
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native(or equivalent with-fllvm) - Increase spec constr threshold —
-fspec-constr-count=16
More efficient library functions
- Use mersenne-random instead of StdGen to generate randoms
- Replace
modwithrem - Replace
V.!indexing with unchecked indexingVU.unsafeIndex(after moving toData.Vector.Unboxed
Runtime settings
- Increase the default allocation area:
-A20Mor-H
Also, check your algorithm is identical to the C# one, and you’re using the same data structures.