Here’s a possible modified version of your code, using delayed arrays:
{-# LANGUAGE TypeOperators #-}
import Data.Array.Repa as R
mulM :: (Num a, Source r a) => Array r DIM2 a -> Array D DIM2 a
mulM arr = traverse arr id mulM'
where
mulM' _ idx@(i' :. i) =
sumAllS $ extract (Z:.0) (Z:.(i+1)) $ slice arr (i' :. All)
ext :: DIM2
ext = Z :. (1000000::Int) :. (10::Int)
array :: Array D DIM2 Int
array = fromFunction ext (\(Z:.j:.i) -> j+i)
main :: IO ()
main = do
let delayedArray :: Array D DIM2 Int
delayedArray = delay array
result = computeUnboxedS $ mulM delayedArray
print "done"
Please note that this is a simplified example, and you may need to adapt and experiment further with Repa’s features to fully capture the desired linear-work algorithm for your specific matrix M. Be sure to refer to Repa’s documentation for more information on its capabilities and how to best express your computations.