The Python version is compiling down to a loop in C while the Clojure version is building a new intermediate sequence for each of the calls to map in this code. It is likely that the performance difference you see is coming from the difference of data structures.
To get better than this you could play with a library like Incanter or write your own version as explained in this SO question. see also this one, neanderthal or nd4j. If you really want to stay with sequences to keep the lazy evaluation properties etc. then you may get a real boost by looking into transients for the internal matrix calculations
EDIT: forgot to add the first step in tuning clojure, turn on “warn on reflection”