Python double free error for huge datasets

Question

After discussions on the same issue on the Numpy Github page (https://github.com/numpy/numpy/issues/2995) it has been brought to my attention that Numpy/Scipy will not support such a large number of non-zeros in the resulting sparse matrix.

Basically, W is a sparse matrix, and Q (or np.log(Q)-1) is a dense matrix. When multiplying a dense matrix with a sparse one, the resulting product will also be represented in sparse matrix form (which makes a lot of sense). However, note that since I have no zero rows in my W matrix, the resulting product W*(np.log(Q)-1) will have nnz > 2^31 (2.2 million multiplied by 2000) and this exceeds the maximum number of elements in a sparse matrix in current versions of Scipy.

At this stage, I’m not sure how else to get this to work, barring a re-implementation in another language. Perhaps it can still be done in Python, but it might be better to just write up a C++ and Eigen implementation.

A special thanks to pv. for helping out on this to pinpoint the exact issue, and thanks to everyone else for the brainstorming!

Leave a Comment Cancel reply