matrix-multiplication – Page 2

Why is matrix multiplication faster with numpy than with ctypes in Python?

March 21, 2023 by Tarik

NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication (see also: ATLAS). The specific function in this case is GEMM (for generic matrix multiplication). You can look up the original by searching for dgemm.f (it’s in Netlib). The optimization, by the way, goes beyond compiler optimizations. Above, Philip mentioned Coppersmith–Winograd. If I remember correctly, … Read more

Faster way to initialize arrays via empty matrix multiplication? (Matlab)

March 13, 2023 by Tarik

This is strange, I am seeing f being faster while g being slower than what you are seeing. But both of them are identical for me. Perhaps a different version of MATLAB ? >> g = @() zeros(1000, 0) * zeros(0, 1000); >> f = @() zeros(1000) f = @()zeros(1000) >> timeit(f) ans = 8.5019e-04 … Read more

2-D convolution as a matrix-matrix multiplication [closed]

March 10, 2023 by Tarik

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x … Read more

Vectorized way of calculating row-wise dot product two matrices with Scipy

March 2, 2023 by Tarik

Straightforward way to do that is: import numpy as np a=np.array([[1,2,3],[3,4,5]]) b=np.array([[1,2,3],[1,2,3]]) np.sum(a*b, axis=1) which avoids the python loop and is faster in cases like: def npsumdot(x, y): return np.sum(x*y, axis=1) def loopdot(x, y): result = np.empty((x.shape[0])) for i in range(x.shape[0]): result[i] = np.dot(x[i], y[i]) return result timeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50)) # 1 loops, best of 3: … Read more

CUDA determining threads per block, blocks per grid

March 2, 2023 by Tarik

In general you want to size your blocks/grid to match your data and simultaneously maximize occupancy, that is, how many threads are active at one time. The major factors influencing occupancy are shared memory usage, register usage, and thread block size. A CUDA enabled GPU has its processing capability split up into SMs (streaming multiprocessors), … Read more

Matrix multiplication: Small difference in matrix size, large difference in timings

February 3, 2023 by Tarik

Here’s my wild guess: cache It could be that you can fit 2 rows of 2000 doubles into the cache. Which is slighly less than the 32kb L1 cache. (while leaving room other necessary things) But when you bump it up to 2048, it uses the entire cache (and you spill some because you need … Read more

How do I multiply matrices in PyTorch?

January 28, 2023 by Tarik

Use torch.mm: torch.mm(a, b) torch.dot() behaves differently to np.dot(). There’s been some discussion about what would be desirable here. Specifically, torch.dot() treats both a and b as 1D vectors (irrespective of their original shape) and computes their inner product. The error is thrown because this behaviour makes your a a vector of length 6 and … Read more

Why is there huge performance hit in 2048×2048 versus 2047×2047 array multiplication?

December 15, 2022 by Tarik

This probably has do with conflicts in your L2 cache. Cache misses on matice1 are not the problem because they are accessed sequentially. However for matice2 if a full column fits in L2 (i.e when you access matice2[0, 0], matice2[1, 0], matice2[2, 0] … etc, nothing gets evicted) than there is no problem with cache … Read more

how does multiplication differ for NumPy Matrix vs Array classes?

November 29, 2022 by Tarik

The main reason to avoid using the matrix class is that a) it’s inherently 2-dimensional, and b) there’s additional overhead compared to a “normal” numpy array. If all you’re doing is linear algebra, then by all means, feel free to use the matrix class… Personally I find it more trouble than it’s worth, though. For … Read more

How to get element-wise matrix multiplication (Hadamard product) in numpy?

November 27, 2022 by Tarik

For elementwise multiplication of matrix objects, you can use numpy.multiply: import numpy as np a = np.array([[1,2],[3,4]]) b = np.array([[5,6],[7,8]]) np.multiply(a,b) Result array([[ 5, 12], [21, 32]]) However, you should really use array instead of matrix. matrix objects have all sorts of horrible incompatibilities with regular ndarrays. With ndarrays, you can just use * for … Read more