cython – Page 3 – Tarik Billa

Running Cython in Windows x64 – fatal error C1083: Cannot open include file: ‘basetsd.h’: No such file or directory

June 12, 2023 by Tarik

In case anyone is currently (2017) facing same error with visual C++ 2015 tools, launch setup again and also select windows 8.1 / 10 SDK depending upon your OS. This will fix basestd.h error. If it is still not working, try launching build tools from: C:\Program Files (x86)\Microsoft Visual C++ Build Tools. Another alternative would … Read more

Cython: cimport and import numpy as (both) np

June 9, 2023 by Tarik

cimport my_module gives access to C functions or attributes or even sub-modules under my_module import my_module gives access to Python functions or attributes or sub-modules under my_module. In your case: cimport numpy as np gives you access to Numpy C API, where you can declare array buffers, variable types and so on… And: import numpy … Read more

Cython Speed Boost vs. Usability [closed]

June 6, 2023 by Tarik

The other answers have already explained how you were just compiling the Cython code, not executing it. However, I thought that you might want to know how much faster Cython can make your code. When I compiled the code you have (though I ran the function from from a different module) with distutils, I got … Read more

Numpy vs Cython speed

June 6, 2023 by Tarik

With slight modification, version 3 becomes twice as fast: @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) def process2(np.ndarray[DTYPE_t, ndim=2] array): cdef unsigned int rows = array.shape[0] cdef unsigned int cols = array.shape[1] cdef unsigned int row, col, row2 cdef np.ndarray[DTYPE_t, ndim=2] out = np.empty((rows, cols)) for row in range(rows): for row2 in range(rows): for col in range(cols): out[row, col] … Read more

Why is np.dot so much faster than np.sum?

May 28, 2023 by Tarik

numpy.dot delegates to a BLAS vector-vector multiply here, while numpy.sum uses a pairwise summation routine, switching over to an 8x unrolled summation loop at a block size of 128 elements. I don’t know what BLAS library your NumPy is using, but a good BLAS will generally take advantage of SIMD operations, while numpy.sum doesn’t do … Read more

Simple wrapping of C code with cython

May 27, 2023 by Tarik

Here’s a tiny but complete example of passing numpy arrays to an external C function, logically fc( int N, double* a, double* b, double* z ) # z = a + b using Cython. (This is surely well-known to those who know it well. Comments are welcome. Last change: 23 Feb 2011, for Cython 0.14.) … Read more

Cython: (Why / When) Is it preferable to use Py_ssize_t for indexing?

May 17, 2023 by Tarik

Py_ssize_t is signed. See PEP 353, where it says “A new type Py_ssize_t is introduced, which has the same size as the compiler’s size_t type, but is signed. It will be a typedef for ssize_t where available.” You should use Py_ssize_t for indexing. I didn’t find a definitive statement of this in the Cython docs, … Read more

Are there advantages to use the Python/C interface instead of Cython?

May 15, 2023 by Tarik

The current “top answer” sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in … Read more

Make distutils look for numpy header files in the correct place

May 11, 2023 by Tarik

Use numpy.get_include(): from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy as np # <—- New line ext_modules = [Extension(“hello”, [“hello.pyx”], include_dirs=[get_numpy_include()])] # <—- New argument setup( name=”Hello world app”, cmdclass = {‘build_ext’: build_ext}, ext_modules = ext_modules )

Improve Pandas Merge performance

May 7, 2023 by Tarik

set_index on merging column does indeed speed this up. Below is a slightly more realistic version of julien-marrec’s Answer. import pandas as pd import numpy as np myids=np.random.choice(np.arange(10000000), size=1000000, replace=False) df1 = pd.DataFrame(myids, columns=[‘A’]) df1[‘B’] = np.random.randint(0,1000,(1000000)) df2 = pd.DataFrame(np.random.permutation(myids), columns=[‘A2’]) df2[‘B2′] = np.random.randint(0,1000,(1000000)) %%timeit x = df1.merge(df2, how=’left’, left_on=’A’, right_on=’A2′) #1 loop, best of … Read more