Why does Python copy NumPy arrays where the length of the dimensions are the same?

Question

In [1]: a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]), 
   ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8]), 
   ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8])]                               
In [2]:                                                                         
In [2]: a                                                                       
Out[2]: 
[array([0. , 0.2, 0.4, 0.6, 0.8]),
 array([0. , 0.2, 0.4, 0.6, 0.8]),
 array([0. , 0.2, 0.4, 0.6, 0.8])]

a is a list of arrays. b is a 2d array.

In [3]: b = np.array(a)                                                         
In [4]: b                                                                       
Out[4]: 
array([[0. , 0.2, 0.4, 0.6, 0.8],
       [0. , 0.2, 0.4, 0.6, 0.8],
       [0. , 0.2, 0.4, 0.6, 0.8]])
In [5]: b[0] += 1                                                               
In [6]: b                                                                       
Out[6]: 
array([[1. , 1.2, 1.4, 1.6, 1.8],
       [0. , 0.2, 0.4, 0.6, 0.8],
       [0. , 0.2, 0.4, 0.6, 0.8]])

b gets values from a but does not contain any of the a objects. The underlying data structure of this b is very different from a, the list. If that isn’t clear, you may want to review the numpy basics (which talk about shape, strides, and data buffers).

In the second case, b is an object array, containing the same objects as a:

In [8]: b = np.array(a)                                                         
In [9]: b                                                                       
Out[9]: 
array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),
       array([0. , 0.2, 0.4, 0.6])], dtype=object)

This b behaves a lot like the a – both contain arrays.

The construction of this object array is quite different from the 2d numeric array. I think of the numeric array as the default, or normal, numpy behavior, while the object array is a ‘concession’, giving us a useful tool, but one which does not have the calculation power of the multidimensional array.

It is easy to make an object array by mistake – some say too easy. It can be harder to make one reliably by design. FOr example with the original a, we have to do:

In [17]: b = np.empty(3, object)                                                
In [18]: b[:] = a[:]                                                            
In [19]: b                                                                      
Out[19]: 
array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),
       array([0. , 0.2, 0.4, 0.6, 0.8])], dtype=object)

or even for i in range(3): b[i] = a[i]

Leave a Comment Cancel reply