NumPy (Part I)#

What you will learn in this lesson:

  • Modules and Packages

  • What is NumPy

  • Introduction to NumPy arrays

  • Creating NumPy arrays

  • Data Types in NumPy

Brief Introduction to Modules and Packages#

The first external Python package we’ll be learning is NumPy. External packages (those not built into Python) typically need to be (1) installed and (2) imported before use. But before diving into NumPy, let’s take a moment to understand what packages are and how they are used.

In Python, code is usually organized into packages and modules.

  • Technically, a Python package is a directory that contains none or multiple Python modules.

  • A module is simply a Python file (with an extension .py) that contains functions, classes, and other definitions.

For example, a typical package structure might look like this:

── package_name
    ├── __init__.py
    ├── module1.py
    └── module2.py

Each package always includes a special file named __init__.py. This file signals to Python that the folder should be treated as a package, allowing you to import and use the code within it. Additionally, __init__.py can contain initialization code that runs when the package is first imported.

Installing#

To install a Python package on your system, it is common to use the package management tool called PIP. To install a package using PIP in your Jupyter notebook, you can run the following command:

!pip install [package_name]

Alternatively, if you are using a command-line terminal, you would just run the same command but with no exclamations mark:

pip install [package_name]

For example, to install NumPy directly from our jupyter notebook, we would run:

!pip install numpy

However, NumPy is such a common package it comes pre-installed here.

Importing#

Once a package is installed on your system, it must be imported into any Python code using the import command.

For example, if we wanted to import NumPy, we would run:

import numpy

Import Aliases#

Python also allows import statements to declare an alias for referencing the package. It is a common practice among NumPy users to import it as np like this:

import numpy as np
import numpy as np

What is NumPy#

These are the main features of NumPy:

  • A new data structure

NumPy introduces a new data structure to Python: the n-dimensional array. This data structure comes with a collection of functions and methods specifically designed to work with numerical data.

The n-dimensional array is optimized for numerical methods, which are algorithmic approximations used to solve problems in mathematical analysis.

  • New Functions

NumPy also provides a new way to apply functions to data through vectorized functions. These functions allow you to perform operations on entire arrays at once, eliminating the need for loops and comprehensions.

In addition, NumPy offers a comprehensive library of linear algebra functions to further take advantage of its data structure.

  • New Data Types

NumPy introduces several new data types tailored for efficient numerical computation.

  • Python for Science

Finally, because numerical methods are so critical across many fields of science, NumPy serves as the foundation of Python’s scientific stack. This stack includes libraries like SciPy, Matplotlib, Scikit-Learn, and Pandas—all of which assume some familiarity with NumPy.

The ndarray object#

The ndarray is a multidimensional array object in NumPy.

Before diving into the details, let’s get a quick glimpse of its functionality. We will generate some fake data using NumPy’s built-in random number generator, numpy.random.standard_normal():

# generate a single random number
number = np.random.standard_normal()
print(number, type(number))
1.5106196181630014 <class 'float'>
# generate some data - what do the arguments specify?
data = np.random.standard_normal((2, 3))
print(data, type(data))
[[ 0.50357981  1.60435582  0.06693117]
 [-0.76137145  1.07623844 -0.00815216]] <class 'numpy.ndarray'>
# Multiplication by a number
data * 10
array([[ 5.03579813, 16.04355815,  0.66931166],
       [-7.61371454, 10.76238442, -0.08152162]])
# Multiplication by a number
print(data + data)

print(data * 2)
[[ 1.00715963  3.20871163  0.13386233]
 [-1.52274291  2.15247688 -0.01630432]]
[[ 1.00715963  3.20871163  0.13386233]
 [-1.52274291  2.15247688 -0.01630432]]

Like it is highlighted above and we repeatedly emphasized throughout the course, everything in Python is an object, and the ndarray is no exception. Consequently, ndarray objects have attributes and methods, just like other Python objects.

# This is an attribute example, which stores the shape of the array
data.shape
(2, 3)
# This is an method example, which allows you to reshape your array
print(data.reshape(3,2))
print(data.reshape(3,2).shape)
[[ 0.50357981  1.60435582]
 [ 0.06693117 -0.76137145]
 [ 1.07623844 -0.00815216]]
(3, 2)

Note

The term “dimension” can be ambiguous:

  • It may refer to real-world dimensions, such as space and time.

  • Or, it may refer to the dimensions of a data structure, independent of its real-world meaning.

In NumPy, “dimensions” refer to the structure of the data itself, although these dimensions can be used to represent real-world entities, as physicists often do.

The dimensions of a data structure are sometimes called axes.

For example, three-dimensional space can be represented either as three columns in a two-dimensional table or as three axes in a data cube.

Creating ndarrays#

  • np.array(): This is the most basic and common way. It takes an object (normally a list) and casts it as an array data structure

data1 = [6, 7.5, 8, 0, 1] # create a list
arr1 = np.array(data1) # turn list into a numpy array
arr1
array([6. , 7.5, 8. , 0. , 1. ])
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
  • np.zeros(): It allows you to create an array with all zeros

np.zeros(5)
array([0., 0., 0., 0., 0.])

You can create an array of zeros with any shape that you like. How do we do this? Let’s have a look at the documentation of this function:

help(np.zeros)
Help on built-in function zeros in module numpy:

zeros(...)
    zeros(shape, dtype=float, order='C', *, like=None)
    
    Return a new array of given shape and type, filled with zeros.
    
    Parameters
    ----------
    shape : int or tuple of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    like : array_like, optional
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
    
        .. versionadded:: 1.20.0
    
    Returns
    -------
    out : ndarray
        Array of zeros with the given shape, dtype, and order.
    
    See Also
    --------
    zeros_like : Return an array of zeros with shape and type of input.
    empty : Return a new uninitialized array.
    ones : Return a new array setting values to one.
    full : Return a new array of given shape filled with value.
    
    Examples
    --------
    >>> np.zeros(5)
    array([ 0.,  0.,  0.,  0.,  0.])
    
    >>> np.zeros((5,), dtype=int)
    array([0, 0, 0, 0, 0])
    
    >>> np.zeros((2, 1))
    array([[ 0.],
           [ 0.]])
    
    >>> s = (2,2)
    >>> np.zeros(s)
    array([[ 0.,  0.],
           [ 0.,  0.]])
    
    >>> np.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]) # custom dtype
    array([(0, 0), (0, 0)],
          dtype=[('x', '<i4'), ('y', '<i4')])
np.zeros((5,2))
array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])
  • np.ones(): This function creates arrays populated with all ones.

np.ones(5)
array([1., 1., 1., 1., 1.])
np.ones((5,2))
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])
  • np.empty(): It creates an array whose initial content is random and depends on the state of the memory.

np.empty((5,2))
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])
  • numpy.random.standard_normal(): It creates an array of random numbers following a standard normal distribution (Gaussian distribution with mean equal to zero, and standard deviation equal to 1)

help(np.random.standard_normal)
Help on built-in function standard_normal:

standard_normal(...) method of numpy.random.mtrand.RandomState instance
    standard_normal(size=None)
    
    Draw samples from a standard Normal distribution (mean=0, stdev=1).
    
    .. note::
        New code should use the
        `~numpy.random.Generator.standard_normal`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.
    
    Parameters
    ----------
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    
    Returns
    -------
    out : float or ndarray
        A floating-point array of shape ``size`` of drawn samples, or a
        single sample if ``size`` was not specified.
    
    See Also
    --------
    normal :
        Equivalent function with additional ``loc`` and ``scale`` arguments
        for setting the mean and standard deviation.
    random.Generator.standard_normal: which should be used for new code.
    
    Notes
    -----
    For random samples from the normal distribution with mean ``mu`` and
    standard deviation ``sigma``, use one of::
    
        mu + sigma * np.random.standard_normal(size=...)
        np.random.normal(mu, sigma, size=...)
    
    Examples
    --------
    >>> np.random.standard_normal()
    2.1923875335537315 #random
    
    >>> s = np.random.standard_normal(8000)
    >>> s
    array([ 0.6888893 ,  0.78096262, -0.89086505, ...,  0.49876311,  # random
           -0.38672696, -0.4685006 ])                                # random
    >>> s.shape
    (8000,)
    >>> s = np.random.standard_normal(size=(3, 4, 2))
    >>> s.shape
    (3, 4, 2)
    
    Two-by-four array of samples from the normal distribution with
    mean 3 and standard deviation 2.5:
    
    >>> 3 + 2.5 * np.random.standard_normal(size=(2, 4))
    array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
           [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
np.random.standard_normal((5,2))
array([[ 0.5256863 ,  0.73393186],
       [-0.42945253, -0.21709978],
       [-0.5252806 , -0.60517355],
       [-2.27295385,  1.0068564 ],
       [ 0.73981497,  0.22700545]])
  • np.arange(): Similar to range, it allows you to create an array of sequencial integer numbers.

np.arange(1,10,2)
array([1, 3, 5, 7, 9])
  • np.linspace(): This function allows you to create an array with a specified number of values that are evenly spaced within a given interval.

# Here we are requesting 25 numbers linearly spaced between 1 and 10
np.linspace(1,10,25)
array([ 1.   ,  1.375,  1.75 ,  2.125,  2.5  ,  2.875,  3.25 ,  3.625,
        4.   ,  4.375,  4.75 ,  5.125,  5.5  ,  5.875,  6.25 ,  6.625,
        7.   ,  7.375,  7.75 ,  8.125,  8.5  ,  8.875,  9.25 ,  9.625,
       10.   ])

Try it yourself: Rerun the previous example by using the function np.logspace instead. What would you say is the difference here?

Very common attributes and methods with numpy arrays objects#

Below is a very short subset of the attributes and methods available for ndarray objects. Please, refer to array-attributes and array-methods for a comprehensive and detailed list.

  • ndim: It gives the number of dimensions or axes.

arr1.ndim
1
arr2.ndim
2
  • shape: It checks the shape of the ndarray.

arr2.shape
(2, 4)
  • dtype: It retrives the data types contained in the ndarray (see later).

arr2.dtype
dtype('int64')
  • reshape: It reshapes an array to a given shape.

arr2.reshape(1,8)
array([[1, 2, 3, 4, 5, 6, 7, 8]])
  • transpose: It transposes the array.

arr2.transpose()
array([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8]])
# Or as an attribute
arr2.T
array([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8]])
  • flatten: It flattens the array

arr2.flatten()
array([1, 2, 3, 4, 5, 6, 7, 8])

Data Types#

NumPy introduces its own data types, optimized for efficient storage and processing. The most commonly used data types in NumPy are:

  • np.int16

  • np.int32

  • np.int64

  • np.float32

  • np.float64

  • np.float128

  • np.bool_

  • np.str_

  • np.bytes_

  • np.object_

We can control the data type of a NumPy array at the time of creation by using the dtype argument.

Let’s see how this works with some of the data we used earlier.

By default, if all elements in a NumPy array are integers, Python will be as flexible as possible by assigning them the np.int64 data type.

# arr2 contained all integer elements
print(arr2.dtype)
int64
# Nevertheless, we can specify the data type that we want in the definition of the arrays
arr2_int32 = np.array(data2, dtype=np.int32)
print(arr2_int32)
print(arr2_int32.dtype)

arr2_float64 = np.array(data2, dtype=np.float64)
print(arr2_float64)
print(arr2_float64.dtype)
[[1 2 3 4]
 [5 6 7 8]]
int32
[[1. 2. 3. 4.]
 [5. 6. 7. 8.]]
float64

In contrast, if at least one element in the array contains a decimal, Python will treat the entire array as a float type, defaulting to np.float64.

# Similar to data2, only that the last element has a 0 decimal.
data3 = [[1, 2, 3, 4], [5, 6, 7, 8.0]]
arr3 = np.array(data3)
arr3
print(arr3)
print(arr3.dtype)
[[1. 2. 3. 4.]
 [5. 6. 7. 8.]]
float64
# Here using string
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings
array([b'1.25', b'-9.6', b'42'], dtype='|S4')
# Here using string
boolean_array = np.array([True, False, False], dtype=np.bool_)
boolean_array
array([ True, False, False])

You can also convert to different data types after your array has been created using the method astype.

numeric_strings.astype(np.float64)
array([ 1.25, -9.6 , 42.  ])
boolean_array.astype(np.int64)
array([1, 0, 0])

Practice exercises#

Exercise 29

1- Convert the following variable, sudoku_game, into a NumPy array called sudoku_array.
2- Print the class type() of sudoku_array to check that your code has worked properly.

# sudoku_game is Python list containing a sudoku game
sudoku_game = [[0, 0, 4, 3, 0, 0, 2, 0, 9],
               [0, 0, 5, 0, 0, 9, 0, 0, 1],
               [0, 7, 0, 0, 6, 0, 0, 4, 3],
               [0, 0, 6, 0, 0, 2, 0, 8, 7],
               [1, 9, 0, 0, 0, 7, 4, 0, 0],
               [0, 5, 0, 0, 8, 3, 0, 0, 0],
               [6, 0, 0, 0, 0, 0, 1, 0, 5],
               [0, 0, 3, 5, 0, 8, 6, 9, 0],
               [0, 4, 2, 9, 1, 0, 3, 0, 0]]
# Your answers from here

Exercise 30

You’ve just created a sudoku_game two-dimensional NumPy array. Perhaps you have hundreds of sudoku game arrays, and you’d like to save the solution for this one, sudoku_solution, as part of the same array as its corresponding game in order to organize your sudoku data better. You could accomplish this by stacking the two 2D arrays on top of each other to create a 3D array.

1- Create a 3D array called game_and_solution by stacking the two 2D arrays, created from sudoku_game and sudoku_solution, on top of one another; in the final array, sudoku_game should appear before sudoku_solution. Print sudoku_game.

2- Flatten sudoku_game so that it is a 1D array, and save it as flattened_game. Print the .shape of flattened_game.

3- Reshape the flattened_game back to its original shape of nine rows and nine columns; save the new array as reshaped_game.

# sudoku_game is Python list containing a sudoku game

sudoku_game = [[0, 0, 4, 3, 0, 0, 2, 0, 9],
               [0, 0, 5, 0, 0, 9, 0, 0, 1],
               [0, 7, 0, 0, 6, 0, 0, 4, 3],
               [0, 0, 6, 0, 0, 2, 0, 8, 7],
               [1, 9, 0, 0, 0, 7, 4, 0, 0],
               [0, 5, 0, 0, 8, 3, 0, 0, 0],
               [6, 0, 0, 0, 0, 0, 1, 0, 5],
               [0, 0, 3, 5, 0, 8, 6, 9, 0],
               [0, 4, 2, 9, 1, 0, 3, 0, 0]]

sudoku_solution = [[8, 6, 4, 3, 7, 1, 2, 5, 9],
                   [3, 2, 5, 8, 4, 9, 7, 6, 1],
                   [9, 7, 1, 2, 6, 5, 8, 4, 3],
                   [4, 3, 6, 1, 9, 2, 5, 8, 7],
                   [1, 9, 8, 6, 5, 7, 4, 3, 2],
                   [2, 5, 7, 4, 8, 3, 9, 1, 6],
                   [6, 8, 9, 7, 3, 4, 1, 2, 5],
                   [7, 1, 3, 5, 2, 8, 6, 9, 4],
                   [5, 4, 2, 9, 1, 6, 3, 7, 8]]
# Your answers from here

Exercise 31

1- Create another 3D array called new_game_and_solution with a different 2D game and 2D solution pair: new_sudoku_game and new_sudoku_solution. new_sudoku_game should appear before new_sudoku_solution.

2- Create a 4D array called games_and_solutions by making an array out of the two 3D arrays: game_and_solution and new_game_and_solution, in that order.

3- Print the shape of games_and_solutions.

new_sudoku_game = [[0, 0, 4, 3, 0, 0, 0, 0, 0],
                   [8, 9, 0, 2, 0, 0, 6, 7, 0],
                   [7, 0, 0, 9, 0, 0, 0, 5, 0],
                   [5, 0, 0, 0, 0, 8, 1, 4, 0],
                   [0, 7, 0, 0, 3, 2, 0, 6, 0],
                   [6, 0, 0, 0, 0, 1, 3, 0, 8],
                   [0, 0, 1, 7, 5, 0, 9, 0, 0],
                   [0, 0, 5, 0, 4, 0, 0, 1, 2],
                   [9, 8, 0, 0, 0, 6, 0, 0, 5]]

new_sudoku_solution = [[2, 5, 4, 3, 6, 7, 8, 9, 1],
                       [8, 9, 3, 2, 1, 5, 6, 7, 4],
                       [7, 1, 6, 9, 8, 4, 2, 5, 3],
                       [5, 3, 2, 6, 9, 8, 1, 4, 7],
                       [1, 7, 8, 4, 3, 2, 5, 6, 9],
                       [6, 4, 9, 5, 7, 1, 3, 2, 8],
                       [4, 2, 1, 7, 5, 3, 9, 8, 6],
                       [3, 6, 5, 8, 4, 9, 7, 1, 2],
                       [9, 8, 7, 1, 2, 6, 4, 3, 5]]
# Your answers from here

Exercise 32

1- Create and print an array filled with zeros called zero_array, which has two rows and four columns.

2- Create and print an array, called random_array, of random floats following a standard normal distribution, which has three rows and six columns.

3- Create a 1D array called one_to_ten which holds all integers from one to ten (inclusive)

# Your answers from here