25+ Numpy functions you must know as a beginner

25+ Numpy functions you must know as a beginner


Numpy is easily the most popular library for handling numerical data in Python.

It uses a multi-dimensional array structure to support the manipulation of a large amount of data at once. Its built-in functions then allow you to access and operate on these arrays to bring the data to the required final state.

Numpy provides functions to carry out a wide variety of tasks belonging to different numerical domains. This includes array creation, statistics, vector and matrix routines, linear algebra, and regular mathematical operations.

As a beginner to Numpy, though, you don’t have to learn all of them at once. If you go for the more common Numpy functions first, it’ll be enough to serve most of your numerical data needs.

So, in this post, we will introduce you to essential Numpy functions you should know as a beginner to handle its common use cases.

How to work with NumPy

Learn how to work with Numpy and take the first steps into becoming a data scientist.


Array creation

Array

The array function creates a new n-dimensional array from an array-like object (e.g., Python list, Pandas series) passed as the first argument.

positions = [[1, 2], [3, 4]]
price = [23, 45, 10.2, 50]

positions_array = np.array(positions)
price_array = np.array(price)

print(positions_array)
# [[1 2]
#  [3 4]]

print(price_array)
# [23. 45. 10.2 50.]

print(positions_array.dtype) # int64
print(price_array.dtype) # float64

Numpy determines the dimension of the returned array based on this object’s dimensions.

You can either allow Numpy to derive the data type from the passed object or specify its value using a second argument.

positions_float_array = np.array(positions, dtype= np.float32)

print(positions_float_array)
# [[1. 2.]
#  [3. 4.]]

print(positions_float_array.dtype) # float32

Zeros

Zeros is another array creation function that returns a Numpy array filled with zeros and takes the dimensions of a provided shape.

zeros_1d_array = np.zeros(5)
zeros_2d_array = np.zeros((3, 4))
zeros_int_array = np.zeros((4,), dtype=np.int64)

print(zeros_1d_array)
# [0. 0. 0. 0. 0.]

print(zeros_2d_array)
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

print(zeros_int_array)
# [0 0 0 0]

While the default data type of the returned array is set to float64, you can explicitly assign it another type using the dtype argument.

Ones

This function is quite similar to the zeros function. But it fills the output array with ones instead of zeros.

ones_1d_array = np.ones(5)
ones_2d_array = np.ones((3, 4))
ones_int_array = np.ones((4,), dtype=np.int64)

print(ones_1d_array)
# [1. 1. 1. 1. 1.]

print(ones_2d_array)
# [[1. 1. 1. 1.]
#  [1. 1. 1. 1.]
#  [1. 1. 1. 1.]]

print(ones_int_array)
# [1 1 1 1]

Arange

Arange is a function that returns a new array containing evenly-spaced numbers within a given range. It accepts three main arguments.

  • start: The beginning number of the interval.
  • stop: The ending number of the interval.
  • step: The spacing between numbers added to the array.

From these three, start and step are optional arguments.

If you call the arange with only a stop value, the function uses numbers in the interval (0, stop] with a step of 1 to generate the array.

np.arange(6)
# array([0, 1, 2, 3, 4, 5])

If you pass the optional start value, it uses the interval (start, step] with a step of 1 to generate the array. Here, the first argument becomes start while the second is assigned to stop.

np.arange(4, 10)
# array([4, 5, 6, 7, 8, 9])

If you pass all three arguments, arange creates an array using step-spaced numbers in the interval (start, stop].

np.arange(10, 100, step=10)
# array([10, 20, 30, 40, 50, 60, 70, 80, 90])

np.arange(10, 20, step=15)
# array([10])

While you can pass a non-integer step value to this function, it can lead to some unexpected results due to limitations in floating point arithmetics. It’s best to rely on the linspace function discussed next instead of arange in such cases.

However, there’s no issue with using non-integer start and stop values as long as the step remains integer. Arange also accepts an optional dtype argument that lets you set the array data type.

Linspace

Unlike arange, you can use linspace more reliably to fill an array with evenly-spaced, non-integer numbers. It accepts two mandatory arguments, start and stop, along with another optional num value to create such an array.

Linspace generates a “num” amount of evenly-spaced samples within the [start, stop] interval using these inputs. If no num is passed, it uses the default value of 50.

np.linspace(1, 2, num=11)
# array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ])

np.linspace(5, 15, num=21)
# array([ 5. ,  5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. , 10.5, 11. , 11.5, 12. , 12.5, 13. , 13.5, 14. , 14.5, 15. ])

To exclude the stop value from the created array, you should set the optional endpoint argument, which is assigned True by default, to False.

np.linspace(1, 2, num=10, endpoint=False)
# array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])

np.linspace(5, 15, num=20, endpoint=False)
# array([ 5. ,  5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. , 10.5, 11. , 11.5, 12. , 12.5, 13. , 13.5, 14. , 14.5])

Eye

Eye allows you to create a Numpy array similar to an identity matrix. In other words, this returns a 2-D array with a diagonal filled with ones and zeros everywhere else.

The function accepts the mandatory N argument as the number of rows in the array. You can also pass an optional M argument to specify the number of columns. If no M is given, its value defaults to N.

np.eye(3)
# array([[1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]])

np.eye(2, M=3)
# array([[1., 0., 0.],
#        [0., 1., 0.]])

Array manipulation

Reshape

Reshape function allows you to modify the current shape of the array. For example, if you previously had a (4, 3) shape, you can change it to (6, 2) without affecting existing data using this function.

arr1 = np.ones((3, 4))
arr1.shape # (3, 4)

arr2 = arr1.reshape((6, 2))
arr2.shape # (6, 2)

arr3 = np.arange(10).reshape((5, 2))
arr3.shape # (5, 2)

When passing the new shape, you must carefully ensure that it is compatible with the array’s original dimensions. For example, you can’t reshape a (5, 2) array into a (3, 3) one.

The function uses the element indices to determine the order they will be stored in the reshaped array.

arr4 = np.arange(6).reshape((2, 3))
# array([[0, 1, 2],
#        [3, 4, 5]])

arr5 = arr3.reshape((3, 2))
# array([[0, 1],
#        [2, 3],
#        [4, 5]])

Transpose

When used on a 2-D array, this function outputs the array’s matrix transpose. On higher dimensional arrays, you can expect it to return a permutation of the axes. Transposing a 1-D array doesn’t lead to any modification.

arr1 = np.arange(6).reshape((2, 3))
# array([[0, 1, 2],
#        [3, 4, 5]])

arr2 = arr1.transpose()
# array([[0, 3],
#        [1, 4],
#        [2, 5]])

Flatten

Flatten function allows you to convert an array of any dimensions into 1-D one. It assigns the flattened array to the original variable as the following example shows.

arr = np.eye(3)
arr.flatten()
# array([1., 0., 0., 0., 1., 0., 0., 0., 1.])

Squeeze

The squeeze function removes axes with a length of one from an array. It helps you remove unnecessary dimensions from data to simplify upcoming processes and analytics.

You can remove all dimensions of length one by simply passing the array to the squeeze function like this:

arr = np.array([[[[[1, 2]], [[3, 4]], [[5, 6]]]]])
arr.shape # (1, 1, 3, 1, 2)

arr = np.squeeze(arr)
print(arr.shape) # (3, 2)
print(arr) # [[1, 2]
           #  [3, 4]
           #  [5, 6]]

The squeeze function also accepts an optional argument, axis, that lets you specify a subset of axes with a length of one to remove. However, if any of the passed axes have a length greater than one, it throws an error.

arr = np.array([[[[[1, 2]], [[3, 4]], [[5, 6]]]]])
arr.shape # (1, 1, 3, 1, 2)

arr1 = np.squeeze(arr, axis=(0, 1))
arr1.shape # (3, 1, 2)

arr2 = np.squeeze(arr, axis=3)
arr2.shape # (1, 1, 3, 2)

arr3 = np.squeeze(arr, axis=2) # ValueError

Concatenate

Numpy concatenate function is used to combine two arrays along a given existing axis.

With 2D arrays, for example, you can join two of them together along the horizontal axis using tfhe unction like this:

a = np.arange(1, 7).reshape(3, 2)
b = np.arange(10, 40, 10).reshape(3, 1)

c = np.concatenate((a, b), axis=1)
print(c.shape) # (3, 3)
print(c) # [[ 1  2 10]
         #  [ 3  4 30]
         #  [ 5  6 50]]

If the concatenation should happen along the vertical axis, set the axis index to 0 instead.

a = np.arange(1, 7).reshape(3, 2)
b = np.arange(10, 50, 10).reshape(2, 2)

c = np.concatenate((a, b), axis=0)
print(c.shape) # (5, 2)
print(c) # [[ 1  2]
         #  [ 3  4]
         #  [ 5  6]
         #  [10 20]
         #  [30 40]]

One important thing to note in these concatenation operations is that the arrays used must have compatible shapes. This means they should have equal lengths along all axes except the one passed as the “axis” argument.

Alternatively, Numpy also offers two dedicated functions to handle array concatenations (or stacking) along horizontal and vertical axes.

  • np.vstack - Behaves in a similar way to concatenate with axis=0
  • np.hstack - Behaves in a similar way to concatenate with axis=1
h = np.hstack((p, q))
v = np.vstack((r, s))

Split

Split function can divide a single array into multiple subarrays.

If you want to generate N number of equal-sized arrays, you can pass N as a second argument when calling the function. It uses axis 0 by default to carry out the split. But you can change its value by passing the correct axis index as the “axis” argument.

a = np.arange(12).reshape(4, 3)

c, d = np.split(a, 2)
print(c) # [[0 1 2]
         #  [3 4 5]]

print(d) # [[ 6  7  8]
         #  [ 9 10 11]]

arr_list = np.split(a, 3, axis=1)
len(arr_list) # 3

If an equal split using the given N is not possible, the function throws an error.

You can also pass an array of split points as the second argument to create subarrays of different sizes.

a = np.arange(12).reshape(4, 3)

c, d = np.split(a, [1], axis=1)
print(c) # [[0]
         #  [3]
         #  [6]
         #  [9]]
print(d) # [[ 1  2]
         #  [ 4  5]
         #  [ 7  8]
         #  [10 11]]

arr_list = np.split(a, [1, 2], axis=0)
len(arr_list) # 3

Numpy also provides two dedicated functions to conduct vertical and horizontal splits more easily.

  • np.hsplit - Behaves similarly to split with axis = 1
  • np.vsplit - Behaves similarly to split with axis = 0

Search operations

Where

Given two arrays, x and y, you can use the where function to retrieve elements from one or the other based on a condition. When the condition satisfies, it uses the respective element from x, and the element from y otherwise.

Condition:

  • True - Return x
  • False - Return y

The returned array, therefore, contains elements from x where the condition is True and elements from y where the condition becomes False.

If you pass an integer in place of either x or y, Numpy broadcasts the integer to match the condition and the given array’s shape. If x and y are both arrays, they need to be either similar in shape or broadcastable to some shared shape compatible with the condition.

x = np.arange(12).reshape(4, 3)
y = np.arange(20, 80, 5).reshape(4, 3)

z = np.where(x % 3 == 0, x, y)
# [[ 0, 25, 30],
#  [ 3, 40, 45],
#  [ 6, 55, 60],
#  [ 9, 70, 75]]

z = np.where(x > 5, x, 0)
# [[ 0,  0,  0],
#  [ 0,  0,  0],
#  [ 6,  7,  8],
#  [ 9, 10, 11]]

z = np.where(x % 2 == 0, 1, 0)
# [[1, 0, 1],
#  [0, 1, 0],
#  [1, 0, 1],
#  [0, 1, 0]]

An example of np.where broadcasting arrays of incompatible shapes:

x = np.arange(1, 13).reshape(4, 3)
y = np.arange(20, 35, 5)

z = np.where(y % x == 0, x, y)
# [[ 1, 25,  3],
#  [ 4,  5,  6],
#  [20, 25, 30],
#  [10, 25, 30]]

Argmax

Argmax returns an array containing indexes of the maximum values along an axis. If no axis is passed, it flattens the array first before finding the index of the maximum.

a = np.array([[1, 45, 23], [12, 9, 21], [32, 45, 45]])

np.argmax(a) # 1

np.argmax(a, axis=0) # [2, 0, 2]

np.argmax(a, axis=1) # [1, 2, 1]

In case there are more than one occurrence of the maximum, the function returns only the index of the first place it appears.

Argmin

Argmin works similarly to argmax to find the indices of minimum values along an axis.

a = np.array([[9, 45, 23], [12, 9, 21], [32, 45, 45]])

np.argmin(a) # 0

np.argmin(a, axis=0) # [0, 1, 1]

np.argmin(a, axis=1) # [0, 1, 0]

Statistical operations

Mean

This function calculate the means of the values along a given axis or axes. If no axis is passed, it calculates the mean of the flattened array.

a = np.array([[9, 45, 23], [12, 9, 21], [32, 45, 45]])

np.mean(a) #25.8888

np.mean(a, axis=0) # [15., 33., 29.6667]

np.mean(a, axis=1) # [23., 14., 40.6667]

np.mean(a, axis=(0, 1)) # 25.8888

Median

The function returns the medians along a given axis or axes. If no axis is passed, the array is flattened first before getting the overall median.

a = np.array([[9, 45, 23], [12, 9, 21], [32, 45, 45]])

np.median(a) # 23.0

np.median(a, axis=0) # [12., 45., 23.]

np.median(a, axis=1) # [23., 12., 45.]

np.median(a, axis=(0, 1)) # 23.0

Std

It computes the standard deviation along the given axis or axes. The flattened array is used if no axis is specified.

a = np.array([[9, 45, 23], [12, 9, 21], [32, 45, 45]])

np.std(a) # 14.611850424895067

np.std(a, axis=0) # [10.20892855, 16.97056275, 10.87300429]

np.std(a, axis=1) # [14.81740718,  5.09901951,  6.12825877]

np.std(a, axis=(0, 1)) # 14.611850424895067

Histogram

This function allows you to easily calculate the histogram of the dataset. It returns two arrays, one containing the values of the histogram and the other showing the bin edges.

a = np.array([[9, 45, 23], [12, 9, 21], [32, 45, 45]])

hist, edges = np.histogram(a)

print(hist)
# [3 0 0 2 0 0 1 0 0 3]

print(edges)
# [9.  12.6 16.2 19.8 23.4 27.  30.6 34.2 37.8 41.4 45.]

By default, it divides the dataset into 10 equal-length bins for this operation. However, if you want to manually set the value, use the optional bins argument as the following example shows.

hist, edges = np.histogram(a, bins=5)

print(hist)
# [3 2 0 1 3]

print(edges)
# [ 9.  16.2 23.4 30.6 37.8 45. ]

You can also explicitly set the bin edges by passing a sequence of monotonically increasing numbers to this argument.

hist, edges = np.histogram(a, bins=[9, 18, 27, 36, 45])

print(hist)
# [3 2 1 3]

print(edges)
# [9 18 27 36 45]

Mathematical operations

Sin, cos, & tan

Numpy provides functions to calculate different trigonometric values of an array, element-wise. You can see examples of sin, cos, and tan functions here.

a = np.arange(0, 30, 6)

np.sin(a)
# [ 0., -0.2794155 , -0.53657292, -0.75098725, -0.90557836]

np.cos(a)
# [1., 0.96017029, 0.84385396, 0.66031671, 0.42417901]

np.tan(a)
# [ 0., -0.29100619, -0.63585993, -1.13731371, -2.1348967]

Numpy considers each value in the array as an angle in radians and calculates its relevant sin, cos, or tan value using the respective function.

Log

This function computes the natural logarithm of the given array element-wise.

a = np.arange(5, 10)

np.log(a)
# [1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458]

Numpy also provides log10 and log2 functions to retrieve the base 10 and base 2 logarithms of the array, respectively.

Around

Around function rounds each element of the Numpy array to a given number of decimals. It accepts the decimal count as the second argument. This is set to 0 by default.

a = np.linspace(10, 20, 9)
# [10.  , 11.25, 12.5 , 13.75, 15.  , 16.25, 17.5 , 18.75, 20.  ]

np.around(a)
# [10., 11., 12., 14., 15., 16., 18., 19., 20.]

np.around(a, decimals=1)
# [10. , 11.2, 12.5, 13.8, 15. , 16.2, 17.5, 18.8, 20. ]

Dot

This function computes the dot product of two arrays. The passed arrays must have compatible sizes to carry out this operation.

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 10], [15, 20]])

np.dot(a, b) #[[ 35,  50],
             # [ 75, 110]]

Matmul

This function conducts matrix multiplication on given two arrays. The passed arrays must respect the shape requirements of this operation. This means the column count of the first array must be equal to the row count of the second one.

(a, b) * (b, c) -> (a, c)
a = np.arange(6).reshape(3, 2)
b = np.arange(4).reshape(2, 2)

np.matmul(a, b)
# [[ 2,  3],
#  [ 6, 11],
#  [10, 19]]

If the dimensionality of either array is bigger than 2, Numpy considers it as a stack of matrixes (the last two axes becomes the matrix) and uses broadcasting to bring both to compatible sizes.

a = np.arange(24).reshape(2, 3, 4)
b = np.arange(12).reshape(4, 3)

np.matmul(a, b)
# [[[ 42,  48,  54],
#   [114, 136, 158],
#   [186, 224, 262]],

#  [[258, 312, 366],
#   [330, 400, 470],
#   [402, 488, 574]]]

Bonus

Sort

You can sort an array along a specified axis using this function. If no axis is passed, it sorts the last axis by default. You can also prompt it to sort the flattened array by setting the axis to None.

a = np.array([[23, 12, 43], [32, 13, 54]])

b = np.sort(a) #[[12, 23, 43],
               # [13, 32, 54]]

c = np.sort(a, axis=0) #[[23, 12, 43],
                       # [32, 13, 54]]

d = np.sort(a, axis=None) #[12, 13, 23, 32, 43, 54]

Unique

This function allows you to retrieve the unique elements in an array. It finds and sorts all the unique elements and returns them as a new array.

a = np.array([[1, 2, 1, 3], [2, 4, 3, 0]])

unique_items = np.unique(a) # [0, 1, 2, 3, 4]

If you want to retrieve the number of times each element appears, pass True to the optional return_counts argument.

unique_items, counts = np.unique(a, return_counts=True)

print(unique_items) # [0 1 2 3 4]
print(counts) # [1 2 2 2 1]

You can find unique subarrays along a given axis by passing an axis argument. For example, the following example shows how to retrieve unique rows and columns from a 2-D array with this method.

# get unique rows
a = np.array([[1, 2, 3], [1, 2, 3], [4, 5, 6], [7, 8, 9]])
np.unique(a, axis=0) # [[1, 2, 3],
                     #  [4, 5, 6],
                     #  [7, 8, 9]]

#get unique columns
b = np.array([[1, 2, 1, 4], [2, 2, 2, 5], [3, 4, 3, 7]])
np.unique(b, axis=1) # [[1, 2, 4],
                     #  [2, 2, 5],
                     #  [3, 4, 7]]

Wrapping up

In this article, we introduced you to some common Numpy functions that we often come across in data science. We hope this knowledge will help you be more confident when preparing and analyzing a large dataset.

Do let us know in the comments if we’ve missed any of your favorites in this list.