NumPy (an acronym for Numeric Python) is a library for working with multi-dimensional arrays and matrices. It was created in 2005 by Travis Oliphant, and since then received numerous contributions from the community that enabled it to grow into one of the most used tools in data science.
NumPy lets you manipulate huge arrays in a very performant way. This is thanks to a well-implemented low-level layer (written mostly in C) that does most of the heavy lifting. A Python layer on top lets you use its powerful computational capabilities with friendly syntax.
This article series is centered around NumPy, and has a much more tutorialy vibe. Code along and write the examples, I believe that's by far the easiest way of learning.
ndarray is the central data structure in NumPy. Think about it as an array with a lot of extra functionality and much better performance. It supports a great variety of vectorized operations that will let you perform calculations with a concise syntax and good performance.
In this article, we will learn about some of the most common ways of creating ndarrays. Good, let's get started!
Oh, but before we get started, let's first import NumPy. The standard alias for NumPy is np, and it is usually imported this way:
import numpy as np
You can create ndarrays from sequence-like objects, like lists or tuples. In the following example we create a standard Python list and then convert it into a ndarray using the np.array method:
primes = [2, 3, 5, 7, 11, 13, 17] arr = np.array(primes) arr
array([ 2, 3, 5, 7, 11, 13, 17])
There are two pieces of interesting data about an ndarray:
- Shape: Is the shape of the array, simple as that. For example, a ndarray with shape 3,5 is a matrix with 3 rows and 5 columns.
- dtype: The data type of the entries in the array. These values are inferred dynamically at creation time but can be specified during creation or typecasted after the array was created.
You can see our array has shape 7, (7 entries, one-dimensional) and int64 as data type. Let's see what happens when we create an array from a Python list containing numbers with decimal values.
numbers = [1.7, 4.8, 5.5, 52.7, 11.3, 14.63] arr = np.array(numbers) arr
array([ 1.7 , 4.8 , 5.5 , 52.7 , 11.3 , 14.63])
Now the array's dtype is float64. NumPy does a great job at inferring the data type our array is going to have, but you can specify it on creation using the dtype argument. We can ask NumPy, for example, to treat the values as integers, discarding the decimal part:
numbers = [1.7, 4.8, 5.5, 52.7, 11.3, 14.63] arr = np.array(numbers, dtype=np.int64) # NumPy also supports, among many others, int32 arr
array([ 1, 4, 5, 52, 11, 14])
You can, of course, use type casting after an array has been created. For this, use the astype method.
floatarr = np.array([1.1, 2.3, 3.5, 4,7]) intarr = floatarr.astype(np.int64) intarr
array([1, 2, 3, 4, 7])
Another important detail to remember is that lists with other lists nested inside them will lead to the creation of multi-dimensional arrays. For example, a list like the following produces a 3x3 ndarray
matrixlist = [[1,2,3],[4,5,6,],[7,8,9]] arr = np.array(matrixlist) arr
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Ok, that's enough for now with creation from sequence-like objects. Let's see some other built-in array creation utilities
Creating arrays full of zeros
np.zeros is a function that receives as input a shape tuple and creates an array with that shape and every single entry set to 0.
# This creates a 1-dimensional array of zeros with 8 entries zarr = np.zeros(8) zarr
array([0., 0., 0., 0., 0., 0., 0., 0.])
# This creates a 5x5 matrix full of zeros zarr = np.zeros((5,5)) zarr
array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
Creating ranges of values
np.arange is a very handy function that creates an array with all the values in a range.
# This creates an array with entries from 0 to 14 rarr = np.arange(15) rarr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
This is a very good place to introduce the reshape method. It grabs a ndarray and creates a new one with the provided dimensions. This can be combined with arange to produce an 8x8 matrix with numbers from 0 to 63
rarr = np.arange(64).reshape(8,8) rarr
array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55], [56, 57, 58, 59, 60, 61, 62, 63]])
Creating arrays with random values
np.random.randn is an extremely useful function. It creates an array filled with random data (with standard normal distribution) and, like np.zeros, receives a shape tuple as the main argument.
randomarr = np.random.randn(5) randomarr
array([-1.5745314 , -1.00901794, -1.15070662, 1.69227946, 0.30028233])
randomarr = np.random.randn(3,3) randomarr
array([[-0.96191126, 1.4272083 , 0.38075726], [-0.09240098, 0.80662898, -1.04838281], [-0.58463188, 1.02508633, -0.0234994 ]])
This covers most basic scenarios
It doesn't seem like much, but we have already covered enough to deal with the most common array creation scenarios on our own. There are other techniques that would probably come in handy in specific situations, but you are now more than ready to start your NumPy journey!
In the next article we will begin manipulating data by learning how to perform arithmetic operations with ndarrays.
Thanks for reading!
What to do next
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- You can find the source code for this series in this repo.
- This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)