NumPy: A Hidden Gem in Python’s Sea of Data Structures

Python comes with some built-in methods for performing mathematical operations. However, these are not typically how mathematics is done in Python. Most programmers prefer using NumPy, as it offers greater simplicity, specificity, and performance. In fact, NumPy is so effective that you can focus on the math itself rather than the coding behind it. In this article, I’ll first introduce Python’s built-in data structures, and then show you the magic and power of NumPy.

NumPy stands out as one of the most essential libraries in Python for mathematical and numerical operations. While Python has built-in methods for handling math, NumPy provides a more efficient and elegant way to perform these tasks.

One of its key advantages is simplicity—it allows you to express complex mathematical operations in clean and readable code, closely resembling standard mathematical notation. This reduces the cognitive load of coding and lets you focus more on solving mathematical problems rather than thinking about implementation details.

Another major strength is specificity. NumPy is purpose-built for numerical computing, which means it comes with a wide range of functions tailored for linear algebra, statistical analysis, matrix manipulation, and more. This makes it far more suited for tasks involving vectors, matrices, and large datasets than generic Python structures like lists or tuples.

Finally, performance is a critical factor. NumPy arrays are more memory-efficient and significantly faster than Python lists due to their implementation in C and use of contiguous memory. This makes NumPy ideal for data science, machine learning, and scientific computing where speed and efficiency matter.

Numpy: A hidden gem in Python's sea of libraries

Together, these advantages make NumPy the go-to library for doing real mathematics in Python.

Python data structures

We’ll start by exploring Python’s built-in data structures that are frequently used in mathematical computations. Recognizing their limitations will help you fully appreciate the strengths of NumPy—something that’s easy to take for granted. Take your time with this section, but if you’re already confident with Python fundamentals, feel free to move ahead.

Python Lists

Python lists are one of the most commonly used data structures, especially for beginners. They are ordered, mutable collections of items, which means you can easily add, remove, or modify elements. This flexibility makes them ideal for representing numerical sequences, vectors, and even matrices.

You can create a list using square brackets or the list() function. Once created, you can append elements to the end, insert them at specific positions, or remove them. The list automatically grows or shrinks depending on the operations performed, making it a powerful tool for various types of data manipulation.

The following example demonstrates how to create a list, add items, and access values by their index:

# Initialize a Python list
python_list = [1, 2, 3, 4, 5]

# Modifying an element
python_list[1] = 10

# Append elements to the list
python_list.append(6)
python_list.insert(2,12)

# Remove elements from the list
python_list.remove(3) # searches for 3 and removes it from the list
python_list.pop(2) # Returns the item at index 2 and removes it drom the list

In addition to using literal syntax, Python also provides the list() function for more flexible construction. For example:

vector = list(range(10))

You can also use list comprehension to create more complex structures, such as:

vector = [[i + (j * 4) for i in range(1, 5)] for j in range(4)]

This is the outcome:

These examples showcase the flexibility and expressive strength of Python lists. While this section serves as a brief introduction, there is much more to uncover. For now, let’s focus on how Python lists can be applied to mathematical problems—particularly in the context of linear algebra, which is the central theme of this article.

As we saw earlier, it's possible to create a 2x2 matrix using the list() function combined with list comprehension. While this approach might seem a bit awkward at first, experienced Python developers are quite comfortable with it.

Now, let’s consider a different scenario — element-wise multiplication of two vectors. In linear algebra, it’s common to multiply corresponding elements of two vectors: the element at index 0 of vector A with the element at index 0 of vector B, index 1 with index 1, and so on. Python doesn’t have a built-in function for this because lists are general-purpose data structures. We may use them as vectors or matrices, but fundamentally, they’re just flexible containers for storing data.

A = [1 ,2 ,3]
B = [4 ,5 ,6]

C = list([0 for i in range(len(A))])
for i,a in enumerate(A):
    C[i] = a * B[i]

This code first creates a new list C with the same length as A (or B) and fills it with zeros. Then we use the enumerate() function to loop over the values in A and multiply them by the corresponding values in vector B.

The above code seems a bit lengthy just for the multiplication of vector elements. Let’s explore a more Pythonic version of the code, which seems more intuitive and promising:

A = [1 ,2 ,3]
B = [4 ,5 ,6]

C = [a * b for a , b in zip(A ,B)]

While this approach works for now, mathematical operations extend well beyond simple element-wise multiplication. Writing a separate function for every possible operation isn't practical. Although such exercises are valuable for learning, a dedicated library is essential for simplifying real-world computations. With that in mind, let’s turn our attention to another Python data structure—sets—which can help address additional mathematical requirements.

Python Sets

Sets are collections of unique elements, and they are not inherently ordered. While you could technically store arrays, vectors, or matrices in a set, it would not be practical for most numerical computing tasks. Sets are more commonly used for tasks where uniqueness and set operations (such as union, intersection, etc.) are required.

# Create a set of numbers
my_set = {1, 2, 3, 4, 5}

# Add a new element
my_set.add(6)

# Try adding a duplicate (it will be ignored)
my_set.add(3)

# Remove an element
my_set.remove(2)

# Check if a value exists in the set
is_three_present = 3 in my_set

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
c = a | b # {1, 2, 3, 4, 5, 6}
d = a & b # {3, 4}

Python sets, like lists, are mutable, dynamically sized, and easy to work with. However, the key distinction is that sets are unordered collections. As a result, you cannot access elements by index, insert at specific positions, or append elements in a predictable order. Once an element is added, its position within the set cannot be determined or controlled.

Here are some common set operations:

Adding elements with .add(): Unlike lists, sets use the add() method to include new elements. Since sets are unordered, the element is placed arbitrarily.
Removing elements with .remove() or .pop(): The remove() method deletes a specified element, similar to lists. However, pop() removes and returns an arbitrary element, as indexing is not supported.
Checking for membership: Use the in keyword to verify whether a particular element exists in the set.
Performing set operations: Sets support a variety of mathematical operations such as union, intersection, difference, and symmetric difference—making them ideal for logic-based and mathematical tasks.

Python Dictionaries

Dictionaries are collections of key-value pairs, where each key is associated with a value. While dictionaries could be used to store arrays, vectors, or matrices by using keys as identifiers, this approach is not common or efficient for numerical computing. Dictionaries are more commonly used for tasks where efficient lookup by key is required.

# Create a dictionary
grades = {
    "Alice": 85,
    "Bob": 90,
    "Charlie": 78
}

# Accessing a value
print("Bob's score:", grades["Bob"])  # Output: 90

# Adding a new key-value pair
grades["Diana"] = 92
print("Added Diana:", grades)

# Updating an existing value
grades["Alice"] = 88
print("Updated Alice's score:", grades)

# Removing a key-value pair
del grades["Charlie"]
print("Removed Charlie:", grades)

# Using .get() to avoid KeyError
print("Eve's score:", grades.get("Eve", "Not found"))  # Output: Not found

# Loop through keys and values
print("\nAll students and their scores:")
for student, score in grades.items():
    print(f"{student}: {score}")

Python Tuples

Tuples are immutable collections of elements, and they can store heterogeneous data types. They are similar to Python lists in that they are ordered collections of items. However, unlike lists, tuples are immutable, meaning that once a tuple is created, its elements cannot be changed, added, or removed. They are essentially a read-only version of lists. While tuples could technically be used to store arrays, vectors, or matrices, they are not well-suited for this purpose due to their immutability and lack of specialized operations for numerical computations. Tuples are more commonly used for tasks where ordered, immutable collections are needed.

# Create a tuple
my_tuple = ("apple", "banana", "cherry")

# Accessing elements (like lists)
print("First item:", my_tuple[0])  # Output: apple

# Length of tuple
print("Length:", len(my_tuple))    # Output: 3

# Tuples can be mixed types
mixed_tuple = (1, "hello", 3.14)
print("Mixed types:", mixed_tuple)

# Tuples are immutable - this will cause an error:
# my_tuple[1] = "blueberry"  # ❌ Uncommenting this line will raise a TypeError

# Tuples can be nested
nested_tuple = (1, (2, 3), "a")
print("Nested tuple:", nested_tuple[1])  # Output: (2, 3)

# Tuple unpacking
name, age, country = ("Alice", 25, "UK")
print(f"{name} is {age} years old from {country}")

Python arrays

In Python, both arrays and lists are data structures used for storing collections of items. While they share some similarities, they also have distinct characteristics that make them suitable for different purposes. Python arrays are a more specialized data structure available through the built-in array module. While similar to lists, python arrays contain elements of the same data type. This makes arrays more memory-efficient and suitable for storing large sequences of homogeneous data.

# ✅ Python List: Heterogeneous (can hold mixed types)
my_list = [1, "two", 3.0, True]
print("Python List:", my_list)

# ✅ Python Array: Homogeneous (must hold same type)
import array
my_array = array.array("i", [1, 2, 3, 4])  # "i" = integer type
print("Python Array:", my_array)

In summary, The array.array module in Python offers key advantages over lists, particularly for numeric data. It enforces homogeneity, ensuring all elements are of the same type, which can help reduce bugs. It also provides better performance and memory efficiency by storing data as raw bytes, making it ideal for large-scale numerical operations. Additionally, it supports binary I/O, which is useful in low-level or performance-critical applications. However, array.array has notable limitations: it supports fewer features than lists, works only with primitive data types (like integers and floats), and is rarely used in practice, as most developers prefer NumPy for advanced numerical tasks.

Numpy

Now it's time to transition to our main topic: NumPy. You may wonder why we’ve spent so much time on foundational concepts. The reason is to highlight just how essential and powerful NumPy truly is. In my view, Python would not be nearly as effective for mathematics or machine learning without it. To illustrate this clearly, let’s start with a compelling example. If the example convinces you of NumPy’s unique capabilities, we’ll then return to explore its features in more detail. Do you recall the earlier code we used to create a matrix?

vector = list([[i + (j * 4) for i in range(1, 5)] for j in range(4)])

With NumPy, the same result can be achieved far more elegantly with a single line:

np.arange(1, 17).reshape(4, 4)

Do you recall how we previously implemented element-wise multiplication of vectors? With NumPy, the same operation becomes remarkably simple:

C = A * B

Moreover, NumPy arrays can also be used to perform set operations. This means a single variable can function both as a vector and as a set, offering even greater flexibility. Quite powerful, isn’t it?

A = np.array([1,2,3,4])
B = np.array([3,4,5,6])
np.intersect1d(A, B)
np.union1d(A,B)

These simple examples illustrate how NumPy streamlines complex operations and enhances code readability.

How to use Numpy

Now that you’ve seen just how powerful and effective NumPy can be in mathematical applications, I hope you're eager to explore it further. I’m here to guide you through the process. Let’s begin with the fundamentals—so far, you’ve only encountered a few quick examples, which may have left you with several questions. In this section, we’ll dive deeper to understand how NumPy operates behind the scenes, revealing how it combines simplicity with remarkable computational power.

To begin using NumPy, you’ll first need to install the module. This can be done easily with the following command:

pip install numpy

pip is Python’s standard package manager, and I’ll assume you’re already familiar with its basic usage. So let’s move on to more practical matters. Once NumPy is installed, you can import it into your Python script using:

import numpy as np

This np alias is widely adopted and recommended, as it aligns with the common convention used throughout the Python community. From this point forward, you'll access NumPy’s features using the np prefix.

Creating Arrays

The most common and straightforward way to create a NumPy array is as follows:

A = np.array([1,2,3,4])

While this syntax appears simple, a great deal happens under the hood. When you run this line of code, NumPy performs several internal operations—such as argument dispatch, data type (dtype) resolution, C-level memory allocation, and structure initialization.

I genuinely believe that learners benefit more when we provide transparency about these underlying processes rather than oversimplifying the material. By understanding what’s truly happening behind the scenes, they gain deeper insights and fewer misconceptions. It's far better to introduce complexity early on than to deal with confusion later.

Argument Dispatch

In NumPy, when you call np.array(...), it checks what kind of arguments you've passed (e.g., a list, a tuple, a NumPy array, etc.) and decides how to handle it internally. This decision-making process is called argument dispatch.

Think of it like a waiter taking your order: based on what you say, they go to the right kitchen station. This flexibility is what makes the np.array() function so powerful: it allows you to create an array by passing a list, a tuple, or even another NumPy array. However, it's important to note that if you pass a set, the function will still create an array, but the resulting structure might not behave as you would expect. The follwoing example shows that A, B , C are the same while D is different !

Consider the following example:

import numpy as np

A = np.array([1, 2, 3])
B = np.array((1, 2, 3))
C = np.array(B)
D = np.array({1, 2, 3})

print(A)       # → [1 2 3]
print(B)       # → [1 2 3]
print(C)       # → [1 2 3]
print(D)       # → {1, 2, 3}

print(len(A))  # → 3
print(len(D))  # → TypeError: len() of unsized object

When you call np.array([1, 2, 3]), np.array((1, 2, 3)), or even np.array on an existing one-dimensional ndarray, NumPy recognizes each as a sequence of three values, walks through them in order, and builds a new one-dimensional array of length 3 containing [1, 2, 3]. Asking for len(...) on any of these returns 3 because they have exactly one axis whose size is 3.

By contrast, passing in a Python set (e.g. {1, 2, 3}) doesn’t produce a list of items to unpack; instead NumPy treats the entire set as a single object to store. The result is a zero-dimensional array, sometimes called a “scalar array,” which has no axes (its shape is the empty tuple ()), holds exactly one element (the set itself), and therefore has no meaningful “length” (len(D) raises a TypeError). This is how NumPy’s array constructor differentiates between iterables it knows how to unpack (lists, tuples, ndarrays) and arbitrary objects—it either unpacks into an N-D numeric array or boxes the object as a 0-D array.

You may be wondering why I’m bringing up sets in this context—after all, wouldn’t that be confusing for those just starting with NumPy? It’s a valid concern, but I believe it’s essential for users to learn this now. NumPy’s argument handling is intentionally flexible, allowing you to pass in a wide variety of data structures. However, this flexibility can be a double-edged sword.

While NumPy will often accept whatever you pass to it, that doesn't mean it will behave as you expect. If you're unaware of these subtleties, you may encounter confusing bugs or unexpected results. That’s why I’m including examples like this—to highlight that there’s no hidden magic. You, as the developer, remain fully responsible for your inputs.

The real benefit is that NumPy supports a broad range of input formats, enabling powerful and adaptable code. But it’s crucial to understand how it interprets your data, so you can use it to its full potential. This insight is based on my own experience—trust me, learning it early will make your journey with NumPy much smoother.

Data Type

If you give np.array() a list like [1, 2.5, 3], NumPy decides all values should be converted to float64—because mixing integers and floats defaults to float. That decision process is called dtype resolution.

You can control it like this:

np.array([1, 2, 3], dtype=np.float32)

This is another key characteristic of NumPy arrays that they are strongly typed and homogeneous—every element in the array must be of the same data type. This detail is often overlooked, especially in beginner tutorials where np.array is introduced without the dtype parameter and the topic is deferred to a more advanced stage.

However, postponing this explanation can make it harder for learners to revise their mental model later. It's important to clarify from the beginning: when you omit the dtype argument, you're not avoiding typing—NumPy simply infers it for you. Internally, NumPy evaluates the input and selects the most compact data type capable of representing all the values. Understanding this early on makes it much easier to work confidently and correctly with NumPy arrays in more complex scenarios.

As for dtype, there is something else that I would like to mention here. It is called element-wise conversion. If you write:

np.array(['1', '2', '3'], dtype=int)

NumPy will convert each string '1', '2', '3' into integers using element-wise conversion. This is automatic, unless you pass something that can’t be converted, like np.array(['a', 'b'], dtype=int) → which throws an error.

C-Level Memory Allocation

Unlike Python lists, which store references, NumPy arrays store raw data in contiguous memory, similar to how things work in the C language. This makes operations super fast.

You don’t directly control this, but the shape and dtype influence how memory is allocated.

Structure Initialization

After memory is allocated, NumPy prepares the array object: how many dimensions it has, the size of each dimension, and how to move through it in memory (strides). This is called structure initialization. Lets explain it more technically with an example: Structure initialization is the phase where NumPy finalizes the internal setup of the array after creating and converting the data. This includes:

Setting the number of dimensions (ndim)
Determining the shape (size of each dimension)
Computing the strides (how many bytes to step in memory to move between elements)
Assigning the data pointer (location in memory where the array starts)

It's like preparing a building blueprint after getting all the raw materials ready.

Example: Structure Initialization in Action

import numpy as np

a = np.array([[10, 20, 30],[40, 50, 60]])

Let’s inspect what NumPy sets up under the hood:

print("Array:\n", a)
print("Shape:", a.shape)
print("Dimensions:", a.ndim)
print("Strides:", a.strides)
print("Data type:", a.dtype)

Output:

Array:
[[10 20 30]
[40 50 60]]
Shape: (2, 3)
Dimensions: 2
Strides: (24, 8)
Data type: int64

Shape (2, 3): 2 rows and 3 columns.
Dimensions 2: It’s a 2D array.
Strides (24, 8): To move to the next row, you jump 24 bytes; to move to the next column, jump 8 bytes.
Why 24 and 8? Because each int64 takes 8 bytes:
3 columns × 8 bytes = 24 bytes to go to the next row.

Let’s look at the following example to better understand why internal details of NumPy arrays matter:

import numpy as np 
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 910, 11, 12]) 
b = a.reshape(3, 4) 
b[0, 0] = 100 
print("a = " , a)
print("b = " , b)

# Output 
# a =  [100   1   2   3   4   5   6   7   8   9  10  11]
# b =  [[100   1   2   3]
#   [  4   5   6   7]
#   [  8   9  10  11]]

This code creates a one-dimensional NumPy array containing twelve elements. Using the reshape method, we transform it into a 3×4 matrix. Importantly, this transformation alters the array's shape, number of dimensions, and memory strides—but not its memory allocation. This design allows NumPy to operate with high efficiency by avoiding costly memory reallocation. The reshaping process updates only the array's metadata, while the underlying data remains unchanged.

However, this efficiency comes with implications. In this example, modifying an element in b also affects the same element in a, because both arrays reference the same memory block. Understanding this behavior is critical: while reshape offers performance benefits, it also introduces shared references, which can lead to unintended side effects if not handled carefully.

This example illustrates that a solid grasp of how NumPy manages memory and data structures is not just helpful—it’s essential for writing correct and efficient code.

At this point, it's clear that a solid understanding of NumPy is essential. You’ve likely realized that truly learning NumPy requires more than just surface-level exposure—it involves grasping key foundations and technical details that are often glossed over in books, tutorials, and classrooms.

Remember, while NumPy is a powerful and versatile tool, a lack of understanding can lead to subtle and frustrating bugs. That said, I won’t reiterate this warning further, as I want to keep the content engaging and forward-moving.

Thank you for your attention and patience so far. Let’s now shift our focus to what you're likely most eager to see: how NumPy empowers mathematical computation.

Alternatives to create arrays

NumPy offers a variety of intuitive and flexible methods for creating arrays. Depending on your use case, you can choose from the following:

np.zeros()

np.zeros() Creates an array filled with zeros.

np.zeros((2, 3)) # 2x3 array of zeros

The output would be:

You reach for np.zeros() whenever you need a fresh block of memory already filled with a known “empty” value—zero—so you can go on to fill or update its entries without worrying about leftover junk. It’s what you use to set up blank canvases (whether that’s a time-series array you’ll populate step by step, a two-dimensional grid for carrying out numerical updates, or a mask of the same shape as your data), to pre-allocate space so your code runs efficiently, and to make it crystal-clear that everything starts off at zero before your algorithm writes in the real values.

np.ones()

np.ones() Creates an array filled with ones.

np.ones((3, 2)) # 3x2 array of ones

The output of the code is:

The np.ones() function is particularly useful when you need an array pre-filled with the value 1. A common use case is initializing a bias array in machine learning models. At the beginning, all values might be set to 1 to represent an unbiased state, and then gradually updated during each iteration of training. This method eliminates the need to manually expand or resize the array, streamlining your data handling throughout the process.

np.arange()

np.arange() Generates values within a given range at specified intervals.

range_arr = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]

You’ll reach for np.arange() whenever you need a simple, evenly spaced sequence of numbers packaged as an array. Under the hood it works much like Python’s built-in range(), except that it returns a NumPy array and lets you choose non-integer steps (e.g. 0.5) as well as a floating-point start and stop. This makes it ideal for building things like:

Index or time vectors: when you want to label each element of a calculation or simulation by its step number or time stamp, an arange lets you say “0 to N–1” or “0.0 s to 10.0 s in 0.1 s increments” in one clean call.
Coordinate grids: pairing two arange calls with a mesh-grid operation gives you the x- and y-coordinates of a 2D plane, ready for plotting or numerical solution of partial differential equations.
Parameter sweeps: if you need to try a function at every value from 1 to 100 in steps of 2, or sample a range of temperatures from –20 to +50 °C in increments of 0.25 °C, arange constructs that array for you in one line.

For example, suppose you want to compute the displacement under constant acceleration

for times from 0 to 10 seconds in 0.1 s steps. With NumPy you’d write:

import numpy as np
t = np.arange(0, 10.1, 0.1) # [0.0, 0.1, 0.2, …, 10.0]
a = 9.81 # gravitational accel. 
s = 0.5 * a * t**2 # vectorized: computes 0.5*a*t[i]**2 for each t[i]

Here t is a 1-D array of evenly spaced time points, and t**2 squares each entry, so s becomes another array of the same shape holding each corresponding displacement. This one-liner replaces what would otherwise be a Python loop accumulating results one by one—and because it all lives in contiguous C memory, it runs much faster.

As arange() allocates its output as a contiguous NumPy buffer, you can immediately do fast, vectorized arithmetic on the result—no Python loops, no list-to-array conversion needed. Just be mindful that using non-integer steps can introduce floating-point rounding quirks, so for evenly spaced floating-point grids over a fixed interval you may sometimes prefer np.linspace() instead.

np.linespace()

It produces evenly spaced values between two endpoints. It is very similar to np.arange():

np.linspace(0, 1, 5) # [0. , 0.25, 0.5 , 0.75, 1. ]

np.random.rand()

rand() method fills an array with random values sampled from a uniform distribution (between 0 and 1).

np.random.rand(3, 2) # 3x2 array of random values

np.eye()

np.eye() generates an identity matrix. An identity matrix is a square grid with 1’s on its main diagonal and 0’s elsewhere. In multiplication it behaves like “1”: any matrix times the identity (on either side) returns the original matrix.

np.eye(3) # 3x3 identity matrix

The outcome of this code is:

np.full()

You’ve already encountered np.ones() and np.zeros(), which create arrays filled with ones and zeros, respectively. The np.full() function serves as a more general and flexible alternative—it allows you to create arrays filled with any constant value of your choice.

np.full((2, 3), 5) # 2x3 array filled with the value 5

np.reshape()

Reshapes an existing array without changing its data.

original = np.array([1, 2, 3, 4, 5, 6])
reshaped = original.reshape(2, 3) # 2x3 matrix

As mentioned earlier, both the reshaped and original arrays point to the same memory location—they are simply different views of the same underlying data. To avoid unintended side effects, you can use the copy() method to create an independent copy in memory. Alternatively, you may reassign the reshaped array to the same variable, clearly indicating that only the shape has changed and maintaining clarity in your code’s intent.

np.copy()

Creates an independent copy of an array. This function ensures that any changes to the copied array do not affect the original one, making it ideal for safe data manipulation.

original = np.array([1, 2, 3])
copy = np.copy(original)

Each of these methods contributes to NumPy’s strength as a numerical computing library, giving you the tools to construct and control arrays for a wide range of computational tasks.

Indexing, Slicing, and Broadcasting

Show how NumPy handles slicing more powerfully than lists
Introduce broadcasting with a simple example

pythonCopyEdita = np.array([1, 2, 3]) a + 10 # Broadcasting

Vectorized Operations

Demonstrate how loops are avoided with NumPy
Give examples of fast operations on large datasets (e.g. mean, sum, element-wise multiplication)

pythonCopyEditdata = np.random.rand(1000000) mean = np.mean(data)

Multidimensional Arrays

Introduce 2D/3D arrays
Matrix operations vs element-wise
dot(), matmul(), or @ operator

pythonCopyEditA = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) C = A @ B