Gain insights into utilizing NumPy for data manipulation and analytics. Learn about implementing concepts in both Python and NumPy through coding challenges and quizzes.

python.tar.gz

python3.6

Live3.6

If you're looking to grow your career in machine learning or data science in this day and age, adding a powerful library to your skill set is an important place to start. In that vein, Python has become one of the most widely used tools in the industry for serious data analytics, and NumPy is probably the most widely used data analytics library. With NumPy, you can manipulate data involving multi-dimensional arrays and matrices (think linear algebra).

Join us as we venture into the vast world of NumPy in this comprehensive course. Each lesson dive into the actual implementation of concepts in both pure Python and then NumPy, exploring how NumPy vectorization compares to traditional Python that uses a procedural and object-oriented approach.

Practice and test yourself along the way with in-browser coding challenges, quizzes, and more.

This course is intended for users who are already familiar with intermediate level Python.

From Python to Numpy

The [NumPy documentation](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) defines the
ndarray class very clearly:

  >*An instance of class **ndarray** consists of a contiguous one-dimensional segment
  of computer memory (owned by the array, or by some other object), combined
  with an indexing scheme that maps N integers into the location of an item in
  the block.*

Said differently, an array is mostly a contiguous block of memory whose parts
can be accessed using an indexing scheme. Such indexing scheme is in turn
defined by a [shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape)
and a [data type](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html) and this is
precisely what is needed when you define a new array:


import numpy as np
Z = np.arange(9).reshape(3,3).astype(np.int16)
print(Z)
print(Z.itemsize)# returns size of Z in bytes
print(Z.shape)# returns the x dimension and y dimension of Z
print(Z.ndim)# dimension in Z i.e (2 in this case) since the array is 2D

Here, we know that itemsize is 2 bytes (`int16`), the shape is (3,3) and
the number of dimensions is 2.
> To calculate the dimension we can also use `len(Z.shape)`.

Furthermore, we can deduce the
[strides](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html#numpy.ndarray.strides) of the array that define the number of bytes to step in each dimension when traversing the array.

import numpy as np
Z = np.arange(9).reshape(3,3).astype(np.int16)
stride = Z.shape[1]*Z.itemsize, Z.itemsize # store stride of Z
print("Stride(as np.int16):",stride)
print("Z.stride(np.16):",Z.strides)
Z = np.arange(9).reshape(3,3).astype(np.int32)
stride= Z.shape[1]*Z.itemsize, Z.itemsize  #stores stride of Z
print("Stride(as np.int32):",stride)
print("Z.stride(np.32):",Z.strides)

Here in this example, we have to skip 2 bytes (1 value) to move to the next column, but 6 bytes (3 values) to get to the same position in the next row. As such, the **strides** for the array `Z` will be **(6, 2)**.

With all this information, we know how to access a specific item (designed by
an index tuple) and more precisely, how to compute the start and end offsets:

This lesson explains the memory layout using NumPy.

Introduction

Anatomy of an Array

Code Vectorization

Problem Vectorization

Custom Vectorization

Beyond NumPy

Conclusion

Memory layout