Home/Blog/Learn to Code/SciPy tutorial for beginners
Home/Blog/Learn to Code/SciPy tutorial for beginners

SciPy tutorial for beginners

Maham Amjad
Nov 19, 2024
9 min read

What is SciPy?#

SciPy is an open-sourceThis means that the source code is available for use or modification as users see fit. library in Python used for scientific computing. It is dependent on the NumPy since SciPy uses NumPy arrays to efficiently handle numerical computations. Though NumPy has many mathematical functions, SciPy has optimized them and added other complex functions.

It is a good alternative to MATLAB and the GNU scientific library in C/C++.

Interesting fact: SciPy stands for Scientific Python. It was created by Travis Oliphant, who also created NumPy. It is pronounced as “Sigh-Pie”.

How to import SciPy#

To use SciPy, remember to install NumPy first. To install, use the following commands.

  • For macOS:

    sudo port install py35-scipy py35-numpy
    
  • For Windows:

    python3 -m pip install --user numpy scipy
    
  • For Linux:

    sudo apt-get install  python-scipy python-numpy
    

SciPy in scientific computing#

SciPy has multiple packages that cover different scientific domains, which are listed below:

  • Core scientific computing

  • Mathematical functions and computation

  • Constants and utilities

  • Data handling

  • Signal and image processing

  • Statistics and data analysis

  • Spatial data structures and algorithms

These packages are imported exclusively before being used in the code.

import scipy
from scipy import <module name>

Note that this blog will discuss basic functionalities that are easier for beginner-level audiences to understand. For this purpose, we can list down the relevant subdomains as shown in the figure below.

Subdomains where SciPy is applicable
Subdomains where SciPy is applicable

Feel free to jump to any section you’re interested in.

Physical and mathematical constants#

Constants and units are the building blocks of scientific measurement. Constants define the fundamental behavior of the universe, e.g., the speed of light. Knowing their values helps scientists make predictions. Similarly, units help scientists standardize their measurements, e.g., meters and kilometers. It allows researchers to replicate experiments and verify findings.

Here’s an analogy: Constants are the ingredients, while units are the measuring cups. To bake a good cake, you need to follow the recipe precisely.

The scipy.constants subpackage contains multiple constants. You can access the value of any supported constant as below.

# Import the required package(s)
import scipy
from scipy import constants
print (constants.pi) # Print the value of π
print (constants.g) # Print the value of gravitational constant
  • Lines 2–3: We import the required libraries.

  • Lines 5–6: We print the value of two constants, π and gg.

You can also use the units of physical quantities under different unit systems.

# Import the required package(s)
import scipy
from scipy import constants
# SI prefixes specified in the power of 10
print (constants.mega)
print(constants.kilo)
# Time units specified in seconds
print(constants.minute)
print(constants.day)
  • Lines 2–3: We import the required libraries.

  • Lines 6–7: We print the values of metric prefixes specified in powers of 10.

  • Lines 10–11: We print the values of time units specified in seconds.

To view the complete list of supported constants and units, try the dir() function as follows:

# Import the required package(s)
import scipy
from scipy import constants
print(dir(constants))

Linear algebra#

Linear algebra is an adapter that connects mathematics and science to solve real-world problems. Many problems boil down to manipulating linear systems of equations. You can almost present any data in the form of matrices or vectors.

The scipy.linalg subpackage helps you solve advanced linear algebra routines and matrix decompositions.

Solving a system of linear equations#

# Import the required package(s)
import numpy as np
import scipy
from scipy import linalg
# Write equations in the form of matrices
A = np.array([[1,2], [4,3]])
B = np.array([1, 2])
print(linalg.solve(A, B)) # Solve the equations for 2 unknown variables
  • Lines 2–4: We import the required libraries.

  • Lines 7–8: We store the following system of linear equationsEquations that have a degree of 1 using the numpy.array functionality.

  • Line 9: The linalg.solve() function accepts the matrices and returns the list of unknown variables, which are 0.2 and 0.4 in this case.

Computing the determinant of a matrix#

Calculating the determinant is one of the prime operations done on a matrix. We can compute the determinant as follows.

# Import the required package(s)
import numpy as np
import scipy
from scipy import linalg
A = np.array([[1,2], [4,3]])
print(linalg.det(A)) # Compute the determinant of matrix
  • Lines 2–4: We import the required libraries.

  • Line 7: The linalg.det() function takes the square matrix (created on line 6) and returns its determinant.

Computing the inverse of a matrix#

Computing the inverse of a matrix on a piece of paper is a lengthy process with multiple steps. But with the scipy.linalg subpackage, we can get the result in one step as follows.

# Import the required package(s)
import numpy as np
import scipy
from scipy import linalg
A = np.array([[1,2], [4,3]])
print(linalg.inv(A)) # Compute the inverse of a matrix

  • Lines 2–4: We import the required libraries.

  • Line 7: The linalg.inv() function takes the matrix (created on line 6) and returns its inverse.

Optimization#

Optimizing means finding the best solution to a problem according to certain criteria. In real-world scenarios, we want to minimize costs and maximize profits. For instance, an engineer might optimize a design to minimize material usage while maintaining strength.

Imagine a function as a landscape with hills and valleys. Optimizing the function is like finding the highest peak (maximum) or the lowest valley (minimum) in that landscape. This point (whether maximum or minimum) represents the optimal solution to your problem.

With the scipy.optimize subpackage, you can minimize or maximize the objective function.

# Import the required package(s)
from scipy import optimize
# Objective function
def func(x):
return (x*x)+x+2
result = optimize.minimize(func, 0) # Optimize the objective function
print(result.x)
print(result.fun)
  • Line 2: We import the required libraries.

  • Lines 5–6: We define an objective quadratic function.

  • Line 8: The optimize.minimize() function takes:

    • The function whose minimum value we’re seeking

    • An initial guess, which is 0 in this case

The output shows that for x equals -0.5, the minimum value of the function is 1.75.

Integration#

In calculus, integration means finding the area under a curve. In scientific computing, integration can be used to compute a function’s total accumulated value over an interval. Imagine velocity as a function of time. Integration of that function gives you the total distance traveled over that time.

Here’s an analogy: Imagine rain falling at a certain rate throughout the day. Integration helps you calculate the total amount of rainwater collected (accumulation) over that entire day.

With scipy.integrate, you can perform single integration as follows.

# Import the required package(s)
from scipy import integrate
def integrand(x):
return x**2
integral, error = integrate.quad(integrand, 0, 1) # Integrate the function
print (integral)
print (error)
  • Line 2: We import the required libraries.

  • Line 4–5: We declare the integrand represents the mathematical function of one variable we want to integrate.

  • Line 7: The integrate.quad() function is used to perform the definite integrationThis refers to the area under a curve between two fixed limits.. Here, 0 and 1 are the lower and upper limits of the integration interval.

For integrating functions of two or more variables, use dblquad() or tplquad().

Special functions#

SciPy’s special package provides several utility functions that complement the core NumPy operations, such as computing factorial, combinations, and permutations. Look at the code below.

# Import the required package(s)
import scipy
from scipy import special
# Compute factorial of a positive integer
print(special.factorial(5))
n = 10.4
x = 5
# Calculate the number of combinations
print(special.comb(int(n), int(x)))
# Calculate the number of permutations
print(special.perm(int(n), int(x)))
  • Lines 2–3: We import the required libraries.

  • Line 6: The special.factorial() function takes a positive integer as an argument and returns its factorial.

  • Line 12: The special.comb() function takes two arguments (say n and x) and calculates the number of combinations by choosing x elements from a set of n.

  • Line 15: The special.perm() function takes two arguments (say n and x) and calculates the total number of arrangements that can be performed with n elements taken x at a time.

We can also apply trigonometric operations and basic mathematical functionalities. Look at the code below.

# Import the required package(s)
import scipy
from scipy import special
print(special.exp10(3)) # Raise numerical number to the exponent of 10
print(special.exp2(3)) # Raise numerical number to the exponent of 2
print(special.cbrt(8.5)) # Calculate cube root of an numerical number
print(special.sindg(90)) # Calculate sine of an angle provided in degrees
print(special.cosdg(45)) # Calculate cosine of an angle provided in degrees
  • Lines 2–3: We import the required libraries.

  • Line 5: The special.exp10() function raises the numerical number to the power of 10 and returns the result.

  • Line 7: The special.exp2() function raises the numerical number to the power of 2 and returns the result.

  • Line 9: The special.cbrt() function calculates the cube root of a numerical number passed as an argument and returns the result.

  • Line 11: The special.sindg() function calculates the sine of an angle provided (in degrees unit) as an argument and returns a scalar value.

  • Line 13: The special.cosdg() function calculates the cosine of an angle provided (in degrees unit) as an argument and returns a scalar value.

For comprehensive statistical functionalities, visit the dedicated scipy.stats subpackage in the official documentation.

Interpolation#

Interpolation means bridging the gap between known data points by providing estimates for unknown values in between. Imagine tracking stock prices over time. By interpolating, we can estimate potential price movements between recorded points.

With the scipy.interpolate subpackage, you can do 1D (linear) interpolation as follows.

# Import the required package(s)
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
# Define the known data points
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([0, 1, 4, 9, 16, 25])
# Create the interpolation function
f_linear = interpolate.interp1d(x, y) # Linear interpolation
# Estimate the values
x_new = np.linspace(0, 5, 50) # Defining x values
y_linear = f_linear(x_new) # Estimate y values using the interpolation functions
# Plot the results
plt.plot(x, y, 'o', label='Data points')
plt.plot(x_new, y_linear, '-', label='Linear interpolation')
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Interpolation')
plt.savefig('output/graph.png')
  • Lines 2–4: We import the required libraries.

  • Lines 7–8: We define two arrays for known data points. For example, for x=0x=0, y=0y=0 and for x=3x = 3, y=9y = 9. In other words, the relation between these two variables is as follows: y=x2y=x^2.

  • Line 11: The interpolate.interp1d() takes both arrays and uses them to estimate corresponding y values based on the provided data points in x and y.

  • Line 15: We generate 50 new values against the x-axis (x_new) between 0 and 5 (inclusive). These new values represent where we want to estimate the y-axis values.

  • Line 16: We apply the linear interpolation function, f_linear, to the new values in x_new. This estimates the corresponding y values using linear interpolation between the original data points.

  • Lines 19–25: We plot the interpolation through the pyplot package.

Clustering#

Clustering means dividing the population (or data points) into groups such that the data points in one group are more similar. A group is also known as a cluster.

Imagine a retail company trying to understand its customers better to tailor marketing strategies and improve sales. One way is to segment customers based on purchasing behavior and demographics. In simple words, clustering helps businesses make data-driven decisions.

One of the most commonly used techniques in scipy is hierarchical clusteringThis refers to an unsupervised learning method that builds clusters by measuring the dissimilarities between data points.. Go through its example below.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, fcluster
from scipy.spatial.distance import pdist
np.random.seed(42)
data = np.random.rand(10, 2)
distance_matrix = pdist(data, 'euclidean')
Z = linkage(distance_matrix, 'ward')
threshold = 0.4
clusters = fcluster(Z, threshold, criterion='distance')
# Print cluster labels
print("Cluster labels:", clusters)
# Plot the clustered data
plt.figure(figsize=(10, 7))
plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='prism')
plt.title('Data points and their cluster assignments')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.savefig('output/graph.png')

  • Lines 1–4: We import the required modules.

  • Lines 6–7: On line 6, we set a seed for a random number generator. 42 is an arbitrary number; any integer can be used. The key point is that the seed will produce the same sequence of random numbers. This is useful for debugging. Line 7 generates a 2D array of random numbers. The rand() function generates random numbers over the interval. The arguments specify the array shape, i.e., 10 rows and 2 columns.

  • Line 9: We calculate the distances between each pair of data points in the dataset using the pdist function. The Euclidean distance is one of the most common distance metrics, representing the straight-line distance between two points.

  • Line 10: The linkage function performs hierarchical clustering on the distance matrix using Ward’s method. The output Z is a linkage matrix that contains information about which clusters were merged and at what distance. This information can be used to decide on the final number of clusters.

  • Line 11: It performs hierarchical clustering on the distance matrix using Ward’s method. The output Z is a linkage matrix that contains information about which clusters were merged and at what distance.

  • Lines 12–13: The threshold defines the maximum distance between clusters that will be merged. Clusters formed by merging nodes at distances greater than this threshold will be treated as separate clusters. The fcluster() function assigns cluster labels to each observation based on the linkage matrix Z and the specified criterion, i.e., distance threshold.

  • Line 16: We print the cluster labels for each data point, showing which cluster each point belongs to.

  • Lines 19–24: We plot the clustering through the pyplot package.

If you want to learn more about SciPy, check the official documentation.


  

Free Resources