SciPy is an
It is a good alternative to MATLAB and the GNU scientific library in C/C++.
Interesting fact: SciPy stands for Scientific Python. It was created by Travis Oliphant, who also created NumPy. It is pronounced as “Sigh-Pie”.
To use SciPy, remember to install NumPy first. To install, use the following commands.
For macOS:
sudo port install py35-scipy py35-numpy
For Windows:
python3 -m pip install --user numpy scipy
For Linux:
sudo apt-get install python-scipy python-numpy
SciPy has multiple packages that cover different scientific domains, which are listed below:
Core scientific computing
Mathematical functions and computation
Constants and utilities
Data handling
Signal and image processing
Statistics and data analysis
Spatial data structures and algorithms
These packages are imported exclusively before being used in the code.
import scipyfrom scipy import <module name>
Note that this blog will discuss basic functionalities that are easier for beginner-level audiences to understand. For this purpose, we can list down the relevant subdomains as shown in the figure below.
Feel free to jump to any section you’re interested in.
Constants and units are the building blocks of scientific measurement. Constants define the fundamental behavior of the universe, e.g., the speed of light. Knowing their values helps scientists make predictions. Similarly, units help scientists standardize their measurements, e.g., meters and kilometers. It allows researchers to replicate experiments and verify findings.
Here’s an analogy: Constants are the ingredients, while units are the measuring cups. To bake a good cake, you need to follow the recipe precisely.
The scipy.constants
subpackage contains multiple constants. You can access the value of any supported constant as below.
# Import the required package(s)import scipyfrom scipy import constantsprint (constants.pi) # Print the value of πprint (constants.g) # Print the value of gravitational constant
Lines 2–3: We import the required libraries.
Lines 5–6: We print the value of two constants, π and
You can also use the units of physical quantities under different unit systems.
# Import the required package(s)import scipyfrom scipy import constants# SI prefixes specified in the power of 10print (constants.mega)print(constants.kilo)# Time units specified in secondsprint(constants.minute)print(constants.day)
Lines 2–3: We import the required libraries.
Lines 6–7: We print the values of metric prefixes specified in powers of 10.
Lines 10–11: We print the values of time units specified in seconds.
To view the complete list of supported constants and units, try the dir()
function as follows:
# Import the required package(s)import scipyfrom scipy import constantsprint(dir(constants))
Linear algebra is an adapter that connects mathematics and science to solve real-world problems. Many problems boil down to manipulating linear systems of equations. You can almost present any data in the form of matrices or vectors.
The scipy.linalg
subpackage helps you solve advanced linear algebra routines and matrix decompositions.
# Import the required package(s)import numpy as npimport scipyfrom scipy import linalg# Write equations in the form of matricesA = np.array([[1,2], [4,3]])B = np.array([1, 2])print(linalg.solve(A, B)) # Solve the equations for 2 unknown variables
Lines 2–4: We import the required libraries.
Lines 7–8: We store the following system of numpy.array
functionality.
Line 9: The linalg.solve()
function accepts the matrices and returns the list of unknown variables, which are 0.2 and 0.4 in this case.
Calculating the determinant is one of the prime operations done on a matrix. We can compute the determinant as follows.
# Import the required package(s)import numpy as npimport scipyfrom scipy import linalgA = np.array([[1,2], [4,3]])print(linalg.det(A)) # Compute the determinant of matrix
Lines 2–4: We import the required libraries.
Line 7: The linalg.det()
function takes the square matrix (created on line 6) and returns its determinant.
Computing the inverse of a matrix on a piece of paper is a lengthy process with multiple steps. But with the scipy.linalg
subpackage, we can get the result in one step as follows.
# Import the required package(s)import numpy as npimport scipyfrom scipy import linalgA = np.array([[1,2], [4,3]])print(linalg.inv(A)) # Compute the inverse of a matrix
Lines 2–4: We import the required libraries.
Line 7: The linalg.inv()
function takes the matrix (created on line 6) and returns its inverse.
Optimizing means finding the best solution to a problem according to certain criteria. In real-world scenarios, we want to minimize costs and maximize profits. For instance, an engineer might optimize a design to minimize material usage while maintaining strength.
Imagine a function as a landscape with hills and valleys. Optimizing the function is like finding the highest peak (maximum) or the lowest valley (minimum) in that landscape. This point (whether maximum or minimum) represents the optimal solution to your problem.
With the scipy.optimize
subpackage, you can minimize or maximize the objective function.
# Import the required package(s)from scipy import optimize# Objective functiondef func(x):return (x*x)+x+2result = optimize.minimize(func, 0) # Optimize the objective functionprint(result.x)print(result.fun)
Line 2: We import the required libraries.
Lines 5–6: We define an objective quadratic function.
Line 8: The optimize.minimize()
function takes:
The function whose minimum value we’re seeking
An initial guess, which is 0 in this case
The output shows that for x
equals -0.5, the minimum value of the function is 1.75.
In calculus, integration means finding the area under a curve. In scientific computing, integration can be used to compute a function’s total accumulated value over an interval. Imagine velocity as a function of time. Integration of that function gives you the total distance traveled over that time.
Here’s an analogy: Imagine rain falling at a certain rate throughout the day. Integration helps you calculate the total amount of rainwater collected (accumulation) over that entire day.
With scipy.integrate
, you can perform single integration as follows.
# Import the required package(s)from scipy import integratedef integrand(x):return x**2integral, error = integrate.quad(integrand, 0, 1) # Integrate the functionprint (integral)print (error)
Line 2: We import the required libraries.
Line 4–5: We declare the integrand
represents the mathematical function of one variable we want to integrate.
Line 7: The integrate.quad()
function is used to perform the
For integrating functions of two or more variables, use
dblquad()
ortplquad()
.
SciPy’s special
package provides several utility functions that complement the core NumPy operations, such as computing factorial, combinations, and permutations. Look at the code below.
# Import the required package(s)import scipyfrom scipy import special# Compute factorial of a positive integerprint(special.factorial(5))n = 10.4x = 5# Calculate the number of combinationsprint(special.comb(int(n), int(x)))# Calculate the number of permutationsprint(special.perm(int(n), int(x)))
Lines 2–3: We import the required libraries.
Line 6: The special.factorial()
function takes a positive integer as an argument and returns its factorial.
Line 12: The special.comb()
function takes two arguments (say n
and x
) and calculates the number of combinations by choosing x
elements from a set of n
.
Line 15: The special.perm()
function takes two arguments (say n
and x
) and calculates the total number of arrangements that can be performed with n
elements taken x
at a time.
We can also apply trigonometric operations and basic mathematical functionalities. Look at the code below.
# Import the required package(s)import scipyfrom scipy import specialprint(special.exp10(3)) # Raise numerical number to the exponent of 10print(special.exp2(3)) # Raise numerical number to the exponent of 2print(special.cbrt(8.5)) # Calculate cube root of an numerical numberprint(special.sindg(90)) # Calculate sine of an angle provided in degreesprint(special.cosdg(45)) # Calculate cosine of an angle provided in degrees
Lines 2–3: We import the required libraries.
Line 5: The special.exp10()
function raises the numerical number to the power of 10 and returns the result.
Line 7: The special.exp2()
function raises the numerical number to the power of 2 and returns the result.
Line 9: The special.cbrt()
function calculates the cube root of a numerical number passed as an argument and returns the result.
Line 11: The special.sindg()
function calculates the sine of an angle provided (in degrees unit) as an argument and returns a scalar value.
Line 13: The special.cosdg()
function calculates the cosine of an angle provided (in degrees unit) as an argument and returns a scalar value.
For comprehensive statistical functionalities, visit the dedicated
scipy.stats
subpackage in the official documentation.
Interpolation means bridging the gap between known data points by providing estimates for unknown values in between. Imagine tracking stock prices over time. By interpolating, we can estimate potential price movements between recorded points.
With the scipy.interpolate
subpackage, you can do 1D (linear) interpolation as follows.
# Import the required package(s)import numpy as npimport matplotlib.pyplot as pltfrom scipy import interpolate# Define the known data pointsx = np.array([0, 1, 2, 3, 4, 5])y = np.array([0, 1, 4, 9, 16, 25])# Create the interpolation functionf_linear = interpolate.interp1d(x, y) # Linear interpolation# Estimate the valuesx_new = np.linspace(0, 5, 50) # Defining x valuesy_linear = f_linear(x_new) # Estimate y values using the interpolation functions# Plot the resultsplt.plot(x, y, 'o', label='Data points')plt.plot(x_new, y_linear, '-', label='Linear interpolation')plt.legend()plt.xlabel('x')plt.ylabel('y')plt.title('Interpolation')plt.savefig('output/graph.png')
Lines 2–4: We import the required libraries.
Lines 7–8: We define two arrays for known data points. For example, for
Line 11: The interpolate.interp1d()
takes both arrays and uses them to estimate corresponding y values based on the provided data points in x
and y
.
Line 15: We generate 50 new values against the x-axis (x_new
) between 0 and 5 (inclusive). These new values represent where we want to estimate the y-axis values.
Line 16: We apply the linear interpolation function, f_linear
, to the new values in x_new
. This estimates the corresponding y values using linear interpolation between the original data points.
Lines 19–25: We plot the interpolation through the pyplot
package.
Clustering means dividing the population (or data points) into groups such that the data points in one group are more similar. A group is also known as a cluster.
Imagine a retail company trying to understand its customers better to tailor marketing strategies and improve sales. One way is to segment customers based on purchasing behavior and demographics. In simple words, clustering helps businesses make data-driven decisions.
One of the most commonly used techniques in scipy
is
import numpy as npimport matplotlib.pyplot as pltfrom scipy.cluster.hierarchy import linkage, fclusterfrom scipy.spatial.distance import pdistnp.random.seed(42)data = np.random.rand(10, 2)distance_matrix = pdist(data, 'euclidean')Z = linkage(distance_matrix, 'ward')threshold = 0.4clusters = fcluster(Z, threshold, criterion='distance')# Print cluster labelsprint("Cluster labels:", clusters)# Plot the clustered dataplt.figure(figsize=(10, 7))plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='prism')plt.title('Data points and their cluster assignments')plt.xlabel('Feature 1')plt.ylabel('Feature 2')plt.savefig('output/graph.png')
Lines 1–4: We import the required modules.
Lines 6–7: On line 6, we set a seed for a random number generator. 42
is an arbitrary number; any integer can be used. The key point is that the seed will produce the same sequence of random numbers. This is useful for debugging. Line 7 generates a 2D array of random numbers. The rand()
function generates random numbers over the interval. The arguments specify the array shape, i.e., 10 rows and 2 columns.
Line 9: We calculate the distances between each pair of data points in the dataset using the pdist
function. The Euclidean distance is one of the most common distance metrics, representing the straight-line distance between two points.
Line 10: The linkage
function performs hierarchical clustering on the distance matrix using Ward’s method. The output Z
is a linkage matrix that contains information about which clusters were merged and at what distance. This information can be used to decide on the final number of clusters.
Line 11: It performs hierarchical clustering on the distance matrix using Ward’s method. The output Z
is a linkage matrix that contains information about which clusters were merged and at what distance.
Lines 12–13: The threshold
defines the maximum distance between clusters that will be merged. Clusters formed by merging nodes at distances greater than this threshold will be treated as separate clusters. The fcluster()
function assigns cluster labels to each observation based on the linkage matrix Z
and the specified criterion, i.e., distance threshold.
Line 16: We print the cluster labels for each data point, showing which cluster each point belongs to.
Lines 19–24: We plot the clustering through the pyplot
package.
If you want to learn more about SciPy, check the official documentation.
Free Resources