Statistical Features - Basics
We'll cover the following...
Basic Concepts
The first step in analyzing data is to get familiar with it. Our good old NumPy provides a lot of methods that can help us do this easily. We are going to look at some of these methods in this lesson. Along the way, we are going to understand the meaning of important statistical terms as we encounter them.
The most basic yet powerful terms that you could come across are the mean, mode, median, standard deviation, and correlation coefficient. Let’s understand these with an example dataset and using NumPy.
Say we have a dataset consisting of students’ exams scores and the time they invested in studying for the exam. What can we learn about this data using statistics?
Run the code in the widget below and try to understand what’s happening before reading the description that follows.
import numpy as np# The datasetlearning_hours = [1, 2, 6, 4, 10]scores = [3, 4, 6, 5, 6]# Applying some stats methods to understand the data:print("Mean learning time: ", np.mean(learning_hours))print("Mean score: ", np.mean(scores))print("Median learning time: ", np.median(learning_hours))print("Standard deviation: ", np.std(learning_hours))print("Correlation between learning hours and scores:", np.corrcoef(learning_hours, scores))
Mean
The mean value is the average of a data set, the sum the elements divided by the number of elements. As the name says, np.mean()
returns the arithmetic mean of the dataset. ...