Gain insights into data science with easy-to-follow, hands-on explanations. Explore essential concepts quickly and efficiently, even without prior statistics knowledge, for a career boost.

GrokkingDataScientist.tar.gz

jupyter_job

Master the skills that can get you a $100K+ salary even if you bunked your statistics classes. 

No need to waste hours and hours on browsing from one article to the next and piecing together the info you need to grasp important topics. No need to get overwhelmed by the information overload. Find easy to follow, hands-on, and fun explanations of all the essential topics in one place so you can quickly and efficiently learn what you need to thrive as a data scientist.

"Is this course right for me?" Continue to read to decide for yourself!

- "I want to understand this data science concept. Let me Google it". Then after hours of surfing, reading random articles, and invoking the heavens, you are more confused than before.
- "Data science is the sexiest and highest paying job of the 21st century. I want to become a data scientist too".
- "I have a basic knowledge of Python, willingness to learn, and commitment to become a great data scientist."

Is that you? If yes, you are at the right place.

Grokking Data Science

# Basic Concepts
The first step in analyzing data is to get familiar with it. Our good old NumPy provides a lot of methods that can help us do this easily. We are going to look at some of these methods in this lesson. Along the way, we are going to understand the meaning of important statistical terms as we encounter them. 

The most basic yet powerful terms that you could come across are the **mean**, **mode**, **median**, **standard deviation**, and **correlation coefficient**. Let's understand these with an example dataset and using NumPy.

Say we have a dataset consisting of students' exams scores and the time they invested in studying for the exam. What can we learn about this data using statistics? 

Run the code in the widget below and try to understand what's happening before reading the description that follows.



import numpy as np 

# The dataset
learning_hours = [1, 2, 6, 4, 10]
scores = [3, 4, 6, 5, 6]

# Applying some stats methods to understand the data:
print("Mean learning time: ", np.mean(learning_hours))
print("Mean score: ", np.mean(scores))
print("Median learning time: ", np.median(learning_hours))
print("Standard deviation: ", np.std(learning_hours))
print("Correlation between learning hours and scores:", np.corrcoef(learning_hours, scores))

## Mean
The mean value is the **average of a data set**, the sum the elements divided by the number of elements. As the name says, `np.mean()` returns the arithmetic mean of the dataset.

## Median
The median is the **middle element** of the set of numbers. If the length of the array is odd, `np.median()` gives us the middle value of a sorted copy of the array. If the length of the array is even, we get the average of the two middle numbers.

## Standard Deviation
Standard deviation is a **measure of how much the data is spread out**, and is returned by the `np.std()` method. More specifically, standard deviation shows us how much our data is spread out around the mean. Standard deviation could answer the questions *"Are all the scores close to the average?"* or, "*Are lots of scores way above or way below the average score?"* Using standard deviation we have a *standard* way of knowing what is normal and what is high or extra low.

In mathematical terms, standard deviation is the square root of the **variance**. So now you ask, *"What is variance?"* <br>Variance is defined as the average of the squared differences from the mean. Let me break this down for you. 

To calculate the variance manually we would follow these steps:

1. Compute the mean (the simple average of the numbers)
2. Then for each number, subtract the mean and square the result, i.e., the squared difference.
3. Then compute the average of those squared differences. 

Let's calculate the standard deviation for learning hours manually. First let's get the mean value:



Python Fundamentals for Data Science

The Fundamentals of Statistics

Machine Learning 101

End-to-End Machine Learning Project

The Real Talk

Statistical Features - Basics

Basic Concepts

Mean