Gain insights into performing predictive data analysis using Python. Learn about tools like NumPy, Pandas, Matplotlib, and Seaborn, and applying techniques to real-world financial and advertisement projects.

seaborn.tar.gz

seaborn

In this course, you will learn how to perform predictive data analysis using Python. The ideal audience is those who want to start their careers as data analysts. The main goal of this course is to show you how to use statistics to draw useful insights from data which can help in predicting future behavior or patterns.

Beyond that, you’ll learn all the tools of the trade that data scientists use everyday including: NumPy, Pandas, Matplotlib, and Seaborn. You’ll learn not only how to extract meaningful insights from data, but you’ll also learn how to create stunning visualizations that you can use for reports.

Various datasets of real-world scenarios are used in each lesson to get you accustomed to handling any type of data. At the end of the course, you will work on two real-world projects that demonstrate how data analysis techniques are being used in the financial and advertisement sector to generate revenue.

Predictive Data Analysis with Python

# Introduction to statistical features

Features that provide numerical information about the given data are known as statistical features. They help to extensively explore the nature and properties of data. The following are some features that will be discussed here:

* **Mean/Median**

* **Standard deviation (STD)**

* **Quantiles**

* **Skewness**

The above properties of data provide information that helps in the examination, inference, and prediction. These properties can only be applied to quantitative parts of the data.

# Mean/Median

* __Mean__: This is the average of the dataset computed by dividing the sum of numbers with their quantity.

* __Median__: This is the exact middle value of a dataset. The data needs to be sorted first to get this measure.

In statistics, the median value is preferred to be used over the mean value because sometimes the mean value can get affected by exceptionally small or large outliers which might bend the mean in the wrong direction. Therefore, the median value is considered as it provides a correct approximation of the middle value of the dataset.


# Standard deviation (STD)

STD stands for standard deviation. This measure informs us how far the values of a dataset are dispersed from their mean value.

A low ___std___ value means that data points of the dataset are close to their mean value, and a high ___std___ value means that data points are widely spread and are far from the mean value. The square of __std__ returns the variance of data.


# Quantiles

Quantile is a statistical measure that divides the data into equal parts. __The main type of quantile is called quartile__, which divides data into __four__ or less equal parts.

Three lines are dropped on data for this division. Each of these lines falls on specific values in the dataset which are explained below. 

* The value that the first line hit is called the ***1st quartile*** and is denoted with __Q1__. This point of data indicates that __25%__ of the data is below this point, and __75%__ of the data is above this point. The data point that this line hits is the middle value between the smallest value of the dataset, and the median value of the dataset.

* The value that the second line hit is called the ***2nd quartile*** and is denoted with __Q2__. This point of data indicates that **50%** of the data is below this point, and **50%** of the data is above this point. The data point that this line hits is the median value of the dataset.

* The value that the third line hit is called the ***3rd quartile*** and is denoted with __Q3__. This point of data indicates that **75%** of the data is below this point, and **25%** of the data is above this point. The data point that this line hits is the middle value between the median value of the dataset and the largest value of the dataset.

The following table summarizes this information:


| Symbol|Names| Definition|
| --- | --- | --- |
|Q1| First Quartile|Splits off the lowest 25% of data from the highest 75%|
|Q2| Second Quartile|Splits dataset in half|
|Q3| Third Quartile|Splits off the highest 25% of data from the lowest 75%|


In this lesson, various statistical features are discussed.

Getting Started

Numpy for Python

Pandas for Python

Statistics for Data Analysis

Data Wrangling

Visualizing the Data

Data Scraping

Project #1

Project #2

Conclusion

Predictive Data Analysis Exam

Statistical Features

Introduction to statistical features

Mean/Median