An overview of predictive modeling

Much of your time as a data scientist is likely to be spent wrangling data: figuring out how to get it, examining it, making sure it’s correct and complete, and joining it with other types of data. The pandas is a widely used tool for data analysis in Python, and it can facilitate the data exploration process for you, as we will see in this chapter. However, one of the key goals of this course is to start you on your journey to becoming a machine learning data scientist, for which you will need to master the art and science of predictive modeling. This means using a mathematical model, or idealized mathematical formulation, to learn relationships within the data, in the hope of making ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Different Types of Data Science Problems

An overview of predictive modeling