scikit-learn

Learn about scikit-learn and how it simplifies training and prediction.

While pandas will save you a lot of time loading, examining, and cleaning data, the machine learning algorithms that will enable you to do predictive modeling are located in other packages.

Importance of scikit-learn in predictive modeling

Scikit-learn is a foundational machine learning package for Python that contains many useful algorithms and has also influenced the design and syntax of other machine learning libraries in Python. For this reason, we focus on scikit-learn to develop skills in the practice of predictive modeling. While it’s impossible for any one package to offer everything, scikit-learn comes pretty close in terms of accommodating a wide range of classic approaches for classification, regression, and unsupervised learning. However, it does not offer much functionality for some more recent advancements, such as deep learning.

Here are a few other related packages you should be aware of:

SciPy:

  • Most of the packages we’ve used so far, such as NumPy and pandas, are actually part of the SciPy ecosystem.

  • SciPy offers lightweight functions for classic methods such as linear regression and linear programming.

StatsModels:

  • More oriented toward statistics and maybe more comfortable for users familiar with R.

  • Can get p-values and confidence intervals on regression coefficients.

  • Capability for time series models such as ARIMA.

XGBoost and LightGBM:

TensorFlow, Keras, and PyTorch:

  • Packages that offer deep learning capabilities.
Press + to interact

There are many other Python packages that may come in handy, but this gives you an idea of what’s out there.

Simplifying training and prediction with scikit-learn

Scikit-learn offers a wealth of different models for various tasks, but, conveniently, the syntax for using them is consistent. In this lesson, we will illustrate model syntax using a logistic regression model. Logistic regression, despite its name, is actually a classification model. This is one of the simplest, and therefore most important, classification models. In the ...

Access this course and 1400+ top-rated courses and projects.