scikit-learn
Learn about scikit-learn and how it simplifies training and prediction.
While pandas will save you a lot of time loading, examining, and cleaning data, the machine learning algorithms that will enable you to do predictive modeling are located in other packages.
Importance of scikit-learn in predictive modeling
Scikit-learn is a foundational machine learning package for Python that contains many useful algorithms and has also influenced the design and syntax of other machine learning libraries in Python. For this reason, we focus on scikit-learn to develop skills in the practice of predictive modeling. While it’s impossible for any one package to offer everything, scikit-learn comes pretty close in terms of accommodating a wide range of classic approaches for classification, regression, and unsupervised learning. However, it does not offer much functionality for some more recent advancements, such as deep learning.
Here are a few other related packages you should be aware of:
SciPy:
-
Most of the packages we’ve used so far, such as NumPy and pandas, are actually part of the SciPy ecosystem.
-
SciPy offers lightweight functions for classic methods such as linear regression and linear programming.
StatsModels:
-
More oriented toward statistics and maybe more comfortable for users familiar with R.
-
Can get p-values and confidence intervals on regression coefficients.
-
Capability for time series models such as ARIMA.
XGBoost and LightGBM:
- Offer a suite of state-of-the-art ensemble models that often outperform random forests. We will learn about XGBoost in Gradient Boosting, SHAP Values, and Dealing with Missing Data.
TensorFlow, Keras, and PyTorch:
- Packages that offer deep learning capabilities.
Get hands-on with 1300+ tech skills courses.