Equivalence to the t-test for two classes and cautions

When we use an F-test to look at the difference in means between just two groups, as we’ve done for the binary classification problem of the case study, the test we are performing actually reduces to what’s called a t-test. An F-test is extensible to three or more groups and so is useful for multiclass classification. A t-test just compares the means between two groups of samples, to see whether the difference in those means is statistically significant.

While the F-test served our purposes here of univariate feature selection, there are a few cautions to keep in mind. Going back to the concept of formal statistical assumptions, for the F-test these include that the data is normally distributed. We have not checked this. Also, in comparing the same response variable, y, to many potential features from the matrix, X, we have performed what ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Finer Points of the F-test

Equivalence to the t-test for two classes and cautions