Exercise: F-test and Univariate Feature Selection
Learn how to select the univariate features using the F-test.
We'll cover the following...
Univariate feature selection using F-test
In this exercise, we’ll use the F-test to examine the relationship between the features and response variable. We will use this method to do what is called univariate feature selection: the practice of testing features one by one against the response variable, to see which ones have predictive power. Perform the following steps to complete the exercise:
-
Our first step in doing the ANOVA F-test is to separate out the features and response as NumPy arrays, taking advantage of the list we created, as well as integer indexing in pandas:
X = df[features_response].iloc[:,:-1].values y = df[features_response].iloc[:,-1].values print(X.shape, y.shape)
The output should show the shapes of the features and response:
# (26664, 17) (26664, )
There are 17 features, and both the features and response arrays have the same number of samples as expected.
-
Import the
f_classif
function and feed in the features and response:from sklearn.feature_selection import f_classif [f_stat, f_p_value] = f_classif(X, y)
There are two outputs from
f_classif
: the ...