Feature selection in Python

Manual filtering

As the name suggests, the irrelevant features that do not affect the target variable are filtered out.

Irrelevant features are determined through a correlation matrix.

The correlations can be displayed in a heatmap.

A value closer to +1 and -1 implies a strong positive and a strong negative correlation, respectively.
A value closer to 0 implies a very weak correlation.

The correlation heatmap can be plotted as shown below:

cor = df.corr()
sns.heatmap(cor, annot=True)
plt.show()

A threshold, $t$ , has to be decided for the value of correlation; if the absolute of the correlation value of a feature and target value, $|v|$ , is less than $t$ , then that feature is filtered out.

You should keep an eye on features that are highly correlated with each other as only one of those features can stay in the selected features.

The obtained features can now be used to build the model.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design