What is content-based Filtering?

Method

This method revolves completely around comparing user interests to product features. The products that have the most overlapping features with user interests are what’s recommended.

Given the significance of product features in this system, it is important to discuss how the user’s favorite features are decided.

Here, two methods can be used (possibly in combination). Firstly, users can be given a list of features out of which they can choose whatever they identify with the most. Secondly, the algorithm can keep track of the products the user has chosen before and add those features to the users’ data.

Similarly, product features can be identified by the developers of the product themselves. Moreover, users can be asked what features they believe identify with the products the most.

Once a numerical value, whether it is a binary 1 or 0 value or an arbitrary number, has been assigned to product features and user interests, a method to identify similarities between products and user interests needs to be identified. A very basic formula would be the dot product. To calculate the dot product the following formula should be used, $\sum_{i=1}^d p_iu_i$ (where $p_i$ is the product feature value and $u_i$ user interest value in column i).

In the table given above, user interest level with Product 1 can be estimated to be $2*1 + 1*1 + 1*2$ , which equals $5$ . Similarly, interest in Product 2 will be $1*4 = 4$ and will be $2*3 + 1*1=7$ in Product 3. Hence, Product 3 will be the algorithm’s top recommendation to the user.

Pros and Cons

This model is easily scalable due to low amounts of data. Moreover, since, unlike other models, this does not need to compare with other users’ data, it can offer niche results specific to the current user.

However, this model requires a fair amount of domain knowledge from the people attributing features to products. So, its accuracy is largely dependent on that knowledge being accurate. Moreover, content-based filtering depends greatly on previously known user interests. Therefore, it is limited in that it’s unable to expand on known user interests.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

What is content-based Filtering?

Content-based Filtering

Method

Pros and Cons