Dense features

We will start by discussing the dense features.

User-author features

These features are based on the logged-in user and the Tweet’s author. They will capture the social relationship between the user and the author of the Tweet, which is an extremely important factor in ranking the author’s Tweets. For example, if a Tweet is authored by a close friend, family member, or someone that user is highly influenced by, there is a high chance that the user would want to interact with the Tweet.

How can you capture this relationship in your signals given users are not going to specify them explicitly? Following are a few features that will effectively capture this.

User-author historical interactions

When judging the relevance of a Tweet for a user, the relationship between the user and the Tweet’s author plays an important role. It is highly likely that if the user has actively engaged with a followee in the past, they would be more interested to see a post by that person on their feed.

Press + to interact

Few features based on the above concept can be:

author_liked_posts_3months

This considers the percentage of an author’s Tweets that are liked by the user in the last three months. For example, if the author created twelve posts in the last three months and the user interacted with six of these posts then the feature’s value will be:

$\frac{6}{12}$ = $0.5$ or $50\%$

This feature shows a more recent trend in the relationship between the user and the author.
author_liked_posts_count_1year

This considers the number of an author’s Tweets that the user interacted with, in the last year. This feature shows a more long term trend in the relationship between the user and the author.

📝 Ideally, we should normalize the above features by the total number of Tweets that the user interacted with during these periods. This enables the model to see the real picture by cancelling out the effect of a user’s general interaction habits. For instance, let’s say user A generally tends to interact (e.g., like or comment) more while user B does not. Now, both user A and B have a hundred interactions on user C’s posts. User B’s interaction is more significant since they generally interact less. On the other hand, user A’s interaction is mostly a result of their tendency to interact more.

User-author similarity

Another immensely important feature set to predict user engagement focuses on figuring out how similar the logged-in user and the Tweet’s author are. A few ways to compute such features include:

common_followees

This is a simple feature that can show the similarity between the user and the author. For a user-author pair, ...

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Feature Engineering

Features for the model #

Dense features

User-author features

User-author historical interactions

User-author similarity