Counting Unique Tweet Visitors
Learn how to count inserted unique visitors using PostgreSQL.
We'll cover the following
As an example use case for the HyperLogLog data type, we’re going to count unique visitors to our tweets using the application we introduced in “Data Manipulation and Concurrency Control.”
The two main operations around an hll
data type consist of the following:
- Build a hash from an input value, such as an IP address.
- Update the already known
hll
value with the hash.
The main idea behind hll
is to keep a single hll
value per granularity, here per Tweet message and per day. This means that each time we have a new visit on a tweet, we want to update our hll
set to count that visitor.
Update heavy scenarios and concurrency
Concurrency is a deal breaker for UPDATE-heavy scenarios where the same row is solicited over and over again. So we’re going to work in two steps again here, first doing an INSERT per visit and then arranging a background process to transform those visits into an UPDATE to the single hll
aggregate per tweet and date.
Here’s the visitor table where we can insert every single visit:
Get hands-on with 1300+ tech skills courses.