Lossy Unique Count with hll
Learn how to compute precise counts.
We'll cover the following...
We can rewrite the query using our hll
data type now, even though, at this stage, it’s not going to be very useful because we still have the full logs of every visit, and we can afford to compute precise counts.
Computing precise counts
Nonetheless, our goal is to dispose of the daily entries that we anticipate will be just too large a dataset. So, the hll
-based query looks like this:
Press + to interact
select messageid,datetime::date as date,# hll_add_agg(hll_hash_text(ipaddr::text)) as hllfrom tweet.visitorwhere messageid = 3group by grouping sets((messageid),(messageid, date))order by messageid, date nulls firstlimit 10;
In this query, we use several new functions and operators related to the hll
data type:
-
The
#
operator takes a single argument: it’s a unary operator, like factorial (written ...