Scheduling Estimates Computations
Understand how to schedule estimates computations in PostgreSQL.
We'll cover the following...
To calculate unique visitor estimates from the heavy insert table, we need a background process to do this from time to time.
The easiest way to do that here would be to create a new API endpoint on our back-end server and set up a cron-like utility to use that endpoint for our specified schedule. In case of emergency, though, it’s nice to be able to run this updating process interactively. A solution to have both the back-end API integration and the interactive approaches available consist of packaging our SQL query as a stored procedure.
SQL function
While stored procedures aren’t covered, it’s easy enough to write a SQL function around the statement we have already:
begin;create function tweet.update_unique_visitors(in batch_size bigint default 1000,out messageid bigint,out date date,out uniques bigint)returns setof recordlanguage SQLas $$with new_visitors as(delete from tweet.visitorwhere id = any (select idfrom tweet.visitororder by datetime, messageidfor updateskip lockedlimit update_unique_visitors.batch_size)returning messageid,cast(datetime as date) as date,hll_hash_text(ipaddr::text) as visitors),new_visitor_groups as(select messageid, date, hll_add_agg(visitors) as visitorsfrom new_visitorsgroup by messageid, date)insert into tweet.uniques(messageid, date, visitors)select messageid, date, visitorsfrom new_visitor_groupson conflict (messageid, date)do update set visitors = hll_union(uniques.visitors, excluded.visitors)where uniques.messageid = excluded.messageidand uniques.date = excluded.datereturning messageid, date, cast(# visitors as bigint) as uniques;$$;commit;
And here’s an interactive session where we use the newly defined stored procedure to update our unique visitors hll
table. ...