...

/

A Sampling of Countries

A Sampling of Countries

Look at a sample of countries from the data for experimentation.

The GeoNames dataset of more than 11 million rows is not practical to include in the course’s material, where you have a database dump or Docker image to play with. We instead take a random sample of 1 percent of the table’s content, and here’s how the magic is done:

Press + to interact
begin;
create schema if not exists sample;
drop table if exists sample.geonames;
create table sample.geonames
as select /*
* We restrict the “export” to some columns only, so as to
* further reduce the size of the exported file available to
* download with the book.
*/
geonameid,
name,
longitude,
latitude,
feature_class,
feature_code,
country_code,
admin1_code,
admin2_code,
population,
elevation,
timezone
/*
* We only keep 1% of the 11 millions rows here.
*/
from raw.geonames TABLESAMPLE bernoulli(1);
\copy sample.geonames to '/usercode/allCountries.sample.copy'
commit;

In this script, we use the tablesample feature of PostgreSQL to only keep a random selection of 1 percent of the rows in the table. The tablesample accepts several methods, and you can see the PostgreSQL documentation entitled Writing A ...