A Sampling of Countries
Look at a sample of countries from the data for experimentation.
We'll cover the following...
The GeoNames dataset of more than 11 million rows is not practical to include in the course’s material, where you have a database dump or Docker image to play with. We instead take a random sample of 1 percent of the table’s content, and here’s how the magic is done:
Press + to interact
begin;create schema if not exists sample;drop table if exists sample.geonames;create table sample.geonamesas select /** We restrict the “export” to some columns only, so as to* further reduce the size of the exported file available to* download with the book.*/geonameid,name,longitude,latitude,feature_class,feature_code,country_code,admin1_code,admin2_code,population,elevation,timezone/** We only keep 1% of the 11 millions rows here.*/from raw.geonames TABLESAMPLE bernoulli(1);\copy sample.geonames to '/usercode/allCountries.sample.copy'commit;
In this script, we use the tablesample feature of PostgreSQL to only keep a random selection of 1 percent of the rows in the table. The tablesample accepts several methods, and you can see the PostgreSQL documentation entitled Writing A ...