Introduction to Data Manipulation and Concurrency Control
Get an introduction to data manipulation and concurrency control.
Tweets dataset
We used a dataset of 200,000 USA-geolocated Tweets with a very simple data model. The data model is a direct port of the Excel sheet format, allowing a straightforward loading process—we used the \copy
command from the psql tool.
Press + to interact
begin;create table tweet(id bigint primary key,date date,hour time,uname text,nickname text,bio text,message text,favs bigint,rts bigint,latitude double precision,longitude double precision,country text,place text,picture text,followers bigint,following bigint,listed bigint,lang text,url text);\copy tweet from '/usercode/tweets.csv' with csv header delimiter ';'commit;
Database model and normalization
The tweets.sql
database model is all wrong per the normal forms introduced earlier:
-
There’s neither a unique constraint nor a primary key, so there is nothing preventing the insertion of duplicate entries, violating 1NF.
-
Some non-key attributes are not dependent on the key because we mix data from the Twitter account posting the message and the message itself, violating ...