...

/

Introduction to Data Manipulation and Concurrency Control

Introduction to Data Manipulation and Concurrency Control

Get an introduction to data manipulation and concurrency control.

Tweets dataset

We used a dataset of 200,000 USA-geolocated Tweets with a very simple data model. The data model is a direct port of the Excel sheet format, allowing a straightforward loading process—we used the \copy command from the psql tool.

Press + to interact
begin;
create table tweet
(
id bigint primary key,
date date,
hour time,
uname text,
nickname text,
bio text,
message text,
favs bigint,
rts bigint,
latitude double precision,
longitude double precision,
country text,
place text,
picture text,
followers bigint,
following bigint,
listed bigint,
lang text,
url text
);
\copy tweet from '/usercode/tweets.csv' with csv header delimiter ';'
commit;

Database model and normalization

The tweets.sql database model is all wrong per the normal forms introduced earlier:

  • There’s neither a unique constraint nor a primary key, so there is nothing preventing the insertion of duplicate entries, violating 1NF.

  • Some non-key attributes are not dependent on the key because we mix data from the Twitter account posting the message and the message itself, violating ...