Antipattern: Clone Tables or Columns
Let's see the effects of cloning tables or columns for making minor changes.
In the television series
Where does one put them? Who’s responsible for them? How long would it take to pick up every tribble? Eventually, Captain Kirk discovers that his ship and crew can’t function, and he has to order his crew to make it their top priority to remove the tribbles.
We know from experience that querying a table with a few rows is quicker than querying a table with many rows, all other things being equal. This leads to a common fallacy: we must make every table contain fewer rows, no matter what we do. This leads to two forms of antipattern:
-
Splitting a single long table into multiple smaller tables and naming each table based on the distinct data values of one of the table’s attributes.
-
Splitting a single column into multiple columns, using column names based on distinct values as another attribute.
But we cannot get something for nothing; to meet the goal of having fewer rows in every table, we have to either create tables that have too many columns or else create a greater number of tables. In both cases, we find that the number of tables or columns keep growing as new data values make us create new schema objects.
Spawning tables
To split data into separate tables, we need to define a policy about which rows belong in which tables. For example, we could split them up by the year in the date_reported
column:
Get hands-on with 1400+ tech skills courses.