Finalizing the Design of Data Model
Let's take a look at interactions between tables to make changes to the data model.
We'll cover the following
Interactions between tables
Let’s do the last pass and see if the interactions between the tables look good. In particular, we’re looking for any implicit assumptions that should either be eliminated or made explicit. We also want to avoid duplication of data where possible. That sort of analysis here might reveal a few insights.
Hard-coded fields
First, we see that three languages are hard-coded into the data schema in the user model with the num_en
, num_fr
, and num_es
fields. Any sort of hard coding carries with it some implicit assumptions that changes will be rare. Conceptually, these counts are part of a one-to-many relationship. That is, a single user could have games in zero, one, or many languages. There are a few options we have:
-
The full-strength solution for this is to make a separate table that tracks each language as a separate row and then to link between the tables using id fields. This is a valid solution, but for only three distinct subfields that will possibly never be changed, it’s a lot of schema details to manage.
-
Another option is to leave it as is and explicitly document what would need to be updated if the set of languages is changed.
-
We’ll choose the third option: record the counts in a single JSON field.
JSON will allow us to have a field that itself has fields, like a dictionary in Python or an object in JavaScript. While we’re at it, we can also count wins, losses, and canceled games in either the same or a different field. It’s cleaner to keep them separate. With the language field, we could enforce that it can only take a valid value. A similar constraint for JSON types is more difficult to put in place. The new user table is as follows.
Field Name | Field Description |
---|---|
user_id |
A unique UUID for the user |
user_name |
The user’s self-reported name |
num_games |
Number of games started |
outcomes |
JSON object counting wins, losses, cancels |
by_lang |
JSON object counting games in each language |
first_time |
When the first game was started |
total_time |
Total time between starts and finishes |
avg_game_time |
Average time per game |
Duplication
The second issue in the tables is that the game has information about usage and the secret word, which comes from the usage table, but it doesn’t point to the usage table. This could be a useful way to organize it if we expected that the usage table might change after the game starts, but we want the game to remain unchanged. In this case, we expect the records in the usage table to never change, except possibly to have new records added. This is an implicit assumption that we should document. Also, we should reorganize the schema to take advantage of this change. In particular, the game should not duplicate the information contained in the usage table but just point to it with its ID. The new game table drops the language, the secret word, and the usage fields, but adds the usage ID field.
Field Name | Field Description |
---|---|
game_id |
A unique UUID for each game |
usage_id |
The ID of the usage in that table |
guessed |
Letters guessed so far by the user |
reveal_word |
Blanks and revealed letters so far |
bad_guesses |
Number of incorrect guesses so far |
start_time |
When the game was started |
The relationship between the game and user
A third issue is that, given a game, there is no easy way to determine which user is playing that game. This can be a problem if the API indicates the game and must then update the user’s game statistics. Of course, it is possible to search through the user table to find the user playing that game. A better option is to add a user ID field to each game to indicate who is playing it. As a result, the final version of the game table will include a player field.
Field Name | Field Description |
---|---|
game_id |
A unique UUID for each game |
player |
The ID of the user playing the game |
usage_id |
The ID of the usage in that table |
guessed |
Letters guessed so far by the user |
reveal_word |
Blanks and revealed letters so far |
bad_guesses |
Number of incorrect guesses so far |
start_time |
When the game was started |
Once the game points to the user, we need to wonder if the user should still point to the game. In our API, we will always access things using the game ID rather than the user ID, so there’s no need to get the active game for a user. Should we include it for completeness, or because we may need it later? Opinions vary on this, but we say no. It would be another item that needs to be maintained. We can put it in later, if needed.
Get hands-on with 1400+ tech skills courses.