Finalizing the Design of Data Model

Let's take a look at interactions between tables to make changes to the data model.

Interactions between tables

Let’s do the last pass and see if the interactions between the tables look good. In particular, we’re looking for any implicit assumptions that should either be eliminated or made explicit. We also want to avoid duplication of data where possible. That sort of analysis here might reveal a few insights.

Hard-coded fields

First, we see that three languages are hard-coded into the data schema in the user model with the num_en, num_fr, and num_es fields. Any sort of hard coding carries with it some implicit assumptions that changes will be rare. Conceptually, these counts are part of a one-to-many relationship. That is, a single user could have games in zero, one, or many languages. There are a few options we have:

  • The full-strength solution for this is to make a separate table that tracks each language as a separate row and then to link between the tables using id fields. This is a valid solution, but for only three distinct subfields that will possibly never be changed, it’s a lot of schema details to manage.

  • Another option is to leave it as is and explicitly document what would need to be updated if the set of languages is changed.

  • We’ll choose the third option: record the counts in a single JSON field.

JSON will allow us to have a field that itself has fields, like a dictionary in Python or an object in JavaScript. While we’re at it, we can also count wins, losses, and canceled games in either the same or a different field. It’s cleaner to keep them separate. With the language field, we could enforce that it can only take a valid value. A similar constraint for JSON types is more difficult to put in place. The new user table is as follows.

Field Name Field Description
user_id A unique UUID for the user
user_name The user’s self-reported name
num_games Number of games started
outcomes JSON object counting wins, losses, cancels
by_lang JSON object counting games in each language
first_time When the first game was started
total_time Total time between starts and finishes
avg_game_time Average time per game

Duplication

The second issue in the tables is that the game has information about usage and the secret word, which comes from the usage table, but it doesn’t point to the usage table. This could be a useful way to organize it if we expected that the usage table might change after the game starts, but we want the game to remain unchanged. In this case, we expect the records in the usage table to never change, except possibly to have new records added. This is an implicit assumption that we should document. Also, we should reorganize the schema to take advantage of this change. In particular, the game should not duplicate the information contained in the usage table but just point to it with its ID. The new game table drops the language, the secret word, and the usage fields, but adds the usage ID field.

Field Name Field Description
game_id A unique UUID for each game
usage_id The ID of the usage in that table
guessed Letters guessed so far by the user
reveal_word Blanks and revealed letters so far
bad_guesses Number of incorrect guesses so far
start_time When the game was started

The relationship between the game and user

A third issue is that, given a game, there is no easy way to determine which user is playing that game. This can be a problem if the API indicates the game and must then update the user’s game statistics. Of course, it is possible to search through the user table to find the user playing that game. A better option is to add a user ID field to each game to indicate who is playing it. As a result, the final version of the game table will include a player field.

Field Name Field Description
game_id A unique UUID for each game
player The ID of the user playing the game
usage_id The ID of the usage in that table
guessed Letters guessed so far by the user
reveal_word Blanks and revealed letters so far
bad_guesses Number of incorrect guesses so far
start_time When the game was started

Once the game points to the user, we need to wonder if the user should still point to the game. In our API, we will always access things using the game ID rather than the user ID, so there’s no need to get the active game for a user. Should we include it for completeness, or because we may need it later? Opinions vary on this, but we say no. It would be another item that needs to be maintained. We can put it in later, if needed.

Get hands-on with 1300+ tech skills courses.