Data modeling is the process of defining how data is stored and what relationships exist between different entities in our data.
Data modeling aims to visually represent the relationship between different entities in data. It also seeks to represent the data’s organization and grouping.
MongoDB provides the following two ways to model our data:
Consider a scenario where you want to save data about blog posts and their comments.
Let’s take a look at how we can model this data using the methods mentioned above.
One way to model blog posts and their respective comments is to embed the child document in the parent document. In our case, the blog post is the parent document and the comment is the child document.
The embedded Document model is also known as “denormalized” data model or schema.
Below is an example of a blog post document that has multiple comment documents embedded in it in the form of an array:
{
_id: <ObjectId123>,
title: "Data Modelling in MongoDB",
body: "some long text...",
comments: [
{
_id: <ObjectId111>,
comment: "some text...",
author: "mike@email.com"
},
{
_id: <ObjectId222>,
comment: "some text...",
author: "jake@email.com"
}
]
}
Embedding documents lead to better performance because we can read and update data in a single database operation.
The embedded data model has the following disadvantages:
Use the embedded data model when:
Entities have a “contains” or “has a” relationship between them.
Entities have a one-to-many relationship between them.
Using this strategy, we can describe relationships between documents using references.
This is also known as “Normalized” data model or schema.
Below is an example of how we can model our blog posts and comments using this data model:
// blog post
{
_id: <ObjectId123>,
title: "Data Modelling in MongoDB",
body: "some long text..."
}
// comments
{
_id: <ObjectId111>,
comment: "some text...",
author: "mike@email.com",
postId: <ObjectId123>. // reference to the blog post
},
{
_id: <ObjectId222>,
comment: "some text...",
author: "jake@email.com",
postId: <ObjectId123> // reference to the blog post
}
In the normalized data model, instead of embedding comment documents in the blog post document, we add the comment documents in a separate collection. In this collection, each comment is a separate document.
In addition to its own data, each comment document also contains a reference to the parent blog post using the id of the parent blog post document.
A normalized data model has the following advantages:
As the related data may be present in the separate documents, to get all the related data we need one of the two things. We either need multiple database operations or we need to join multiple collections.
We also need multiple database operations to write related data in multiple documents.
Use the Normalized data model when you want to:
Model hierarchical data sets
Represent many-to-many relationships
When the read performance gained as a result of using embedded documents does not outweigh the implications of the data duplication.