Data Model of Megastore

Learn about the data model of Megastore.

This lesson discusses the trade-offs between traditional databases and NoSQL and describes the resulting developer-facing features in Megastore’s development.

API design

In traditional databases, to facilitate user operations, normalized relational schemas depend on joins at query time. This is not the proper approach for Megastore applications because of the following reasons:

  • Relatively stable performance benefits high-volume interactive workloads much more than an expressive query language, for example, SQL provides many types of joins.
  • It is beneficial to shift work from read time to write time since there are more reads than writes in the applications where Megastore is to be used. Traditional database queries are translated into an execution plan, and then this plan is optimized when a query is submitted.
  • In key-value systems like Bigtable, storing and accessing hierarchical data is simple. Therefore, we might not need extensive query optimization.

Considering these factors, we have created a data model and schema language that gives users granular control over where their data is stored.

Data model

Megastore proposes a data model that sits somewhere between an RDBMS’s abstract tuples and NoSQL’s actual row-column storage. The data model is defined in a schema and strongly typed, similar to RDBMS. Every schema consists of a collection of tables, each of which has a collection of entities that in turn have a collection of attributes. Property values are named and typed. Google’s Protocol Buffers, strings, and various numeric representations are all acceptable types. We can define them as optional, required, or repeated (A single property can have a number of values.) The table’s entities all have the same set of permitted attributes. The entity’s primary key is assembled by a series of properties, and the primary keys must be distinctive inside the table. Let’s see an example of the database structure for a basic picture album app in the code below:

Press + to interact
CREATE SCHEMA PhotoApp;
CREATE TABLE User {
required int64 user_id;
required string name;
} PRIMARY KEY(user_id), ENTITY GROUP ROOT;
CREATE TABLE Photo {
required int64 user_id;
required int32 photo_id;
required int64 time;
required string full_url;
optional string thumbnail_url;
repeated string tag;
} PRIMARY KEY(user_id, photo_id), IN TABLE User,
ENTITY GROUP KEY(user_id) REFERENCES User;
CREATE LOCAL INDEX PhotosByTime
ON Photo(user_id, time);
CREATE GLOBAL INDEX PhotosByTag
ON Photo(tag) STORING (thumbnail_url);

There are two distinct kinds of tables in Megastore entity groups: root tables and child tables. Each child table should indeed specify a single distinct foreign key referring to a root table, as shown by the ENTITY GROUP KEY annotation at ...