How to Select Primary Keys
Learn about different primary key formats and reasons to use or not use them.
Let’s look at the data we store in the databases. Every record needs a primary key, a unique identifier used to distinguish this field from others. There are multiple ways to define primary keys. Let’s look at them.
Using meaningful data as a primary key
This looks like a great method at first. We use names to identify people in real life. Why wouldn’t we use user names or emails as identifiers in our systems?
This method would work fine until we have two users with the same name or until someone changes their name. The primary key is often used in other tables and external references. It should never change. The only way to guarantee that is to use a value that’s only an identifier.
There is one exception. We might store application parameters in a key-value database. In this case, the record key is a parameter name.
Using incremental IDs as a primary key
One common way to set primary keys is to use the incremental IDs provided by the database. In this case, we define the ID field as an AUTO_INCREMENT
and don’t provide a value for this key when creating records. The database assigns incrementing integer values for this field. The first saved record will receive an ID of 1.
The second will be an ID of 2
and so on.
Pros
- Such IDs are short and human readable. Users like them.
- We can use ID order to sort and filter. The ID order matches the creation order, and by comparing two IDs, we can tell which record was created later.
Cons
-
The ID is created in the database at the moment of record creation, so the new record ID is unknown until the record is saved.
-
Incremental IDs don’t work well with distributed systems. Imagine we want to run two databases, create different records there, and then merge the data. Many records will have conflicting IDs in this case.
-
There’s a security concern. Incremental IDs are easy to iterate. Imagine hackers notice the application URL
/api/users/1234
. They will iterate over all integers, generate URLs, and parse information about all the users in the database.
Conclusion
Incremental IDs are still popular and are OK to use in small-to-medium applications. It’s better to use BIGINT
as a field type. The default type is INT
, and the maximum value it can hold is 2,147,483,647. Maybe we won’t have that many records, but if we do, altering the ID type on such a big table might be a headache.
Using UUIDs as a primary key
UUID stands for universally unique identifier. It’s a 16-byte long ID that can contain a timestamp, node address, namespace, or be completely random. Usually, it’s represented as a 36-char string, like this:
Get hands-on with 1400+ tech skills courses.