Common Issues with Databases
Learn about the different types of issues that arise whendealing with databases.
We'll cover the following
If we can’t trust our data, we can’t trust our program. Issues with data correctness can go unnoticed for a long time, popping up unexpectedly to pollute the user experience, raise doubt about the web application, or crash a server process. This is especially true as the data gets more complicated and as the points of contact with the data increase. The best practice for developing a new application is to enforce data integrity constraints from the beginning and re-run total data checks periodically. Three broad categories to watch for are:
- Type violations
- Invalid semantics
- Broken cross-references
Type violations
The data types on which databases are built prevent the coarsest kinds of type violations, such as storing a number where a string is expected. In general, it is important to think about which fields must be populated and which can be blank, also referred to as NULL in SQL. In addition to missing values, it is possible to have repeated records, with the exception of the primary key. Duplicate records are fairly easy to detect, whereas near-duplicates are more challenging. With the freedom afforded by more general types, like JSON content, the type checking is lost. It’s important to think through what kinds of data will be acceptable. It’s generally easier to validate and correct the data as it goes in, rather than deal with every type of data abnormality that can affect every function that uses the database.
Invalid semantics
Invalid semantics are the most common types of error… Once a field is restricted to a SQL type, there are often many ways in which it can still be wrong. If we take a prosaic field that seems self-evident but examine it closely, we’re likely to find a bewildering number of unusual special cases requiring custom treatment. These special cases are often called edge cases. Take United States state codes, like “CA” for California and “TN” for Tennessee. All of the codes are two letters, so your type is likely to be a String(2)
in sqlalchemy
. But many strings are not valid, like “ZZ”, which someone might put in if they don’t know the code, for example. Lowercase letters should be avoided altogether. We can decide that restricting to just the 50 state codes would be the best, except now, residents of Washington, D.C., will be unable to enter their state. The purpose of this example is to point out how even simple questions can get complicated when aiming for data perfection.
Get hands-on with 1200+ tech skills courses.