What is a database?

Did you know that Charles Bachman created the first database, the Integrated Data Store, in 1960? It revolutionized data management.

Key takeaways:

  • A database is an organized and systematic collection of data stored in a computer system.

  • There are different types of databases designed for specific purposes, including SQL, NoSQL, time-series, in-memory, NewSQL and vector databases.

  • The main components of a database are its schema, data model, DBMS, and data storage layer.

  • Basic functions of a database are data management, data storage, data manipulation, concurrency control, data security, data integrity and transaction management.

  • Some of the business which uses databases are banking, insurance companies, healthcare centers, manufacturers, law firms, social networks, medicines companies, bioinformatics companies, commerce stores, etc.

A database is an organized and systematic collection of data stored and managed electronically in a computer system. It allows users to perform different operations on data, such as storing, retrieving, updating, and deleting it.

Note: Data can be any information like text, images, numeric numbers, media files, and so on.

Types of databases

There are many types of databases due to different applications and systems having different needs in terms of data, performance, scalability, and flexibility. Some of the common database types are as follows:

  • Relational databases or SQL databases

  • NoSQL databases

  • Time series databases

  • In-memory databases

  • NewSQL databases

  • Vector databases

Let’s discuss each of these databases:

SQL databases

SQL databases, also referred to as relational databases, organize data into tables with rows and columns. Each row in a table represents a record, and each column represents a field or attribute. SQL databases follow a strict schema and use a specific language called structured query language (SQL) to perform different operations on the data. These databases are preferred to handle structured data and ensure ACID (Atomicity, Consistency, Isolation, and Durability) properties. Some of the SQL databases include:

  • MySQL

  • PostgreSQL

  • Oracle

An example of an SQL database where data is stored in the form of tables
An example of an SQL database where data is stored in the form of tables

NoSQL databases

The NoSQL (not only SQL) databases are designed to handle unstructured or semi-structured data. These databases provide more flexibility than relational databases because they don’t require a rigid schema and can be scaled horizontally across multiple servers. Some types of the NoSQL databases are:

  • Key-value store

  • Document databases

  • Graph databases

  • Columnar database

An example of a key-value of a NoSQL database
An example of a key-value of a NoSQL database

Advantages and Disadvantages of SQL and NoSQL Databases

Databases Type

Advantages

Disadvantages



SQL

  • Strong consistency and ACID compliance ensure data integrity
  • Supports complex queries and joins
  • Mature ecosystem with many tools and frameworks.
  • Limited horizontal scalability
  • Rigid schema can make it difficult to adapt to changing requirements
  • May face performance bottlenecks with large datasets.


NoSQL

  • High scalability for distributed systems
  • Flexible schema allows for easier adaptation to changing data structures
  • Designed to handle large volumes of unstructured or semi-structured data
  • Different nodes may return different values due to eventual consistency
  • Lacks complex querying features like joins
  • Requires custom query design depending on the database type

Time series databases

Many modern systems, such as IoT, monitoring systems, and stock markets, produce time-stamped data. Such data requires specialized databases known as time series databases that are optimized for managing and querying time-stamped data. Time series databases are highly efficient at storing and retrieving data indexed by time. Some use cases of the time series databases are IoT applications, financial market data, and environmental and weather data. Following are some of the examples of time series databases:

  • Amazon Timestream

  • InfluxDB

  • TimeScaleDB

  • Prometheus

An example of time series data showing Bitcoin (BTC) price over the period of 7 days

In-memory databases

As the name suggests, in-memory databases store data directly in the system’s RAM instead of on the disk, which helps rapid access to data. They are designed for applications where speed is critical, such as real-time applications, caching, and high-frequency trading. Because data is stored in memory. In-memory databases provide much lower latency compared to disk-based databases, though they typically trade off durability for performance. To mitigate data loss, the in-memory databases are used in tandem with other persistent databases. Some use cases of in-memory databases are high-frequency trading, session management, and online/live gaming applications. Following are some popular in-memory databases:

In-memory databases are very fast because they store data in RAM, but this comes with some trade-offs related to data persistence and recovery. To prevent losing data if the system crashes, they often use techniques like taking snapshots and keeping transaction logs. While these methods help to protect data, they can add some complexity and overhead, so it’s essential to find a balance between speed and reliable data protection.

NewSQL databases

NewSQL databases combine the best of both relational and NoSQL databases, i.e., ACID properties of SQL databases and the scalability and distributed nature of NoSQL databases. These databases are designed for high-throughput transactional workloads and require both consistency and scalability. The nature of these databases makes them ideal for modern web-scale applications such as online retail platforms, real-time analytics, and healthcare systems. Some common NewSQL databases are:

Vector databases

Vector databases store and manage large-scale high-dimensional vector data, often generated by large language models (LLMs) or machine learning models. These databases are designed to expedite the process of similarity search in big data, such as finding similar text, images, videos, and audio using their vector representations. These databases are primarily used in LLMs and recommendation systems where large volume of data is handled with higher performance and scalability. Some well-known vector databases include:

  • ChromaDB

  • PinCone

  • Milvus

  • Weaviate

  • ScaNN

A process of converting objects (documents) into vectors and storing them in a vector database
A process of converting objects (documents) into vectors and storing them in a vector database

Key Features and Use Cases of Each Database Type

Database Type

Key Features

Use Cases

Examples


Relational databases (SQL)

  • Structured data organized into tables with predefined schemas
  • Strong ACID properties
  • Supports complex queries with SQL
  • Data integrity and relationships are enforced through foreign keys
  • Financial systems
  • Enterprise applications
  • E-commerce platforms
  • MySQL
  • PostgreSQL
  • Oracle
  • Microsoft SQL Server


NoSQL databases

  • Schema-less or flexible schema
  • Handles unstructured, semi-structured, or structured data
  • Scales horizontally with distributed architecture
  • Social networks
  • Real-time analytics
  • Content management systems
  • MongoDB (document)
  • Cassandra (column-family)
  • Redis (key-value)
  • Neo4j (graph)


Time series databases

  • Optimized for time-stamped or time-ordered data
  • Efficient storage and querying of time-series data
  • Provides built-in functions for aggregation, downsampling, and analysis
  • IoT sensor data
  • Monitoring and observability
  • Financial and stock trading data
  • InfluxDB
  • TimescaleDB
  • Prometheus



In-memory databases

  • Stores data entirely in memory (RAM) for low-latency access
  • High-performance read/write operations
  • Ideal for caching and real-time applications
  • Often provides persistence options via disk snapshots
  • Caching layers
  • Real-time gaming leaderboards
  • Session management
  • Redis
  • Memecached



NewSQL databases

  • Combines the ACID properties of traditional SQL databases with the scalability of NoSQL
  • Distributed architecture for horizontal scaling
  • Supports complex SQL queries
  • Ensures consistency across distributed nodes
  • High-scale web applications
  • Real-time analytics
  • Large-scale transactional systems
  • Google Spanner
  • CockroachDB
  • VoltDB



Vector database

  • Optimized for storing and querying high-dimensional vector data
  • Commonly used for similarity search in AI/ML applications
  • Efficient indexing techniques like approximate nearest neighbor (ANN) search
  • Machine learning and AI
  • Recommender systems
  • Semantic search for large data sets
  • ChromaDB
  • PinCone
  • Milvus

Note: To understand more about large scale (big data), you can look at the difference between big data and data warehousing.

The main components of a database

There are many components of the database that are responsible for different operations. Some of the key components are:

  • Database schema or data model: The database schema and data model define the architecture of how data is organized within the database. It includes the tables, columns, data types, relationships (such as one-to-many or many-to-many), and constraints (like primary keys and foreign keys). A relational database schema may include a set of interrelated tables, while a NoSQL database may have a flexible schema.

  • Database management system (DBMS): The DBMS is the software layer that interacts with the database and manages all operations on the data including indexing and transaction management. It provides an interface to interact with the database in a secure, efficient, and reliable manner. The DBMS ensures data consistency, manages concurrent access, and enforces security via access controls. It also includes query processors to optimize SQL or NoSQL queries for efficient and fast data retrieval.

  • Data storage layer: Data storage refers to how and where the actual data is physically stored, whether on disk, SSD, or memory. For example, traditional databases store data on disk using storage engines that efficiently write and read data using techniques like B-trees or hashing. In contrast, in-memory databases, like Redis, store data in RAM to ensure high-speed access. The storage layer also includes data compression, caching, and replication to optimize space usage and access times.

Key features of a database

Some of the key features and importance of a database are:

  • Data management and storage: A database allows us to manage and store data physically, whether on disk, SSD, or memory.

  • Data manipulation: The database allows us to perform the CRUD (Create, Read, Update, and Delete) operations on the data.

  • Concurrency control: The database also controls the simultaneous access of multiple users accessing the data.

  • Data security: A database protects and secures data from unauthorized access and breaches. This involves authentication, encryption, and an access control mechanism.

  • Transaction management: Following the ACID properties, SQL databases ensure that all database operations are executed as a single unit and maintain data integrity.

Note: The ACID properties are typically not applied to the NoSQL database as they are designed for scalability, flexibility, and high availability. They usually follow BASE (Basically Available, Soft state, Eventual consistency) principles.

Database use cases

Databases are used in almost every field. We can store media files, images, songs, genomic and biological data, texts, and numbers. Social media companies have their own database systems to manage. The following are some of the businesses where databases are being used nowadays.

  • Banking
  • Insurance companies
  • Healthcare centers
  • Manufacturers
  • Law firms
  • Social networks
  • Medicines companies
  • Bioinformatics
  • Commerce stores

Note: You might be interested in top 20 database interview questions.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


Why should I use a database instead of storing data in files?

Databases offer advantages like faster search and retrieval, concurrent access, data integrity, and better security. Files are prone to data corruption and are difficult to scale when handling large volumes of information or concurrent users.


What is a database and its type?

A database is an organized collection of structured data that can be easily accessed, managed, and updated using a database management system (DBMS). It allows for efficient querying, storage, and manipulation of large amounts of data. There are many types of databases, including relational databases (SQL databases), NoSQL databases, time series databases, in-memory databases, NewSQL databases, and vector databases.


What is DBMS?

A Database Management System (DBMS) is software that interacts with the database to perform tasks like storing, retrieving, updating, and managing the data. It ensures data consistency, security, and efficient access.


How do databases ensure data integrity?

Databases ensure data integrity through rules such as constraints, ACID (Atomicity, Consistency, Isolation, Durability) properties in SQL databases, and conflict resolution and eventual consistency mechanisms in NoSQL databases.


What are the differences between SQL and NoSQL databases?

SQL databases are structured and use a rigid schema, ensuring strict ACID compliance for transactions, making them ideal for applications requiring data integrity, like banking systems. They typically scale vertically by enhancing a single server’s power. In contrast, NoSQL databases offer flexible schemas for unstructured or semi-structured data, prioritizing horizontal scalability by adding more servers. While NoSQL may relax ACID properties for improved performance, it is best for handling big data and real-time applications, such as social media and IoT.


How do I choose the right database for an application?

To choose the right database for an application, you need to consider the following factors: Here are the factors to consider when choosing the right database for your application:

  • Structure of the data: For structured data and complex queries, go for SQL. For unstructured or semi-structured data, use NoSQL.
  • Scalability requirements: For scalability, your choice should be a NoSQL database. SQL databases typically scale vertically.
  • Consistency requirements: If your application demands strong consistency, choose SQL; otherwise, NoSQL is preferred for eventual consistency.
  • Performance: Analyze your performance needs, such as read/write speed and transaction volume. In-memory or NoSQL offer a good performance compared to SQL databases
  • Use case: Match the database to the application’s specific requirements, such as real-time analytics, logging, or relational data processing.

What sequence of topics should I follow if I have to learn about databases from scratch?

To learn databases from scratch, you should follow the following sequence of topics:

  1. Introduction to databases
  2. Database models
  3. SQL basics
  4. Database design
  5. Advanced SQL
  6. Transactions and ACID properties
  7. NoSQL databases
  8. Database management systems (DBMS)
  9. Data security and backup
  10. Scalability and performance tuning
  11. Real-world applications and case studies
  12. Emerging trends

Copyright ©2024 Educative, Inc. All rights reserved