Cassandra Performing Queries Efficiently
Look into how Cassandra tries to perform queries efficiently.
In Cassandra, performing a query that does not use the primary key is guaranteed to be inefficient because it will need to perform a full table scan querying all the cluster nodes.
Methods to perform queries efficiently
Two alternatives can be used to solve the above problem:
- Secondary indexes
- Materialized views.
Secondary indexes
A secondary index can be defined on some columns of a table. This means each node will index this table locally using the specified columns. A query based on these columns will still need to ask all the system nodes, but at least each node will have a more efficient way to retrieve the necessary data without scanning all the data.
Materialized views
A materialized view can be defined as a query on an existing table with a newly defined partition key. This materialized view is maintained as a separate table, and any changes on the original table are eventually propagated to it. As a result, these two approaches are subject to the following trade-off.
Trade-offs with secondary indexes and materialized views
-
Secondary indexes are more suitable for high cardinality columns, while materialized views are suitable for low cardinality columns as they are stored as regular tables.
-
Materialized views are more efficient during read operations than secondary indexes because only the nodes that contain the corresponding partition are queried.
-
Secondary indexes are guaranteed to be strongly consistent, while materialized views are eventually consistent.
Get hands-on with 1300+ tech skills courses.