CQRS : Schema Design , Synchronization and Consistency Considerations

This blog post is an extension of my previous article on Event-Driven Architecture (EDA) with Event Sourcing. In this post, I will make an attempt to delve deeper into the CQRS (Command Query Responsibility Segregation) pattern, exploring its components and flow, and various considerations in terms of schema design, command and query model synchronization and consistency implications.

What is Command Query Responsibility Segregation ?

CQS (Command-Query Separation) : CQS principle states that every method should either be a command that performs an action, or a query that returns data, but not both. Commands are methods that change the state of an object, while queries are methods that return data without modifying anything.

CQRS (Command Query Responsibility Segregation) : CQRS is an architectural design pattern based on CQS principle and it takes a step further by separating the data access patterns of an application. In essence, CQRS is characterized by two distinct models.

Command model or Write model, is responsible for handling commands that modify the state of the system (Write, Update and Delete). The read model or query model is responsible for handling queries that retrieve data (Select queries).


Core Features of CQRS

The CQRS (Command Query Responsibility Segregation) pattern emphasizes Separation of Concerns by splitting the system’s responsibilities into Command model and Query model. This separation provides two significant benefits, Scalability and ability to use specialized models.

Separation of Concerns

CQRS inherently separates the write model from read model, considering each model is optimized for its specific purpose. This makes the architecture cleaner and easier to maintain.

Scalability : Command and Query sides can be scaled independently based on their workload requirements. Hence CQRS is useful for applications with high read-to-write ratio or vice versa.

For example, a read-heavy application can scale query side without impacting the command side.

Specialized Model : CQRS makes it possible to use different data structures tailored to the specific needs of the command and query sides. This flexibility allows each side to perform efficiently without being constrained by the requirements of the other.

The key to fully leveraging the benefits of CQRS lies in making strategic choices regarding database selection, schema design, synchronization mechanisms, and addressing consistency implications.


Command Model with Normalized Schemas

Since write model modifies the state of data, it is only prudent to design the schema in normalized fashion so that it enables data integrity, avoid data duplication and ease of updates, which inherently make it easier to manage complex relationships and business rules. Normalization structures data into smaller, related tables, ensuring that each piece of information is stored only once and can be consistently updated or modified.

Database Selection – Write Heavy

Event Streaming Systems – Kafka and Kinesis

Kafka and Kinesis can handle large amounts of event data efficiently. They store events as immutable, replayable streams, making them compatible for high write loads.

NoSQL Databases – Cassandra and DynamoDB

Cassandra and DynamoDB are built for high-speed, write-heavy operations. They are scalable and fault-tolerant, reliable performance under heavy load.

Relational Databases – PostgreSQL and MySQL

Oracle, PostgreSQL and MySQL are great for storing commands with transactional consistency.1 They work well when there is a need for reliable data validation.

Query Model with Denormalized Schemas

Read model focuses on query performance and scalability. Schemas are typically denormalized to optimize query performance, often organizing data in a way that directly matches the expected query patterns. How many times have we come across, queries suffering increased latency ? I have seen many many times, it only calls for revising the architecture, so to speak.

Not having a denormalized view on the query side can significantly impact the performance, complexity, and scalability of data retrieval and computation, especially when dealing with very large datasets.

Here are a few issues with retrieving large datasets without denormalization:

  • Increased Join Complexity – Joins on large datasets (like millions of rows) are computationally expensive and can lead to slow query execution.
  • High Latency – Real-time or near-real-time queries suffer from increased latency, as they need to traverse multiple relationships and aggregate data on-the-fly.
  • Reduced Query Optimization – Query optimizers in RDBMS systems are effective, but excessive joins and aggregations can still bottleneck performance.
  • Read bottlenecks – As data volume grows, performing joins across normalized tables can cause read bottlenecks.
  • Duplication of Effort – Queries with similar computation requirements (e.g., total loans per customer, repayment history or some common aggregations) repeat the same logic.
  • Impossible Real-Time Analytics – Real-time reporting becomes difficult, as normalized queries take longer to execute.
  • Elevated hardware costs – Scaling hardware to compensate for query complexity increases operational costs.
  • Concurrency issues – Simultaneous large queries on normalized tables can lead to execution contention and degraded performance.

Database Selection – Read Heavy

Any Read-heavy databases that support a significantly higher volume of read operations (queries, data retrieval) compared to write operations (inserts, updates, deletes) should be a good fit as long as it fits business requirements.

Some example are, Cassandra (columnar database), Elasticsearch (search and analytics), Redis (in-memory distributed data store ), MongoDB(doc database) and in the category of read replicas in SQL databases we have MySQL, PostgreSQL, and others can have read replicas to offload read queries from the primary database.


Example – Schema Design in Loan Management System

Command Model Schema (Normalized)

In a loan management system, the Command Model is designed to handle updates and maintains data consistency by adhering to strict business rules. This model uses a normalized schema, where tables are structured with foreign key constraints and other integrity constraints to maintain data accuracy and support complex business relationships.

customer_idnameemailphoneaddress
1001John Doejohn.doe@email.com555-555-1212123 Main Street, Anytown, CA
1002Jane Doejane.doe@email.com555-555-1112
Somewhere 123 Street, Anytown, CA
Customer Table
loan_idcustomer_idloan_typeprincipal_amtinterest_rateloan_start_dateloan_end_date
20011001personal10000.005.02024-01-012026-01-01
20021002home200000.004.52024-02-182034-02-15
Loan table
transaction_idloan_idcustomer_idamounttransaction_typetransaction_date
400120011001500.00Repayment2024-02-01
400220021002200000.00Disbursement2024-02-18
Transaction table
repayment_idloan_idamountpayment_datepayment_status
30012001500.002024-02-01Paid
300220021000.002024-03-29Due
Repayment table

Query Model Schema (Denormalized)

Pre-computing and storing common aggregations and joins is a key strategy to optimize the query side in a CQRS design pattern. There are several ways this can be done, the most common approach which always works is creating denormalized views or materialized data structures, denormalized tables that are optimized for fast reads.

Extending above example, query model for loan summary view.

load_summary_view

customer_idcustomer_nameloan_idloan_typeprinciple_amtinterest_ratebalance_due
1001John Doe2001personal10000.005.09500.00
1002Jane Doe2002home200000.004.5199000.00

Integrating Write and Read Model a.k.a Synchronization

One approach for achieving synchronization between write and read model is through use of Event Sourcing technique. This is kind of basic strategy that is used usually to sync between primary database and read replicas in RDBMS. We could extend this idea further by leveraging database-level events.

When a command is executed, it generates a new log that represents the change in the application’s state, which are usually recorded in some kind of log as WAL2(PostgreSQL, SQLite), binary logs(MySQL, Maria DB) or Redo logs (Oracle) based on the type of the database used. The write model then uses CDC events (Change Data Capture)3 to update its state, and the read model can be updated by replaying the events.

There are several tools like Debezium4, Oracle GoldenGate, and Amazon DMS generate CDC (Change Data Capture) events by monitoring database logs (WAL ,binlogs, Redo logs, transaction logs etc), extracting changes (INSERT, UPDATE, DELETE), and producing structured events (serialized into structured formats like JSON, Avro, or Protobuf) and published to message queues, event streams, or files for downstream systems to consume. CDC events typically include – eventType, table/schema, data (before and after update) and metadata like timestamp.

Example : Raw log entry into logical CDC events in JSON format. ( Debezium generated CDC event)

{
  "operation": "update",
  "schema": "auto_loan",
  "table": "refinance",
  "before": { "id": 1, "status": "pending" },
  "after": { "id": 1, "status": "approved" },
  "timestamp": 1675467890000
}

Consistency Implications

The greatest challenge in the CQRS pattern emerges from the fact that Command model and Query model can be out of sync momentarily. There are some implications to consider when it comes to consistency.

One common approach as indicated in the above section (in the CQRS pattern) is event sourcing, where changes to the system are captured as a series of immutable events. However, when using event sourcing, there may be a delay between when a command is executed and when the corresponding event is available for the read model to consume. This delay can lead to temporary inconsistencies between the write model and the read model.

To minimize the impact of these inconsistencies, we can use strategies like eventual consistency or caching.

  • Eventual consistency means that the read model will eventually be updated to reflect the changes made by the write model, even if there is a delay.
  • Caching can also help by storing frequently accessed data in memory, which can reduce the likelihood of inconsistencies.

To make the synchronization happens as soon as possible, we can use techniques like message queues or event-driven architecture. Message queues allow to decouple the write model and read model, so that the write model can continue processing without waiting for the read model to catch up. Event-driven architecture (EDA) takes this a step further by allowing the write model to publish events that the read model can subscribe to, so that the read model is always up-to-date.

Write Model -> Event Bus -> Read Model projector OR app -> Read Model Store 

When to Rethink Using CQRS ?

In line with the YAGNI (You Aren’t Gonna Need It) principle, which is a key practice from Extreme Programming5, implementing CQRS (Command Query Responsibility Segregation) may be unnecessary if the system does not have specific requirements for separate read and write optimizations, scalability challenges, or complex projections. Introducing CQRS in such cases could create operational overhead and unnecessary maintenance hazards.

To make a decision, I would weigh in the redesign costs, the expected rate at which the application grows in size in terms of data, to see if we need CQRS as part of initial setup.

That’s a wrap! Hope you found it helpful, and as always, I learned a few things along the way too.

  1. transactional consistency : Sequence of operations performed within a transaction adheres to the ACID properties (Atomicity, Consistency, Isolation, Durability) ↩︎
  2. WAL : Write Ahead Logs ↩︎
  3. CDC : CDC (Change Data Capture) events are higher-level abstractions of changes that occur in a database and they are derived out of db-level logs(low-level logs) ↩︎
  4. Debezium : For more info visit https://debezium.io/documentation/reference/stable/index.html ↩︎
  5. Extreme Programming : An agile software development methodology that values flexibility, teamwork, and customer satisfaction. ↩︎