Statement based replication

Relational databases such as MySQL or Oracle have associated database query languages such as SQL. These languages consist of statements that read, write or modify records within the database. One of the way to maintain a replication log on the master node is to simply record every statement e.g. INSERT, UPDATE or DELETE that is received from clients. Note, we don’t need to record statements that perform reads since they don’t mutate the database. The logged SQL statements can then be sent to the followers who execute these statements on the replica of data they hold to get in sync with the data on the master.

Problems

This replication may sound simple and effective but it comes with its own set of problems. Some of these are:

The SQL statements can consist of non-deterministic functions such as NOW() which returns the current time or RAND() which returns a random number. These functions are likely to evaluate to different values on different nodes. This issue can be overcome by having the master replace the call to the non-deterministic function with a fixed value and ...

Basics

Kafka Producer

Kafka Consumer

Kafka Internals

Conclusion

Appendix

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Log Replication

Statement based replication

Problems