We've been talking about replication log and change stream, especially in the previous article. This is the mechanism the leader uses to communicate changes to the followers. As you can imagine, there are different implementations with their advantages and disadvantages, and it's important to understand the way changes flow from the leader to the followers in your system.
We will take a look at 3 implementations of replication log:
- Statement-based replication
- WAL shipping
- Logical-based replication
So, let's get started with the first one.
This is the simplest implementation.
The leader keeps a log with every request that has the potential to mutate data. Writes, updates, deletions, and other such operations become entries in the log. The leader periodically sends this log to each follower. When a follower receives the log, it executes every statement in order.
The advantage of this method is that it's extremely simple. The downside is that in order to work, two important restrictions must be placed in your system:
Non-deterministic behavior is forbidden: Non-deterministic operations/statements like the generation of a random number, writing the current time or triggering secondary actions are all forbidden. The problem with this type of operation is that they will end up generating different values in every replica, thus breaking your system's data consistency. If your system needs to perform this type of operation, it won't be able to support statement-based replication.
Concurrency can't be supported: Statements that use data already in the DB must be executed in order. Operations of the type
UPDATE X WHERE Ycan have different results if you change the order in which the statements in the log are executed. This forbids you from executing statements concurrently. This could be a problem if you want to execute concurrent transactions to improve performance.
These two restrictions seem trivial, but there are many production edge-cases that fall into one of these two categories. Because they tend to be commonplace, this approach is rarely used for replication. However, if your DB implementation requires all your statements to be completely deterministic then statement-based replication can be safely used in production.
There are commercial systems that enforce determinism and use this approach. An example is VoltDB and its enforced deterministic procedures.
A write-ahead log (also known as WAL or redo log) is an additional on-disk data structure that some databases include. This is an append-only file that includes every modification on the DB's files. Every change in data is written in the log before being applied. Databases whose storage engine uses B-Trees for representing data on the disk include a write-ahead log. Other storage-engine mechanisms, like log-structured storage, write all changes to a log (which is compacted in the background periodically) as their way of representing data. The important thing to know is that DBs usually have an append-only log that can be used to restore data.
WAL replication works by sending this file to every follower after the leader writes the changes to its own disk. The followers then use the log to restore their own copies of the data.
This approach has (you guessed it) an important drawback: the log describes the data at a very low-level (details at the byte level) and is tightly coupled to the storage engine. Databases can change their storage formats from one version to the next, which is why this approach might not work if your system is running different DB versions.
Logical-based replication is an approach that supports different log formats for replication and storage engine. For a relational DB, logical logs are a sequence of records that describe the operations performed at row-level. Every entry in the log describes the operations at a higher level of abstraction, some examples are:
- Inserts are represented as a row containing the new values of all columns of the record.
- Deletes contain all the information needed to uniquely identify the deleted row.
- Updates contain all the information needed to uniquely identify the updated row and all the columns with the updated values.
- Transactions that modify multiple rows will result in an entry in the log for every modified row. A special indicator entry states that the transaction was committed.
The main advantage of this approach is the decoupling from the storage engine. This lets you keep backward compatibility and deploy multiple machines with different storage engines. If you need to perform software updates you can first upgrade the followers, perform failover and update the leader and finally re-join the old leader as a follower. This enables you to perform software updates with no downtime.
An extra upside is that these logs are easier to understand and parse. There are many tools that let you perform analysis on data collected from logical logs.
It's not magical anymore
Now you understand how data flows from leaders to followers.
This is one of those concepts that at first seem like magic. What exactly is the leader sending to every follower to notify them about the changes in data? is it a series of SQL statements or is it something else?
Understanding how replication works is about learning about these little details. What in the beginning looks like a complete mistery is usually just a collection of very clever mechanisms and protocols that work together to support complex systems.
Now that we understand dataflow between leaders and followers it's time to learn about the dangers of replication lag.
What to do next:
- Read the next article in this series here
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- This series is based on Designing Data-Intensive Applications, make sure to take a look at chapters 5 and 6 for a more in-depth explanation of replication and sharding. This and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)