DataBase Replication 101

Amit Raj
Dev Genius
Published in
4 min readJun 26, 2022

--

This blog is part of the series where we discuss 101 concepts from Ground Zero for an audience that has limited starting knowledge. This article comes in the Intermediate-Level Series since it involves understanding the of Database Replication which is primarily used for copying application data in multiples nodes either spread across data-centres, availability zones and cloud regions to ensure data availability in case of single node failures.

Some of the earlier blogs in the 101 Series are as follows:

DataBase Sharding 101
Caching Strategy 101
CORS 101
Circuit Breaker 101
Priority Queues 101
Async Communication 101

Database Design 101

What is DataBase Replication?

DataBase Replication is a process of creating multiple copies of application data into one more than one database servers. The replication can happen at real time I.e., with each Database Manipulation Query on the primary server or at scheduled frequencies at regular intervals during the day.

Also, depending on the Recovery Point Objective Needs of the overall business domain, a design choice between synchronous or asynchronous technique is made in production. The primary objective of data replication is to reduce downtime, improve availability of data and in some cases reduce application latency by used of read replicas for read only operations.

Types of Data Replication

Different cloud vendors have varied data replication techniques to copy data between multiple database instances. However, the underlying mechanism to mark a database transaction as complete and trigger an acknowledgment back to the client is used to different the techniques into following types

Asynchronous Replication

In asynchronous replication, as soon as the data is updated on the primary server, the client receives an acknowledgment. The replication to the slave happens in the background and the delay is measured using a replication lag metric.

Since there is delay, in case of outages data for the duration of replication lag is missing on the slave server. Hence, this method is not preferred in application with Zero Data Loss requirements for the business.

Synchronous Replication

In synchronous replication, the database transaction is marked as completed only once DML transactional changes to the slave are both propagated and applied i.e., the slave is consistent with master.

Since, the replication propagation and application to slave are additional steps in the transaction, this method is slower especially in case of cross regional write events. Hence, this method isn’t preferred for applications with strict latency requirements.

Semi-synchronous Replication

Semi synchronous replication waits for replication changes to be propagated to the secondary server, however the actual replication to slave may happen post the acknowledgment is sent back to the client.

Multi Master vs Single Master

Replication strategies can be classified based on number of primary nodes(read/write) as part of the Database Server as well as unidirectional/bidirectional nature of the replication process.

Single Master

In a single master setup, data is written only primary server and replicated to slaves using different replication strategies (log based, snapshot based, key based etc).

Given single node for primary data write, single-master setups are always consistent. In case of primary failure, this setup needs an additional failover step (manual/automated) before it starts taking back write queries from the client.

Multi Master

In multi-master setup, data from the client can be written to any of the primary servers. Replication works bi-directionally to ensure that data is eventually consistent post the conflict resolution has been for updates to similar entities.

In case of one of the master servers goes offline, the remaining primaries can handle the write load from the client, hence making the need of database failover a redundant step in this setup.

Summary

Replication helps fulfil High Availability and Disaster Recovery needs of critical business domains to function seamlessly in case of outages. Clubbed effectively with a traffic routing strategy for compute layer of the overall functional architecture, it solves for both RTO and RPO alignments. Most of the PAAS Databases such as Post Gress, My SQL, Azure SQL, Azure Cosmos, Aurora etc have automated replication setup handy to simplify minimum operational setup for end customers.

We will look to cover a deep dive of one of the Log Based, Snapshot Based, Key Based replication strategy with a Database example in an expert blog of the series.

For feedback, please drop a message to amit[dot]894[at]gmail[dot]com or reach out to any of the links at https://about.me/amit_raj.

--

--