Thursday, October 30, 2008

Replication essentials

Replication is the copy and distribution of data and database objects from one database to another with the goal of maintaining synchronization between the two databases.

The origin database is call publisher and the destination database the subscriber. Each database has an agent (SQL Server agent) responsible of capturing the change from publisher and applying the change onto the subscriber.

In a distribution implementation you will have to choose between a pull subscription and a push subscription. In a push subscription, publisher agent carries the load of moving data over to the subscriber, in a pull subscription the subscriber agent actually is in charge of reading data from the publisher.

Sql server allows implementation of 3 methods of replications:

Snapshot replication

In a snapshot replication format, a copy of the publisher is send to the subscriber from time to time; the exact picture of data or database object will be copy in bulk over to the subscriber.

This method is recommended only if the destination database functionalities requirements include a reasonable degree of latency, the database receiving data will be in delay synchronization with the origin database.

Snapshot replication is the most commonly implemented method of replication; it is a good fit for low bandwidth network or low activity network.

Transactional replication

The term transactional here refers to the fact that SQL Server captures and applies changes using DML operations.

All committed changes are called transaction; the difference with snapshot replication here is the fact that changes are send to the subscriber as they occur.
It is possible to setup the publisher to cache changes and send them at a certain frequency or transmit all changes in real time. During the time interval transaction are stored in log files before transmission to the subscriber.

The initial step in a transactional replication is a snapshot replication where a copy of the publisher is send to the subscriber.

Transactional replication required a high bandwidth and reliable network connection; in case of delay in transmission, transactions are cached in transaction log before transmission. Depending of the size of data to replicate, transaction log can grow quickly.

Transaction replication is recommended for data ware house and reporting databases where less transaction occurs and it is recommended to disable replication before massive load.

Merge replication

Merge replication is designed for systems where servers must have the same data at defined intervals. Think of this as a both ways differential backup where changes from all databases must be mutually applied; data are move both ways.

This method of replication obviously present risk of conflicts and data inconsistencies; it is the most difficult to implement and manage. Implementation includes creation of additional systems table and required adding new column to identify server to replicated tables.

Replication agent is here call merge agent and is equally responsible for carrying the replication workload.

Update for SQL Server 2008

In SQL Server 2008, you can also synchronize databases by using Microsoft Sync Framework and Sync Services for ADO.NET

No comments: