Introduction to SQL Server Replication
Replication is a SQL Server feature that synchronizes data from a database (known as a publisher) to one or more databases (known as subscribers) on the same or different SQL Server instances.
Consider the following diagram:
The preceding diagram depicts a typical implementation of replication. A replication has a number of components that work together to synchronize data between databases.
Let's look at these components in detail:
Publisher: A publisher is a database that facilitates the data for replication.
Publication: A publication is a set of objects and data to replicate. A publisher (database) can have one or more publications. For example, a database has two schemas,
finance
andsales
. There's one publication that has objects and data for thefinance
schema and another publication that has objects and data for thesales
schema.Articles: Articles are the database objects that are to be replicated such as tables and stored procedures. A publication can include one or more selected database objects and data.
Distributor: A distributor is a database (distribution database) that stores the data to be replicated from one or more publishers. The distribution database can be on the same instance as the publisher (which happens in most cases) or can be on a different SQL Server instance. Created as part of the replication database, it also stores the replication metadata such as publisher and subscriber details.
A better understanding of distribution databases is crucial in troubleshooting replication.
Subscriber: A subscriber is a database that subscribes to one or more publications from the one or more publishers in order to get the replicated data. A subscriber can also update the publisher data in case of merge or peer-to-peer transactional replication. A subscriber database can be on the same SQL Server instance as the publisher or on a different SQL Server instance.
Subscription: Subscription is the opposite of publication. The subscriber connects to the publisher by creating a subscription for the given publication.
There are two types of subscriptions, push and pull subscriptions. In the case of a push subscription, the distributor updates the subscriber as and when data is received (distribution agent is at distributor). In a pull subscription, the subscriber asks the distributor for any new data changes, as scheduled (distribution agent is at the subscriber).
If we now look at the preceding diagram, the publisher database has two publications, one for finance
and one for the sales
schema. The replication agent gets the changes from the publisher and inserts them into the distribution database.
The distribution agent then applies the changes to the relevant subscribers. There are two subscribers: one has a subscription to the finance
publication and another subscribes to the sales
publication.
Replication Agents
Replication agents are the standalone executables that are responsible for replicating the data from a publisher to a subscriber. In this section, we will cover replication agents in brief, and we will look at them in detail later in this book.
Snapshot Agent
The snapshot agent creates the selected articles and copies all of the data from the publisher to the subscriber whenever executed. An important thing to note here is that the subsequent execution of the agent doesn't copy the differential data; rather, each run clears out the existing schema and data at the subscriber and copies the schema and data from the publisher.
The snapshot agent is run at the distributor and is used via snapshot replication. It is also used in transactional and merge replication to initialize the subscriber with the initial data.
Log Reader Agent
The log reader agent scans the transaction log for the transactions marked for replication and inserts them into the distribution database. It is used only in transactional replication and provides continuous replication from the publisher to the subscriber.
Each publication has its own log reader agent; that is, if there are two different databases with transactional replication, there will be two log reader agents, one for each database.
The log reader agent runs at the distributor.
Distribution Agent
As the name suggests, the distribution agent distributes (applies) the data that's inserted into the distribution database by the log reader agent to the subscribers.
The distribution agent runs at the subscriber if it's a pull subscription and at the distributor if it's a push subscription.
Note
There's also a queue reader agent that's used in bidirectional transactional replication. Bidirectional transactional replication is now obsolete.
Merge Agent
Used in merge replication, the merge agent applies the initial snapshot to the subscriber (generated by the snapshot agent) and then replicates the changes from the publisher to the subscriber and from the subscriber to the publisher as and when they occur, or when the subscriber is online and available for replication.
There is one merge agent for one merge subscription.