Content Server Manual / 3.9.7 Analyzing the Replicator State

Content Server Manual / Version 2201

3.9.7 Analyzing the Replicator State

The replicator component is a Replication Live Server. It can be started and stopped while the server itself continues running. It consists of a controller process and a number of stages that process events for the Master Live Server. Events are always processed in the order in which they were created.

The replicator depends on the availability of two servers, two databases, and a network connection. The replication fails to make progress if any of these components fails. This section gives some hints for analyzing the replicator state if you suspect that events from the Master Live Server are not properly replayed on the Replication Live Server.

Checking the Server States

Using

cm runlevel -u <user> -p <password>
cm systeminfo -u admin

you can verify whether both servers are up and in runlevel online. The system information also tells you whether the initial replication of the Replication Live Server has been completed successfully.

An alternative way to check the replication service status is by querying the actuator HTTP endpoint as follows:

 curl -X GET http://localhost:42081/actuator/replicator

Checking the Replicator Configuration

Look at the IOR URL of the Master Live Server in the property replicator.publicationIorUrl and at the property replicator.enable, which starts and stops the replicator. In case of an error that is not automatically healed for an extended period, it can make sense to set replicator.autoRestart=false to ensure that the error condition can be analyzed without continuous restart attempts of the replicator.

Checking Replicator Startup Messages

If in doubt whether and when the replicator was started, check the log for messages of the form

[CURRENT_DATE] Info: cap.server.replicator:
Action(name="StartAction", completed=false): running
[CURRENT_DATE] Info: cap.server.replicator:
Replicator: pipeline created
...
[CURRENT_DATE] Info: cap.server.replicator:
Replicator: connected
[CURRENT_DATE] Info: cap.server.replicator:
Action(name="StartAction", completed=true): completed

where CURRENT_DATE is the date when the replicator was started, typically shortly after the server start. If these messages are repeated over and over again, the connection to the Master Live Server might be broken, especially if only the pipeline created message is printed, but the replicator never claims to be connected.

Checking Replicator Status Messages

After a successful start, the replicator writes frequent status messages to its log file on the log facility cap.server.replicator at log level info. A healthy idle replicator looks like this:

[CURRENT_DATE] Info: cap.server.replicator: EventStatistics: 
Replicator(enabled=true, initialized=false, state="running", 
pipelineUp=true, connectionUp=true, logEvents=false,
autoRestart=true, checkStream=true, checkTimeout=300)

[CURRENT_DATE] Info: cap.server.replicator: EventStatistics: 
IncomingCounter(lastSequenceNumber=SEQ_NO, 
lastStampedNumber=SEQ_NO, count=EVENT_COUNT, 
lastEventArrived=LAST_EVENT_DATE, 
startedAt=REPLICATOR_START_DATE)

[CURRENT_DATE] Info: cap.server.replicator: EventStatistics: 
CompletedCounter(lastSequenceNumber=SEQ_NO, 
lastStampedNumber=SEQ_NO, count=EVENT_COUNT, 
lastEventArrived=LAST_EVENT_DATE, 
startedAt=REPLICATOR_START_DATE)

where REPLICATOR_START_DATE is a date very close to the start of the Replication Live Server or the last start of the replicator, if the replicator has been restarted since the server start. LAST_EVENT_DATE should be a date very close to the last successful publication or the last replicator start, which ever comes later. SEQ_NO should be the sequence number of the last event received or 0, if no content has been published since the replicator start.

If messages of this type do not appear once per minute, check the log configuration and use the above mentioned command to check the server state. Make sure the replicator is enabled in the file replicatior.properties.

The three entries in the log have the following meaning:

The first line tells you to which extent the replicator is up. It also provides some basic configuration information:

enabled: true, if the replicator is enabled;
initialized: true, if the initial replication ever completed successfully during a previous or the current run of the server; if the current run performed the initialization, it is necessary that the replicator also caught up with the continuous event stream of the Master Live Server;
state: running, if the replicator pipeline is up, not started, if the replicator was never started during the current run of the server, stopped, if the replicator pipeline has been completely stopped, and failed in the rare case that the replicator pipeline controller itself died;
pipelineUp: true, if the replication pipeline is ready to process events; this does not imply that events are actually being retrieved or processed, just that the infrastructure is available;
connectionUp: true, if the connection to the Master Live Server has been established successfully;
logEvents: true, if individual events are logged as they propagate through the replicator pipeline;
autoRestart: true, if the replicator restarts automatically, if the event stream from the Master Live Server is broken;
checkStream: true, if the replicator checks regularly whether the event stream from the Master Live Server is still intact;
checkTimeout: the interval between two checks of the event stream from the Master Live Server.

The second line reports on the incoming events from the Master Live Server. When a publication is performed, the reported values should quickly rise to the last sequence number reported at the Master Live Server by cm events.

lastSequenceNumber: the sequence number of the last Master Live Server event that arrived at the replicator; this value will stay 0 until the first event is received after a restart and while the initial replication is performed.
lastStampedNumber: the sequence number of the last stamp event from the Master Live Server; such an event indicates the end of a publication;
count: the total number of event that arrived since the replicator was started;
lastEvenArrived: the date of the last arrival of an event from the Master Live Server;
startedAt: the start date of the replicator.

The third line reports the complete processing of events by the replicator pipeline. The reported properties are identical to the properties reported by the incoming event counter. Normally, the values reported here should lag only slightly behind the properties for the incoming events or should match them exactly. However, there are legitimate reasons for differences:

During the initial replication, the incoming events may already come from the live event stream, showing a large positive number, while the completed events are still drawn from the initial synthetic replay of the Master Live Server repository, showing 0 as the sequence number.
During times of very high load and after a long downtime, the incoming event might be ahead of the processed events for an extended period, until the replicator has caught up with the live event stream.

Generally, you should not worry about the health of the replicator as long as the property count of the processed events is continually rising, because that indicates that events are still being processed.

Interpreting Special Messages

The replicator outputs quite a lot of log messages in special occasions. The most frequent messages will be discussed.

replicator still X events behind, will not yet go online: The replication was started and has just replicated another complete publication, but the live event stream is still way ahead, so that it is not safe to switch the replicator online. Use cm runlevel to force a switch.
initial replication complete, will not go online: A freshly installed replicator has finished its initial replication. Use the opportunity to change the various default passwords before switching the server into online mode using cm runlevel.
possibly disconnected from event stream: The replicator suspects that the Master Live Server is no longer feeding events. Depending on the configuration, the replicator may restart itself.
resource to be replicated already destroyed or version to be replicated already destroyed: While processing an event for a resource or a version, that object is no longer readable. It is assumed that a subsequent destroy event has cause this situation. This message only indicates a temporary inconsistency in the repository that will be healed automatically.
cannot initialize repository as the repository not empty: Typically indicates that a previous initial replication did not complete. The replicator cannot recover from that failure. Drop the database schema and retry, possibly in times of lower load or with more memory allocated to the server process. If the initial replication fails repeatedly, create a new Replication Live Server from a backup of the Master Live Server as previously discussed.

Search Results

Table Of Contents

Filter