Mirror Outage Procedures
Due to planned maintenance or to unplanned problems, the InterSystems IRIS® data platform instance on one or both of the failover members in a mirror may become unavailable. When a failover member’s InterSystems IRIS instance is unavailable, its ISCAgent may continue to be available (if the host system is still operating), or may also be unavailable (as when the host system is down). This section provides procedures for dealing with a variety of planned and unplanned outage scenarios involving instance outages or total outages of one or both failover members.
As noted in Automatic Failover Mechanics, there are two requirements for safe and successful failover from the primary failover member to the backup failover member:
-
Confirmation that the primary instance is actually down, and not isolated by a temporary network problem.
-
Confirmation that the backup has the most recent journal data from the primary, either because it was active when the primary failed (see Mirror Synchronization) or because it has been manually caught up (see Unplanned Outage of Primary Without Automatic Failover).
In reading and using this material, you may want to refer to Automatic Failover Rules to review the rules governing automatic failover.
For information about using the Mirror Monitor to determine whether a backup failover member is active or a DR async is caught up, see Mirror Member Journal Transfer and Dejournaling Status and Monitoring Mirrors.
This topic covers the following procedures:
-
Planned outage procedures
-
Unplanned outage procedures
-
Disaster recovery procedures
Planned Outage Procedures
To perform planned maintenance, you may need to temporarily shut down the InterSystems IRIS instance on one of the failover members, or the entire system hosting it. Situations in which you might do this include the following:
In this section, the term graceful shutdown refers to the use of the iris stop command. For information about the iris command, see Controlling InterSystems IRIS Instances.
In addition to the iris stop command, the SYS.MirrorOpens in a new tab API and the ^MIRROR routine can be used to manually trigger failover.
For information on shutting down the primary without triggering automatic failover, see Avoiding Unwanted Failover During Maintenance of Failover Members.
When there is no backup failover member available due to planned or unplanned failover member outage, you can promote a DR async member to failover member if desired, protecting you from interruptions to database access and potential data loss should a primary failure occur. See Temporary Replacement of a Failover Member with a Promoted DR Async for information about temporarily promoting a DR async member to failover member.
Maintenance of Backup Failover Member
When you need to take down the backup failover member InterSystems IRIS instance, you can perform a graceful shutdown on the backup instance. This has no effect on the functioning of the primary. When the backup instance is restarted it automatically rejoins the mirror as backup.
However, if the primary’s InterSystems IRIS instance is restarted (for whatever reason) while the backup’s host is shut down and the backup’s ISCAgent therefore cannot be contacted, the primary cannot become primary after the restart, because it has no way of determining whether it was the most recent primary. When you need to shut down the backup’s host system, you can eliminate this risk using the following procedure:
-
On the backup, demote the backup to DR async as described in Demoting the Backup to DR Async.
-
Shut down the former backup instance and its host system, complete the maintenance work, and restart the member as a DR async.
-
Promote the former backup from DR async to failover member, as described in Promoting a DR Async Member to Failover Member, to restore it to its original role.
If the primary is restarted after the backup has been demoted, it automatically becomes (remains) primary.
If you do not demote the backup before shutting it down, and find you do need to restart the primary InterSystems IRIS instance while the backup’s agent is unavailable, follow the procedures in Unplanned Outage of Both Failover Members.
Maintenance of Primary Failover Member
When you need to take down the primary failover member InterSystems IRIS instance or host system, you can gracefully fail over to the backup first. When the backup is active (see Mirror Synchronization), perform a graceful shutdown on the primary InterSystems IRIS instance. Automatic failover is triggered, allowing the backup to take over as primary.
When maintenance is complete, restart the former primary InterSystems IRIS instance or host system. When the InterSystems IRIS instance restarts, it automatically joins the mirror as backup. If you want to return the former primary to its original role, you can repeat the procedure—perform a graceful shutdown on the backup InterSystems IRIS instance to trigger failover, then restart it.
Avoiding Unwanted Failover During Maintenance of Failover Members
You may want to gracefully shut down the primary failover member without the backup member taking over as primary, for example when the primary will be down for only a very short time, or prevent the backup from taking over in the event of a primary failure. You can do this in any of three ways:
-
Demote the backup failover member as described in Maintenance of Backup Failover Member.
-
Gracefully shut down the primary InterSystems IRIS instance using the command iris stop /nofailover; the /nofailover argument is used as a precaution to avoid triggering failover.
-
Set no failover by clicking Set No Failover at the top of the Mirror Monitor page on either the primary or the backup. When no failover is set, the button says Clear No Failover and the Status Monitor options of the Mirror Status menu of the ^MIRROR routine indicate that this is the case. (See Monitoring Mirrors for more information about the Status Monitor option.)
Click Clear No Failover on either failover member to clear the no failover state and enable failover. The no failover state is automatically cleared when the primary is restarted.
Upgrade of InterSystems IRIS Instances in a Mirror
To upgrade InterSystems IRIS across a mirror, see the procedures described in Minimum Downtime Upgrade with Mirroring.
Unplanned Outage Procedures
When a failover member unexpectedly fails, the appropriate procedures depend on which InterSystems IRIS instance has failed, the failover mode the mirror was in (see Automatic Failover Mechanics Detailed), the status of the other failover member instance, the availability of both failover member’s ISCAgents, and the mirror’s settings.
-
Unplanned Outage of Primary Failover Member With Automatic Failover
-
Unplanned Outage of Primary Failover Member When Automatic Failover Does Not Occur
In reading and using this section, you may want to review Mirror Response to Various Outage Scenarios, which discusses the details of the backup’s behavior when the primary becomes unavailable.
Unplanned Outage of Backup Failover Member
When the backup failover member’s InterSystems IRIS instance or its host system fails, the primary continues to operate normally, although some applications may experience a brief pause (see Effect of Backup Outage for details).
When an unplanned outage of the backup occurs, correct the conditions that caused the failure and then restart the backup InterSystems IRIS instance or host system. When the backup InterSystems IRIS instance restarts, it automatically joins the mirror as backup.
If the backup fails in agent controlled mode (see Automatic Failover Rules) and the backup’s ISCAgent cannot be contacted, the primary’s InterSystems IRIS instance cannot become primary after being restarted, because it has no way of determining whether it was the most recent primary. Therefore, if you need for any reason to restart the primary InterSystems IRIS instance while the backup host system is down, you must use the procedure described in Maintenance of Backup Failover Member to do so.
Unplanned Outage of Primary Failover Member With Automatic Failover
As described in Automatic Failover Rules, when the primary InterSystems IRIS instance becomes unavailable, the backup can automatically take over as primary when
-
The backup is active and
-
receives a communication from the primary requesting that it take over.
-
receives information from the arbiter that it has also lost contact with the primary.
-
if the arbiter is unavailable or no arbiter is configured, contacts the primary’s ISCAgent to confirm that the primary instance is down or hung.
-
-
The backup is not active but can contact the primary’s ISCAgent to confirm that the primary instance is down or hung and obtain the primary’s most recent journal data from the ISCAgent.
See Automatic Failover in Response to Primary Outage Scenarios for a detailed discussion of the situations in which automatic failover can take place.
When the backup has automatically taken over following an unplanned primary outage, correct the conditions that caused the outage, then restart the former primary InterSystems IRIS instance or host system. When the InterSystems IRIS instance restarts, it automatically joins the mirror as backup. If you want to return the former primary to its original role, perform a graceful shutdown on the backup InterSystems IRIS instance to trigger failover, then restart it, as described in Maintenance of Primary Failover Member.
Unplanned Outage of Primary Failover Member When Automatic Failover Does Not Occur
As described in Automatic Failover Rules, the backup InterSystems IRIS instance cannot automatically take over from an unresponsive primary instance when the primary’s host system, including its ISCAgent, is unavailable, and any of the following is true:
-
The backup was not active.
-
The backup is prevented from taking over by an error.
-
The backup cannot verify that the primary is down, either because no arbiter is configured or because it lost contact with the arbiter before or at the same time as it lost contact with the primary InterSystems IRIS instance and its ISCAgent.
Under this scenario, there are three possible situations, each of which is listed with possible solutions in the following:
-
The primary host system has failed but can be restarted. You can do either of the following:
-
Restart the primary host system without restarting the primary InterSystems IRIS instance. When the primary’s ISCAgent becomes available, the backup obtains the most recent journal data from it if necessary and becomes primary.
-
Restart the primary host system including the primary InterSystems IRIS instance. The failover members negotiate until one becomes primary, with the other becoming backup.
-
-
The primary host system has failed and cannot be restarted. You can manually force the backup to take over. The procedures for this vary depending on whether or not the backup was active when it lost its connection the primary; there is some risk of data loss, as described in the following sections.
-
The primary host system is running but is network isolated from the arbiter as well as the backup; see Unplanned Isolation of Primary Failover Member for procedures.
Manually Forcing a Failover Member to Become Primary
When a failover member cannot become primary you can force it to do so, but there is a risk of data loss if you do this in any situation in which the last primary could have more recent journal data than the member you are forcing. The following procedures describe how to determine and manage that risk. If you force a member to become the primary when you cannot confirm that it has the most recent journal data, the other mirror members may be unable to rejoin the mirror and therefore need to be rebuilt (as described in Rebuilding a Mirror Member).
Before proceeding, confirm that the primary is down and will remain down during this procedure. If you cannot confirm that, it is best to abort this procedure in order to avoid the risk that the original primary becomes available again, resulting in both members simultaneously acting as primary. If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance.
Determining Whether the Backup Was Active Before Manually Failing Over
Assume two failover members called InterSystems IRIS A and InterSystems IRIS B. If the ^MIRROR routine confirms that the backup (InterSystems IRIS B) was active at the time contact with the primary (InterSystems IRIS A) was lost, and therefore has the most recent journal data from InterSystems IRIS A, you can manually fail over using a single procedure. When the connection was lost due to the primary failure, this poses no risk of data loss. However, when multiple failures occur, it is possible that an active backup does not have all of the latest journal data from the primary because the primary continued operating for some period after the connection was lost.
Determine whether the backup was active using this procedure:
-
Confirm that both the InterSystems IRIS instance and the ISCAgent on InterSystems IRIS A are actually down (and ensure that they stays down during the entire manual failover procedure).
-
On InterSystems IRIS B, run the ^MIRROR routine (see Using the ^MIRROR Routine) in the %SYS namespace in Terminal.
-
Select Mirror Management from the main menu to display the following submenu:
1) Add mirrored database(s) 2) Remove mirrored database(s) 3) Activate or Catchup mirrored database(s) 4) Change No Failover State 5) Try to make this the primary 6) Connect to Mirror 7) Stop mirroring on this member 8) Modify Database Size Field(s) 9) Force this node to become the primary 10) Promote Async DR member to Failover member 11) Demote Backup member to Async DR member 12) Mark an inactive database as caught up 13) Manage mirror dejournaling on async member (disabled) 14) Pause dejournaling for database(s)
-
Select the Force this node to become the primary option. If the backup was active at the time contact was lost, a message like the following is displayed:
This member was the active backup the last time it was connected and journal data up to '05/02/2024 16:57:35' in file /intersystems/iris/mgr/journal/MIRROR-MIR-20240502.004 If the primary has not done any work since that time,this member can take over without having to rebuild the mirror when the primary reconnects. Otherwise, the consequences of forcing this member to become the primary is that some operations may be lost and the other mirror member may need to be rebuilt from a backup. Do you want to continue? <No>
If you have access to the primary’s journal files, you can confirm that the cited file is the most recent before proceeding.
If the backup was not active at the time contact with the primary was lost, a message like the following is displayed:
Warning: If this member does not have all of the journal data which has been generated in the mirror, the consequence of forcing it to become the primary is that some operations may be lost and the other mirror member may need to be rebuilt from a backup of this member before it can rejoin as a backup. This member has journal data up to '05/02/2024 16:57:35' in file /intersystems/iris/mgr/journal/MIRROR-MIR-20240502.004 Do you want to continue? <No>
Manual Failover To An Active Backup
If the Force this node to become the primary option of the ^MIRROR routine confirms that the backup was active when it lost its connection to the primary, complete the manual failover procedure as follows:
-
Enter y at the Do you want to continue? prompt to continue with the procedure. The Force this node to become the primary option waits 60 seconds for the mirror member to become the primary. If the operation does not successfully complete within 60 seconds, ^MIRROR reports that the operation may not have succeeded and instructs you to check the messages log to determine whether the operation failed or is still in progress.
-
Once the ^MIRROR routine confirms that the backup has become primary, restart InterSystems IRIS A when you can do so. InterSystems IRIS A joins the mirror as backup when the InterSystems IRIS instance restarts.
Manual Failover When the Backup Is Not Active
Even when the ^MIRROR routine does not confirm that the backup (InterSystems IRIS B) was active at the time it lost its connection with the primary (InterSystems IRIS A), you can still continue the manual failover process using the following procedure, but there is some risk of data loss if you do. This risk can be minimized by copying the most recent mirror journal files from InterSystems IRIS A, if you have access to them, to InterSystems IRIS B before manual failover, as described in this procedure.
-
If you have access to the primary’s mirror journal files, copy the most recent files to InterSystems IRIS B, beginning with the latest journal file on InterSystems IRIS B and including any later files from InterSystems IRIS A. For example, if MIRROR-MIRRORA-20180220.001 is the latest file on InterSystems IRIS B, copy MIRROR-MIRRORA-20180220.001 and any later files from InterSystems IRIS A. Check the files’ permissions and ownership and change them if necessary to match existing journal files.
-
If you accept the risk of data loss, confirm that you want to continue by entering y at the prompt; the backup becomes primary. The Force this node to become the primary option waits 60 seconds for the mirror member to become the primary. If the operation does not successfully complete within 60 seconds, ^MIRROR reports that the operation may not have succeeded and instructs you to check the messages log to determine whether the operation failed or is still in progress.
-
Once the ^MIRROR routine confirms that the backup has become primary, restart InterSystems IRIS A when you can do so.
-
If InterSystems IRIS A joins the mirror as backup when the InterSystems IRIS instance restarts, no further steps are required. Any journal data that was on the failed member but not on the current primary has been discarded.
-
If InterSystems IRIS A cannot join the mirror when the InterSystems IRIS instance restarts, as indicated by the messages log message referring to inconsistent data described in Rebuilding a Mirror Member, the most recent database changes on InterSystems IRIS A are later than the most recent journal data present on InterSystems IRIS B when it was forced to become the primary. To resolve this, rebuild InterSystems IRIS A as described in that section.
-
Unplanned Isolation of Primary Failover Member
As described in Automatic Failover Mechanics, when the primary simultaneously loses contact with both the backup and the arbiter, it goes into an indefinite trouble state and can no longer operate as primary. Typically, when this occurs, the backup takes over and becomes primary. When the primary’s connection to the backup is restored, the backup forces the primary down; alternatively, you can force the primary down yourself before restoring the connection.
However, if a network event (or series of network events) causes the failover members and arbiter to all lose contact with each other simultaneously (or nearly simultaneously), there can be no primary because the backup cannot take over and the primary is no longer operating as primary. This situation is shown as the final scenario in the table Mirror Responses to Lost Connections in Arbiter Mode in the section Automatic Failover Mechanics Detailed. A similar situation can occur when the primary becomes isolated and the backup cannot take over because of an error.
When these circumstances occur, you have the following options:
-
Restore the connection between the failover members; when the former primary is contacted by the former backup, the members negotiate and one becomes primary, the other backup.
-
Without restoring the connection, if you can open a Terminal window on the primary, do so and run the ^MIRROR routine (see Using the ^MIRROR Routine) on the primary. The routine confirms that the primary instance is in an indefinite trouble state, and gives you two options:
-
If you confirm that the other failover member is down (possibly because you shut it down), that it never became primary, and that it did not create a mirror journal file later than the latest one on the primary, you can force the member to resume operation as primary. Once it has done so, and you restore the connection between the primary and the backup, the backup resumes operation as backup.
-
If you cannot confirm these conditions, you can shut the primary down. You can then manually fail over to the backup using one of the procedures described in Unplanned Outage of Primary Failover Member When Automatic Failover Does Not Occur.
-
-
If you cannot open a Terminal window on the primary, but can confirm that the other failover member is down, that it never became primary, and that it did not create a mirror journal file later than the latest one on the primary, you can restart the primary InterSystems IRIS instance and force it to become primary using the Force this node to become the primary option of the ^MIRROR routine. Alternatively, if you cannot confirm these conditions, you can ensure that the primary InterSystems IRIS instance is down and will stay down, then manually fail over to the backup using one of the procedures described in Unplanned Outage of Primary Failover Member When Automatic Failover Does Not Occur.
If you force the primary to resume operation as primary without confirming the listed conditions, you run the risk of data loss or both failover members simultaneously acting as primary. If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance.
Unplanned Outage of Both Failover Members
When both failover members unexpectedly fail, due the same event or different events, the appropriate procedures depends on whether you can restart either or both of the failover members within the limits of your availability requirements. The longer the mirror can be out of operation, the more options you are likely to have.
-
If you can restart both agents and at least one InterSystems IRIS instance, the failover members will negotiate with each other and automatically select which of them is to act as primary, returning the mirror to operation with no risk of data loss.
-
If you know with certainty which of the failover members was the last primary and you can restart it, it will not automatically become primary if it cannot communicate with the other failover member’s InterSystems IRIS instance or agent (because they are down), but you can manually force it to become primary, with no risk of data loss, using the Force this node to become the primary option of the ^MIRROR routine (as described in Unplanned Outage of Primary Failover Member Without Automatic Failover).
-
If you can restart only one of the failover members but don’t know whether it was last primary, you can use the Force this node to become the primary option of the ^MIRROR routine to manually force it to become primary with some risk of data loss.
Caution:If you force a backup that was not active to become the primary, some global update operations may be lost, and the other mirror members may need to be rebuilt (as described in Rebuilding a Mirror Member). If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance.
-
If you cannot restart either of the failover members, proceed to Disaster Recovery Procedures.
Disaster Recovery Procedures
As described in Async Mirror Members, a disaster recovery (DR) async member maintains read-only copies of the mirrored databases, making it possible for the DR async to be promoted to failover member should the need arise. The procedure for promoting a DR async is described in Promoting a DR Async Member to Failover Member. This section discusses three scenarios in which you can use DR async promotion:
In the procedures in this section, InterSystems IRIS A is the original primary failover member, InterSystems IRIS B is the original backup, and InterSystems IRIS C is the DR async to be promoted.
Manual Failover to a Promoted DR Async During a Disaster
When the mirror is left without a functioning failover member, you can manually fail over to a promoted DR async. The following procedures covers scenarios under which this is an option:
-
DR Promotion and Manual Failover with No Additional Journal Data
-
DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent
-
DR Promotion and Manual Failover with Journal Data from Journal Files
If you cannot confirm that the primary failover member InterSystems IRIS instance is really down, and there is a possibility that the instance will become available, do not manually fail over to another mirror member. If you do manually fail over and the original primary becomes available, both failover members will be simultaneously acting as primary.
When the primary InterSystems IRIS instance is in an indefinite trouble state due to isolation from both the backup and the arbiter in arbiter controlled mode, as described in Automatic Failover Mechanics Detailed, you cannot promote a DR async to failover member.
DR Promotion and Manual Failover with No Additional Journal Data
In a true disaster recovery scenario, in which the host systems of both failover members are down and their journal files are inaccessible, you can promote the DR async member to primary without obtaining the most recent journal data from the former primary. This is likely to result in some data loss. If the host systems of the failover members are accessible, use one of the procedures in DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent or DR Promotion and Manual Failover with Journal Data from Journal Files instead, as these allow the promoted DR async to obtain the most recent journal data before becoming primary, minimizing the risk of data loss.
Once you have promoted a DR async that is not participating in the mirror VIP to primary, you must make any needed changes to redirect users and applications to the new primary (see Redirecting Application Connections Following Failover or Disaster Recovery) before completing the procedures provided in this section.
A promoted DR async does not attempt to become primary unless all mirrored databases marked Mount Required at Startup (see Edit a Local Database’s Properties) are mounted, activated, and caught up, and therefore ready for use on becoming primary.
Promoting a DR async to primary without the most recent journal data from the former primary is likely to result in the loss of some global update operations, and the other mirror members may need to be rebuilt (as described in Rebuilding a Mirror Member). If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance.
To promote a DR async (InterSystems IRIS C) to primary without obtaining the most recent journal data, do the following.
-
Promote InterSystems IRIS C to failover member without choosing a failover partner. InterSystems IRIS C becomes the primary without any additional journal data.
-
When the host systems of the former failover members (InterSystems IRIS A and InterSystems IRIS B) become operational, at earliest opportunity and before restarting InterSystems IRIS, set ValidatedMember=0 in the [MirrorMember] section of the Configuration Parameter File for the InterSystems IRIS instance on each member (see [MirrorMember]). This instructs the InterSystems IRIS instance to obtain its new role in the mirror from the promoted DR async, rather than reconnecting in its previous role. The promotion instructions note that this change is required.
Caution:Failure to set ValidatedMember=0 may result in two mirror members simultaneously acting as primary.
-
Restart InterSystems IRIS on each former failover member.
-
If the member joins the mirror as DR async when InterSystems IRIS restarts, no further steps are required. Any journal data that was on the failed member but not on the current primary has been discarded.
-
If the member cannot join the mirror when InterSystems IRIS restarts, as indicated by the messages log message referring to inconsistent data described in Rebuilding a Mirror Member, the most recent database changes on the member are later than the most recent journal data present on InterSystems IRIS C when it became primary. To resolve this, rebuild InterSystems IRIS A as described in that section.
-
-
After InterSystems IRIS A and InterSystems IRIS B have rejoined the mirror, you can use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles. If either InterSystems IRIS A or InterSystems IRIS B restarted as backup, start with a graceful shutdown of InterSystems IRIS C when the backup is active to fail over to the backup; if InterSystems IRIS A and InterSystems IRIS B both restarted as DR async, promote one of them to backup and then perform the graceful shutdown on InterSystems IRIS C. Promote the other former failover member to backup, then restart InterSystems IRIS C as DR async.
DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent
If the host system of InterSystems IRIS A is running, but the InterSystems IRIS instance is not and cannot be restarted, you can use the following procedure to update the promoted InterSystems IRIS C with the most recent journal data from InterSystems IRIS A after promotion through InterSystems IRIS A’s ISCAgent.
-
Promote InterSystems IRIS C, choosing the InterSystems IRIS A as failover partner. InterSystems IRIS C is promoted to failover member, obtains the most recent journal data from InterSystems IRIS A’s agent, and becomes primary.
-
Restart the InterSystems IRIS instance on InterSystems IRIS A, which rejoins the mirror as backup.
-
After InterSystems IRIS A has rejoined the mirror and become active, you can use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles, starting with a graceful shutdown of InterSystems IRIS C, followed by setting ValidatedMember=0 in the [MirrorMember] section of the Configuration Parameter File for InterSystems IRIS B (see [MirrorMember]), restarting InterSystems IRIS B as DR async, promoting InterSystems IRIS B to backup, and restarting InterSystems IRIS C as DR async.
If InterSystems IRIS A’s host system is down, but InterSystems IRIS B’s host system is up although its InterSystems IRIS instance is not running, run the ^MIRROR routine on InterSystems IRIS B as described in Manual Failover To An Active Backup to determine whether InterSystems IRIS B was an active backup at the time of failure. If so, use the preceding procedure but select InterSystems IRIS B as failover partner during promotion, allowing InterSystems IRIS C to obtain the most recent journal data from InterSystems IRIS B’s ISCAgent.
DR Promotion and Manual Failover with Journal Data from Journal Files
If the host systems of both InterSystems IRIS A and InterSystems IRIS B are down but you have access to InterSystems IRIS A’s journal files, or InterSystems IRIS B’s journal files and messages log are available, you can update InterSystems IRIS C with the most recent journal data from the primary before promotion, using the following procedure.
-
Update InterSystems IRIS C with the most recent journal files from InterSystems IRIS A or InterSystems IRIS B as follows:
-
If InterSystems IRIS A’s journal files are available, copy the most recent mirror journal files from InterSystems IRIS A to InterSystems IRIS C, beginning with the latest journal file on InterSystems IRIS C and including any later files from InterSystems IRIS A. For example, if MIRROR-MIRRORA-20180220.001 is the latest file on InterSystems IRIS C, copy MIRROR-MIRRORA-20180220.001 and any later files from InterSystems IRIS A.
-
If InterSystems IRIS A’s journal files are not available but InterSystems IRIS B’s journal files and messages log are available:
-
Confirm that InterSystems IRIS B was very likely caught up, as follows:
-
Confirm that InterSystems IRIS B disconnected from InterSystems IRIS A at the same time as InterSystems IRIS A and its agent became unavailable. You can check the time that InterSystems IRIS B disconnected by searching for a message similar to the following in its messages.log file (see Monitoring InterSystems IRIS Using the Management Portal):
MirrorClient: Primary AckDaemon failed to answer status request
-
Confirm that InterSystems IRIS B was an active backup at the time it disconnected by searching for a message similar to the following in its messages.log file:
Failed to contact agent on former primary, can't take over
Caution:A message like the following in the messages.log file indicates that InterSystems IRIS B was not active when it disconnected:
nonactive Backup is down
Forcing a promoted DR async to become the primary when you cannot confirm that it was caught up may result in its becoming primary without all the journal data that has been generated by the mirror. As a result, some global update operations may be lost and the other mirror members may need to be rebuilt from a backup. If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance.
-
-
If you can confirm that InterSystems IRIS B was active, copy the most recent mirror journal files from InterSystems IRIS B to InterSystems IRIS C, beginning with the latest journal file on InterSystems IRIS C and including any later files from InterSystems IRIS B. For example, if MIRROR-MIRRORA-20180220.001 is the latest file on InterSystems IRIS C, copy MIRROR-MIRRORA-20180220.001 and any later files from InterSystems IRIS C. Check the files’ permissions and ownership and change them if necessary to match existing journal files.
-
-
-
Promote InterSystems IRIS C to failover member without choosing a failover partner. InterSystems IRIS C becomes the primary.
-
When the problems with InterSystems IRIS A and InterSystems IRIS B have been fixed, at earliest opportunity and before restarting InterSystems IRIS, set ValidatedMember=0 in the [MirrorMember] section of the Configuration Parameter File for the InterSystems IRIS instance on each member (see [MirrorMember]). The promotion instructions note that this change is required. Once you have done this, restart InterSystems IRIS on each member, beginning with InterSystems IRIS A (the member that was most recently the primary).
-
If the member joins the mirror as backup or DR async when InterSystems IRIS restarts, no further steps are required. Any journal data that was on the failed member but not on the current primary has been discarded.
-
If the member cannot join the mirror when the InterSystems IRIS instance restarts, as indicated by the messages log message referring to inconsistent data described in Rebuilding a Mirror Member, the most recent database changes on the member are later than the most recent journal data present on InterSystems IRIS C when it became the primary. To resolve this, rebuild the member as described in that section.
-
-
In most cases, the DR async system is not a suitable permanent host for the primary failover member. After InterSystems IRIS A and InterSystems IRIS B have rejoined the mirror, use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles. If either InterSystems IRIS A or InterSystems IRIS B restarted as backup, start with a graceful shutdown of InterSystems IRIS C when the backup is active to fail over to the backup; if InterSystems IRIS A or InterSystems IRIS B both restarted as DR async, promote one of them to backup and then perform the graceful shutdown on InterSystems IRIS C. Promote the other former failover member to backup, then restart InterSystems IRIS C as DR async.
Planned Failover to a Promoted DR Async
If you have included one or more DR asyncs in a mirror to provide disaster recovery capability, it is a good idea to regularly test this capability through a planned failover to each DR async. To perform this test, or when you want to fail over to a DR async for any other reason (such as a planned power outage in the data center containing the failover members), use the following procedure:
-
Promote InterSystems IRIS C to failover member; because InterSystems IRIS A is available, you are not asked to choose a failover partner. InterSystems IRIS C becomes backup and InterSystems IRIS B (if it exists) is demoted to DR async.
Note:If the mirror contains only one failover member to start with, the procedure is the same; you are not asked to choose a failover partner, and InterSystems IRIS C becomes backup, so that the mirror now has two failover members.
-
When InterSystems IRIS C becomes active (see Backup Status and Automatic Failover), perform a graceful shutdown on InterSystems IRIS A. Automatic failover is triggered, allowing InterSystems IRIS C to take over as primary.
-
After any testing you might want to perform on InterSystems IRIS C, restart InterSystems IRIS A, which automatically joins the mirror as backup.
Alternatively, if you want to restart the primary to keep it synchronized without it automatically becoming backup, since in a real disaster it is not likely to be available, you can demote it to DR async (through its ISCAgent) before restarting it, and then later promote it to failover member when you are ready. For information on doing this, see Demoting the Backup to DR Async.
-
When InterSystems IRIS A becomes active as backup, perform a graceful shutdown on InterSystems IRIS C to fail over to InterSystems IRIS A.
-
Promote InterSystems IRIS B (if it exists) to failover member; it becomes backup.
-
Restart the InterSystems IRIS instance on InterSystems IRIS C, which automatically joins the mirror in its original role as DR async.
A DR async that does not have network access to the mirror private addresses of the failover members, as described in Sample Mirroring Architecture and Network Configurations, can be promoted only to function as primary, and this should be done only when no other failover member is in operation. When this is the case, therefore, the preceding procedure is not appropriate. Instead, follow this procedure:
-
Perform a graceful shutdown on InterSystems IRIS B, if it exists, so that only InterSystems IRIS A is functioning as failover member (primary).
-
When InterSystems IRIS C is caught up (see Mirror Member Journal Transfer and Dejournaling Status), perform a graceful shutdown on InterSystems IRIS A.
-
Promote InterSystems IRIS C to primary, as described in DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent. The new primary contacts former primary’s ISCAgent to confirm that it has the most recent journal data during this procedure.
-
After any testing you might want to perform on InterSystems IRIS C, shut it down.
-
Restart InterSystems IRIS A; it automatically becomes primary.
-
Restart InterSystems IRIS B (if it exists); due to InterSystems IRIS C’s promotion, it joins as DR async.
-
Promote InterSystems IRIS B to backup.
-
Restart InterSystems IRIS C, which automatically joins the mirror in its original role as DR async.
In both of the procedures in this section, if InterSystems IRIS B does not exist, that is, the mirror consists of primary and asyncs only, InterSystems IRIS C when restarted becomes backup. Demote it to DR async as described in Maintenance of Backup Failover Member.
Temporary Replacement of a Failover Member with a Promoted DR Async
Some of the procedures described in Planned Outage Procedures and Unplanned Outage Procedures involve temporary operation of the mirror with only one failover member. While it is not necessary to maintain a running backup failover member at all times, it does protect you from interruptions to database access and potential data loss should a primary failure occur. For this reason, when only the primary is available due to planned or unplanned failover member outage, you can consider temporarily promoting a DR async member to backup failover member. Before doing so, however, consider the following:
-
If the DR async is in a separate data center at significant distance from the failover members, there may be substantial network latency between them. When a DR member is promoted and becomes an active failover member, this round-trip latency becomes part of the synchronous data replication between the primary and the backup (see Mirror Synchronization) and can negatively affect the performance of applications accessing the mirror (see Network Latency Considerations).
-
If the DR async does not have network access to the mirror private addresses of the failover members, as described in Sample Mirroring Architecture and Network Configurations, it cannot be used in these procedures, as it can be promoted only to function as primary, and this should be done only when no failover member is in operation.
-
If the mirror uses a VIP for automatic redirection of users and applications (see Redirecting Application Connections Following Failover or Disaster Recovery) and the DR async cannot acquire the mirror VIP because it is on a different subnet, these procedures typically should not be used.
Before using this option, review the discussion of failover partner selection and the requirement to set ValidatedMember=0 on former failover members whose agent cannot be contacted at the time of promotion in Promoting a DR Async Member to Failover Member.
If you need to perform planned maintenance on InterSystems IRIS B, the current backup failover member (see Maintenance of Backup Failover Member), you can do the following:
-
Promote InterSystems IRIS C, a DR async that is caught up (see Mirror Member Journal Transfer and Dejournaling Status). InterSystems IRIS C automatically becomes backup, and InterSystems IRIS B is demoted to DR async.
-
Shut down InterSystems IRIS B’s InterSystems IRIS instance or host system and complete the planned maintenance.
-
Restart InterSystems IRIS B, which joins the mirror as DR async.
-
When InterSystems IRIS B is caught up, promote it to failover member, returning it to its original role as backup. InterSystems IRIS C is automatically demoted to DR async, its original role.
If you need to perform planned maintenance on InterSystems IRIS A, the current primary failover member (see Maintenance of Primary Failover Member), you can do the following:
-
When InterSystems IRIS B is active (see Mirror Synchronization), perform a graceful shutdown on InterSystems IRIS A. Automatic failover is triggered, allowing InterSystems IRIS B to take over as primary.
-
Promote InterSystems IRIS C, a DR async that is caught up. InterSystems IRIS C automatically becomes backup.
-
Complete the planned maintenance on InterSystems IRIS A, shutting down and restarting the host system if required.
-
Restart the InterSystems IRIS instance on InterSystems IRIS A, which joins the mirror as DR async.
-
When InterSystems IRIS A is caught up, promote it to failover member; it becomes backup, and InterSystems IRIS C is automatically demoted, returning it to its original role.
-
When InterSystems IRIS A becomes active, perform a graceful shutdown on InterSystems IRIS B. Automatic failover is triggered, returning InterSystems IRIS A to its original role.
-
Restart the InterSystems IRIS instance on InterSystems IRIS B, which joins the mirror in its original role.
If you have had an unplanned outage of InterSystems IRIS B, or automatically or manually failed over to InterSystems IRIS B due to an unplanned outage of InterSystems IRIS A (see Unplanned Outage Procedures), you can do the following:
-
Promote InterSystems IRIS C, a DR async that is caught up. InterSystems IRIS C automatically becomes backup.
-
Restart the failed failover member. If the failed member’s ISCAgent could not be contacted when the DR async was promoted, you must at earliest opportunity and before restarting InterSystems IRIS set ValidatedMember=0 in the [MirrorMember] section of the Configuration Parameter File for the InterSystems IRIS instance (see [MirrorMember]). The promotion instructions note that this change is required. When you restart the former failover member’s InterSystems IRIS instance, it joins the mirror as DR async.
-
When the restarted failover member is caught up, promote it to failover member; it becomes backup, and InterSystems IRIS C is automatically demoted to DR async, its original role.
-
If you want the failover members to exchange their current roles, when the backup becomes active perform a graceful shutdown on the current primary, triggering automatic failover. Restart the other failover member; it joins the mirror as backup.