Vision: iSeries Availability Runbook Template

Overview

For the most current version of this template, look on the Vision web site at http://www.mimix.com/partners/services/templates/index.asp.

This template is purposely overwritten to minimize the amount of time you have to spend doing original writing. As we gain experience using this template, we will incorporate feedback from Vision Installation Professionals like you as to how it can be improved. If you have questions about how to use this template, contact the Vision Solutions Project Manager associated with the account.

Instructions

  1. Variables — This document makes use of variable fields to minimize the time to create the document. The variables are located on page 3. This is the cover page for the customers document. The variable box is hidden. To locate the box, move your mouse pointer below the red header line and click. This will cause the box to be selected. To see the variables, enlarge the box by enlarge the box by moving your mouse to the small square at the bottom center of the outline and click and hold the mouse button as you drag the line down. Once you are finished making the changes to the variables reduce the box back to the original size so it will not show during printing
  2. Write the report — Author the report based on your discussions with the customer and the data you collect using this template. Delete all material that doesn't apply, respond appropriately to Highlighted Directions, edit the XXX and **Phrase** items, and add new material unique to the customer's environment.
  3. Keep it simple — Resist the temptation to use additional colors or typefaces. The template has been worked out using common Windows fonts to present an orderly, professional appearance.
  4. Be Consistent — Whatever spelling, grammar, style, and formatting rules you use, use them consistently.
  5. Spell out hyperlinks — Since the user is likely to have to rely on a printed version of the final documents, spell out hyperlinks and email addresses. An embedded hyperlink will not be of any use to someone with just a paper copy of the report.
  6. Remove Highlighted Directions, and check to be sure you have either appropriately edited XXX and **Phrase** items, or returned the text formatting to "Auto" by: Selecting the text, then FormatðFontðColorðAuto (or by using the Text Color button on the toolbar).
  7. Review the report — run Spell Check. Check for grammar and completeness.
  8. Clean up the report — Remove all Highlighted Directions, and non-essential blank lines, extra rows and columns in tables, and this instruction page.
  9. Update the Table of Contents — Place the cursor in the table of contents field and update the table (press F9, select Update Field, and check the Update entire table choice, then click OK).
  10. If time permits, have someone unfamiliar with the project give it a "cold read." Spell check can miss some big mistakes, and a quick read by a fresh pair of eyes (even someone who is not technically proficient with the iSeries or MIMIX) is always good, time permitting.
  11. Update records — Both Vision consultants and Business Partner consultants should send a copy of the report to the Vision Solutions Project Manager responsible for the customer’s territory.
  12. Deliver the Runbook to the customer.

Runbook Template Revision History

Document Template Revision Date
(On cover page) /
Summary of Revisions
June 16,2008 / ·  Updated hyper links
Aug 1, 2007 / ·  Revised for V5

Note: This page and the page(s) preceding it are not to be included in the final Runbook.

Customer X

iSeries Managed Availability Runbook

Prepared by: Your Name>

Solutions Consultant

Vision Solutions

Owner: <Customer Contact Name>
Customer X

Created: <date>

Last revision: <date>

MIMIX HA Runbook template_08102007 Availability Runbook page 1 of 38


<Refresh the table of contents upon completion of the document>

Table of Contents

Summary of Revisions 2

Purpose and Audience 5

Ownership 5

Maintaining this procedure 5

Revision Changes 6

Server Switching 7

Concepts and Strategy 7

Switch Cycle 7

Graphical full switch cycle overview 8

Switch Overview 9

Planned Switch Overview 9

Unplanned Switch Overview 10

Switch readiness validation 11

Goal 11

Switch readiness validation tasks 11

MSFname-SWITCH from Sys1 to Sys2 13

Procedure SWITCHOVER-SWITCH – Switch to Backup 13

SWITCHOVER-SWITCH Pre-Switch Tasks 14

SWITCHOVER-SWITCH Planned Switch Tasks 15

SWITCHOVER-SWITCH Post-Switch Tasks 16

Procedure SYNCHRONIZE-SWITCH – Resynchronize 17

SYNCHRONIZE-SWITCH Pre-Synchronization Tasks 17

SYNCHRONIZE-SWITCH Synchronization Tasks 18

SYNCHRONIZE-SWITCH Post-Synchronization Tasks 19

Procedure FAILOVER-SWITCH – Fail over to Backup 19

FAILOVER-SWITCH Pre-Switch Tasks 20

FAILOVER-SWITCH Unplanned Switch Tasks 20

FAILOVER-SWITCH Post-Switch Tasks 22

MSFname-RETURN from Sys2 to Sys1 23

Procedure SWITCHOVER-RETURN – Switch to Backup 23

SWITCHOVER-RETURN Pre-Switch Tasks 24

SWITCHOVER-RETURN Planned Switch Tasks 25

SWITCHOVER-RETURN Post-Switch Tasks 26

Procedure SYNCHRONIZE-RETURN – Resynchronize 27

SYNCHRONIZE-RETURN Pre-Synchronization Tasks 27

SYNCHRONIZE-RETURN Synchronization Tasks 28

SYNCHRONIZE-RETURN Post-Synchronization Tasks 29

Procedure FAILOVER-RETURN – Fail over to Backup 29

FAILOVER Pre-Switch Tasks 30

FAILOVER-RETURN Unplanned Switch Tasks 30

FAILOVER-RETURN Post-Switch Tasks 32

Appendix A: Runbook Hyperlinks 33

Appendix B: The Runbook Data Capture Tool 34

Using and Updating the Links Used in this Document 34

Updating The Links 35

Method 1: Automatically 35

Method 2: Manually 36

About

Purpose and Audience

This Runbook, describes the operational actions to switch the host production role from the Sys1 system to the Sys2 system and the actions needed to return the host production role from the Sys2 system to the Sys1 system. The intention of the document is to guide the MIMIX administrator through the switch process.

Ownership

The owner of this document named on the cover page is responsible for maintaining the procedures and schedules presented to comply with your availability goals and objectives. This document must be revised when changes, ranging from a simple fix update to major software or hardware changes, occur in your managed availability environment.

Maintaining this procedure

Whenever the system setup changes, it may be needed to change both the Runbook and this switch procedure because many changes can occur in your managed availability environment that can impact the effectiveness of your solution. Some of the more common changes that can occur are:

§  New Availability Solution Administrator.

§  New Operating system technology (i.e. remote journaling, new protocols) will impact performance and the configuration of MIMIX and automation code.

§  Network changes or additional - such as new hardware or communication components - can impact the switching of users to a remote system.

§  Introduction of a new application on the systems that needs to be included in the managed availability environment.

§  Introduction of new database features such as triggers, null fields or referential integrity constraints.

§  Addition or changes to application change management can result in files on hold and failed requests.

§  Other: ______

When changes are needed to this Runbook, contact the document owner (listed on the cover) and notify them of discrepancies and enhancements.

Revision Changes

Indicate the date and type of changes made to this document.

Date: <date>

Document creation

Server Switching

Concepts and Strategy

[Define terms (production, backup, planned, unplanned) and describe the strategy (IP impersonation, use of A/B switches, etc.). The purpose here is to describe how switching is done, while the individual step details are left to the procedure descriptions below.

If MIMIX Monitor is used, explain how (what is monitored?).

What IP addresses or SNA LUs are switched?

Is the switch automatic or semi-automatic?

Is the MIMIX Switch Framework or clustering used to affect the switch?

Will the MIMIX Availability Manager Interface be available when users have been switched to the backup server?

The following text may be used as an intro or modified if applicable:

Server switching consists of moving users from one server to another in a controlled way. At Customer X, server switching means moving users from the Sys1 server to the Sys2 server, and, when appropriate, moving the users back again to the Sys1 server.

The criteria for performing a switch will be different for a planned switch than for an unplanned switch.

A planned switch is done at a time when it is generally convenient and when the readiness to switch can be carefully assessed.

An unplanned switch is done when a failure of the current production system has been detected. In this case, the readiness to switch is difficult to assess. However, that readiness can be assumed with some confidence if a regimen of auditing, monitoring, and testing has been followed.

NOTE: A switch is unplanned, if the original production system is no longer accessible from the backup system. If the original production system is reachable, it is a planned switch, even if it was not scheduled or intended.

Switch Cycle

Switching systems, when done properly, includes a complete cycle, since it requires not only switching users to a second system, but also provides for returning the users safely to the original system when appropriate.

The full cycle involves a -SWITCH and a -RETURN. Each of which has two identical phases. The first phase is called Switchover/failover If planned it is referred to has a switchover if unplanned it is considered a failover. The second phase is called Resynchronize. After the -SWITCH, system Sys1 now plays the role of backup to system Sys2. This allows for the repeating of the two phases, Switchover/failover and Resynchronize, to return production to Sys1.

When moving the current production from Sys1 to Sys2 this is considered the -SWITCH. When you are ready to return the production back to Sys1 from Sys2 this is considered the -RETURN.

Graphical full switch cycle overview

Switch Overview

Procedure MSFname-SWITCH is switching production role from the Sys1 system to the Sys2 system. For a planned switch to Sys2 use procedure SWITCHOVER-SWITCH on page 13, for unplanned switching use procedure FAILOVER-SWITCH on page 19.

Procedure MSFname-RETURN is switching production role from the Sys2 system to the Sys1 system. For a planned switch to Sys1 use procedure SWITCHOVER-RETURN on page 23, for unplanned switching use procedure FAILOVER-RETURN on page 29.

Planned Switch Overview

The Planned switch scenario includes 2 major steps, all of which are begun by interactively issuing a command on the system serving as the backup.

For moving production from Sys1 to Sys2:

§  Switching production from the production system Sys1 to the backup system Sys2. This step, called “switch to backup”, carefully disengages the production system from the network, connects the backup system to the network, and makes this the new production system.
See Procedure SWITCHOVER-SWITCH below.

§  Starting MIMIX replication from the new production system back to the old production system. Effectively, the means the old production system now becomes a backup system. This step does not affect any user or connections. This step is called “catch-up” or “resync” because it allows the new backup system to catch up on all the changes that have been taking place on the new production system since Step 1 was performed.
See Procedure SYNCHRONIZE-SWITCH below.

For moving production from Sys2 to Sys1:

§  Switching production from the backup system Sys2 to the old production system Sys1. This step, called “switch to backup”, carefully disengages the production system from the network, connects the backup system to the network, and makes this the new production system.
See Procedure SWITCHOVER-RETURN below.

§  Starting MIMIX replication from the new production system back to the old production system. Effectively, the means you have switched full circle and are back to the initial roles of Production and Backup for the systems. This step does not affect any user or connections. This step is called “catch-up” or “resync” because it allows the new backup system to catch up on all the changes that have been taking place on the new production system since Step 1 was performed.
See Procedure SYNCHRONIZE-RETURN below.

Unplanned Switch Overview

The Unplanned scenario includes 3 major steps, all of which are begun by interactively issuing a command on the Sys2 system.

For failover production from Sys1 to Sys2:

§  Failover production to the backup system Sys2. This step, called “fail-over to backup”, quickly establishes this system as the new production system. The original production system Sys1 cannot be reached so it cannot be changed to no longer hold Production Role.
See Procedure FAILOVER-SWITCH below.

§  Repairing and preparing the old production system Sys1 to no longer hold Production Role. This means taking down connections to the network.

§  Starting MIMIX replication from the new production system to back to the old production system. Effectively, the means the old production system now becomes a backup system. This step does not affect any user or ATM connections. This step is called “catch-up” or “resync” because it allows the backup system to catch up on all the changes that have been taking place on the new production system since Step 1 was performed.
See Procedure SYNCHRONIZE-SWITCH below.

For failover production from Sys2 to Sys1:

§  Failover production to the backup system Sys1. This step, called “fail-over to backup”, quickly establishes this system as the new production system. The original production system Sys2 cannot be reached so it cannot be changed to no longer hold Production Role.
See Procedure FAILOVER-RETURN below.

§  Repairing and preparing the old production system Sys2 to no longer hold Production Role. This means taking down connections to the network.

§  Starting MIMIX replication from the new production system to back to the old production system. Effectively, the means the old production system now becomes a backup system. This step does not affect any user or ATM connections. This step is called “catch-up” or “resync” because it allows the backup system to catch up on all the changes that have been taking place on the new production system since Step 1 was performed.
See Procedure SYNCHRONIZE-RETURN below.

Switch readiness validation

Note

These steps are not part of any switch. If you are executing an unplanned switch, please proceed: