Template


L&I - Systems Management

Problem/Incident Communications Plan

Version 1.5

Prepared for

Commonwealth of Pennsylvania

Department of Labor and Industry

January 2011

Reviewed By:
Name / Team/Role / Reviewer Comments / Date Reviewed
Myrna Barnes / Chief, Customer Relations Division, / 1/21/2011
Anita Steinmeier / Chief, Business Center of Excellence Division, CoE / 1/21/2011
Karen Fausnacht / Chief, Project Mgmt Division / Questions addressed. No comments. / 1/7/11
Stephen Yurich / Chief, Security Division / 1/21/2011
Jacki Hagmayer / Chief, Engineering and Research Division, / 1/21/2011
Ed Bowlen / Chief, Standards Development & Compliance Division / 1/21/2011
Joseph Sheridan / Chief, Data Mgmt & Database Operations Division / 1/21/2011
Bryan Reed / Chief, Compensation & Insurance Division / 1/21/2011
Mary Lynn Kowalski / Chief, Unemployment Compensation Division / 1/21/2011
John Shontz / Chief, Vocational Rehabilitation – Safety & Labor Mgmt Relations Division / 1/21/2011
Philip Day / Chief, Workforce Development Division / 1/21/2011
John Auchey / Chief, Server Farm Operations Division / 1/21/2011
Kevin Paul / Chief, Infrastructure Division / 1/21/2011
William Glatz / Chief, Network Support Services Division / 1/21/2011
Martin Thomas / Chief, Mainframe Operations Division / 1/21/2011
Approved By:
Name / Team/Role / Sign-off Date
Michele Vogelsong / Director, Bureau of Enterprise Services / 3/9/2011
John Malinoski / Director, Bureau of Infrastructure and Operations / 2/22/2011
Phil Day / Acting Director,Bureau of Enterprise Architecture, / 2/22/2011
David Andrews / Director, Bureau of Business Application Development / 3/8/2011

Table of Contents

1.0Purpose

1.1Purpose Definitions and Examples

2.0Within the First Fifteen Minutes

3.0The Thirty Minute Interval

4.0Three Hour Status Update Intervals

5.0Conference Bridge Options

6.0Appendix A – Related Documents

Department of Labor and Industry – Office of Information Technology
Systems Management Plan – Problem/Incident Management Communications Plan

T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

1

Template

1.0 Purpose

The purpose of this document is to establish a template for effective chains of communication in the event of a large scale problem or an outage. The chains of communication should include external customers, internal staff, project and Program/Business areas, as well as communications to all Office of Information Technology (OIT) management and staff. It also documents the timeline for communicating at fifteen and thirty minute intervals and three hour status update intervals. The message should be communicated in easily understood terms and contain the current status, what is being done, an anticipated resolution time if known, and the work around if there is one in place. This document is not intended to be all inclusive, but rather as a base document. Each Business Application and OIT Support group is to create its own version. These individualversions will include more Business Application support group specific communication plans in conjunction with the plan outlined in this document.

1.1Purpose Definitions and Examples

A large scale problem or outage is defined as one or more applications or services which becomes inoperable and causes a major impact on the availability or function of systems. Examples of some systems include but are not limited to:

  • Wide area network links or network links that affect a large number of users
  • Enterprise applications
  • Public facing applications
  • Business applications
  • Enterprise servers that service a large number of users
  • Enterprise shared applications
  • Mainframe applications
  • Voice services affecting a large number of users or multiple sites
  • Desktop services that affect a large number of users or sites
  • Facility issues that affect a large number of users or multiple sites

In some cases, depending on the system(s) affected, it may be necessary to utilize alternative methods of communications such as phone chains or posting a message on the LINKS Help Desk phone system. Examples of these incidents include but are not limited to:

  • A data circuit goes offline affecting a large part of the state not allowing the use of email
  • Connectivity to the email servers goes down

2.0Within the First Fifteen Minutes

  • As documented in the Problem/Incident Management process, the program area or OIT staff calls the LINKS Help Desk to create a Remedy Help Desk ticket. Please note: LINKS may ask fora screen shot of the error message to be emailed to the Help Desk as an attachment so it can be included in the Remedy Help Desk ticket.
  • Notify internal members of the appropriate technical team of the problem or situation.
  • Division Chief or designee will notify the CIO’s Office, OIT Bureau Directors, OIT Division Chiefs, and Help Desk Manager.
  • If the problem appears to be part of an outage or large scaleproblem,please refer to the L&I-Systems Management Plan – Problem and Incident Management.doc, Section 4.9.3.1 Diagnose the Problem/Incident.
  • Please refer toDLI OIT First Responders and Escalation Contact List for current Administrators contact information.
  • If necessary, notify the Program/Business areas and others as applicable. If it affects the enterprise, the Customer Relations Division (CRD) will send out a Department wide notification. When requesting CRD to send email, it is to be drafted using the Email Guidelines document.
  • The CRD Help Desk Manager or Help Desk Coordinator will notify the LINKS Help Desk Lead Agent or Manager of the situation and have a message placed on the LINKS telephone system alerting callers to the problem.
  • If the problem has been resolved before the thirty minute communication interval, a follow-up email describing the problem and the resolution must be sent.

3.0 The Thirty Minute Interval (from incident)

  • Division Chief or designee will update the CIO’s Office, OIT Bureau Directors, OIT Division Chiefs, and Help Desk Manager.
  • Project teams should post a Sorry Page (application down page) forProduction environment web based applications that are affectedor in the event that the Sorry Page is auto generated. Teams should verify that the page has been posted.
  • The Problem/Incident Coordinator will initiate a conference call if the problem/incident or outage affects the Production environment or Production system. The Problem/Incident Conference Number is listed in Section 5.
  • Please refer to theL&I- Systems Management Plan - Problem/Incident Management.doc Section 4.9.3.2 Escalate the Problem/Incidentfor direction of communications.
  • If necessary, notify the Program/Business areas and others as applicable. If it affects the enterprise, the Customer Relations Division (CRD) will send out a Department wide notification. When requesting CRD to send email, it is to be drafted using the Email Guidelines document. As new information becomes available, it will be necessary to give periodic updates, see Section 4.0 Three Hour Status Update Intervals.
  • Notify the Help Desk Manager or the Help Desk Coordinator of status update or resolution. At this time, the LINKS Help Desk telephone system message will be modified alerting customers to the status and ETA, if available
  • If the problem has been resolved before the 3 hour communication interval, a follow-up email describing the problem and the resolution must be sent.
  • Notify the Help Desk Manager or Help Desk Coordinator of the resolution. The LINKS telephone system message will be returned to normal status.
  • An incident report must be completed detailing the problem, actions taken, and the resolution.

4.0ThreeHour Status Update Intervals

  • At three hour intervals it will be necessary to keep OIT and the Program/Business areasupdated as to the progress of the work being done to correct the problem. The message should be communicated in easily understood terms and contain the current status, what is being done, an anticipated resolution time if known, and the work around if there is one in place.
  • The Problem/Incident Coordinator will initiate a conference call if the problem/incident or outage affects the Production environment or Production system. The Problem/Incident Conference Number is listed in Section 5.
  • Notify the Help Desk Manager or the Help Desk Coordinator of status update. At this time, the LINKS Help Desk telephone system message will be modified alerting customers to the status and ETA, if available.
  • An example of a time frame may be that a message would be sentat 9 AM, 12 PM and 3 PM.
  • If the problem has been resolved before the next 3 hour communication interval, a follow-up email describing the problem and the resolution must be sent.
  • Notify the Help Desk Manager or Help Desk Coordinator of the resolution. The LINKS telephone system message will be returned to normal status. An incident report must be completed detailing the problem, actions taken, and the resolution.

The figure on the following page illustrates the proposed L&I Problem/Incident Management Communications process flow.

Figure 1 Problem/Incident Communications Process

5.0ConferenceBridgeOptions

  • To request a UCMSProjectConferenceBridge, contact appropriate UCMS/IBM personnel.
  • To request a CWDSProjectConferenceBridge, contact appropriate CWDS personnel.
  • For all others, to request a WebEx conference session, please refer to the WebEx Teleconference _Host_Instructions_Meetings document.
  • Problem/Incident Conference Number and access codes

Primary Call in Number: 1-866-699-3239

Backup Call in Number: 1-408-792-6300 (This number should only be used if the Primary number does not work or is unavailable)

  • Host Access Code – 20748166
  • Attendee Access Code – 20738745

6.0 Appendix A – Related Documents

DLI OIT First Responders and Escalation Contact List

  • T:\All (Common area for all OIT Staff)\OIT ContactList\DLI OIT First Responders and Escalation Contact List.xls

Systems Management Plan – Problem and Incident Management Document

  • T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)\Problem-Incident Management\Systems Management Plan - Problem and Incident Management.doc

Server Farm Dependency Diagrams (CWDS, UCMS included)

  • Server Dependencies Documentation

WebEx Teleconference HostInstructionsMeetings Form

  • WebEx Teleconfernce Host Instructions

Department of Labor and Industry – Office of Information Technology
Systems Management Plan – Problem/Incident Management Communications Plan

T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

1