4 OPERATIONS
4.1 Introduction
4.2 Benefits, costs and problems
4.3 The roles, responsibilities and interfaces
4.4 The management processes involved
4.5 The processes and deliverables of Operations
4.6 The techniques, tools and technologies
Annex 4A ICT operational roles
Annex 4B International Standards Organisation Management and Systems Management Functional Areas (SMFAs)
Annex 4C Additional Operations techniques
Annex 4D Backups and storage
4.1 Introduction
An Operations process is required to ensure a stable and secure foundation on which to provide ICT services. The Operations process has a strong technology focus with an emphasis on ‘monitor and control’. It is of an observing, serving and operational nature, ensuring the stability of the ICT infrastructure.
Definition – Operations process
The Operations process comprises all activities and measures necessary to enable and/or maintain the intended use of ICT services and infrastructure in order to meet Service Level Agreements and business targets.
Although Operations underpins and facilitates all other ICTIM processes and IT Service Management (ITSM) processes in their ever-changing needs and wants, it is conservative and inclined to preserve the status quo. Operations is perceived to be in the ‘back office’ of most ICT organisations and the activities and roles are often undervalued with a low profile. However, effective Operations people, processes and products are critical to the provision of quality ICT services. It is impossible to provide resilient, highly available services without good operational procedures. It is common for Operations to be a continuous process, often 24 hours a day, seven days a week, 365 days a year.
In order to deliver the results required by the business, service and performance measurement must operate top-down. The emphasis must be on overall service and business-related performance, not on individual component performance.
Example
A nationwide retail organisation measured operational performance against targets based on data centre responses and availability, which were excellent and exceeded their SLA targets. However, the business complained that the quality of their ICT services was poor and unacceptable. The business perception was based upon the quality of service delivered to the desktop NOT the service delivered by the data centre. Unfortunately, there were no operational targets measured or reported on for either the desktop or network infrastructure, although the targets within the SLAs implied that there were.
Key messages
1. It does not matter if components of the infrastructure are exceeding their operational targets – it is the quality of the overall service delivered to the users and the business that is critical.
2. The emphasis on measurement and operational targets should be from the service and business perspective.
3. All operational targets should be agreed and documented within SLAs and OLAs, and should be measurable, reported and reviewed.
If the overall end-to-end service is not performing to levels agreed within the SLAs and OLAs, no amount of demonstration that the infrastructure components are fine will make the users, customers or the business satisfied. It is essential, therefore, that Operations personnel are aware of the overall service targets and contents of the SLAs, and indeed of the appropriate OLAs, and that they constantly strive to ensure that ICT services and all elements of the infrastructure meet, or exceed their targets.
In today’s high-tech world, questions often arise, such as:
· ‘Do we really need people operating these computers for us?’
· ‘Surely someone has come up with a way to automate that!’
· ‘Do we need people for that particular task?’
The answer to all these questions is one of balance. The selective and appropriate use of automated processes is vital. However, it is also essential that people maintain control of these automated processes. The reasons are many and varied, and often poorly understood, so perhaps an analogy might be helpful.
The car, which most of us use on a regular basis, has over a century of development behind it. In some respects, its history is remarkably similar to that of ICT. When first introduced its possibilities were recognised by only a handful of visionaries – today it is pervasive in every sense. Its evolution is an amalgamation of incremental improvements and adaptations, mostly small steps, but some of them big leaps forward:
· its usage is as diverse as ICT’s – from simple point-to-point travel to highly technical Formula-1 racing
· its limitations are also similar: when something goes wrong it can only correct itself up to a point – it cannot heal itself when damaged
· it cannot perform any work on its own – there is always a conscious decision about what kind of work needs to be performed and there is always human oversight and ‘control’ when carrying out a particular task.
The human oversight and ‘control’ of ICT is the subject of this chapter. It is as necessary in ICTIM as it is in using a car, and it will be necessary for the foreseeable future. It is essential that this human oversight and control takes an end-to-end view covering all aspects of the services and does not simply focus on individual components and elements of the infrastructure.
The quality of service delivered to the business will be dependent upon the quality, availability, reliability and performance of the poorest component within the overall service. Often the hard work and dedication of many ICT staff can be destroyed by the neglect of areas of infrastructure or of other personnel.
‘The chain is only as strong as the weakest link.’
4.1.1 Basic concepts
The basic concept of Operations is that of the ‘monitor-control loop’. This is shown schematically in Figure 4.1.
Figure 4.1 – The monitor control loop.
Click here to view a larger version in a new browser window.
This diagram simplifies Operations Management. The reality of any ICT infrastructure, from the smallest family-owned and operated business to the largest global enterprise, is more complex, made up of a number of these diagrams, although the monitor and control loop is fundamental to all Operations.
Two additional concepts that are fundamental to operational management and the management of ICT systems in general are those of managed objects (MOs) and management domains.
Managed objects
Definition – Managed object
A managed object (MO) is the Open Systems Interconnection (OSI) management view of a resource that is subject to management. An MO is the representation of a technical infrastructure resource as seen by (and for the purposes of) management. An MO is defined in terms of the attributes it possesses, operations that may be performed upon it, notifications that it may issue and its relationships with other MOs.
Management domains
Definition – Management domain
A management domain is a set of MOs, to which a common Systems Management policy applies. A management domain possesses at least two of the following properties:
· a unique name
· identification of a collection of MOs which are members of the domain
· identification of the inter-domain relationships applicable to the domain’s relationship with other domains (rules, practices, procedures).
These concepts are illustrated in Figure 4.2.
Figure 4.2 – Management domains and managed objects (MOs).
Click here to view a larger version in a new browser window.
International and national standards relating to the management of ICT infrastructures (ISO, CCITT and ITU) refer to these concepts and the management areas that need to be addressed in order to manage an operational environment. These areas are referred to as Systems Management Functional Areas (SMFAs) and are explained in detail in Annex 4B, together with an explanation of how they map across to the ICTIM and ITSM processes.
With the introduction of the Configuration Item (CI) in the first edition of ITILService Management, a generic term for the physical elements of the IT Infrastructure became available. Generally speaking, the major difference is that a CI would be a static item and an MO would be a dynamic item. These differences are illustrated in Figure 4.3.
Figure 4.3 – Managed objects.
Click here to view a larger version in a new browser window.
A CI is defined as follows:
‘A CI is a component of an infrastructure – or an item, such as a Request for Change, associated with an infrastructure component – that is (or is to be) under the control of Configuration Management.’
In Operations, many of the basic elements of the ICT infrastructure are dynamic. For example, a file download service, which is provided by an FTP server, can either be accessible or not. This is because the status of the FTP server is constantly changing, second by second. The status of the FTP server can be any one of:
· running
· closed
· shutting down
· initialising
· dead
· off-line.
Within the construct of a CI, this would be impossible to achieve. The MO view of an FTP server would be live, with constantly changing (dynamic) status and data, whereas the concept of Configuration Management is more towards managing a database of CIs with relatively static data under control of the Change Management process. The script defining the FTP parameters could be a CI, as could the script defining the parameters of the SVC, or they could both be included in the build script for the operating system. However, it is more likely that either the physical network link or the range of SVCs would be controlled as a CI rather than the SVC itself. The SVC would appear on some management systems as an MO. Alternatively, the FTP server could be defined as a CI with a status of ‘live’ within the Configuration Management Database (CMDB); its actual, detailed operating status or instantaneous status could then be determined from interrogation of a network or Systems Management tool, as could the current status of the SVC.
In Operations, there is a need for a dynamic construct defining any object within the ICT infrastructure and reflecting the control that Operations can exert over that object. This construct is the MO. First introduced by ISO in late 1980s it has become the object of management in network and Systems Management products and tools. In everyday speech people refer to objects such as servers, print queues, files, etc., but when they perform management operations on those objects they perform them on MOs. These concepts of MOs and management domains are expanded in Annex 4B.
4.1.2 The goals
The main goals of the ICT Operations process are:
· to operate, manage and maintain an end-to-end ICT infrastructure that facilitates the delivery of the ICT services to the business, and that meets all its agreed requirements and targets
· to ensure that the ICT infrastructure is reliable, robust, secure, consistent and facilitates the efficient and effective business processes of the organisation.
4.1.3 The scope
The operational processes within an ICT organisation should include the management and control of all the operational components within the ICT infrastructure. However, this should include not only the control of each individual component but also the interaction between all these elements and their role in the provision of a quality ICT service. Operations have a critical role to play in assisting the other ICT and Service Management processes achieve their individual process objectives. An effective Operations unit within an ICT organisation can be the difference between overall success or failure of the ICT services. One of the major roles that Operations plays is within the event and incident handling processes. This role is indicated in Figure 4.4.
Figure 4.4 – The operational incident workflow.
Click here to view a larger version in a new browser window.
The Operations section often acts as the 24x7 ‘eyes and ears’ of the ICT organisation, providing a round-the-clock presence and early warning system for all other areas of ICT and ITSM. This enables the early detection, correction and prevention of service failure. With effective processes, automated wherever possible, Operations can greatly reduce the business disruption caused by issues within the ICT services. In order to ensure an efficient and effective operational incident workflow, the Operations and Incident Management processes are often co-located and integrated into an ‘Operations bridge’. The Operations bridge provides the combined functionality of a Service Desk and an Operational Control Centre for the initial control and escalation of all ICT issues.
Inputs
· the current ICT infrastructure
· OLAs that are negotiated and produced by SLM. These agree and document the operational targets and requirements for the ICT infrastructure. It is crucial that these targets within the OLAs are consistent and supportive of the SLA targets and are relevant to the ICT infrastructure and its method of operation
· Underpinning Contracts (UCs) should also be consistent and support the SLA targets
· operational processes and procedures
· strategies, plans, policies, standards and architectures.
Processes
· event, warning, alert and alarm processing and management:
· progression and resolution of all event, warning, alert and alarm messages
· liaison with Incident and Problem Management
· liaison with Availability, Security and Capacity Management
· end-to-end management of the operational ICT infrastructure:
· performance and configuration tuning of the operational infrastructure in conjunction with Capacity Management and Change Management
· configuration and reconfiguration of MOs
· system tuning and performance
· workload scheduling:
· batch processing schedule management and maintenance
· output scheduling and print management
· housekeeping and maintenance:
· backup and restore
· ICT infrastructure configuration maintenance
· database administration
· documentation maintenance
· availability, resilience and recovery testing
· health checking of the infrastructure
· log and journal housekeeping
· storage management:
· file and file systems maintenance
· database management and administration