Final Paper
Continuous Performance and Capacity Management
Abstract
Continuous Performance and Capacity Management model which helps to eliminate inefficiencies in application and infrastructure management. It provides Performance and Capacity insights to pro-actively optimize performance, stability and scalability issues and give opportunities for TCO savings. This model can be used by business to optimize the cost of product lines. This refined approach can be applied early in the product development life cycle to build a well-engineered system and reduce IT management cost including infrastructure, maintenance and support costs.
In summary, the paper provides a holistic view on how continuous performance/capacity assessment model can be implemented to improve applications’ performance and scalability.
Some key aspects covered in this paper include:
- Background on Performance and Stability concerns in traditional legacy multi-tier systems
- The Approach - continuous Performance and Capacity model
- Methodology
- Success Story
Background and significance:
Some of the recent market changes eventually led to make their system robust and future proof to be competitive in the market.Some common issues are sudden increase in load during peak periods often hitting performance limits of the legacy infrastructure resulting in backlogs for downstream systems.Unavailability of system resources due to increased workload on the system. More work on single process resulting in memory constraints etc.
Impact includes Reputation Cost, poor performance/stability of customer facing system seriously damaging reputation and loss of business. To overcome these challenges and make applications robust enough to handle such situations a refined pro-active monitoring approach is required.
The Approach
Our unique approach to resolving the crisis meant following ‘Treat to cause and not the symptoms’ approach. Being a Save our System (SoS) engagement, our approach meant understanding entire landscape, legacy performance issues and technology stack prior to providing them with long term solution. In fast moving agile world everyone is looking to see short terms results before committing themselves to long term projects. Customer was impressed with Three prong Performance Engineering approach as they will get intermediate deliverables for each phase of this approach
1. Reactive Performance Engineering
2. Assessment Phase
3. Continuous Performance & Capacity Management Solution
1. Reactive Performance Engineering
Gaining customer confidence is the first step in any successful engagement. This phase meant identifying key performance issues and addressing the low hanging fruits to showcasing huge performance gains
2. Assessment Phase
Building on the success of initial Reactive Performance Engineering Phase we started our assessment enthusiastically. Being the vital phase of the entire engagement this meant taking expert opinions from all stakeholders. Full day workshops with Infrastructure managers, Architects and Production Support engineers helped in understanding the background behind the historical monitoring and performance issues. Technical challenges are always one part of the story and the meetings with business stakeholders meant understanding their expectations as well.
Outcome of the assessment revealed need for custom Continuous Performance and Capacity Management solution to overcome inefficiencies in client’s application and infrastructure management.
3. Continuous Performance & Capacity Management Solution
“Sometimes when you're given hurdles, it makes you more creative in the end” - Judy Greer
Now the final phase of the solution and the very important one. Tedious task of delivering a tailored Continuous Performance and Capacity Management solution to meet our client requirements. Pressure always brings the best and four step cyclical monitoring solution was designed.
Design: This solution followed typical Shift Right Approach.
The four step cyclical solution namely Monitor, Analyse, Plan & Define helped our customer getting right and required level of information at the right time enabling them to take right decisions.
ü Monitor – Track Usage
ü Analyse – Identify issues
ü Plan – Forecast workload
ü Define – Define system to support workload.
Build & Implement: Unlike traditional Non Functional testing projects we have to develop the tool rather than using a readymade tool. Application Performance Management and Analytics solution was developed using C# .NET with SQL server database as backend and Splunk for Reporting & Alerting
Methodology
Continuous Performance and Capacity Monitoring Solution
This solution primarily has three main components namely
ü Loader
ü Engine
ü Publisher
Loader - Fetches data from different application data sources and does the polishing and feeds the aggregated data by different dimensions into reporting database tables.
Engine - Data from reporting table/Splunk/other sources are fetched and computed to produce various metrics. It builds useful charts/pivots/tables and creates user friendly reports.
Publisher - Component to publish Readymade reports to technical and business stakeholders through emails and loads into SharePoint for future reference.
This tool takes around 10-15 minutes to build the entire report.
Development and Maintenance This solution was designed primarily focussing on metrics that helps to identify the performance or stability bottlenecks quickly. Also, various business interventions has helped bringing the business.
Working Model: This console application is hosted on dedicated reporting server and runs at specified time to generate and publish respective application daily reports. This APM solution also auto-emails a status report about pass/fail run with the description. It is also smart enough to start from the pending task, if any failures occurs in previous task.
Key details captured in the report can be seen in Appendix section
Splunk Monitoring & Reporting Solution
Various monitoring dashboards, alerts, Scheduled reports has helped to undertake pre-emptive action on any performance, throughput or stability issues in Application/Infrastructure and also alert the support/IT quickly to act on it.
Refer Splunk Reporting Dashboards and alerts topics in Appendix section
Success Story
Continuous Performance and Capacity management solution has helped in
ü Pro-actively monitoring and publishing automated Daily/Intraday performance and capacity reports to IT/Business.
ü Reducing turnaround time in analysis/optimizations by 40%.
ü Reducing effort in Capacity planning/system sizing requests by 50%
ü Optimized Infrastructure resulting in TCO reduction.
Best In Class Delivery
Ø Easily modifiable solution to use in other application
Ø Free Development and maintenance
Ø Best use of bringing solution during pre-prod.
Proven results and cost benefits using continuous model induced Client to bring NFT service early in the life cycle for new engagements. Discovering performance and capacity issues early in product life cycle helped reduced development, testing and support cost.
References & Appendix
Author Biography
Saju Abraham Thomas is a Performance & Capacity Engineer with more than 10 years of experience in the industry. He has been actively involved in consulting, performance testing and engineering, capacity planning, resilience engineering and building performance and capacity monitoring solutions. He has extensive experience in complex Trading and Risk platforms with applications running on Grid infrastructure (Private Cloud like infrastructure).
THANK YOU!