IT Governance Management is not entirely new. IT Governance - the art of aligning IT deliverables with business objectives - has been an important organizational priority for at least the past five years. However, up to this point, the primary focus of products that claim to support IT governance have focused on supporting initiative portfolio management, project tracking and investment decision support. In general, these offerings have been far more oriented towards project management and investment governance than toward operations governance, i.e. ensuring that day-to-day IT operations aligns effectively with what the business needs to do day by day as well as with what the business has to do.
This is the aspect of IT governance management that is new. In this paper, we'll discuss the application of governance management to support what many businesses have to do - citing regulatory compliance deficiency remediation as one example - as well as what they need to - effectively address one or more of the business priorities. In addition, we'll also discuss how the use of more mainstream availability and performance management technology may be a far better fit for the day-to-day support of operations governance.
Operations Governance:The New Management ApplicationPrepared by:John P. Morency, Managing DirectorTransitional Data Services, Inc.92 South StreetHopkinton MA 01748(774)-759-9025 (Voice)(978)-808-4112 (Mobile)email@example.com (e-mail)Untitled DocumentIT Governance Management is not entirely new. IT Governance - the art ofaligning IT deliverables with business objectives - has been an importantorganizational priority for at least the past five years. However, up to thispoint, the primary focus of products that claim to support IT governance havefocused on supporting initiative portfolio management, project tracking andinvestment decision support. In general, these offerings have been far moreoriented towards project management and investment governance thantoward operations governance, i.e. ensuring that day-to-day IT operationsaligns effectively with what the business needs to do day by day as well aswith what the business has to do.This is the aspect of IT governance management that is new. In this paper,we'll discuss the application of governance management to support whatmany businesses have to do - citing regulatory compliance deficiency remedi-ation as one example - as well as what they need to - effectively address oneor more of the business priorities shown in Figure 1. In addition, we'll also dis-cuss how the use of more mainstream availability and performance manage-ment technology may be a far better fit for the day-to-day support of opera-tions governance.Applying Governance Managementto Compliance RemediationOne key IT governance application is ensuring organizational compliance withgovernmental or industry regulations. Currently, the most prominent complianceicon is Sarbanes-Oxley (SoX) 404, which becomes mandatory for every Americanpublic company (as well as foreign subsidiaries) by the middle of 2006. After all the required controls are defined, the next phase of a SoX assuranceprocess is typically control remediation. Remediation is only applied to thoserequired controls that have been determined to be deficient, either because theypreviously were not in place or lacked required management functionality. Examples of IT control remediation activities include:" Updating the development project approval process definition in order toensure that appropriate levels of business unit approval exist occur prior tobeginning new projects;" Formalizing information pre-requisite requirements specific to productionturnover;" Restricting the extent to which end users can install software on their desk-top PCs;" Adding additional storage to support required financial and/or customerdata record retention;" Formalizing service level management and reporting;" Documenting and institutionalizing Change Control request, review andapproval processes;" Requiring explicit finance group approval for any application or systemchange that could impact the integrity of corporate financial data.If product investments are required to support remediation, the most effectiveones are generally those that provide far more than a control deficiency band-aid. Their focus is also upon addressing as many of the governance prioritieslisted in Figure 1, in addition to specific deficiency remediation, through a sin-gle investment. In general, this is the approach that will make it far more like-ly that the benefits that result from the remediation exercise far outweigh theassociated product and implementation cost.Operations Governance:The New Management Application IntroductionAs a result of both the operations and competitive changes that are occurring,application management products, regardless of whether they are hardware-or software-based, will increasingly be evaluated based upon their ability tomake IT more project-focused and support-automated. However, achievingthese two objectives must be in addition to, versus instead of, maintainingday-to-day service level quality. Given the rapidly increasing complexity of today's IT infrastructure, address-ing this latter requirement is a formidable undertaking. This is especially truegiven the increasing number and complexity of infrastructure componentsrequired to support end-to-end service delivery. The result has been anincreasingly strong need for management systems that can synergisticallycoordinate and orchestrate component management in order to facilitate amore efficient and effective support organization.The challenge, however, does not end there. Also to be addressed is the task ofensuring timely support for key business objectives. The specific objectives iden-tified by IT as being the most pressing over the coming year are listed in Figure 1.2Figure 1 - Key IT Priorities (source: Information Week)U p d a t e s e c u r i t y p r o c e d u r e s , t o o l s o r s e r v i c e sS i m p l i f y o r o p t i m i z e b u s i n e s s p r o c e s s e sI m p r o v e c u s t o m e r s e r v i c eB o o s t w o r k e r p r o d u c t i v i t y a c r o s s t h e c o m p a n yR e t a i n s k i l l e d s t a f f m e m b e r sE s t a b l i s h p r o c e s s e s t h a t s u p p o r t r e a l - t i m eb u s i n e s s i n f o r m a t i o nO r g a n i z e a n d u t i l i z e c u s t o m e r d a t aR e p o r t f i n a n c i a l d a t a m o r e a c c u r a t e l yU s e I T f o r c o m p l i a n c e w i t h g o v e r n m e n t r e g u l a t i o n sR e d u c e t h e c o s t o f I T o p e r a t i o n sM a k e b u s i n e s s c o n t i n u i t y p l a n s o r i m p r o v e d i s a s t e r p r e p a r e d n e s sI m p r o v e u s a b i l i t y o f c o r p o r a t e a p p l i c a t i o n sI n c r e a s e c o l l a b o r a t i o n w i t h c u s t o m e r s0 2 04 06 0f Rn n8 01 0 0e y B u s i n e s s I n i t i a t i v e sh i c h b u s i n e s s p r i o r i t i e s w i l l y o u r I T o r g a n i z a t i o n i m p l e m e n t o r s u p p o r t i n 2 0 0 5 ?o t e : M u l t i p l e r e s p o n s e s a l l o w e da t a : I n f o r m a t i o n W e e k R e s e a r c h O u t l o o k / P r i o r i t i e s 1fi n-h n lm nrAddressing these challenges still requires the underlying presence of qualityconfiguration, availability and performance management. However, the extentto which management products can more directly address one or more keybusiness priorities is increasingly determining their likelihood of implementa-tion. It's not that the need for the core management functionality has goneaway. Rather, the bar has been raised. The new opportunity is the targetedapplication of technology to support IT Operations Governance management.Untitled Document3Operations Governance: The New Management ApplicationGovernance Management Case StudyIn order to illustrate this principle, we present an example of an actual TDSclient for whom operations governance management was long overdue. Thekey challenge for this client (a financial services company) was the consistentdelivery of high availability and responsiveness for a new set of financial port-folio management applications that had been initially developed for independ-ent agents in the US, but was also being extended to agents in both Europeand the Far East. The operations management software portfolio (made up of nearly 25 individ-ual management products) available to support this partner Extranet wasquite extensive. Supported functionality included network, system and appli-cation monitoring (both inside and outside the firewall) as well as inventory,configuration, report and SLA management support across several differentvendors. However, despite this rich set of utilities, their collective value was often lim-ited at best when problems affecting extranet service quality arose, especial-ly during nights and weekends. During those times, support coverage was lim-ited and primarily provided by lower cost, lower skilled operations staff. Thesewere the individuals that were the front line of customer support when over-seas agents were in the middle of their normal workdays.The ProblemA very typical problem scenario involved the generation of ApplicationUnreachable alerts from geographically distributed, Internet-based synthetictransaction generation agents. Alert support had been configured to forwardoccurrence notification to a centralized management server. Once alert noti-fication was posted to the management server, a trouble ticket was automat-ically generated. Trouble ticket creation notification often occurred in tandemwith the arrival of problem calls to the Operations Center from disgruntledoverseas agents.Once a trouble ticket was opened, the remainder of the resolution taskbecame manual. The simple fact was that not one of the existing managementproducts was capable by itself of both isolating the cause(s) of the problemand providing actionable information that an operations support person couldreadily act upon. Lacking this capability, operations were solely dependentupon direct support from the IT staff. And if the right IT staff member couldnot be contacted when outages occurred, problem isolation and resolutionceased until the support staff returned, either on the following morning or atthe end of a weekend. The inevitable result was trouble tickets that wereopen for several hours (and sometimes days), unhappy agents and a negativeimpact on the company's brand image in an extremely strategic market.The ImpactOur analysis of trouble ticket processing times specific to the detectedresponse time and availability problems showed that specific occurrenceswere only 25% of the opened tickets by volume. However, they required near-ly 98% of the total ticket resolution time due to the long durations in whichtickets were left open by operations until a higher-level support or develop-ment resource could be contacted. The CausesWhen we analyzed solution entries for closed tickets, we found that applica-tion outages were a frequent result of incorrectly applied application, serveror network infrastructure changes. Unfortunately, the operations staff reallyhad no easy way of being able to correlate the occurrence of the ApplicationUnreachable outages with the application of faulty changes. Clearly, therewas a Change Management process problem. In addition, however, inade-quate root cause determination and resolution management support as wellas inadequate operations support staff skill sets were also key contributingfactors.Given the scope of the problem, the solution required a combination ofimproved management process, root cause determination automation andsupport skill sets. The product-independent elements were directly related toimproving governance. In addition, it was clear that the right supporting man-agement products could also address a number of the business priorities list-ed in Figure 1, in addition to the service management-specific issues. Theseinclude:1.Boosting IT productivity and effectiveness by dramatically reducing the resolution time per ticket;2.Improving the quality of customer service by becoming more responsivewhen outages did occur;3.Reducing the cost of IT operations by being able to deliver higher qualitycustomer service without having to necessarily hire more expensive talent.This was clearly a governance management problem that required two impor-tant solution elements. The first is the select application of industry best prac-tices to improve clearly broken support processes. The second is the upgrad-ing of existing management products or the implementation of new productsin order to support best practice processes and deliverables. The next twosections will present specific best practice and product approaches that TDShas found to be very well suited to addressing the service improvementrequirements of our case study client as well as operations that have similarneeds.Best Practice Option: COBITFew argue the value of IT best practices. The question that is asked far morefrequently concerns which of the many best practice standards yield the mosteffective results. TDS' consistent experience is that the most effective set ofbest practices for improving data center operations are defined in the ControlObjectives for IT (COBIT). The COBIT standard, first released in 1996 by the IT Auditors Association,defines a control implementation and measurement framework that wasspecifically designed with IT efficiency, predictability, accountability andauditability in mind. Given these objectives, a frequent application of COBIT isthe definition of IT operations controls that help facilitate business compli-ance with both governmental and industry-specific regulations. This is precisely the reason why the IT Governance Institute(http://www.itgi.org/) utilized the COBIT definitions as the foundation for SoXIT controls in the definitive document IT Control Objectives for Sarbanes-Oxley(downloadable from the ITGI website). The result is that COBIT offers users atwo for one proposition for implementing both SoX compliance as well asservice delivery improvement in other IT supports categories. Today's reality, however, is that most operations are far less concerned aboutthe implementation of best practices than they are with the implementationof the good enough practices that best fit their operations style, confor-mance, service improvement, and business alignment objectives. Untitled DocumentThese include the process-specific Critical Success Factors (CSFs) that can beused to define continuous improvement targets as well as assessment met-rics (Key Performance Indicators (KPIs) and Key Goal Indicators (KGIs) that canbe used to quantitatively measure progress towards one or more serviceimprovement goals. Management Service Maturity In addition to process-specific sets of measurable success metrics, COBIT alsodefines additional criteria that are highly relevant to benchmarking the valueand relevance of alternative technologies, products and vendors relative toone or more process improvement initiatives. These are the managementprocess-specific Service Maturity Models (SMMs). The use of maturity models to both assess and improve process performancewas first done by the Software Engineering Institute (SEI) at Carnegie-MellonUniversity (CMU). The CMU approach (called the Capability Maturity Model(CMM) measures the performance of a software engineering organization inthe delivery of high quality, error-free code. The capability level of a specificorganization is measured on a scale from zero to five. Each numerical meas-urement is associated with specifically defined competencies in the automat-ed design, construction, testing and delivery of high quality software. A scoreof five goes to the most process mature organizations and a score of zero aregiven to those that are the least capable. Implementing Good Enough PracticesDetermining the right set of good enough practices that are most appropri-ate for a specific operation is especially well suited to the way in which COBITis defined. One key strength is that its implementation is not an all or noth-ing proposition. Instead, it defines a very modular set of control and meas-urement options that users can deploy as needed in order to best fit the needsof their specific operation. No process is dependent upon the implementationor success metrics of any other, providing users with a very flexible set ofprocess implementation and improvement measurement options. This modular set of processes and success metrics differentiate COBIT fromother best practice definitions (e.g. ITIL). Not only are management practicesdefined in 34 separate categories (including Application Availability andPerformance Management), but the standards also include specific criteriaand metrics that can readily be used to support a continuous improvement ini-tiative for service delivery.Improvement criteria, success factors and support metrics are all process-spe-cific. For example, the process-specific Critical Success Factors (CSFs) andsuccess metrics (i.e. the Key Goal Indicators (KGIs) and Key PerformanceIndicators (KPIs)) for Availability (COBIT Process DS10) and Performance(COBIT Process DS3) management are shown in Figures 2 and 3 respectively.4CRITICAL SUCCESS FACTORS (CSFs)" There is a clear integration of problem management with availability and changemanagement." Accessibility to configuration data, as well as the ability to keep track of prob-lems for each configuration component, is provided." An accurate means of communicating problem incidents, symptoms, diagnosisand solutions to the proper support personnel is in place." Accurate means exist to communicate to users and IT the exceptional eventsand symptoms that need to be reported to problem management." Training is provided to support personnel in problem solution techniques." Up-to-date roles and responsibilities charts are available to support incidentmanagement." There is vendor involvement during problem investigation and resolution." Post-facto analysis of problem handling procedures is applied.KEY GOAL INDICATORS (KGIs)" A measured reduction of the impact of problems and incidents on ITresources" A measured reduction in the elapsed time from initial symptom report toproblem solution" A measured reduction in unresolved problems and incidents" A measured increase in the number of problems avoided" Reduction in lag between identification and escalation of high-risk prob-lems and incidentsKEY PERFORMANCE INDICATORS (KPIs)" Elapsed time from initial symptom recognition" Elapsed time between problem recording and resolution or escalation" Elapsed time between evaluation and application of vendor patches" Percent of reported problems" Frequency of coordination meetings" Frequency of component problem analysis reporting" Reduced number of problemsCRITICAL SUCCESS FACTORS (CSFs)" The performance and capacity implications of IT service requirements for all critical business processes are clearly understood." Performance requirements are included in all IT development and maintenanceprojects." Capacity and performance issues are dealt with at all appropriate stages inthe system acquisition and deployment methodology." The technology infrastructure is regularly reviewed to take advantage ofcost/performance ratios and enable the acquisition of resources providingmaximum performance capability at the lowest price." Skills and tools are available to analyze current and forecasted capacity." Current and projected capacity and usage information is made available tousers and IT management in an understandable and usable form.KEY GOAL INDICATORS (KGIs)" Number of end-business processes suffering interruptions or outages causedby inadequate IT capacity and performance." Number of critical business processes not covered by a defined service avail-ability plan." Percent of critical IT resources with adequate capacity and performancecapability, taking account of peak loads.KEY PERFORMANCE INDICATORS (KPIs)" Number of down-time incidents caused by insufficient capacity or processingperformance." Percent of capacity remaining at the normal and peak loads." Time taken to resolve capacity problems." Percent of unplanned upgrades compared with the total number of upgrades." Frequency of capacity adjustments to meet changing demands.Figure 2 - COBIT Availability Management - Critical Success Factors and supporting metricsFigure 3 - COBIT Performance Management - Critical Success Factors andSupporting MetricsUntitled Document5Operations Governance: The New Management Application(PM). The broad application network entity support provided by CDM addssubstantial value to the root cause analysis and outage avoidance automa-tion. This functionality is especially relevant to addressing the SoX Changemanagement remediation requirements discussed earlier as well as imple-menting a sustained service improvement program that is generally independ-ent of the levels of vendor, technology and product complexity within theunderlying delivery infrastructure.In addition, the utility of nGenius Performance Manager is evidenced in thesupport that it can provide for continuous support service improvement initia-tives measured by the KPIs and KGIs presented in Figures 3 and 4. Figures 5and 6 show examples of the specific KPI and KGI metrics that can be support-ed by specific nGenius Performance Manager application availability and per-formance management functionality along with a brief description of the spe-cific support that would best an associated improvement initiative.More recently, maturity models are being used to benchmark organizations incompetency areas other than software development. Measuring servicedelivery maturity and business continuity readiness is two of the most recentapplications that are more directly related to improving IT operations quality.COBIT is somewhat unique in its support for management process-specificmaturity models, whose rankings range from Non-Existent (meaning that theprocess is not supported at all) to fully Optimized (the management process isfully automated and well aligned with business objectives).In the remainder of this paper, we'll discuss how the NetScout nGeniusPerformance Manager can be applied to supporting COBIT-based best practiceimplementation as well as improving IT operations governance management.Applying NetScout nGenius to Improve Operations GovernanceAt the beginning of this paper, we briefly discussed the perceived commoditi-zation of applications availability and performance management products.However, it should be emphasized that this is only a perception and not nec-essarily a given. The values shown in each of the three dimensions below(Figure 4) constitute a more useful perspective on management product capa-bility that is complementary to the application availability and performancemanagement Optimizing criteria that were earlier presented.In fact, if one considers this three dimensional representation of managementproduct capability relative to breadth (the range of managed entities), scope(the range of private and public network support) and depth (the degree ofsupported management automation), one quickly concludes that there is noone single vendor that is equally optimal in all three dimensions, therebyresulting in lots of opportunity for future differentiation. Despite that fact thatno one-vendor product family is fully functional across all three axes, webelieve that NetScout is one vendor that comes fairly close.A key foundation for satisfying both the Value Graph and Optimized that isunique to NetScout is the breadth and scope of supported management datathrough the Common Data Model (CDM) in nGenius Performance ManagerWeb ServicesMANAGEMENT BREADTHUser ExperienceInfrastructureExtranet Outside-InInternet Inside-OutMANAGEMENT SCOPEMANAGEMENT DEPTHIntranetMonitoring& ReportingRoot CauseAnalysis (manual)Root CauseAnalysis (Automated)Outage Avoidance(Preventative) 2 0 0 5 T r a n s i t i o n a l D a t a S e r v i c e s Figure 4 - Operations Management Value GraphFigure 5 - nGenius Performance Manager KPI & KGI Metric Support - Availability ManagementCOBIT KGI or KPISupporting nGenius PM FunctionalityA measured reductionof the impact of prob-lems and incidents onIT resourcesTracking application and end-user mis-useof the infrastructure (e.g. streaming video,interactive gaming) in order to reduce thenumber of preceived service outagesElapsed time from initialsymptom recognition toentry in the problemmanagement systemThreshold alarming on application utiliza-tion or on response time/availability - mini-mizing the time lag from symptomoccurence to sympton recognitionElapsed time betweenproblem recording andresolution or escalationSettable time over threshold alarming onsegments for evidence of users consumingexcessive bandwidth or capacityA measured reductionin inresolved problemsand incidentsBehavior visibility of all applications,shrink-wrapped or custom, facilitatingstrategic decision making on whetherincreased bandwidth may be the rightchoice, or some other approach such as re-configuring the network with QoS, ormoving a business process to a non-peaktime of the day, which ultimately reducesthe probability of unresolved service degra-dations, bottlenecks and outagesA measured increase inthe number of problemsavoided through pre-emptive fixesnGenius NewsPapers that contain days tocapacity threshold information to orderadditional capacity for circuits or infrastruc-ture upgrades in order to avoid unplannedoutages or service bottlenecksUntitled Document6Currently, KGI and KPI metric values would have to be tracked manually aspart of any initiative implementation. Over time, however, it's likely that thetracking of these and similar metrics could be more directly supported byincreasingly sophisticated analytics, thereby reducing the manual effortrequired to implement the improvements even further.SummaryTDS' implementation and management experience with both SoX 404 assur-ance projects and clients whose needs are similar to the one discussed in ourcase study example have convinced us that supporting effective OperationsGovernance requires a different functional scope than what has beenaddressed by more traditional operations management products, which werepurely focused on IT infrastructure management, or even the new generationof governance management products, which are very project focused. Given the list of key IT priorities referenced earlier, along with additionalrequirements for continuous operations improvement and increased servicedelivery efficiency, it's clear that support automation with underlying meas-urement analytics can and must play a more significant role in taking IT in thedirection of becoming a more predictable, measurable and accountable serv-ice delivery organization. In general, this functionality can be supported muchmore easily as a natural extension of the current generation of network appli-cation availability and performance management products, of whichNetScout's nGenius Performance Manager is one prominent example.For the reasons presented, as well as constituting a very viable solution forour case study client, TDS has determined that the NetScout nGenius prod-ucts, supplemented by automated analytics provided with the recent acquisi-tion of Quantiva, support a very sound foundation for effective operations gov-ernance management. Consistent with what some users have experiencedwith SoX 404 remediation, the realized operations benefit in proportion to theimplementation cost may also be quite favorable.COBIT KGI or KPISupporting nGenius PM FunctionalityNumber of businessprocesses sufferinginterruptions or outagescaused by inadequateIT capacity and per-formanceRequired capacity measured for all applica-tions, both shrink-wrapped and custom, todetermine if infrastructure capacity needsto be adjusted and validates why it needsto be adjustedNumber of critical busi-ness processes notcovered by a definedservice availibility planMonitoring of applications with associatedQoS levels to evaluate if traffic priorityservices are being delivered in the properclassNumber of down-timeincidents caused byinsufficient capacity orprocessing perform-ancenGenius Newspapers provides capacityplanning reports that show actual capacityagainst standard baselines and 90% base-linesPercent of critical ITresources with ade-quate capacity and per-formance capability,taking account of peakloadsMeasures all applications, both businessand non-business to determine if capacityneeds to be adjusted and validates why itneeds to be adjustedPercent of critical ITresources with ade-quate capacity and per-formance capability,taking account of peakloadsResponse time analysis with alarmingcapability to address degradation andunavailability of strategic businessservers/applicationsFigure 6 - nGenius Performance Manager KPI & KGI Metric Support -Performance ManagementUntitled Document7Operations Governance: The New Management ApplicationUntitled Document8NetScout Systems, Inc.Corporate Headquarters310 Littleton RoadWestford, MA 01886 USATelephone (978) 614-4000Fax (978) 614-4004Web: www.netscout.comEuropeRegus House268 Bath RoadSlough, Berkshire, SL1 4DX UKPhone: +44 1753 725561Fax: +44 1753 725562Asia/PacificRoom 105, 17F/B, No. 167Tun Hua N. RoadTaipei, TaiwanTelephone +886 2 2717 1999Fax +886 2 2547 7010The nGenius Solution is comprisedof nGenius Performance Manager,nGenius Probes and for specializedsituations, additional appliancesincluding nGenius Flow Collectorand nGenius Flow Recorder. nGenius Performance Manager is a software application that analyzesthe information collected by nGeniusProbes as well as other network devices,and delivers the features and functionsof multiple performance managementdisciplines in a single product.nGenius Probes are hardware monitoringdevices that are the industry s mostadvanced sources for identifying, collecting and analyzing application-level traffic data across the enterprise.nGenius Flow Collectors are dedicatedhardware devices optimized for collectingapplication conversation data viaNetFlow records produced by leadingnetwork infrastructure devices. nGenius Flow Recorder is an appliancethat couples storage for large packettrace captures and graphics-based datamining software. It continuously recordsall traffic and produces a network audittrail for post-event forensics requiringfull packet payload details. 2005 NetScout Systems, Inc. All rights reserved.NetScout and the NetScout logo, nGenius and Quantivaare registered trademarks of NetScout Systems, Inc.The CDM logo, MasterCare and the MasterCare logo aretrademarks of NetScout Systems, Inc. Other brands,product names and trademarks are property of theirrespective owners. NetScout reserves the right, at itssole discretion, to make changes at any time in its tech-nical information and specifications, and service andsupport programs.CC-0207-05 Rev A005-11-22