For most organisations, IT downtime is not something they can afford. Despite that, it’s not always possible to prevent incidents. The amount of serious harm that an incident causes depends on the adequacy of the IT organisation’s actions. Organisations which regard their IT as critical to daily operations therefore frequently set up monitoring. But what kind of monitoring do you need, and where does the break-even point lie between the costs and benefits of monitoring? This question is often put to us during sales procedures. This blog therefore gives a number of tips for carrying out your own cost-benefit analysis.
IT organisations make great use of all kinds of monitoring. In particular, there is no lack of technical (component) monitoring, application monitoring and infrastructure monitoring. However, this does not give a full picture of what an end user is experiencing. In many cases, IT organisations only monitor individual components, which means it can happen that ‘all the lights show green’ even though tickets are coming in to report a malfunction. This means something is going wrong somewhere in the chain.
THE COSTS
At that moment, the ticker is running and it all comes down to speed. How fast is the incident’s root cause found (Mean Time To Identify, MTTI)? And then how fast is the incident resolved (Mean Time To Repair, MTTR)? In practice, we see that many organisations try to eliminate various components in a ‘silo-driven’ approach. And the more complex the environment, the longer that takes. In the meantime, costs accumulate and other consequences emerge (harm to image, loss of revenue, etc.).
To get a cost-benefit analysis of monitoring, it is therefore important to chart the current MTTI and MTTR (linked to staffing costs). It is also important to get a clear view of the indirect impact of malfunctions. For example, how much worker productivity was lost due to the malfunction? Did it impact the customer journey or the organisation’s image or revenue?
THE BENEFITS
To limit the impact of malfunctions, you need insight and overview. Not just insight into an individual component, but also understanding and a wider view of the entire chain. This is achieved with chain monitoring. Some advantages of chain monitoring are set out below:
- The end user is the main concern
End users perform hundreds of actions daily within a business process that is supported by one or more applications. All these combined actions (e.g. logging in, searching, opening a file and editing data) have to be available with an acceptable level of performance. With chain monitoring, you show clearly whether it is possible to carry out these actions, and what the performance is. Regardless of whether an individual component is ‘on green’, the end user’s experience determines the quality. - Rapid domain identification
The MTTI can be greatly reduced with the aid of chain monitoring. When a malfunction occurs, you can see at a glance where in the chain the problem is being caused. Does it lie with an action relating to the database? Or is there a log-in failure? This eliminates the need to systematically investigate each individual component, thereby saving a lot of time. - Clear responsibility
A faster MTTI almost automatically leads to a faster MTTR. When the malfunction’s domain is clear, the responsible supplier or department can take action immediately. By linking chain monitoring to the management organisation or service desk, you can ensure that notifications are passed immediately to the right party. This can also save a great deal of time. - Rapid prioritisation
A malfunction’s priority can be determined better when all IT components are linked to the service or operational process that they support. In that way, it can happen that a malfunction in the customer ordering process gets higher priority than a component of the office automation environment. It is therefore advisable to not only plot out the technical aspects of the chain, but also to relate it to the business process. This leads to more efficient deployment of resources. - From reactive to proactive
Setting intelligent alerts allows you to be proactive. Is performance getting worse? Then you can take action immediately, even before an incident arises. Is a component unavailable? Then you don’t need to wait until and end user complains; the alarm bells sound immediately, and you can start identifying the cause. The facility for trend analysis supports this.
From the perspective of the cost-benefit analysis, the most important argument is that lead times of P1 malfunctions can be shortened by 50%.This is mainly attributable to the rapid domain identification and the ability to take a proactive approach. Experience shows that an overview of the impact of P1 incidents is sufficient for a solid business case. This is sometimes down to escalations lasting hours, with high personnel costs, and sometimes down to the efficiency of a process (as in a hospital). You should therefore start by charting the current costs, not forgetting the indirect impact.