It sounds like music to every IT manager's ears: a cloud infrastructure, automated in such a way that administrators no longer need to worry about it. After all, a NoOps environment can fully restore, grow and scale automatically, in line with what’s required of the infrastructure. Sounds like the ideal situation, or is it? While NoOps is a noble ambition, in practice it remains an illusion.
It’s fair to say that an IT environment requiring minimal flexibility would thrive under a NoOps approach. The reality however, is that cloud environments are built to respond to supply and demand in a flexible way. The IT world changes almost daily. That increases the focus on online (sales) channels, and the pressure on those channels. New technologies are emerging, customer patterns are changing, and business demands are evolving. Organizations must therefore continuously check if their cloud environment and the automation within it are accurate and up-to-date. While NoOps offers a start, it’s by no means the end point. Organizations still need to see if their environment corresponds to the original design and if previously defined key performance metrics remain relevant.
Here’s an example
The following example illustrates why a fully automated cloud environment does not always deliver the desired results. Imagine an organization has determined that the latency of an application should be stable at an average of 200ms per request, regardless of the number of users. The number of servers needed to ensure stable latency depends on the number of users and is automatically scaled up or down. When latency is stable, it can be assumed that the scaling of the servers is working properly. However, it could be the case that in a given month only one server is running, as opposed to an average of ten in the previous months. A significant decrease in the number of application users may show. It’s therefore possible that in this case, an issue has arisen somewhere else. Because only latency (a predefined key performance metric) has been noticed however, a considerable decline in application user numbers is overlooked.
The above example shows that while organizations can use NoOps as a guideline in automating their IT operation, it should never be the end goal. When NoOps is achieved, the work has just begun. It’s crucial to continuously look at what can be improved. At the moment maintenance is due, it is important to ensure that a problem doesn’t perpetuate but is automatically solved. In this way, an organization creates a better, more powerful environment. It does however, remain an environment that’s never 'finished'.
Manual tasks remain
By performing maintenance and continuing to improve on errors, unplanned operations can be kept to a minimum. The time an organization saves on regular cloud environment maintenance could also be re-invested back into improving a platform or application. Returning to the example above; if the organization had invested such time gains into optimizing the environment, it would have been noted that all metrics were on green. However, a downward trend in the graph for users or number of servers would also have been spotted. In this case, by manually checking all metrics each month, and looking beyond key performance metrics, problems could have been avoided.
When an issue occurs within a NoOps environment, it’s important to see how it can be solved automatically next time it arises. This saves valuable management time. However, many organizations don’t use the resulting time gains efficiently. By doing so, they miss an opportunity to optimize the cloud environment and adapt it to changing circumstances.
A pitfall to avoid
An important pitfall with NoOps is that organizations only look at the health of the IT environment. They therefore lose sight of other metrics that are relevant from a business perspective. For example, while a website’s visitor numbers could be stable, turnover could be in decline. It’s therefore important for organizations to look beyond IT metrics. Business objectives and the functioning of other systems within the chain must also be monitored. Optimizing the cloud environment is not just about an application’s performance, the end-user experience also deserves adequate attention.
Striving for NoOps is a worthy ambition, but in practice it turns out that an environment is never truly finished. With this in mind, the most important task is to implement continuous improvements within the automated environment, based on changing circumstances. The question remains, can we still consider NoOps a reality? No, not in the strict sense of the word anyway. It appears to be just an illusion.