Don’t Panic About ITOps

You are interested in Don’t Panic About ITOps right? So let's go together look forward to seeing this article right here!

When you’ve finished patching, take a moment to evaluate your IT processes and contemplate how incorporating AI & Machine Learning could enhance your ability to handle future emergencies.

So, what are you doing about Meltdown and Spectre?

If you’ve been out of the loop (or are still recovering from a good New Year’s Eve celebration), here’s a quick recap – feel free to skip the following paragraphs if you’re already familiar with these issues.

In early January, major vulnerabilities affecting popular processor architectures were revealed. Operating system developers had actually been notified well in advance, and patches were quickly made available for all major systems and browsers. Yes, even web browsers – unfortunately, one of the attack vectors is through JavaScript executing in a user’s browser.

At a high level, all three vulnerabilities (Spectre, which is actually the name for two separate issues, CVE-2017-5753 and CVE-2017-5715, while Meltdown is known as CVE-2017-5754) are related to speculative execution. These vulnerabilities are so closely related that they were actually independently discovered by as many as four different groups.

Under normal circumstances, speculative execution means that the CPU will “guess” what the next instructions might be and execute them during idle cycles. If the guess is correct, the result is a perceived improvement in system responsiveness, as the results are already available. And if the guess is incorrect, no harm is done, and the CPU simply proceeds to the next instruction.

The problem with this approach – and the source of these vulnerabilities – is ensuring that all the different operating processes cannot access each other’s data in memory, especially sensitive user data like passwords and credit card numbers. Various methods were supposed to keep processes’ data separate, especially the central kernel. However, through various techniques, mostly involving precise timing, it seems possible to back-solve and retrieve what should be private data – even from within a web browser.

If you want a more detailed analogy, Ben Thompson published a great one at Stratechery.

See also  Mitigating the Risk of Supply Chain Attacks: Ensuring Visibility and Security

The Spectre Of IT Operations Overload

Okay, so that’s where we stand: install your OS and browser vendors’ patches, and keep an eye on this issue for your next major hardware refresh. But apart from the usual headache of distributing patches and dealing with the dependencies, what does this have to do with daily IT operations?

Here’s the problem: nowadays, security vulnerabilities are not just CVEs discussed on dedicated mailing lists by a small number of experts. They have become media celebrities with catchy names. Before Meltdown and Spectre, there was Rowhammer, GHOST, Shellshock, Sandstorm, and of course, Heartbleed, the first vulnerability to really break into the mainstream.

These previously obscure infosec issues are now reported in mainstream news, not just in the tech press. While this visibility may help more people patch their personal systems and avoid being affected, the downside for IT operations is that, for the next year or so (or until the next major bug), every little thing that goes wrong may be blamed on the bug itself or its patch or workaround.

This is especially true for Meltdown and Spectre, as the fixes for these vulnerabilities will reduce or even eliminate the performance gains from speculative execution. It’s still unclear how significant that impact will be, as it varies widely between use cases, but some users are reporting a doubling of CPU utilization.

[Image: Twitter link]

This distraction will worsen the already challenging signal-to-noise ratio that IT operations professionals face. It’s difficult enough to determine which alerts are genuine and how they relate to each other, without being sidetracked by the suspicion that this family of issues or one of its patches is causing part of the problem. And all of this adds to the effort and stress involved in distributing a critical patch everywhere in a timely manner.

There Is No Quick Fix For IT Operations

Now, I don’t want this to sound like an opportunistic post that capitalizes on every major breach or disclosure. Nothing could have protected you from this one, unless you’re really into retro-computing. As many people jokingly pointed out on Twitter, VAX systems, PDPs, and the like are unaffected. Also, there’s no complete fix yet, and the best advice is simply to keep up with your patches, which you should be doing anyway.

See also  The Zero-Belief Triangle: MFA, RBI, and Microsegmentation

More generally, it should be clear by now that this is not an isolated occurrence. There’s always another patch to roll out, another release to deploy, another change to make. IT operations is no longer a back-office process that can be meticulously planned out, but an ongoing real-time activity. And that means it needs to be approached fundamentally differently.

The old approaches that relied on exhaustive planning and documentation are no longer valid. Everything moves too fast for that to work. Instead of manual processes, phone bridges, and low event-to-alert ratios, IT operations in 2018 requires automation at all levels, streamlined collaboration, and a small number of relevant, actionable alerts automatically filtered from the massive data streams generated by modern infrastructure.

AI & Machine Learning techniques are the only way to remove enough friction from IT operations to be able to respond swiftly to the next Meltdown or Spectre – or any sudden project idea from marketing, new sales campaign, or change of heart from the corner office. The emerging field of AIOps is all about embedding the latest algorithmic techniques into IT operations, along with streamlined collaboration among all the different specialist roles that need to be informed or involved.

When you’ve finished this round of patches, take a moment to evaluate your current IT operations process and consider how each firefighting episode is impacting them. It may be time to enhance your existing specialized systems with an AI-driven overlay that can give you the breathing space needed to handle new situations without everything turning into an emergency.

Conclusion: So above is the Don’t Panic About ITOps article. Hopefully with this article you can help you in life, always follow and read our good articles on the website:

Related Articles

Back to top button