Share
Facebook Facebook icon Twitter Twitter icon LinkedIn LinkedIn icon Email
Busy Crowd

Technology

The critical cybersecurity lesson from the Microsoft outage

Published 19 July 2024 in Technology • 6 min read

The IT debacle is a big reminder that cybersecurity is not just about defending against malicious attacks but also about ensuring system availability.  

In the digital era, most organizations view cybersecurity primarily through the lens of defending against malicious threat actors – and there are indeed plenty of them. However, the major global IT outage, caused by a security update from CrowdStrike that affected Microsoft’s Windows operating system, has laid bare a second, often overlooked aspect of cybersecurity: availability.

“[Crowdstrike] CEO ‘deeply sorry’ for global chaos caused by Microsoft update and warns fix may take time to work”, read the headline in The Guardian newspaper. The widespread disruption was caused by a botched software refresh made bycybersecurity firm CrowdStrike, affecting potentially millions of Windows devices. This led to grounded flights, cancelled doctors’ appointments, and shutdowns of some businesses’ payment systems – underscoring the critical importance of system availability.

A robust cybersecurity strategy rests on three fundamental pillars: confidentiality, integrity, and availability of systems. Confidentiality ensures that sensitive information is accessible only to those authorized to view it. Integrity guarantees that data remains accurate and unaltered except by authorized parties. Availability ensures that systems and data are accessible when needed.

While most companies have become acutely aware of the risks to confidentiality and integrity, driven by the increasing number and sophistication of high-profile hacks, availability often receives much less attention. The Microsoft outage has highlighted the critical importance of this often-neglected pillar. Especially given the dependence on cloud service providers today, where availability is a shared responsibility, having a plan B and even a plan C becomes more important than ever.

Blue Screen Error
According to aviation data provider FlightAware, nearly 4,000 flights in the US alone were either delayed or cancelled

The impact of the outage

The outage serves as a stark reminder of the consequences when system availability is compromised. Today’s disruption affected various sectors, from airports in Sydney and Berlin to airlines like Delta, United and American Airlines, as well as major UK financial services like Barclays and Halifax.

According to aviation data provider FlightAware, nearly 4,000 flights in the US alone were either delayed or cancelled. In the UK, about 3,700 doctors’ practices may have been affected. “Biggest IT fail ever,” wrote Tesla’s chief executive Elon Musk on social media.

The scale of this outage, which impacted both PCs and servers, also underscores our dependency on integrated cloud systems and the necessity of robust incident response plans.

This incident should prompt organizations to reevaluate their cybersecurity strategies, ensuring they are not solely focused on defending against attacks from rogue nation states or thrill-seekers – but also prepared to maintain system availability before, during and after an incident.

Dependency on major IT vendors

Indeed, a second critical lesson from this event is the risk of depending heavily on a single IT vendor. Microsoft counts some 1.4 billion monthly active devices running Windows. This highlights the potential vulnerabilities in relying too heavily on dominant players in the tech industry.

Larger vendors offer better pricing due to economies of scale, and their extensive digital footprint allows them to gain valuable experience, improving their service offerings. This creates a virtuous cycle where larger vendors become more attractive due to their efficiency and expertise. However, this reliance also means that any disruption in these major vendors can have far-reaching impacts.

Trusting established giants like Microsoft, one of the creators of the digital ecosystem, makes sense for many businesses. However, this trust should be balanced with an understanding of the inherent risks. Diversification of vendors, while beneficial in reducing risk, must be managed carefully to avoid the complexity and inefficiency of dealing with too many different systems.

Even if the outage is due to a vendor, like in Microsoft’s case, the organization remains responsible for its users’ experience and must avoid appearing helpless or overly dependent on the vendor.

Contingency planning and crisis communication

The outage also raises important questions about contingency planning and crisis communication. As Microsoft and CrowdStrike scrambled to resolve the issue, their shares fell by 0.8% and 13% respectively by lunchtime in New York, and the impact on operations persisted, with some companies including Dutch carrier KLM resorting to manual systems or advising passengers of delays. Organizations must have robust plans in place to manage such crises, including clear communication strategies.

In a crisis, it’s crucial to communicate frequently and transparently with clients. Organizations should share what is happening, the steps being taken to resolve the issue, and how they are supporting their customers/partners. Even if the outage is due to a vendor, like in Microsoft’s case, the organization remains responsible for its users’ experience and must avoid appearing helpless or overly dependent on the vendor.

This happened before with the SolarWinds hack, a big cyber-attack that came to light in late 2020. On that occasion, attackers inserted malicious code into the company’s Orion software updates. This tainted software was then disrupted to SolarWinds customers over the ensuing months, compromising thousands of organizations, from Fortune 500 companies to US government agencies.

Clearly, even with trusted software vendors like SolarWinds was, risks need to be managed, and no matter how strong your cybersecurity strategy is, there will always be some risk.

The Microsoft outage is a critical reminder that cybersecurity is not just about defending against malicious attacks but also about ensuring system availability.

The role of testing and update protocols

Another critical aspect highlighted by this debacle is the importance of controlled environments for testing updates. The Microsoft outage should have been preventable with adequate testing. Typically, vendors have test environments to catch issues before releasing updates. The fact that this incident occurred suggests either gaps in these environments or unforeseen issues.

Balancing the need for rapid updates, especially in security, against the necessity for thorough testing is a delicate act. While it’s essential to patch and upgrade systems swiftly to mitigate vulnerabilities, this pressure can sometimes clash with the need for comprehensive testing. To avoid issues like the CrowdStrike incident, some experts recommended that staged deployment should be used instead of blindly trusting automatic updates.

But overall, this incident is a wake-up call for the cyber and wider tech industry. The Microsoft outage is a critical reminder that cybersecurity is not just about defending against malicious attacks but also about ensuring system availability.

Organizations should adopt a holistic approach to cybersecurity, incorporating robust incident response plans, diversified risk management strategies, and effective crisis communication protocols. By doing so, they can better protect themselves against the multifaceted threats of the digital age and maintain trust with their clients and stakeholders.

As the digital ecosystem evolves, so too must our approaches to cybersecurity. In light of this event, it is time to seriously reconsider your cybersecurity strategy.

Five questions to ask yourself:

  1. Have you considered availability as well as integrity and confidentiality?
  2. Do you have contingency plans in place for incidents like this and have you tested them?
  3. Do you have a communication strategy in place for incidents like this one, where the solution is not within your control?
  4. Do you test updates before releasing them? As we continue increasing the complexity of our IT systems, proper testing before deployment becomes increasingly critical. Instead of blindly implementing the updates and patches from your vendors, check whether you can test them before releasing.
  5. Are you confronting the issue head-on? You may not be directly responsible for the damage incidents like this can cause, but you are still accountable to your customers. Do not throw your hands up in response but actively engage with your partners and customers in an effort to normalize processes. Often such proactivity wins more than simple engagement. We can only expect such incidents to increase in frequency and impact, so how we deal with them will separate trustworthy organizations from others.

Authors

Oyku Isik IMD

Öykü Işık

Professor of Digital Strategy and Cybersecurity at IMD

Öykü Işık is Professor of Digital Strategy and Cybersecurity at IMD, where she leads the Cybersecurity Risk and Strategy program and co-directs the Generative AI for Business Sprint. She is an expert on digital resilience and the ways in which disruptive technologies challenge our society and organizations. Named on the Thinkers50 Radar 2022 list of up-and-coming global thought leaders, she helps businesses to tackle cybersecurity, data privacy, and digital ethics challenges, and enables CEOs and other executives to understand these issues.

Related

Learn Brain Circuits

Join us for daily exercises focusing on issues from team building to developing an actionable sustainability plan to personal development. Go on - they only take five minutes.
 
Read more 

Explore Leadership

What makes a great leader? Do you need charisma? How do you inspire your team? Our experts offer actionable insights through first-person narratives, behind-the-scenes interviews and The Help Desk.
 
Read more

Join Membership

Log in here to join in the conversation with the I by IMD community. Your subscription grants you access to the quarterly magazine plus daily articles, videos, podcasts and learning exercises.
 
Sign up
X

Log in or register to enjoy the full experience

Explore first person business intelligence from top minds curated for a global executive audience