By Christian Kelly, CTO, Xantrion Inc.
As Xantrion’s CTO, I’ve seen my fair share of IT incidents and system outages over the years. But the recent CrowdStrike incident stands out as a stark reminder of the risks inherent in our interconnected digital world. This event, which some are calling “the largest outage in the history of information technology,” offers crucial lessons for businesses of all sizes.
While Xantrion customers were unaffected, we view this situation as an opportunity to refine our approaches and strengthen our clients’ resilience against potential future disruptions. Let’s explore what happened — and what we can learn from it.
The Incident
CrowdStrike, a leading American cybersecurity company, inadvertently distributed a faulty update to its security software that caused widespread system failures — this wasn’t a cybersecurity breach, but rather, a severe availability incident. Approximately 8.5 million systems running Microsoft Windows crashed and were unable to restart properly. The impact was felt globally, affecting industries ranging from airlines and banks to hospitals and government services. The estimated financial damage exceeded $10 billion — and more than a week later, the fallout continues.
Key Lessons for Small and Mid-Sized Businesses
The CrowdStrike incident highlights areas where businesses can strengthen their operational resilience and IT management practices. Specific lessons learned include:
No One is Immune
Insight: The CrowdStrike incident is a good reminder that even industry leaders can make mistakes. For small and mid-sized businesses, this underscores the critical need for robust disaster recovery and business continuity plans. It’s not a question of if a major disruption will occur but when.
Action: At Xantrion, we participate in tabletop exercises with clients that can simulate various outage scenarios, including an “all systems down” event. These types of exercises can help identify vulnerabilities and ensure your team (and ours) knows how to respond effectively.
The Cloud is Not a Panacea
Insight: Many cloud-native services were impacted during the CrowdStrike incident, reminding us that cloud solutions — while robust — are not immune to large-scale disruptions.
Action: Review your cloud strategy. Your Xantrion consultant can help ensure you have contingency plans for critical cloud-based services and recommend hybrid approaches where appropriate.
Understanding Your Risk Tolerance
Insight: Not every business system needs 100% availability. Understanding the actual impact of potential outages on your operations and prioritizing your investments is crucial.
Action: To understand your risk tolerance, Xantrion can conduct a thorough business impact analysis with you. Together, we’ll identify critical systems and processes and determine acceptable downtime for each. This will help ensure that you allocate resources more effectively.
Communication is Key
Insight: During the CrowdStrike incident, many organizations found that while their computers were down, employees could still communicate via mobile devices. This highlights the importance of having diverse communication channels.
Action: Your Xantrion consultant can help you develop and regularly update an emergency communication plan that doesn’t rely solely on company-owned devices or networks.
Vendor Due Diligence is Essential — But Not Foolproof
Insight: The CrowdStrike incident demonstrates that even thoroughly vetted vendors can experience catastrophic failures. While due diligence is crucial, it does not guarantee protection against all risks.
Action: Xantrion can be a partner as you consider vendors. We can review and update vendor assessment processes. And we look beyond technical capabilities to include the vendor’s incident response plans and communication protocols.
Balancing Security and Business Continuity
Insight: It can be tempting to have a knee-jerk reaction after a newsmaking incident like this. But resist the urge to make panic-induced decisions; instead, take a step back and examine your operational and business continuity goals to ensure they’re in balance with your security needs.
Action: Xantrion regularly reviews updates and patch management policies to ensure we’re following industry best practices. We also perform staggered rollouts for critical updates to limit potential damage from faulty patches.
Resilience in an Unpredictable World
The CrowdStrike incident serves as a wake-up call for businesses of all sizes. It highlights the need for a holistic approach to IT management that goes beyond just preventing cybersecurity attacks to include robust resilience and recovery capabilities for all types of system failures. After all, when it comes to IT operations, the goal isn’t perfect uptime — it’s building systems and processes that can withstand and quickly recover from unexpected disruptions.
Remember, operational resilience isn’t just about technology — it’s about people, processes, and continuous improvement. By learning from incidents like this and implementing thoughtful strategies, small and mid-sized businesses can significantly enhance their overall IT resilience.
At Xantrion, we’re committed to helping you navigate any IT challenges you encounter. Contact us today to learn more about our IT management services.