When Software Updates Go Wrong: TPRM Strategies for Resilience
November 25th, 2024 •Loren Johnson• Reading Time: 5minutes
The following is a discussion between Loren Johnson, Sr. Director of Product Marketing at Aravo, and Eric Hensley, Aravo Chief Technology Officer shortly following the now notorious CrowdStrike incident. In this conversation, they explore how software updates can go bad and what kind of blows your third-party environment ends up taking when that happens.
Loren: In this market, we discuss assessing risks and where risks actually live. There have been recent stories of major corporations losing access to computer systems due to a software release gone bad at their third party. It seems like corporations have not considered this kind of risk associated with a single area of their business partnership in their deployments.
What did those companies miss and how should they have assessed their risks with their third parties a little differently?
Eric: Good question. Third-party risk management (TPRM) is coming up a lot these days. As the CTO at Aravo, I see a lot of risk assessments from the point of view of our customers assessing us, but also, of course, we provide risk assessment as part of our product suite.
One of the big problems that we consistently see over the years is areas of risk in silos. This has been particularly insidious because what happened at CrowdStrike felt like an IT due diligence risk, and there’s a big focus on this because people are constantly being phished and losing data. You see it all the time. And this recent CrowdStrike incident, while it didn’t have anything to do with phishing, still felt like an IT risk to people when it wasn’t an IT issue. The risk silos and resulting gaps create situations where risks like this get overlooked.
And so, I think that people missed it because they had traditional silos of IT risk rather than looking at their risk holistically and bringing, for example, software development lifecycle people into the risk management conversation. What we really had was a problem of software development management, quality management, and release management, which is not actually an IT risk from the point of view of IT due diligence and is not something that you commonly see within it.
Avoiding Single Points of Failure and Concentration Risk
Loren: If you develop software yourself, you’re going to see how software is deployed. Sometimes, there are bumps in the road; you do internal risk management. You should include some accountability for software development processes and delivery that can fail. Ultimately, you don’t want that to be a significant and unassessed point of failure.
Maybe this was where a single question should have been asked or some kind of assessment of the environment done prior to accepting software updates. This would help make sure you don’t have so much concentration risk with a software-providing third party. Ideally, you should have some risk analysis that necessitates insulation from your third-party software providers, or something that protects you from a single point of failure that may happen when a corrupted file or other update fails.
Eric: You absolutely should. This is difficult, though. This is because almost every company has internal IT. Of course, we are outsourcing more and more functions of our business for reasons of efficiency and, of course, taking on more third-party risks. But everybody has IT people, and they focus on their specific realm.
And because of that, everybody feels like they can and do adequately assess relevant due diligence risks. Also, a lot of people have single-risk domain IT due diligence risk systems. Not every company is a software company. Not every company has software engineering people who understand deployment models or associated risks.
And so, it’s less common that your standard internal risk folks will be even thinking about software development lifecycle risks as a legitimate or applicable risk. What does it mean to deploy software? What does it mean to have concentration risk where I have a software vendor who has pieces of their software in my infrastructure?
And so once again, this is one of these areas where it’s important to stop thinking about risks in silos. You have this huge concentration risk, which is hard enough to figure out, combined with typical siloed IT due diligence, risk management, etc. Your vision and scope have to expand to where your risks actually are.
Looking at Cyber and TPRM Holistically
Loren: It’s interesting that when we look at TPRM, the companies that do it well make it holistic across the business. And it’s not necessarily a certain industry. You might think manufacturing, for example, has a large supply chain, and they have to be very aware of risks along the supply chain, delivery assurances, and even ESG.
It’s rare to see the whole company aware of this kind of holistic risk management, and this is where the frontline people really need to report it and say, “These are things you should consider when you’re doing TPRM.”
So, how do you empower IT people and software teams to manage and communicate these real risks? Look at what happened recently, with these companies brought down by a single point of error: the CrowdStrike update.
Getting the Right People Involved
Eric: If you’re managing your program holistically, it’s about bringing the right stakeholders together in your business into this process and not thinking in silos. Any company that has any familiarity with the development and deployment of software has software risk people in it. They’re usually software QA folks. This is what they do all day.
When I get an IT due diligence assessment, it’s still very rare that I see questions like, “When you deploy software, what are your risk controls around deployment?” “What does your quality process look like?” “How can you roll your software back effectively?” Very rarely do we see that, and those are important questions for everyone to be asking, especially now.
Aravo assessments have those questions embedded, because we’re a software company, and it occurs to us. But the reason that tends to happen at other organizations is that all the right stakeholders just haven’t been brought into this process.
Loren: And in terms of resilience, it’s important to ask what needs to be done to get back up to speed and get going again. In the case of CrowdStrike, there were some companies that were able to get back up within a few hours and some that weren’t, and it became a big public embarrassment with finger pointing and passing the buck. There were some failures that should have and could have been avoided. There are best practice resiliency strategies you can put into place to best position your organization to quickly recover should these kinds of risk events occur.
Resiliency Strategies to Avoid Software Deployment Failures
Eric: This is a real scenario that a lot of companies could face. You probably have third-party software companies somewhere in your supply chain.
The first thing to identify is if you have software people in your company. Can you bring them into your risk program? Just have a meeting and ask them, what are the risks of building and deploying software that I should be asking our third parties about? And include those concerns when you assess the relative risk of your third-party software providers.
If you don’t have software folks in your organization, do you have risk intelligence providers? Are you using risk intelligence providers to augment the information that you have about your third party? Go and ask them these same questions you would to an internal stakeholder, particularly if you have an IT risk intelligence provider. Questions to ask them include
How much assessment of software development lifecycle risk are you doing?
Do you have products along these lines?
Are you asking some key questions?
Is this area of risk even covered?
And your answers are going to be mixed because due diligence and software are not quite the same even though they seem the same.
Finally, if you have a program provider who runs your program for you or helps you with software tools, just ask them what they have regarding software development and lifecycle risks and insulate yourself.
As a business that works with third parties, make sure you understand your software development risk. What happens if something goes down or if the SLAs aren’t met? Do you have a backup plan? If it’s a risk possibility, include it in your assessments and evaluations of third-party software provider engagements.
Don’t be in a situation like the airlines that had outages that affected thousands of customers and the inability to fly because of a single point of system failure. You need to know what actually could happen and be able to score that risk – and notably, how to reduce that risk and recover from it if it occurs.
Loren Johnson leads Aravo’s product marketing function, covering how Aravo builds, markets, and sells its market-leading third-party risk management solution. Driven by a passion for innovation and solving business challenges, Loren brings an international business perspective and desire to deliver measurable customer success. Loren is a long-term TPRM advocate with an MBA in International Management from Thunderbird, and more than 30 years working in the technology sector. With eight years in the GRC market, Loren brings enthusiasm and an informed perspective to his work with Aravo.
Senior Director, Product Marketing
Loren Johnson leads Aravo’s product marketing function, covering how Aravo builds, markets, and sells its market-leading third-party risk management solution. Driven by a passion for innovation and solving business challenges, Loren brings an international business perspective and desire to deliver measurable customer success.