Cloud outages raise questions on how to architect for resilience

There is no such thing as fail-proof systems says Avi Shillo, and global public cloud systems are no different. Outages have happened, and will occur again in the future. The answer for resilience is taking a cloud-agnostic approach

It’s no secret that the cloud can be a fickle beast. Outages are all too common, and when they happen, they can cause massive disruptions for businesses. So how can you ensure that your business is safe in the cloud?

When a public cloud region goes down, everyone notices.

Since 2017, when a typo took down S3 taking much of the Internet with it, we all became aware of just how fragile the cloud really is. The notorious AWS’s S3 outage was so bad that Amazon couldn’t get into its own dashboard to warn the world.

While this event shook the world, smaller-scale outages are happening all the time. The past year has been no different, with a slew of outages affecting cloud vendors including Amazon Web Services, Microsoft Azure, and Google Cloud.

These incidents brought it home for many IT teams, reminding them that a single misconfiguration, a snowstorm, or an errant typo could take down yet another service and potentially bring down your entire business.

In some cases, a single outage in a cloud vendor may be devastating for an organization, depending on its architecture and deployment choices.

With the growing amount of cloud vendor outages, more businesses are concerned about what to do when a vendor service goes down; but given the wide range of applications that are generally provisioned on public clouds, finding a way to reduce the risk of failure is proving to be difficult.

There is no such thing as fail-proof systems. Human beings make mistakes, and sometimes unexpected events and disasters derail even the most well-thought-out strategies.

Outages are an unfortunate but inevitable aspect of cloud computing. All the cloud vendors have seen outages, some more than others, and outages will keep happening. It is a part of life.

What we’re seeing with these outages is a refocus on how organizations deploy and architect their applications. The awareness that outages are inevitable is creating a healthy tension in the market because it forces people to start thinking about how they build their software. It’s encouraging people to act more responsibly and consider resilience as a priority concern.

Welcome to the cloud-agnostic era

One avenue companies are exploring to maintain their data resilience is making their applications cloud-agnostic, preventing dependency on any single cloud vendor, and enabling their data the freedom to seamlessly shift workloads between cloud regions and vendors in the event of a disaster or outage.

Opting for cloud-agnostic architectures can provide enterprises with the peace of mind that their data is safe, whatever happens to one of the vendors they are working with.

While the idea of ​​adopting the cloud-agnostic type of architecture sounds great in theory, it does come at a cost; this solution is not cheap or easy to implement. You need highly skilled IT pros and a lot of time to do it right.

It’s a very difficult proposition for a company to take a complex application that’s been around for years and retrofit it so that it can run across different clouds. The complexity and costs required can be prohibitive for many organizations. The expertise required to do this is a challenge.

Rather than having everyone scramble and build their own tooling, as we go through this process of adopting multicloud for mission critical workloads, it would be very useful if there were ways to implement multicloud infrastructures as a service. Organizations need to have the ability to boost their resiliency and adopt cloud-agnostic architectures: the next challenge is making multicloud so simple that it is available to all.

The author

Avi Shillo is CEO of Statehub.

.

Leave a Comment