Amazon Web Services crash: who owns the internet?

Image: public domain

In the early hours of 20 October, a simple error disrupted the operations of a data centre hub for Amazon Web Services (AWS) in Northern Virginia. The ripple effect over the next few hours caused global outages to services as diverse as banking and real-time trading, tools for office management, social media, and even advanced manufacturing. 

In all, a total of 2,000 large companies were impacted, including a number of British banks, such as Lloyds, Halifax, and the Bank of Scotland; government software such as the US Federal Web Services and the British tax service, HMRC (His Majesty’s Revenue and Customs); as well as a plethora of highly-used apps, websites, and services, such as Amazon itself.

Europe’s car industry was also impacted by the outage. Advanced manufacturing utilises automation dependent on these web services. For example, Volkswagen’s Digital Production Platform is built on AWS and connects 43 facilities across three continents. It uses real time analysis to inspect quality and optimise energy usage using machine learning. BMW and Mercedes-Benz are other examples. 

Whilst the factory floor was not immediately shut down by these outages, the effects in terms of slips in quality control, maintenance alerts, and supply chain coordination remain to be seen. The impact will likely have been dampened, as the outage occurred outside of peak production hours.

CyberCube, a risk analytics firm, estimates that, should all affected parties decide to file claims against AWS, the total loss will be around $581 million, although the final figure may well be higher, as medium and long-term impacts become apparent. 

Fragility

The cause of the outage wasn’t a cyber attack, but an error introduced by a ‘latent defect’ in Amazon’s automated DNS (Domain Name System, how the internet translates your search and connects it to the website database) management programme.

AWS provides ‘on-demand cloud computing’. Instead of companies spending money to maintain their own servers – which takes space, a high level of investment in technology, and continual maintenance – companies have increasingly turned to renting portions of massive data centres on a pay-as-you-go basis. Gigantic monopolies like Amazon – which are large enough to muster up the necessary investment, and to take advantage of economies of scale – can offer this service at a much cheaper rate than individual smaller companies all creating their own systems.

The Amazon Web Services (AWS) office at CityCentre Five, 825 Town and Country Lane, Houston, Texas.DNS errors are incredibly common, the issue was exacerbated by the fact that AWS effectively acts as the backbone for a third of the internet / Image: Tony Webster, Wikimedia Commons

AWS makes up about 30 percent of the global cloud infrastructure, giving Amazon a dominant position in the market. Combined with Microsoft and Google, which have a market share of 25 percent and 11 percent respectively, these three control around 66 percent of the entire global cloud infrastructure. In reality though, even many services that don’t directly run on AWS will have components that do. 

This is not the first time that an error in computing systems has had a ripple effect on this scale. Last year, the American cybersecurity company CrowdStrike issued a faulty update for its systems, which crashed around 8.5 million computer systems worldwide. It is estimated that this caused at least $10 billion in damages, and shut down hospitals, airports, banks, government services, and stock exchanges.

DNS errors are incredibly common, the issue was exacerbated by the fact that AWS effectively acts as the backbone for a third of the internet. Many have commented on the fragility of the market demonstrated by this outage, that an error in an automated code run on servers on the East coast of the USA can shut down banks and government departments from Canada across to the Atlantic. 

Amidst all the warnings of ‘fragility’ this outage has highlighted the massive potential power of the working class.

That something as mundane as an error in an automated script can cause such a rippling effect internationally poses the question: what if the workers – of which Amazon employs over 1.5 million worldwide – decided to shut this down themselves, to assert their own interests? 

Particularly in an industry that is dominated by just three monopolies, the workers in this industry hold an immense amount of potential power. Reaching across oceans, implicating banks, governments, and whole industries.

Short sighted

But this fragility is not a ‘necessary evil’ of large-scale cloud infrastructure. Most of the companies that suffered from the outages do have failsafes in place for such outages. The problem lies in the fact that many of these backups were themselves reliant on the very same AWS infrastructure. 

Industry experts have described this as a lack of “visibility”. In other words, many companies aren’t aware of which of their various online services, backup plans, and monitoring tools are dependent on AWS infrastructure. That is, until there is an outage and the entire network goes down. For example, a company may go to a third party as a failsafe service provider, who in turn is reliant on AWS or another major monopoly for their own infrastructure. 

This lack of “visibility” is another way of referring to a lack of planning. The fact that Amazon is a monopoly player in the market was responsible for the widespread nature of the outage. But it is the fact that there is general anarchy in the market, with web and cloud service providers working blind as to which services depend on which other services that enormously exacerbated this chaos.

Even so, there is a lot that can be done to mitigate these shocks, such as greater contingency planning, and proactively simulating system shut-downs to test for weaknesses (known as chaos engineering). Studies have shown, for example, that those companies that have invested in certain AI assisted tools built to monitor these interdependencies, have managed to reduce the impact of network outages by around 43 percent. 

Amazon Image Jérémy Günther Heinz Jähnick Wikimedia CommonsThis isn’t a ‘baked in’ issue with large-scale cloud infrastructure, but with the profit motive and the anarchy of the market / Image: Jérémy Günther Heinz Jähnick, Wikimedia Commons

And yet much of this is simply not pursued, not due to oversight, but due to a focus on cutting costs and maximising returns. This isn’t a ‘baked in’ issue with large-scale cloud infrastructure, but with the profit motive and the anarchy of the market. 

The origin of the outage was a specific AWS data centre in Northern Virginia, a part of the so-called Data Centre Alley, the largest cluster of data centres in the world. That Amazon’s main data centre suffering an error shut down the whole network demonstrates that no plans or systems were in place for such a scenario.

The fact that a problem in one region led to a global shutdown has placed the focus on so-called ‘regional redundancy’ – that is, creating a certain amount of redundancy in different regions so that if one goes down, the whole lot doesn't go down. The problem is, building in redundancy costs money, which means higher service costs, which cuts into competitiveness and profit.

Similarly to the way ‘just in time’ production in the pursuit of tighter profit margins has led to cuts to redundancy in supply chains globally, thus making those supply chains vulnerable to shocks, so the same capitalist logic has been applied to cloud services.

And the AWS outage has also had a similar effect of highlighting strategic dependence, just as shocks to supply chains have highlighted this fact in a world riven by inter-imperialist rivalry. 

Europe humiliated

Besides the economic impact, this will have been a humiliating reminder to the European countries of their dependence on American capitalism in this field.

Some in government, alongside tech experts, have suggested that the solution is for Britain and other European powers to build their own cloud infrastructure, and compete with the Americans. But this is an already established market, dominated by monopolies, such as Amazon, Google, and Microsoft. Of the ten largest data centres on the planet, seven are in the United States, two are in China, and only one is in Europe, and is owned by Google. 

Compared to the 66 percent market share of the three US giants, Europe collectively has a market share of just 15 percent. This is a gap that is widening further. Just four months ago, British Prime Minister Keir Starmer was boasting of a £40 billion investment commitment from Amazon. In Germany, Microsoft announced last year a €3.2 billion investment in data centres in Germany alone, which is more than the entire German cloud industry itself invests annually, at €2 billion.

Clearly, monopolies aren’t going anywhere. In fact, it is due to the very logic of capitalism that monopolies emerge, as competition gives way to higher degrees of concentration in each industry. As explained earlier, a company like Amazon – through high levels of internal planning and coordination – can operate far more efficiently than a small company with fewer resources at its disposal, this will inevitably give it an edge over newcomers.

Monopolies contain the seed of potential economic planning. But the profit motive and the coexistence of monopolies with the anarchy of the market are a perfect cocktail for potentially catastrophic outages. This is just one more argument for expropriating these capitalists and reorganising the economy into a socialist plan!

Join us

If you want more information about joining the RCI, fill in this form. We will get back to you as soon as possible.