Update: (Mar. 2, 2017 12:10 PM PST) According to a lengthy note posted by Amazon Web Services today, the disruption was caused by a wrong input on a removal command for the S3 system. In other words, a simple “typo” in the code caused the issue. This mistake caused a larger set of their S3 servers to be removed than intended and they had to go through a long restart process to bring them back online.
What happens when a service that houses a large number of websites’ cloud computing duties goes down? Well, it takes big chunks of the internet with it.
That’s what happened Tuesday afternoon as Amazon Web Services (AWS) experienced issues and brought down or partially affected thousands of sites and apps.
AWS, the largest cloud services platform in the world, stated that it suffered “high error rates with S3 in US-EAST-1” causing a number of its services to go dark for approximately four hours.
Thousands of websites and companies, big and small, rely on AWS cloud and storage platforms to provide the infrastructure and server backbone needed for web deployment without building them themselves. Sites and apps also rely on AWS for backend processes like payment transactions and security logins.
The Amazon S3 server hosts around 148,000 websites, mainly in the U.S. The S3 issues affected multiple sites across the country, particularly the ones located on the East Coast.
As we reported yesterday, some of the websites affected were:
Airbnb, Down Detector, Freshdesk, Pinterest, SendGrid, Snapchat’s Bitmoji, Time, Buffer, Business Insider, Chef, Citrix, CNBC, Codecademy, Coursera, Cracked, Docker, Expedia, Expensify, Giphy, Heroku, Home Chef, iFixit, IFTTT, isitdownrightnow.com, Lonely Planet, Mailchimp, Medium, Microsoft’s HockeyApp, News Corp, Quora, Razer, Slack, Sprout Social, Travis CI, Trello, Twilio, Unbounce, the U.S. Securities and Exchange Commission (SEC), and Zendesk.
The outage started at about 9:45 a.m. PST and according to the AWS Service Health Dashboard, the s3 service fully recovered at 1:49 p.m. PST.
Amazon still has not specified the reason nor divulged any details concerning the cause of the S3 storage system issue but it stated in a later AWS tweet that they “believe they understand the root cause.”
For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.
— Amazon Web Services (@awscloud) February 28, 2017
Although Tuesday’s AWS issue may have only caused slight inconveniences for companies (including us) and their web operations, this just illustrates how putting a large portion of the web in a few “baskets,” so to speak, can have disastrous consequences.
We have seen wide disruptions caused by issues on large web service providers such as last year’s Dyn DNS server denial-of-service attack, and most recently, the Cloudflare “Cloudbleed” bug, and how these affected thousands of sites and millions of people globally.
With this increasing reliance on a handful of web hosting platforms, we can’t help but think that the cyberspace equivalent of “The Big One” can be lurking just around the corner.