Last Christmas Eve, many Netflix users planned to bundle up with their loved ones and watch a bunch of streaming movies. Unfortunately, plans went awry.
Like millions of companies, Netflix depends upon Amazon for key pieces of its infrastructure. Specifically, Hasting’s streaming service runs on Amazon Web Services. Because of an AWS service outage, presumably millions of Netflix customers couldn’t dial up their favorite movies and TV shows. (Some people actually had to talk to each other during the holidays. Perish the thought!) Kidding aside, the incident demonstrated a number of things and this post looks at them.
Lessons from the AWS Outage
Now, the notion of acceptable downtime has been with us for a long time. In my consulting career, I’ve seen different people and organizations cling to wildly different definitions of the term. While opinions vary, everyone agrees that downtime needs to be minimized for many reasons, not the least of which is security. From a 2009 Microsoft article:
Excessive downtime can result in increased exposure to malware, which can lead to many business losses, including the loss of sales, loss of customer goodwill, loss of productivity, loss of competitiveness, missed contractual obligations, and increased costs resulting from the need to make up these losses.
Today, there is no longer such a thing as acceptable downtime. What used to be considered acceptable is now excessive. Fifteen years ago, consumers, employees, partners, and suppliers were not constantly connected. Remember the quaint old days in which we used to get work done at, you know, work? I do. Marissa Mayer’s recent decision on banning remote work wouldn’t have caused such a kerfuffle in 1998. Relatively few of us worked remotely, especially in comparison to today. For years now, entire companies like WordPress.com and a few mentioned in The New Small have employed entirely distributed workforces. That is, employees don’t come to work. Ever.
Today, there is no such a thing as acceptable downtime.
Second, the AWS outage and subsequent Netflix backlash (yes, the topic was trending on Twitter) illustrates how reliant we are upon cloud services–whether we know it or now. Joe Consumer didn’t blame AWS or Jeff Bezos. To him, Netflix was down. End of story.
Finally, the AWS outage was the exception that proved the rule. Put differently, AWS sports up-time well north of 99 percent. When related services like Netflix stop working, it only underscores the fact that they are working the vast majority of the time.
Are there legitimate concerns with the public cloud? You betcha. By the same token, though, the importance of business continuity is impossible to overstate. Ours is a truly global economy now. Downloads to apps, Likes, hits to websites, and product orders take place 24/7. Against, this backdrop, organizations need to constantly minimize their downtime.
What is your company doing to maximize up-time?