Following on from the traffic generated through the recent Amazon outage, Alex Iskold posted this fantastic treatise over on Read/Write web.
Alex raises some excellent points that will be anathema to his enterprise readers. He says;
SLA vs. Common Sense
But maybe last week’s failure is not about clouds but about SLAs (Service Level Agreements)? If the SLA says that you will be up 99.99% of the time, then how can you go down for 3 hours? But here’s the truth about SLAs. Whatever they say, they still don’t mean that the service is not going to go down. You can’t prevent power grid outages and you cannot prevent cloud outages. You can take all the precautions and backups, but still you cannot be completely certain that failure would not occur. First order catastrophes happen.
So the problem is that we should not be looking at the SLA, but instead we need to consider common sense. It is not a single failure of the system that is indicative of the performance. It is the frequency of failures that we should look for. If AWS goes down once a year each year for 3 hours, then it is nothing short of cloud computing paradise. If this happens every quarter, it’s alarming, every month – unacceptable. The point is, as Albert Wegner explained, we need to think about this stochastically.
An interesting point, especially so given a recent exchange I’ve had with someone well connected with CIOs at large organisations. His perspective is very much one of risk mitigation through contractual arrangements. So it’s a case of screw down the commitments, get the vendor to buy some mega insurance policies and monitor, monitor, monitor.
Alex’s perspective is much more in tune with my own. It accepts that outages occur, but it looks at those outages in pragmatic terms. It also regards the customer/vendor relationship as one of partnership, where both sides realise the risks and importance of the service and plan/design accordingly.
What we’re seeing here goes way beyond an insource/outsource discussion – this is, I believe the tip of the culture shift iceberg. We’re seeing a contrast between the cloud-based mentality (one of all care, all concern but shared outcome and responsibility) with that of Enterprise (all contract, all montoring, all blame-laying).
Once the dust settled post Amazon outage, it was interesting to see people’s comments which tended to be along the lines of “well Amazon had a problem but I’m sure they’ve learnt from it and developed ways to ensure it doesn’t happen again”. I wonder what the tone would have been if the effects were primarily felt by large organisations.
So can enterprise adapt to the new paradigm? I believe they can, but it’ll take a new generation of CIOs/CTOs who understand the partnership model much better than there predecessors.