Following on from the traffic generated through the recent Amazon outage, Alex Iskold posted this fantastic treatise over on Read/Write web.
Alex raises some excellent points that will be anathema to his enterprise readers. He says;
SLA vs. Common Sense
But maybe last week’s failure is not about clouds but about SLAs (Service Level Agreements)? If the SLA says that you will be up 99.99% of the time, then how can you go down for 3 hours? But here’s the truth about SLAs. Whatever they say, they still don’t mean that the service is not going to go down. You can’t prevent power grid outages and you cannot prevent cloud outages. You can take all the precautions and backups, but still you cannot be completely certain that failure would not occur. First order catastrophes happen.
So the problem is that we should not be looking at the SLA, but instead we need to consider common sense. It is not a single failure of the system that is indicative of the performance. It is the frequency of failures that we should look for. If AWS goes down once a year each year for 3 hours, then it is nothing short of cloud computing paradise. If this happens every quarter, it’s alarming, every month – unacceptable. The point is, as Albert Wegner explained, we need to think about this stochastically.
An interesting point, especially so given a recent exchange I’ve had with someone well connected with CIOs at large organisations. His perspective is very much one of risk mitigation through contractual arrangements. So it’s a case of screw down the commitments, get the vendor to buy some mega insurance policies and monitor, monitor, monitor.
Alex’s perspective is much more in tune with my own. It accepts that outages occur, but it looks at those outages in pragmatic terms. It also regards the customer/vendor relationship as one of partnership, where both sides realise the risks and importance of the service and plan/design accordingly.
What we’re seeing here goes way beyond an insource/outsource discussion – this is, I believe the tip of the culture shift iceberg. We’re seeing a contrast between the cloud-based mentality (one of all care, all concern but shared outcome and responsibility) with that of Enterprise (all contract, all montoring, all blame-laying).
Once the dust settled post Amazon outage, it was interesting to see people’s comments which tended to be along the lines of “well Amazon had a problem but I’m sure they’ve learnt from it and developed ways to ensure it doesn’t happen again”. I wonder what the tone would have been if the effects were primarily felt by large organisations.
So can enterprise adapt to the new paradigm? I believe they can, but it’ll take a new generation of CIOs/CTOs who understand the partnership model much better than there predecessors.
I think those “new generation” of CIOs already exist. It’s more the case of who’s doing it and who’s not.
I also agree with what you said on a new partnership model, but this will also need for vendor’s to shift from their traditional SLA confines as well.
In order for CIOs to pursue new partnership models, vendors need to be willing to listen and change their service processes as well.
The risk mitigation process can sometimes be on the part of the vendor as well, loading themselves up with insurance to protect themselves, and then on-charging that as part of the agreement.
When something does go wrong in those situations, and the insurance backs out, vendor’s may find their clients less accepting of the circumstances due to the face that they had paid a premium for that guarantee of service, or compensation due to a breakdown in service.
Current SLAs mostly do have a tolerance level for disruptions. I’d imagine that it’d be part of the performance measures. However, the degree of tolerance will probably vary depending on the service.
But again, as you’ve said Ben the relationships between vendors and their clients have developed far beyond a simple transactional relationship to one which requires long term partnerships. I think it is fair to say that most people (both vendor and client side), understand this and understand that sometimes, such relationships can be complex, but still productive and beneficial to both.