News a couple of days ago that Amazon Web Services is adding AWS Glacier to it’s product catalog. AWS is designed for one thing only – long term archival of data that isn’t routinely accessed. I’s not speed optimized but what it is, is price optimized. From AWS’s blurb;

Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions.

Currently much archiving occurs on expensive kit, primarily because the price of cloud storage has been too high for this type of data. By slashing the price considerably, AWS makes it viable to move archiving to the cloud.

MyPOV

There are two ways of looking at this, from a customer perspective and from the perspective of AWS itself. For customers this is a compelling offering since it gets the cost of the archival storage down to ridiculously low levels. While it is true that customers still need to think about the practicalities of getting all that data into Glacier (sending disks by FedEx anyone?), once the data is there the pricing makes it viable to archive the ever increasing amounts of data an organization holds cheaply and easily, and to not have to think about the infrastructure behind that ever again. There goes the decades old approaches of archiving on tape. As Lydia Leong opined;

The more interesting angle however is that of AWS. Being able to move significant volumes of data onto their own service helps to increase yet further the economies of scale that Amazon enjoys, allowing it to yet again compete on a cost basis with other vendors with its other services. It already enjoys some of the largest scale-benefits of a cloud vendor – this drives those unit costs further south.

Of course Glacier is, as yet, unproven. AWS needs to show that it is resilient, and enterprises need to understand that Glacier is no greyhound, data stored on Glacier takes between three and five hours to access. Add to that the fact that retrieval is only free for 5% of total storage per month, after which users are charged a retrieval fee, and it becomes obvious that TCO calculations for switching to Glacier are going to be fairly complex indeed.

More interesting also is the indication from AWS that over time it will allow the seamless movement of data between S3 and Glacier storage based on data lifecycle policies. That could prove a nicely compelling little workflow aid to organizations.

Ben Kepes

Ben Kepes is a technology evangelist, an investor, a commentator and a business adviser. Ben covers the convergence of technology, mobile, ubiquity and agility, all enabled by the Cloud. His areas of interest extend to enterprise software, software integration, financial/accounting software, platforms and infrastructure as well as articulating technology simply for everyday users.

1 Comment
  • The cost calculation will provide complex. If the data is not already in an Amazon zone, network I/O metering charges apply. “Data transferred between Amazon EC2 and Amazon Glacier across all other Regions will be charged at Internet Data Transfer rates on both sides of the transfer.”

    What eDiscovery options are available on top of Glacier? Enterprise organizations often require eDiscovery. Or, must all information conforming a specific time period or meta-data be first transferred out of Glacier?

    Seems like Glacier provides little meta-data, or even timestamps. The client is responsible for maintaining archive meta-data and linking the meta-data to retrieval systems: “Amazon Glacier does not support any additional metadata for the archives. The archive ID is an opaque sequence of characters from which you cannot infer any meaning about the archive. So you might maintain metadata about the archives on the client-side. “

Leave a Reply