2012-08-21

Earlier today Amazon
Web Services announced Glacier,
a low-cost, cloud-hosted, cold storage solution. Cold storage is a class of storage
that is discussed infrequently and yet it is by far the largest storage class of them
all. Ironically, the storage we usually talk about and the storage I’ve worked on
for most of my life is the high-IOPS rate
storage supporting mission critical databases. These systems today are best hosted
on NAND
flash and I’ve
been talking recently about two AWS solutions to address this storage class:

 

I/O
Performance (no longer) Sucks in the Cloud

EBS
Provisioned IOPS & Storage Optimized EC2 Instance Types

 

Cold storage is different.
It’s the only product I’ve ever worked upon where the customer requirements are single
dimensional. With most products, the solution space is complex and, even when some
customers may like a competitive product better for some applications, your product
still may win in another. Cold storage is pure and unidimensional.  There is
only really one metric of interest: cost per capacity. It’s an undifferentiated requirement
that the data be secure and very highly durable. These are essentially table stakes
in that no solution is worth considering if it’s not rock solid on durability and
security.  But, the only dimension of differentiation is price/GB.

 

Cold storage is unusual because the
focus needs to be singular. How can we deliver the best price per capacity now and
continue to reduce it over time? The focus on price over performance, price over latency,
price over bandwidth actually made the problem more interesting. With most products
and services, it’s usually possible to be the best on at least some dimensions even
if not on all. On cold storage, to be successful, the price per capacity target needs
to be hit.  On Glacier, the entire project was focused on delivering $0.01/GB/Month
with high redundancy and security and to be on a technology base where the price can
keep coming down over time. Cold storage is elegant in its simplicity and, although
the margins will be slim, the volume of cold storage data in the world is stupendous.
It’s a very large market segment. All storage in all tiers backs up to the cold storage
tier so its provably bigger than all the rest. Audit logs end up in cold storage as
do web logs, security logs, seldom accessed compliance data, and all other data I
refer jokingly to as Write Only Storage. It turns out that most files in active
storage tiers are actually never accessed (Measurement
and Analysis of Large Scale Network File System Workloads ).
In cold storage, this trend is even more extreme where reading a storage object is
the exception. But, the objects absolutely have to be there when needed. Backups aren’t
needed often and compliance logs are infrequently accessed but, when they are needed,
they need to be there, they absolutely have to be readable, and they must have been
stored securely.

 

But when cold objects are called for,
they don’t need to be there instantly. The cold storage tier customer requirement
for latency ranges from minutes, to hours, and in some cases even days. Customers
are willing to give up access speed to get very low cost.  Potentially rapidly
required database backups don’t get pushed down to cold storage until they are unlikely
to get accessed. But, once pushed, it’s very inexpensive to store them indefinitely.
Tape has long been the media of choice for very cold workloads and tape remains an
excellent choice at scale. What’s unfortunate, is that the scale point where tape
starts to win has been going up over the years. High-scale tape robots are incredibly
large and expensive. The good news is that very high-scale storage customers like Large
Hadron Collider (LHC) are
very well served by tape. But, over the years, the volume economics of tape have been
moving up scale and fewer and fewer customers are cost effectively served by tape. 

 

In the 80s, I had a tape storage backup
system for my Usenet server
and other home computers. At the time, I used tape personally and any small company
could afford tape. But this scale point where tape makes economic sense has been moving
up.  Small companies are really better off using disk since they don’t have the
scale to hit the volume economics of tape. The same has happened at mid-sized companies.
Tape usage continues to grow but more and more of the market ends up on disk.

 

What’s wrong with the bulk
of the market using disk for cold storage? The problem with disk storage systems is
they are optimized for performance and they are expensive to purchase, to administer,
and even to power. Disk storage systems don’t currently target cold storage workload
with that necessary fanatical focus on cost per capacity. What’s broken is that customers
end up not keeping data they need to keep or paying too much to keep it because the
conventional solution to cold storage isn’t available at small and even medium scales.

 

Cold storage is a natural cloud solution
in that the cloud can provide the volume economics and allow even small-scale users
to have access to low-cost, off-site, multi-datacenter, cold storage at a cost previously
only possible at very high scale.  Implementing cold storage centrally in the
cloud makes excellent economic sense in that all customers can gain from the volume
economics of the aggregate usage. Amazon
Glacier now offers
Cloud storage where each object is stored redundantly in multiple, independent data
centers at $0.01/GB/Month. I love the direction and velocity that our industry continues
to move.

 

More on Glacier:

 

·         Detail
Page: http://aws.amazon.com/glacier

·         Frequently
Asked Questions: http://aws.amazon.com/glacier/faqs

·         Console
access: https://console.aws.amazon.com/glacier

·         Developers
Guide: http://docs.amazonwebservices.com/amazonglacier/latest/dev/introduction.html

·         Getting
Started Video: http://www.youtube.com/watch?v=TKz3-PoSL2U&feature=youtu.be

 

By the way, if Glacier has caught your
interest and you are an engineer or engineering leader with an interest in massive
scale distributed storage systems, we have big plans for Glacier and are hiring. Send
your resume to glacier-jobs@amazon.com.

 

                                                                --jrh

 

James
Hamilton 

e: jrh@mvdirona.com 

w: http://www.mvdirona.com 

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com



From Perspectives.

Show more