Here Comes The Petabyte Event Horizon

A step change is on the way for data storage, and in particular for networked storage. Technologies that have served us well for a decade or more are about to run out of steam, and the point-of-no-return looks to be when you start counting in petabytes rather than terabytes.

black holeIf there is one thing that we’ve learned about data storage, it’s that it just keeps on growing. Acquiring and managing extra capacity is therefore a major headache for anyone involved in storage management, and any number of technologies and products have been developed to help with these tasks, from de-duplication and other data reduction techniques to sophisticated storage management software.

Things have evolved massively on the hardware side too. All that storage capacity must be both protected and served up at a decent speed, so storage subsystems have moved first to the likes of RAID (redundant array of inexpensive/independent disks), with its ability to stripe and duplicate data across disk arrays, then to more advanced drive arrays with multiple parity drives, and now to even more sophisticated scale-out systems that use clustering to get past the inherent limitations of a single box.

The problem is that all this is starting to get out of hand – at least, for standard technologies. Not only are there are limits to how many hard disks you can have in a RAID group, but as those disks get bigger – and we are into multiple terabytes now – it becomes harder and harder to rebuild your no-longer-redundant RAID group in a sensible time after a disk failure. And if it takes too long to rebuild, the risk of a second drive failure causing catastrophic data loss is simply too great to stomach.

A popular response is de-clustered RAID, where the RAID group is distributed across dozens or even hundreds of drives. Triple parity schemes (i.e. three parity disks per array) or triple mirroring could also improve resilience during rebuild, though at the cost of additional redundant drives. Another idea is to allocate hot-spare stripes on every drive, specifically to be used for rebuilds. Writing to these spare stripes in parallel yields a much faster rebuild than writing to a single new drive; the hot-spare data is then written back to the replacement drive in due course, restoring redundancy.

However, while approaches such as these can extend the useful life of today’s technologies, they are not a long term fix. For that, something genuinely different is needed, according to experts such as Dr Joseph Reger, CTO at storage developer Fujitsu. “Around 1PB, things happen that change the rules of the game, so you need a new approach for tens and hundreds of PB,” he says.

That is the motivation behind developments such as Ceph, an open source technology for distributed file, block and object storage now being developed by Red Hat – Fujitsu has adopted Ceph for its latest Eternus CD10000 clustered storage systems. Other storage developers have also come up with clustered file systems or distributed file systems in order to achieve massive scale. Just a few of the many examples are Panasas’ PanFS, IBM’s GPFS, and indeed Google’s GFS plus its clones and workalikes such as the Hadoop DFS.

Making it more complex is the cloud-powered drive towards object storage and RESTful (representational state transfer) interfaces, rather than block or file storage. The kind of object storage systems used in the cloud can deal with petabytes or even exabytes of data pretty well, but have other limitations. Most notably, most current software cannot open and edit a cloud-stored object without downloading it, and then uploading it again once it has been worked upon. That makes life awkward for enterprise applications designed to use file-based storage, especially if the files involved are particularly large.

Here too there are interesting solutions, such as Zadara, which aims to fix the problem by using Amazon Direct Connect – this is the option to connect equipment in a nearby co-lo straight into an Amazon data center over a low-latency local link. In effect, it implements a software-defined storage approach to provide file-based access to Amazon S3 storage for today’s enterprise applications, perhaps while we develop RESTful replacements.

Essentially then, once we reach the petabyte event horizon – and as I mentioned, this is an order of magnitude “counting in PB” thing, not a hard limit – we need to re-examine our storage plans. As well as asking for how much longer RAID is sustainable, our other data protection schemes may also be reaching their practical limits in terms of the time and space required and the complexity involved. If we haven’t jumped already, the petabyte event horizon could be the tipping point that pushes us all to new storage architectures.

Image credit: WikiMedia Commons / CC-BY