The world’s data stores are growing at an alarming rate. Market researchers at IDC have estimated it at 40 percent per year, which will take us from 4.4 zettabytes in 2013 to 44 zettabytes in 2020 – a zettabyte is 10^21 bytes. That’s a thousand exabytes, or a million petabytes.
The costs for business are immense. Not only do you need to buy more storage capacity, but the data must also be protected, replicated, and so on. On top of that, it must be managed in terms of the applicable regulations and laws, which may, for example, include deleting personal data when demanded or when it is no longer relevant — and that means deleting it from all the backups and archives too. No wonder that the cost of managing storage now eclipses the cost of buying that storage hardware.
Where is all this data coming from? Some of it — and more in the future — is coming from the Internet of Things, all those Internet-enabled devices which can now feed back reports and health-checks to headquarters. Some is unnecessary duplication, with multiple copies of the same files held all over the place, often thanks to email attachments.
However, a lot of it is down to the increasingly common desire to keep everything, or as close to that as possible, on the assumption is that storage is cheap — which it is, although as I mentioned, storage management is not. It is safer therefore, especially in regulated industries, to keep everything. The hype around Big Data could also be helping, because the message is that you simply don’t know which data will and won’t be useful for business intelligence and analysis in the future.
Certainly, in a survey commissioned by HGST — the company founded as a merger of IBM and Hitachi’s hard disk businesses, and now owned by Western Digital — 87 percent of European data centre decision makers agreed with the statement that data has value if it can be stored, accessed. and analyzed. Yet 50 percent also said they were not storing all available data, and 75 percent wanted better analytical and storage tools.
“We think that 22 percent of the data that’s not stored has value, and we think that will jump to 37 percent,” says HGST’s senior VP and general manager Mike Gustafson. That still leaves around 70 percent as a potential waste of space and the challenge, as ever, is how to figure out which is ‘digital gold’ and which is ‘digital exhaust’. That’s going to be equally true of the data that IS stored, of course, although you have to hope that the valuable proportion here is greater than 22 percent!
So the aim for HGST and its storage cohorts is to help us store more and more, hence its announcement of a 10TB hard drive aimed at applications such as ‘active archive’, where you keep data readily accessible but on low power capacity-optimized drives. To help with thing, the drive is filled with low-density helium to reduce the turbulence between the spinning disk and the flying read/write head, and therefore reduce both vibration and power consumption.
If you want to keep the data available but can stand a somewhat longer access time in return for lower media costs, there is always tape. The LTO Consortium recently announced an extended roadmap, taking it from today’s sixth generation capacity of 6TB of compressed data per cartridge to a tenth generation with 120TB per cartridge, although on past form we might not see Gen-10 drives until the early 2020s.
The conclusion has to be that yes, we store too much data, but that in the modern world we can’t avoid doing so. The key thing is to be aware that it is happening and that it will get worse, with all the implications that brings for storage management, archiving, and WAN bandwidth. The likes of HGST see this as an opportunity to move up the value chain. of course, adding management software and service layers to their drives and systems.
As someone wise once said, “80 percent of the Internet is garbage. The challenge is working out which 80 percent.” It’s the same with our archived data. The added problem, though, is that yesterday’s garbage — say, ephemera such as thermostat and airflow readings from your office’s environmental systems — could easily be tomorrow’s Big Data research material for a project to design the next generation of smart building technology.