There was a thread going round on Facebook a while back discussing the “fact” that FedEx had more bandwidth for dealing with big data than the internet itself had. OK — it sounds strange, but if all of FedEx’s vehicles were stuffed to the max with disks full of data, it could move more data in one day than the internet can manage. On top of this, because storage densities are increasing faster than the capabilities of the internet, it didn’t look like FedEx was going to lose this (theoretical) crown any time soon.
All very interesting — apart from the latency, of course. Even if FedEx can do same-day delivery of the disks from London, UK to Christchurch, New Zealand or from San Francisco, USA to Johannesburg, South Africa, the latency of the data delivery (24 hours, or around 86 million milliseconds) does tend to mess up the chances of using FedEx for transactional work.
But, there is a point to all this (honest, there is).
Many organizations are looking at moving data to the cloud — either for functional reasons (that’s where the app is going to be) or for information availability reasons (it’s where the archive/backup is going to be).
If the data volume is relatively small, then no problem; a simple movement of the data across the internet will generally be OK. But what happens when your data volume is tens of Terabytes? A few Petabytes? Set off a backup and hope that by the time it has finished transferring across, the next synchronization of changed data isn’t so big that all that happens is a continuous spiral of trying to get the data up to date?
No — although data speeds are improving on the internet, they are not keeping up with a business’ capability to breed data like a virus. This is where the logistics companies come in.
The alternative is a “data pig”. This is a pure storage unit: a large block of disks put together to create a vault big enough for the temporary storage of large volumes of data. The pig is dropped off at the source data center site by a logistics company, and the high speed LAN is used to transfer the data over onto the pig as if it were a standard image backup to a site storage system. I would, of course, recommend that the data is encrypted on the pig — you don’t want all the company’s intellectual property being stolen during transportation. As soon as the data is on the pig, our dear logistics friends rush the pig over to the target external facility, where the data is then retrieved onto the main storage systems there, again at LAN speeds. Even from one side of the globe to another, this should be less than 48 hours from start to finish.
Then, all that is needed is to synchronize the 2-days’ worth or less of changes between the two systems — which can be done via the internet in a short period of time.
I find it nice to see that old-world approaches can still make new-world technology work at times.
Image credit: LordFerguson (flickr)