The FedEx Van

Big Data — a Job For a Logistics Company?

The FedEx VanThere was a thread going round on Facebook a while back discussing the “fact” that FedEx had more bandwidth for dealing with big data than the internet itself had.  OK — it sounds strange, but if all of FedEx’s vehicles were stuffed to the max with disks full of data, it could move more data in one day than the internet can manage.  On top of this, because storage densities are increasing faster than the capabilities of the internet, it didn’t look like FedEx was going to lose this (theoretical) crown any time soon.

All very interesting — apart from the latency, of course.  Even if FedEx can do same-day delivery of the disks from London, UK to Christchurch, New Zealand or from San Francisco, USA to Johannesburg, South Africa, the latency of the data delivery (24 hours, or around 86 million milliseconds) does tend to mess up the chances of using FedEx for transactional work.

But, there is a point to all this (honest, there is).

Many organizations are looking at moving data to the cloud — either for functional reasons (that’s where the app is going to be) or for information availability reasons (it’s where the archive/backup is going to be).

If the data volume is relatively small, then no problem; a simple movement of the data across the internet will generally be OK.  But what happens when your data volume is tens of Terabytes?  A few Petabytes?  Set off a backup and hope that by the time it has finished transferring across, the next synchronization of changed data isn’t so big that all that happens is a continuous spiral of trying to get the data up to date?

No — although data speeds are improving on the internet, they are not keeping up with a business’ capability to breed data like a virus.  This is where the logistics companies come in.

The alternative is a “data pig”.  This is a pure storage unit: a large block of disks put together to create a vault big enough for the temporary storage of large volumes of data.  The pig is dropped off at the source data center site by a logistics company, and the high speed LAN is used to transfer the data over onto the pig as if it were a standard image backup to a site storage system.  I would, of course, recommend that the data is encrypted on the pig — you don’t want all the company’s intellectual property being stolen during transportation.  As soon as the data is on the pig, our dear logistics friends rush the pig over to the target external facility, where the data is then retrieved onto the main storage systems there, again at LAN speeds.  Even from one side of the globe to another, this should be less than 48 hours from start to finish.

Then, all that is needed is to synchronize the 2-days’ worth or less of changes between the two systems — which can be done via the internet in a short period of time.

I find it nice to see that old-world approaches can still make new-world technology work at times.

Image credit:  LordFerguson (flickr) – CC-BY-SA

  • bostergaard

    Clive, you do of course risk the ‘pig in a poke’ – the data you put into the pig may not 1:1 correspond with the data that gets transferred and stored in the remote DC, and because the pig is a temporary storage facility, you’ll never know – which may not be in accordance with your GRC requirements. If you need to store petabytes of data, you probably also have the means to keep your own storage ‘pigsty’. There might certainly be a competitive point here for a cloud operator – offering to pick up the petabytes of data that a customer company wants to shift to set up a hybrid cloud environment.

  • Bostergaard – a problem that comes with the majority of backup systems.  If GRC means that a verified copy is made, then there are ways of ensuring that this is the case – CRC signatures, other electronic signatures, metadata tagging with encrypted tags showing an evidential trail, etc.  At the moment, I would agree that data that is under the auspices of a solid GRC requirement is more than likely to be held internally – but I also believe that this will change over the medium term.  For those who just have “a lot of data” and want to get away from the eternal chase of the right type of internal storage that has to be bought, when and how much, then a data pig is the best way of moving to an external, cloud-based solution – provided that the post-synchronisation is carried out in an effective manner to bring the two different data stores back into line before the internal one is switched off.