AWS re:Invent — Big Data and HPC in the Cloud

Right before Silver Peak released its own solution for Amazon Web Services on Dec 3rd, I had the chance to go to Las Vegas to attend AWS re:Invent — the first global partner and customer conference for AWS users.

During the three day event there were many new product launches, customer success story talks, technical deep-dive and AWS best-practice sessions. If there was one thing that AWS tried hard for everyone to take away from the conference, it was to make sure you walked away with the impression that AWS is technically and financially mature for you to migrate your data center to, whether you are in the public, financial, education, technology, or any other sector. During the first-day keynote, AWS focused on customer success stories, and how customer feedback drives new services and enhancements to their existing services. With NASA, Netflix, Nasdaq, and SAP sharing their stories of using AWS, the keynote addressed the usability, scalability, and security issues for a wide range of uses cases, such as big data, web hosting, application hosting, content distribution, and so on. A similar presentation format was used for breakout sessions: the first half of a session was led by an AWS technologist who introduced a specific AWS service or feature, and in the second half, one or more customers demonstrated how using that particular AWS service benefited their business.

One AWS use case that really stood out for me was the big data and high performance computing (HPC) case. As part of this, AWS showcased that Cloud, with its cheaper and abundant resources, is an ideal platform for hosting big data and HPC. Public Clouds enable an enterprise to easily scale and replace large CAPEX with smaller OPEX. AWS has many interesting features and services around big data and HPC:

  • Computation: Netflix attested to AWS’s computation capability. Until 2008 they relied on their on-premise data center for transcoding the digital catalog, and because of this they missed their online service release date. Their servers could not scale to the required levels. Within a year, Netflix moved into AWS, and they have not missed a deadline since. They have unlimited computation power using EC2 for transcoding their global library.
  • Storage:  The Cloud needs to be able to store great volumes of big data and provides high IOPs for HPC. AWS expanded beyond its home-grown storage technology, such as S3, storage gateway, and Redshift with the release of NetApp’s private storage for AWS, announced at the conference. NetApp’s private storage works with AWS EC2 and MapReduce enhancing its performance and reliability.
  • Data Processing: AWS offers Dynamo DB, Elastic MapReduce, and RDB to address how big data should be stored, accessed, retrieved, and used. These services save enterprise IT from developing and operating the complex systems. Yelp, Earth Network,  and New York Times Digital were the customers that shared their experience and success stories with these AWS technologies.
  • Data transport: In order for Cloud to host big data and HPC, data first needs to make its way to the Cloud, in most cases crossing lossy, high-latency, and expensive WAN links. AWS has worked hard to alleviate these data transfer latency and associated costs. It provides a private line connection to its AWS data centers and has removed the data transfer fee for inbound data. Another important technology in this area is WAN optimization. Traditionally it has been deployed as a physical appliance in data centers and remote offices, accelerating traffic across the WAN. Now with big data and HPC moving into AWS and other public Clouds, WAN optimization has to be able to deliver high capacity (e.g.  hundreds of Mbps to Gbps) in virtual appliance form.

Amazon has gained a unique position in the public cloud arena with its innovations, rich service offerings, and the low-margin-high-volume business model. It will be interesting to watch how AWS’s competitors catch up and challenge its position; it seems a difficult task for at least the next 1-2 years.  For all of us, this competition will push public cloud computing forward and transform IT. It’s just a matter of time now.

(Image Credit: