Is Big Data a Big Waste of Money?

It seems like we’re drowning in data, and the cures, including throwing more storage at it, and the relatively new concept of ‘big data’, are worse than the problem. Sort of a congratulations, the operation was a success but the patient died. However, despite the challenges, which are considerable, the opportunities and pay-outs for big data can be equally considerable when used appropriately.

First, let’s consider some numbers. Data is doubling every two years, and enterprises will manage 50x more data, and files will grow 75x in the next decade. Enterprise storage system expenditures will grow less than 4% per year for the next few years. And budget constraints is the biggest big-data challenge.

So what is big data? As one wag put it, big data is all about acquiring, analyzing and interpreting ridiculously huge data sets. The top data drivers include financial transactions, email, imaging data, Web logs and Internet text and documents. One source says big data starts around 30 terabytes (i.e., the equivalent of digitizing 10-15% of the Library of Congress), with others saying it’s much larger, ranging from petabytes (1000 TB) and exabytes (1000 PB) to zettabytes (1000 EB) and yottabytes (1000 ZB). They weren’t kidding about the ridiculous amounts of data.

These massive amounts of data can’t be handled by ‘normal’ processing capabilities, which typically means buying expensive new platforms, servers, and storage and training existing or hiring new staff that can take advantage of big data. Shortage of talent will be another big big-data problem, according to a McKinsey Global Institute report last year. It says that by 2018 the US could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

If big data is complex, expensive and requires a lot of people and skills that aren’t available, then why even think about it? Simple, big data can create big value. But like all the big-data predecessors – i.e., databases, data warehousing, data mining, data analytics and business intelligence – you need to know what you’re looking for, why you’re looking for it, what’s it worth to you, and how will you take advantage of it BEFORE you start. Otherwise, big data will just be a big waste of money.

 

Replicating Big Data with EMC Isilon and Silver Peak

Let’s say you were one of the lucky people who won last week’s Mega Millions jackpot (unlike this person, who has some proving to do).

Now let’s assume that the $656 million is paid out in gold ingots. (That would be about 1200 of these, or only 43 of these bad boys…)

Can you imagine how difficult it would be to store all that loot? And how would you move it around to buy/sell items, pay bills, etc?
This is the same problem many enterprises face with Big Data. The amount of data being produced in certain industries like science, Internet retail, and finance is astronomical, making it exceptionally hard to store, access, and share in a timely manner.

While Silver peak cannot solve all the challenges surrounding Big Data, we are trying to do our fair share. More specifically, we want to make it easier to move big data sets over wide area networks (WANs), and ensure that remote users can access this data easily and cost effectively.

Why Big Data Is Cheaper Than Gold Bars

This involves working with our key partners, like EMC, who are instrumental in solving other Big Data challenges, like storage and backup. One of EMC’s foremost platforms for handling Big Data is Isilon, which is why we are pleased to announce that we have just completed qualification with this platform. (Isilon is now part of a long list of Silver Peak-qualified EMC products, which include RecoverPoint, SRDF, Celerra, VPLEX, Atmos, Data Domain and others.)

In this latest round of testing, Silver Peak and EMC have benchmarked performance over 90x performance gains for SyncIQ replication over the WAN. That includes up to 99% bandwidth reduction in several environments.

Think about that for a second…

Normally, only 10 Mbps of replication traffic can be sent over a 155 Mbps WAN with 80 ms latency and 0.1% loss. With Silver Peak, replication throughput increases to 975 Mbps over the same WAN.

With numbers like that, you can improve RPO, reduce operational headaches, and save money on WAN bandwidth. Talk about winning the jackpot!

The Next Generation Cloud and Your Network (Video)

The wide area network, or WAN, is essential to next-generation cloud computing. This video featuring Forrester Research analyst Vanessa Alvarez looks at what is required for next-generation cloud implementations and the critical role the wide area network plays in the success of cloud deployments.

Forrester’s Vanessa Alvarez highlights the importance of automation, flexibility, and agility when enterprise organizations rely on the cloud, in particular, the large amounts of data traversing the WAN.

Automation, self-service, and manageability are needed to design the new cloud architecture, and virtual WAN optimization will play a critical role.

The cloud brings server, storage, networking, and computing together. When deploying cloud, organizations need to look at server virtualization and virtual WAN optimization as a solution.

Is “Big Iron” Needed to Move “Big Data”?

mainframe computerWhile there have been plenty of articles and buzz of late about the power of big data and even how WAN optimization will benefit big data, there’s been surprisingly little about what exactly constitutes a “Big Data” WAN optimizer. Yes, extreme scalability and resiliency are critical, but perhaps equally important is the use of off-the-shelf hardware.

However, we get ahead of ourselves. While there’s little consensus on how big is big data, most industry observers seem to agree that these databases are radically larger than what’s currently used within the organization, extending into the terabytes — and even petabytes — of information. Moving this volume of data across the WAN poses significant challenges because of the bandwidth of the wire as well as the latency between the sites, explains Everett Dolgner, Silver Peak’s director of storage and replication product manager.blog

Scaling to the necessary wire speeds is obviously going to be challenge for many WAN optimizers. As heretical as it might sound, though, even with those performance challenges, proprietary hardware architectures have no role in the data center-class WAN optimizer. Yes, I know, there’s this conception that specialized hardware is all but synonymous with superior performance and all, but that’s a carryover from the early days of our industry. The processing power and I/O performance of today’s servers coupled with well-designed software is more than sufficient to power the data center. And since these are standard PC architectures, organizations continue to ride the PC’s price-performance curve.

Silver Peak’s NX-10K, for example, is the highest capacity WAN optimizer in the industry – 2.5 Gbps of encrypted, optimized WAN traffic with support for a leading 256,000 IP flows. With so many flows, the 10K is able to consolidate thousands of network clients or, in more exotic cases, big data updates from tens of thousands of sensors and actuators in the field. Yet open up the NX-10K or any of our appliances and you won’t find custom ASICs (Application-Specific Integrated Circuits), FPGAs (Field-Programmable Gate Arrays), or any other purpose-built silicon from Silver Peak. Just standard, server hardware tuned for WAN optimization.

Besides cost savings, insisting on standard server hardware sets the stage for the next step in data center-class WAN optimization: virtualization. The same code base that runs on the commercial severs can be made to run on standard hypervisors. Our Virtual Acceleration Open Architecture (VXOA), which drives the NX-10K, for example, is also the basis of our VRX-8, the highest capacity, virtualized WAN optimizer on the market.

blur of fast trainVirtual appliances have other benefits to offer beyond the lower costs of commercial iron. If you’re organization is like most, virtualization is probably a strategic direction in the data center. This probably means that processing cycles for running a virtual appliance might be copious, good luck getting more rack space to deploy a physical appliance. Virtual appliances can also be easily moved between networks and sites through software. Not so with physical appliances.

Custom hardware might have played a valuable role in the early days of networking, but not today. The power of today’s commercial server iron and the adoption of virtualized appliances make big iron a bad choice for big data.

Big Data in Motion

cloud computing big dataOne of the most talked-about new buzzwords in the past year is “big data.” A recent Forbes article points out that big data is not just quantity, but also includes multiple types of data. Having a lot of data sitting around doesn’t really accomplish anything; the real key to big data is being able to analyze large diverse data sets and act on the results. While WAN optimization can’t help you analyze the data, it can help you move the data to the right place as quickly as possible and with the lowest bandwidth cost.

Most of the focus on big data from storage companies concerns how to store, protect and guarantee availability. Analyzing the data is quite a bit more difficult, usually requiring clusters of servers. With large clusters commonly used in enterprise deployments, on-demand cloud computing is typically mentioned in the same breath as big data.

However, there is one problem with analyzing big data in the cloud: moving the data to the cloud.

Journey to the Cloud

Moving big data into the cloud means crossing two big hurdles: location and bandwidth. First, the farther away the cloud data center is from your site, the more latency you have to deal with and the longer it will take for your data transfer. Second, bandwidth is important as well, since insufficient bandwidth means the data transfer will take an excessive amount of time. Add the transfer time to the analysis time and it is possible that the resulting big data analysis will be stale and outdated by the time everything is finished.

As Edd Dumbill from O’Reilly mentioned in a recent article, this brings us to the third hurdle in moving big data to the cloud: velocity. With an exponentially increasing amount of data coming into an organization and skyrocketing analysis requirements (think “streaming data”), incoming data must be analyzed as quickly as possible.

If you are using cloud computing to analyze your big data, and you happen to be located in the same city as the cloud data center, and you have unlimited bandwidth, you are ready to go (and probably aren’t reading this post).

If, however, like most people, you are dealing with latency and limited bandwidth, WAN optimization can help. All of the features that help business move and access data across a WAN also apply to big data movement into the cloud.

Protecting Big Data

Replicating big data over a WAN has the same problems as moving data into a cloud data center. Meeting any replication requirement can be difficult and, as the size of the data grows, so does the complexity. Silver Peak has a long history of optimizing replication over WAN connections, with virtual appliances that scale to 1 gigabit-per-second (Gbps) in WAN capacity and physical appliances that scale to multi-Gbps. Replicating big data is no different. If big data replication is a requirement, Silver Peak can help.

The Last Bit

Whether you are trying to move big data into the cloud or replicate it across a WAN, Silver Peak has a solution that can help. Network Acceleration can overcome the impact of latency, Network Integrity can correct lost and out of order packets, and Network Memory can reduce the amount of bandwidth required to move big data. Big data CAN be moved to the cloud — and Silver Peak can make it a reality.