Data deduplication is one of the most significant technologies to impact the storage community in recent years. By identifying redundant data segments and storing only a single instance of information, this technology dramatically reduces storage space and allows more data to be protected over time. While data deduplication is growing in popularity, there are some who are still not quite sure how it can be best utilized.
In a contributed article over at TechRepublic, Rick Vanover does a credible job of describing what data deduplication is and the three different approaches that can be taken to achieve deduplication. But he then wonders openly about the best approach for deduplication. He also goes on to quote Twitter-based personality, StorageZombies, who says of data deduplication, “Deduplication is not overrated, but it is a classic case where your mileage will vary. For archive storage, definitely consider deduplication. Compression may be better for archive storage, however.”
Dedupe for Network Optimization
We believe that data deduplication has a number of uses, but one of the most groundbreaking in recent years has been its introduction to the network as a means to enhance WAN Optimization (WANop). More specifically, it has become an important tool for optimizing application performance across the WAN. By eliminating the transfer of repetitive IP traffic, deduplication significantly improves WAN utilization and accelerates data transfers between geographically disperse locations. This saves bandwidth costs and helps to overcome many obstacles when communicating across a WAN.
Because WAN deduplication works on all IP traffic, it plays a key role in a variety of IT initiatives, including server centralization, virtualization, and application delivery. In addition, it is essential to improving the performance and reliability of data replication, backup and recovery across the WAN. In this respect, despite what StorageZombie has to say, WAN deduplication is actually a nice complement to storage deduplication, resulting in even higher cost savings and better Recovery Point and Time Objectives (RPO/RTOs) across the enterprise.
Up to the Challenge
Deduplication overcomes WAN challenges that often plague common business continuity processes, including backup, replication, and disaster recovery. More specifically, this technology delivers the following benefits:
- Improves data transfer times: by delivering repetitive information from local data stores (as opposed to resending it across the WAN), WAN transfers are handled at LAN-like speeds. More advanced solutions perform data reduction on both TCP and UDP traffic, delivering significant performance improvements across a wide range of traffic types.
- Maximizes WAN efficiency: deduplication can reduce as much as 99% of WAN traffic by eliminating the transfer of duplicate information. When performed at the byte level, repetitive patterns can be detected and eliminated even when the backup/replication solution is performing similar functions at the block level.
- Increases geographic distances: by reducing the impact of latency, enterprises can extend the distances between data centers and disaster recovery locations, increasing operational flexibility.
In addition, WAN acceleration devices typically provide greater accuracy than storage devices when searching for repetitive patterns. This is because individual bytes of data are examined as opposed to blocks, which enables more repetitive patterns to be discovered — even within the same replication stream. In addition, when data deduplication is performed at the network layer, it works across all IP traffic (regardless of the application). Therefore, data sent via email, file or web transfer will immediately register as a “hit” when it is sent across the WAN as part of a backup or replication process. In other words, the application itself may not consider the data repetitive, so data deduplication may not work from a storage standpoint. However, it is duplicate data from a WAN perspective, so a WAN acceleration appliance will treat it as such.
Less is More
Data deduplication is a proven technique that improves the performance, reliability, and efficiency of data backup and recovery. By utilizing this technology in both the storage medium and across the WAN, enterprises can improve their data protection processes even more. In addition, deduplication can be combined with other WAN optimization techniques to enable a variety of other strategic IT initiatives, including server centralization, virtualization, and application delivery.
Deduplication has had a significant impact on both the networking and storage communities. And while some are still unsure of its best use, there is one thing for certain, when it comes to backing up and recovering data, less can actually be more.