Tech On Tap :: Insights for Simplifying Data Management | NetApp(R) - Network Appliance
To Tech on Tap Home Click to visit NetApp
TECH ONTAP ARCHIVE - JANUARY 2008 (PDF)
Jeremy Merrill
Jeremy Merrill
Product and Partner Engineer, NetApp
Since joining NetApp over two years ago, Jeremy has focused exclusively on issues of data protection in both remote offices and data centers. Recently, he has worked on data protection and disaster recovery as it pertains to VMware. Jeremy has spent over seven years in the storage industry, including working extensively with backup, disaster recovery, and storage management in data center environments.
Combining Deduplication and VMware DR
Cascading Savings Improves Cost Effectiveness
By Jeremy Merrill

The shift from physical servers to consolidated, virtualized server infrastructures has had undeniable IT benefits. However, the rapid move to VMware has caused traditional approaches to disaster recovery (DR) to become outmoded and added a layer of complexity to DR implementation.

DR for VMware® Virtual Infrastructure 3 (VI3) requires that all your VMs (virtual machines) be regularly replicated to a remote site, consuming significant storage and network bandwidth. By using NetApp deduplication on your primary VMware storage systems, you can substantially reduce the amount of data in your primary storage environment. This reduction results in a cascading benefit to your downstream infrastructure, reducing the bandwidth necessary for replication and the storage necessary at the DR site.

The cost savings created by using deduplication can make DR feasible in situations in which it may have otherwise been cost prohibitive. For instance, one customer reported that after deduplicating his VMware Virtual Desktop Infrastructure (VDI) environment, the storage and bandwidth needed to provide DR for his desktops was relatively minor and made adding DR feasible for his VDI environment as well as his VI3 environment.

In this article I'm going to explore what it takes to implement deduplication with VMware DR. I'll also talk about leveraging the replicated data in your DR environment for DR testing and other purposes.

Implementing Deduplication in Primary VMware Environment

Because each virtual machine in a VMware environment requires dedicated storage for its operating system, there is a large amount of duplication. You may have numerous VMs that are all installed with more or less the same operating system and applications.

If you have 100 VMs running the same OS and each virtual machine requires 10GB to 20GB of storage, that's 1TB to 2TB of storage dedicated to almost identical copies of the same data. Applying NetApp deduplication can eliminate much of this redundancy.

In general terms, if you have X virtual machines assigned to a storage volume, after deduplication you will need approximately 1/X the amount of operating system storage you would require in a non–deduplicated environment. Obviously, the actual results you achieve will depend on how many VMs you have in a volume and how similar they are.

In practice, customers typically see space savings of 50% or more in ESX VI3 environments, with some obtaining storage savings as high as 90%. This is for deduplication of the entire VMware storage environment including application data–not just operating systems. In VDI environments customers typically see space savings up to 90%.

Another advantage of NetApp deduplication is that not only can it run on primary storage, it can run on any existing NetApp volume. Even if your VMware infrastructure is well established, you can run deduplication and free up significant storage space. All that is required is the deduplication license (which is free of charge) plus a NearStore® license on the target storage system.

Configuring for Disaster Recovery

While the reduction in storage use in your primary storage environment is a significant benefit by itself, the true leverage you get from deduplication becomes apparent when you implement disaster recovery with NetApp SnapMirror®. Because deduplication significantly reduces the amount of data that must be replicated, it reduces both the space needed in your DR location and the network bandwidth needed between sites. After deduplication, you may be able to configure DR over a slower link than would otherwise be possible, and it will be easier and faster to get your DR environment up and running.

deduplication VMware

Figure 1) Applying deduplication in a VMware environment with replication for DR.

To configure DR, you first deduplicate all the volumes in which your data stores reside in your primary VMware storage environment. Then you create SnapMirror relationships between your primary volumes and target volumes at your DR site.

Unlike many other replication solutions, SnapMirror does not require that the target configuration be identical to the source. You can use a different NetApp storage system and less–expensive disk (e.g., SATA instead of Fibre Channel disk) in your DR site if you wish.

When SnapMirror first runs it will sync each source volume with its target. This process is typically the most bandwidth–intensive part of SnapMirror implementation, but since the source volumes are all deduplicated, the amount of data that has to be transferred is much less than would otherwise be the case. This approach is ideal for customers who have slow links and wouldn't otherwise have the bandwidth to do the initial sync but who can manage the incremental updates that occur thereafter.

Note that because deduplication acts at the volume level, you must use Volume SnapMirror to get the maximum benefit. Volume SnapMirror acts on the entire volume, so your mirror always retains the same level of deduplication as the original, saving space, decreasing bandwidth utilization, and speeding up the mirror update process.

Once you complete the initial sync, you configure SnapMirror to run on a set schedule to keep your DR site up to date. In each iteration, SnapMirror transfers only the blocks that have changed, so it uses network bandwidth very efficiently.

You will need to run deduplication periodically on the primary site. Depending on your particular needs this can be done:

  • On a schedule that you specify
  • Automatically whenever there is 20% new data in a volume
  • Manually as needed (for instance, after installing a big patch)

With SnapMirror, whatever happens on the primary volumes is automatically reflected on your secondary volumes, so there is no need to run deduplication at your DR site. Since your secondary volumes are mirrors they "inherit" the deduplicated state from the primary volumes.

Leveraging Your DR Environment

Once you've got all your data at your DR site and it's being updated regularly via SnapMirror, it's still not the end of the story. NetApp also makes it possible to leverage the data stored at your DR site for DR testing, development, or a variety of other purposes.

Leveraging FlexClone

Figure 2) Leveraging FlexClone® at the DR site allows replicated data to be used for a variety of purposes.

In a typical DR testing scenario, all the data for the test must be copied to another set of disks before testing commences. That means you need 2X the storage space right off the bat, and you've got time–consuming copies to make before you can start testing.

With NetApp FlexClone technology, you can make space–efficient, writable clones of any or all of your DR volumes; additional space is only consumed as you make changes to the cloned volume. These FlexClone volumes allow you to capture a static view of your DR data at a fixed point in time without disrupting ongoing SnapMirror updates and without requiring massive amounts of additional storage.

Using FlexClone you can reduce the time it takes for DR testing from 24 hours or more down to a few hours, because the process is fast, reliable, efficient, and far less resource intensive. It's also possible to use FlexClone in a similar fashion for application development work, data mining, patch testing, and so on.

A DR site represents a substantial investment in resources. Using FlexClone, you can leverage those resources for other tasks without negatively impacting DR readiness. By simplifying DR testing, FlexClone makes it easier to meet company–mandated DR testing requirements to ensure DR readiness.

Conclusion

Applying NetApp deduplication to your primary VMware storage yields substantial benefits across both your primary and DR infrastructure. In a typical scenario, you can reduce your primary storage needs by 40% to 60%. This reduction's trickle–down effect reduces the storage needed at the DR site and the bandwidth needed for DR by a corresponding amount, making DR faster and more cost effective. With NetApp FlexClone, you can leverage the data in your DR site for DR testing, application test/dev, or other activities to maximize resource utilization.


 

RELATED INFORMATION

Engineering Perspective:
Deduplication
Comes of Age

At its heart, NetApp deduplication relies on the time–honored computer science technique of reference counting.

When deduplication is enabled on a volume, it computes a database of fingerprints for all of the in–use blocks in the volume (a process known as gathering). Once this initial setup is finished, the volume is ready for deduplication.

To avoid slowing down ordinary file operations, the search for duplicates is done as a separate batch process. As the file system gets updated during normal use, WAFL® creates a log describing the changes to its data blocks. This log grows until one of the following occurs:

  • The administrator issues a start command
  • The next time specified in the schedule occurs
  • The changes to the log exceed a predetermined threshold

Read the full article.


NetApp Snapshot Copies and VMware

NetApp Snapshot™ technology is ideally suited for use with VMware. A recent Tech OnTap article explored five uses of Snapshot and other NetApp technologies derived from it in VMware environments:

  • Near–instantaneous VM backup
  • Fast and flexible VM recovery
  • Accelerated data management through cloning
  • Disaster recovery
  • Application backup and management

Learn more. Read Five Ways to Use NetApp Snapshot Copies in VMware Environments.


Choosing a
Replication Option

NetApp offers several tools for data replication: SnapMirror, MetroCluster, and ReplicatorX™.

If you have only NetApp storage systems, SnapMirror or MetroCluster is the best replication solution in most circumstances. Functionality ranges from asynchronous to fully synchronous and goes beyond disaster recovery with smart copies for DR testing, test/dev, data migration, data mining, offloaded backup, and so on.

To find out more about SnapMirror, read NetApp SnapMirror Best Practices Guide. (pdf)

Choose ReplicatorX if you have third–party primary storage and you want a tiered storage solution (different, less–expensive storage on the back end) or you have a variety of primary storage systems and need a solution that can cover all of them.

Also, you should choose ReplicatorX if you have federated applications (SAP, DB2 IEEE, etc.) and data needs to be recovered at the exact same time across multiple servers. The ReplicatorX solution has patented global clock synchronization technology that guarantees write order consistency across multiple servers at the target location.

To learn more read:

SUBSCRIBE | PROVIDE FEEDBACK