Five Ways to Use NetApp Snapshot™
Copies with VMware VI3
By Mike Slisinger
Using VMware Infrastructure 3 (VI3) with NetApp storage has proven to be a topic of keen interest to Tech OnTap readers. A recent spot survey on VMware and NetApp identified backup and restore as the area of greatest interest, closely followed by high availability and disaster recovery. The use of NetApp Snapshot copies with VMware figured prominently in your comments, serving as the impetus for this article.
Like Microsoft (via VSS) and Oracle, VMware offers a native snapshot functionality that can capture previous states of data. One of the key benefits of VMware Infrastructure 3 is the ability to consolidate multiple servers onto a single physical server, via independent virtual machines (VM). By offloading the copy services to a NetApp array, the resources that would have been assigned for the ESX snapshot can now be allocated for additional VMs.
NetApp Snapshot technology represents an ideal complement to the native snapshot capabilities built into VMware. Because NetApp Snapshot copies occur on the storage system, actions such as backup and restore have no impact on server resources or server performance.
Additionally, NetApp Snapshot technology can protect any datastore used by VMware in a single operation. For instance, if a VMFS datastore contains multiple virtual machines, a NetApp Snapshot copy can capture the state of all of them in a single operation.
Although these advantages are important, the key differentiator of NetApp Snapshot copies is that – unlike competing snapshot technologies that use copy-on-write algorithms or make complete data copies – NetApp Snapshot technology is extremely space and resource efficient.
Figure 1) A comparison of snapshot technologies.
The NetApp WAFL® file system never overwrites existing blocks, so as blocks are changed NetApp simply keeps the original blocks to maintain a consistent, point-in-time view. This means that additional storage is consumed only as changes are made, and there is virtually no overhead on storage system processors, busses, and so on, even when hundreds of Snapshot copies are retained. For more details, read the NetApp technical report A Storage Networking Appliance.
This article examines five ways in which IT teams leverage NetApp Snapshot technology and related NetApp technologies in their VMware environments to obtain:
- Near instantaneous backup
- Fast and flexible virtual machine recovery
- Accelerated data management
- Robust disaster recovery
- Application data protection and management
#1 Near Instantaneous Backup of Virtual Machines
Backing up VMware servers can be a challenging proposition because of the sheer number of virtual machines – dozens to hundreds – that must be protected. In addition, a virtualized server environment has less total I/O resources than a physical server environment. This means that you probably need to change the way you do backups when you move to a virtual environment, and this is why it is helpful to offload backups onto storage systems wherever possible.
Because NetApp Snapshot copies don't actually move any data, they occur almost instantaneously. Users typically retain a specified number of Snapshot copies on their storage systems for convenient restore. NetApp Snapshot copies can also be used as the basis for backing up to tape. SnapVault®, the NetApp disk-to-disk backup solution, can also be used to move Snapshot copies to a secondary disk storage system for greater data protection.
Loyola Marymount University (LMU) recently moved to a virtualized server and storage environment with VMware and NetApp to overcome problems with server sprawl; limited power, cooling, and real estate; and to improve availability for critical applications.
Figure 2) A virtualized
server-to-storage environment.
LMU's VMware environment consists of nine ESX 3.0.1 hosts in three clusters connected to a NetApp FAS3050 storage system via iSCSI. Storage for each virtual machine is provided by one or more RDM (raw device mapping) LUNs. LMU uses NetApp Snapshot technology to protect its VMware environment, citing such reasons as:
- Zero downtime
- Zero CPU load on host
- No need for backup agents
LMU developed its own script to automate backups and is working with NetApp to document its implementation. This script uses the VMware snapshot capability to quiesce the virtual machine so that a consistent NetApp Snapshot copy can be created. You can find out more about LMU's VMware environment in a recent NetApp/VMware Webcast. You can learn more about the use of NetApp Snapshot technology for VMware data protection by reading the NetApp technical report Best Practices for NetApp and VMware ESX Server 3.0.
#2 Fast and Flexible Virtual Machine Recovery
Recovery is a critical issue in any data center environment. Studies have shown that 40% of application downtime is caused by operator errors, another 40% is due to application errors, and only 20% is caused by technology failures (Gartner Group, 2000).
If you have dozens or hundreds of virtual machines, problems requiring recovery are almost a certainty. As mentioned earlier, most restores are not due to hardware failure. Instead, they involve an accidental deletion or an event where a virtual machine instance becomes corrupted, Protecting your VMware environment with NetApp Snapshot technology ensures that you have multiple restore options on the existing storage system:
- Use NetApp SnapRestore® to revert to the latest (or most appropriate) saved Snapshot copy. This effectively resets your environment to the state it was in at the time the copy was created.
- Use NetApp FlexClone® to clone the Snapshot copy and create a writeable version, mount the resulting clone, and extract the broken or missing component you need. For instance, if a volume contains multiple virtual machines, this method can be used to restore a single one without affecting other virtual machines.
To return to the example discussed in the previous section, LMU primarily utilizes the second approach. The Snapshot copies that LMU creates for backup purposes include multiple virtual machines. Using FlexClone, LMU can quickly create a writable version of the RDM used by a particular virtual machine and return it to service. This is well illustrated in the recent VMware Webcast. LMU cites the process as being nearly instantaneous (reducing restore time from hours to minutes) and easily automated.
#3 Accelerated Data Management Using Cloning
As mentioned in the previous section, NetApp FlexClone is a useful tool for recovery in VMware environments. The FlexClone capability is built on two unique features of NetApp Data ONTAP®: Snapshot copies and flexible volume technology (FlexVol ™ volumes). A FlexVol volume is a virtual and dynamic storage entity that can grow or shrink without being constrained by the characteristics of underlying physical storage. FlexVol volumes are built within aggregates (an underlying set of disks defined by the storage administrator on a NetApp storage system). Any FlexVol volume automatically utilizes all spindles in its underlying aggregate for guaranteed performance, regardless of volume size.
Figure 3) NetApp
FlexVol technology.
FlexClone uses the same mechanism as Snapshot technology to create writable clones of volumes or LUNs (FlexVol volumes). As with snapshots, clones can be created nearly instantaneously with effectively no storage overhead; they share the same underlying storage blocks as the baseline volume/LUN. As the baseline and the clone diverge (for example, due to updates to the clone's data blocks), those new blocks are written to disk, and the volume accumulates only the changed blocks.
Naturally, this ability to create space-efficient clones of important data volumes is valuable in any server environment. FlexClone volumes are frequently used in database environments to create copies of production databases for test, development, QA, and so on. In addition to the recovery example discussed earlier, there are several additional uses of FlexClone that are unique to virtual environments:
- Clone a production virtual machine for testing. Have you ever wanted to test something with production data? With FlexClone, you can instantaneously clone an entire VMFS and use the cloned volume in your lab environment to duplicate the production server without scrounging up double the storage or waiting for time-consuming data copies.
- Create a gold image of a virtual machine configuration and clone it to provision new virtual machines for permanent or temporary use. This works best when you are using an RDM LUN for each virtual machine. Once you have the virtual machine configured the way you want it with appropriate software and so on, you can use FlexClone to duplicate it many times in a matter of minutes. This method is extremely space efficient and avoids the copy-out time required by other provisioning methods.
For example, I'm aware of an independent software vendor that uses the latter cloning method to support its sales staff. The company has multiple software products and serves a variety of markets. It maintains a gold master virtual machine configuration for each application in each market, including appropriate data sets. A salesperson at a customer site can access this facility through the Web. A FlexClone volume is automatically made of the appropriate gold master, providing a virtual demonstration environment tailored to the particular customer and allowing the demonstration to start immediately. When the demonstration is complete, the FlexClone volume is deleted and the original gold master is not affected.
This approach also makes sense in lab and development environments, classrooms, or anywhere you need to rapidly provision a set of virtual machines with defined capabilities for temporary use.
#4 Robust Disaster Recovery
The virtual infrastructure of VMware lends itself to disaster recovery. If you have a copy of your virtual machine data in a remote location, you can rapidly restart one or many virtual machines, as necessary. However, VMware does not have replication mechanisms built in, so this replication has to be done by third-party software either on the VMware host itself or, preferably, externally. This makes sense, given the potential for I/O constraints on server hardware hosting numerous virtual machines.
The most effective way to provide disaster recovery for VMware involves asynchronous replication over a wide area network (WAN) using NetApp SnapMirror® software. SnapMirror uses the built-in Snapshot capability of Data ONTAP to efficiently identify and replicate only changed blocks. Other common replication methods use time-consuming and resource-intensive file system walks to check timestamps and identify files that have changed, and then replicate whole files, even when only a single block is altered. When SnapMirror prepares to replicate, the first thing it does is create a Snapshot copy. By comparing the current Snapshot copy to the one from the previous round of replication, changed blocks (not files) can be quickly and easily identified for replication, saving significant time, storage system resources, and network bandwidth.
At a recent VMware technical conference, VMworld 2006, a Canadian reseller of VMware and NetApp presented a detailed technical explanation of a VMware and SnapMirror business continuance implementation for its customer, Imperial Parking. The company has two main data centers, in Calgary and Vancouver, over 500 miles apart. Different service level categories (gold, silver, and bronze) were defined and each virtual machine was assigned to a category based on business requirements. Appropriate processes were defined for restarting services in the event of failure.
Imperial Parking estimates that in the first year, the new configuration resulted in over $480K Canadian in avoided costs and/or savings. To learn more, download the presentation and audio: Server Consolidation and Business Continuity Using VMware ESX Server and NetApp SnapMirror Technology.
#5 Application Data Protection and Management
The final piece of the puzzle is the ability to protect and manage data for Enterprise applications running within virtual machines. As with everything else we've discussed, the preference here is to move as much of the heavy lifting as possible off of the VMware server. NetApp accomplishes this through its SnapManager® software, which includes versions tailored for Microsoft® Exchange and SQL Server as well as Oracle® databases.
These software products run within the virtual machine, which currently uses the Microsoft iSCSI software initiator to access NetApp storage directly. A future version of SnapDrive® for Windows will support RDM LUNs, which will allow virtual machines to use either the Fibre Channel or iSCSI initiators of the VMware server itself.
Each SnapManager product is designed to work in concert with the application to ensure that databases are appropriately quiesced before Snapshot copies are created, ensuring consistent and verified backups while avoiding the need for excessive data movement by the server. Restores, tracking of valid Snapshot copies, and other necessary data management tasks are carried out correctly without disruption, and with minimal server overhead.
The Pennsylvania Attorney General’s office recently did a complete upgrade of its IT infrastructure, moving from over 130 standalone servers with direct-attached storage to an environment consisting of 40 dual-processor blade servers running VMware ESX Server and a pair of NetApp storage systems—one for primary storage and one in its disaster recovery site. NetApp SnapManager for Exchange with Single Mailbox Recovery and NetApp SnapManager for SQL Server are used to provide backup/restore and data management capabilities within the virtual environment. The customer cited the space-efficient, disk-based, verified backups of Exchange along with the ability to quickly recover entire mailboxes, single messages, and contacts as particular benefits of the overall solution.
To learn more about this deployment view a recent webcast or read the customer success story.
Conclusion
The unique capabilities of NetApp Snapshot technology form the basis of an ideal set of data protection and management solutions for a VMware Virtual Infrastructure. Backup, restore, cloning, disaster recovery, and application data management are all facilitated through Snapshot copies or products based on Snapshot technology.
Comment on this article