Tech On Tap :: Insights for Simplifying Data Management | NetApp(R) - Network Appliance
To Tech on Tap Home Click to visit NetApp
TECH ONTAP ARCHIVE - MAY 2006 (PDF)
Bruce Moxon
Bruce Moxon
Senior Director of Strategic Technology and Grid Guru, NetApp
Bruce Moxon works with enterprise customers deploying grid computing solutions. He brings more than 20 years of experience in scale-out computing architectures for both scientific and commercial applications and writes, speaks, and teaches extensively on the continuing evolution of grid computing. Bruce has architected and developed solutions for a number of high-throughput computing environments, including Perlegen Sciences' SNP discovery system, Bank of America's CRM and analytics systems, and NASA's Earth Observing System.
Reducing Time-to-Deployment:
Improving Agility for Database Development, Test, and QA
By Bruce Moxon

Last month I introduced the notion of whole application virtualization, contending that a number of core storage services are necessary to deliver on this vision and the promise of grid computing. This month, I explore two of those services, mirroring and cloning, in the context of a rapid application development model for database-centric applications that increases an organization's agility.

Every database administrator (DBA) has experienced it—a request from the application development organization for a copy of a production database to support application development and testing. For larger databases in excess of a few hundred gigabytes such a request sets off a series of potentially time-consuming planning, provisioning, and implementation activities. (Oh, well, there goes the weekend...)

The Plan...

With traditional tools and approaches, a strategy for replicating relevant portions of the production database while minimizing impact on production operations must first be developed.

Depending on an organization's operations, this plan might include one of two fundamental approaches: building a dev/test copy of the database from backup tapes or doing a (partial or full) extraction of data from the production database—using a variety of vendor-supplied and homegrown tools. Backup-based replication is more straightforward and nonintrusive on the production system, but can be a lengthy process—especially for tape-based backups. Extraction-based replication can be faster—especially if partial replication is sufficient for the scenarios under development or test—but may require further planning to ensure that the production system is minimally impacted during the period of data extraction.

Provisioning...

Once the plan is in place for the replication of the database, adequate resources—including both servers and storage—must be provisioned. The servers must then be loaded with the appropriate software "stack" (operating system, drivers, database, application servers,...) and the database must be restored or created and populated.

Replication...

For many organizations, the effort to replicate a large (multi-100GB) database can take hours or even days. Replications requiring restores from tape are particularly lengthy and manually intensive.

In many large-scale environments, multiple copies of production databases may be required to support a range of routine activities, including application and database development, test, QA, and perhaps user training. This clearly has a multiplicative factor on the physical and human resources required to support such an endeavor.

Reducing Time-to-Deployment for Database Applications

In the storage industry at-large, a few key technologies have matured over the past few years to help organizations address this growing need for physical database replication and rapid deployment.

Disk-to-disk backup technology has become the norm in many organizations, greatly enhancing the speed at which database replicas can be constructed from backups. Snapshots provide space efficient, point-in-time copies of data containers (file systems, volumes, LUNs) that can be used for rapid recovery and, increasingly, as the basis for creating full writable data copies in a technique known as cloning.

While these technologies greatly simplify the task of replicating the database data, organizations employing these approaches are still faced with two significant operational challenges. First, the act of cloning the production data can adversely impact production application performance. As a result, cloning may need to be restricted to "off" or "nonpeak" hours. Second, this approach still requires a complete copy of the underlying data to be made. This can pose nontrivial provisioning challenges, especially when copying large databases multiple times.

The NetApp Fulcrum...

Due to its unique "DNA," which includes Data ONTAP® and WAFL®, Network Appliance has developed innovative implementations of these general industry approaches in ways that deliver unmatched speed, efficiency, and simplicity to the task of database replication and deployment.

With Data ONTAP 7G, Network Appliance introduced FlexClone™ volumes. These are writable flexible volumes that are created from NetApp Snapshot™ copies. As such, they are effectively near-instantaneous, space efficient, writable "clones," sharing the same physical blocks with the baseline file system or LUN. As the baseline and clone data containers diverge (for example, due to continuing updates in the production database or development changes to a cloned database), only the divergent blocks need to be written to disk.

The result is that working copies of very large databases can be created within a few minutes and with minimal incremental storage. A typical 300GB database can be cloned and brought up in just a couple of minutes. (NetApp customer Ben Rockwood provides specific examples of FlexClone volume creation in his article "The Versatile Storage Platform.")

FlexClone volumes can only be created on the same storage system on which the Snapshot copy of the original data resides. Hence cloned data shares the same storage resources as the source of the clone. In cases where the production storage system is not fully utilized or where NetApp control of service capabilities can be employed, this may be sufficient to shield the production database from additional load that could be imposed by development activities. More commonly, however, operational requirements will dictate that the development databases be created on a separate storage system. Enter NetApp SnapMirror® software.

SnapMirror can be used to continually replicate production data (synchronously, if desired) to a secondary storage system with minimal impact on the production environment. The cloning and subsequent development activity can then take place on the secondary system, completely "out-of-band" of the production system. This secondary system might also serve as the organization's disaster recovery (DR) target or might be constructed as a low-cost development system utilizing lower cost controllers, iSCSI, and SATA drives.

A blade server–based deployment depicting this scenario is shown below. In this scenario, new database servers would be dynamically provisioned from the blade server pool, using the database containers cloned on the secondary (Dev/Test) system.

 note: The diagram depicting cloning off of a mirror should have the mirrored volume labeled as such. So, “Prod Mirror” rather than “Prod’ ”

In environments where third-party storage is deployed in the production environment, the same approach can be employed with native database mirroring capabilities, with a variety of emerging heterogeneous replication technologies, including NetApp SnapDrive® and V-Series technologies, and potentially with third-party replication tools such as those in Topio's Data Protection Suite™ or IBM's SAN Volume Controller (SVC).

Finally, this same approach can be applied in additional operational scenarios to improve utilization and agility without impacting production system performance. Examples include:

  • Offloading end-of-month reporting to cloned databases
  • Offloading intensive ad hoc queries and data warehousing extract-transform-load (ETL) operations in support of business analytics
  • Distributed development, using SnapMirror to replicate data across geographically distributed sites and employing FlexClone at the remote site to create copies for development, QA, or training

Through the NetApp SnapManager® suite of products, these capabilities are being integrated into standard database management frameworks, such as Oracle® Grid Control and Microsoft® SQL Server Enterprise Manager. This approach provides DBAs with more rapid and space-efficient approaches to database backup, recovery, and cloning, all from the same management suite they're using today.

The implications for many database development organizations are far reaching. NetApp's unique FlexClone implementation, coupled with appropriate replication technology, can significantly reduce the time required to provision database replicas for a variety of development and operational scenarios. Because of the space savings aspect of this approach, many organizations are finding that not only can they do what they were doing before faster, they can also do a lot more of it (many simultaneous dev/test cycles, for example).

Finally, increased agility and reduction in physical storage requirements are accompanied by a simplification of operational procedures—in short, helping to deliver on the promise of grid computing.

Previous article by Bruce Moxon:


 

RELATED INFORMATION

A Storage Networking Appliance

In the early 1990s, Network Appliance revolutionized storage networking with a simple architecture that relied on NVRAM, integrated RAID, consistency points, and a unique file system to do things that the file servers of the time could not.

This technology is still the basis of every product that NetApp offers and includes:

  • The WAFL file system
  • Snapshot copies
  • Consistency points and NVRAM
  • FlexVol and FlexClone technology
  • RAID and RAID-DP™

If you only read one paper about NetApp technology, read A Storage Networking Appliance.

 


Advantages of NetApp FlexClone Technology in Database Environments

This five-minute swing bench demo shows:

  • Performance benefits of running OLTP database loads on aggregated storage
  • Ability to increase and decrease volumes in seconds
  • Ability to create database clones in under a minute for testing
Transaction Throughput



FlexClone Performance

Unlike other implementations of cloning technology, FlexClone volumes are implemented as a simple extension of existing core mechanisms. As a result, the performance of FlexClone volumes is nearly identical to the performance of flexible volumes.

The creation of a FlexClone volume is nearly identical to creating a Snapshot copy and is usually completed within seconds. Clone metadata is held in memory just like a regular volume, so the impact on storage system memory consumption is identical to having another volume available.

Get the details. Read the 32-page Introduction to Data ONTAP 7G (pdf).

 

SUBSCRIBE | PROVIDE FEEDBACK