Tech On Tap :: Insights for Simplifying Data Management | NetApp(R) - Network Appliance
To Tech on Tap Home Click to visit NetApp
TECH ONTAP ARCHIVE - FEBRUARY 2007 (PDF)
Matthew Taylor
Matthew Taylor
Professional Services Engineer, NetApp Global Services
Matthew Taylor has been working with NetApp storage for more than seven years. Prior to joining NetApp, he worked as a Windows® and storage administrator for a large manufacturing company. Matt joined NetApp in July 2005 and since that time has worked on-site supporting the top enterprise account described in this article. During that time, he has helped the customer’s multiple business units grow their NetApp storage environment from 30 systems to over 90.
Technical Case Study: Thin Provisioning for Disk-to-Disk Backup
By Matthew Taylor

With traditional storage provisioning, you rely on storage end users to identify their requirements and then allocate all the disk space they think they will need for each application upfront. Unfortunately, end users are notoriously unreliable at estimating requirements. If you allocate a 1TB volume for the new “killer app” that’s guaranteed to be a success, nine times out of 10 when you go back and look a year later you’ll find that it’s only using half the allocated space (or less).

When it comes to disk-to-disk backup, provisioning can be even more complicated. You not only need an estimate of the growth in primary storage usage, you also need to know the rate of change in each volume. This case study looks at an innovative customer application of thin provisioning on secondary storage for disk-to-disk backup to overcome these uncertainties. Over the course of a year the customer increased primary storage capacity from 500TB to 900TB without needing any additional secondary storage. This was a direct result of tremendous increases in disk utilization through the use of thin provisioning.

This article provides an introduction to thin provisioning in a NetApp environment and documents a real-world implementation, including:

  • Change rates
  • Safety measures
  • Volume configuration
  • Volume settings
  • Monitoring practices

Background: How NetApp Approaches Thin Provisioning
Thin provisioning addresses the limitations of the traditional approach to storage provisioning. Conceptually, it works the same way as insurance. A typical insurance company holds policies far in excess of what it can pay out at one time, but the number of actual claims in a given period never exceeds the company’s working capital, and it stays in the black. Having a large enough and diverse enough pool of customers helps ensure that an insurance company creates a risk-sharing versus risk-taking environment.

Similarly, with thin provisioning a storage system presents more storage space to the servers connecting to it than it actually has available. Consider a storage system with 15TB of usable storage capacity. With thin provisioning, a storage administrator may map volumes of 0.5TB to each of 45 servers, making 22.5TB of storage visible to hosts.   
This level of simplicity in configuring thin provisioning is unique to NetApp.

Free space on the storage system serves as a buffer pool for all volumes. Physical storage space is allocated to each volume on demand as data is written, so if all 45 hosts use the space provisioned to them there would obviously be a problem. You have to monitor the storage system and add capacity when needed, but instead of making capacity planning decisions and provisioning to meet the needs of each individual volume, you plan and provision for the needs of the entire storage system. This is easier, less prone to mistakes, and results in much more efficient storage utilization, so less storage is needed.

NetApp Data ONTAP® 7G with FlexVol® technology provides a built-in mechanism for enabling thin provisioning. By simply setting the guarantee parameter on each volume to an appropriate value, thin provisioning can be enabled without host or application customization. This level of simplicity in configuring thin provisioning is unique to NetApp.

When you create a volume on a NetApp system, you don't have to dedicate specific disk blocks to the volume. Instead, blocks are allocated on demand as data is written. In this way, multiple volumes share the same pool of free storage, and you don’t have to guess up front which volumes will grow and by how much. You simply add more capacity when free storage gets low and grow a volume if more space is required. With FlexVol you don’t pay a performance penalty for this approach. Even the smallest volumes utilize a large number of disks for optimal performance.

Case Study: Thin Provisioning for Disk-to-Disk Backup

The customer described in this case study is a large company that sells backup services to its internal customers with a fixed retention guarantee (normally 45 days). The customer utilizes NetApp storage systems for both primary and secondary storage. All systems are running Data ONTAP 7G.

Primary Storage and Application Environment
Characteristics of the primary storage requiring backup include:

  • 72 NetApp storage systems with about 900TB of capacity.
  • Retain seven days of Snapshot™ copies on local storage for quick restores.
  • Databases range from 100GB to 6TB in size.
  • Approximately 150 groups and services are served by this storage.

Oracle® Databases are the most critical and most volatile of the applications supported by this storage. Each database is considered independent; this means that it must be possible to back up and (more important) restore each one individually. Database turnover is often very low, but at times may reach a 100% rate of change because of people loading new information. The storage team has no control or visibility into what might occur on particular primary storage volumes, so the backup system has to adapt readily.

Secondary Storage and Disk-to-Disk Backup Environment
The secondary storage and backup environment consists of:

  • Six NetApp NearStore® R200 storage systems using 320GB SATA disks with approximately 430TB of total raw capacity
  • NetApp SnapVault® software:
  • SnapVault starts with a baseline copy on secondary storage that mirrors the source volume or qtree. (A qtree is a subvolume that has its own quotas and permissions.)
  • When a nightly backup is scheduled, SnapVault is used to create a Snapshot copy of the primary volume and transfers only the blocks that have changed to secondary storage. (For databases, in house scripts put the database in hot backup mode before creating a Snapshot copy.)
  • Snapshot copies are maintained on secondary storage for a prescribed time so that data can be restored from any point in time.
  • Approximately 800 qtrees are in SnapVault relationships.
  • From 14 to 45 days’ worth of SnapVault backups are retained for each qtree.

Why Thin Provisioning?
After about a year running this configuration, it became clear to the customer that differences in the change rate of different data sets were resulting in significant underutilization of the R200 systems. Utilization was only at 40%, and yet the IT team was always concerned about secondary storage space since it was almost fully allocated. Manually managing 800 separate qtrees was impractical and painful.

The IT team was initially considering the concept of thin provisioning for another storage project. When NetApp demonstrated thin provisioning to the company’s storage administrators, however, the team recognized an opportunity to leverage this approach to solve its backup challenges. The team found thin provisioning more appealing for its backup environment because performance wasn’t as big a concern—secondary storage was only occasionally accessed for restores—and it was possible to make changes to the backup environment as necessary (move qtrees to new aggregates and so on) without impacting production applications.

Converting to Thin Provisioning
Implementing thin provisioning was easy. The IT team simply made two adjustments to the volumes that housed the secondary qtrees for each SnapVault relationship:

  • Changed volume guarantee setting to “none.”
  • Sized each volume to match the size of the aggregate containing the volume.

With these changes, any volume can potentially grow to the full size of its aggregate, but no volume is guaranteed space. All volumes are free to grow as long as free space exists.

As a safety measure, the company created one “fully guaranteed” volume in each aggregate containing 20% of the total space. In normal operation this volume is not used but serves only as an emergency or backstop. If an aggregate were to fill unexpectedly, a storage administrator could release this space so operations can continue while rebalancing the distribution of qtrees between different aggregates.

The actual conversion process took some time because of the 800 qtrees in SnapVault relationships. To convert, they had to do a linear progression volume by volume and qtree by qtree. The company also used this as an opportunity to remap its qtree-to-volume relationships, which increased the total time for the conversion.

Changes to Monitoring Practices
A set of best practices was established that called for "administrative closing" of aggregates to new SnapVault secondary qtrees after the aggregate became 60% full and for outmigration of qtrees to other aggregates to begin at 85% full.

This actually reduced the number of qtree migrations between aggregates on secondary storage versus the previous traditional provisioning environment. The overly large space demands of the old model made free space a problem and required more frequent moves. The alerting and monitoring done by NetApp Operations Manager (formerly known as DataFabric[r] Manager, or DFM) was customized to account for the oversubscription of the aggregates and the need to provide a more appropriate "aggregate full" threshold. The company also changed from a policy of monitoring free space on volumes to monitoring free space on aggregates.

Result: 70% Utilization, No New Secondary Storage Despite 80% Increase in Primary Storage Capacity
This thin provisioning methodology has been in place for a year with no outages, and no aggregates have been filled. Before the migration started, the company was concerned with free space almost every day, but as the migration went forward, it continuously got back free space from formerly underutilized volumes. This free space made it possible to add new customers and services into the backup system without purchasing additional storage. Over the course of the last year, primary storage capacity has grown from 500TB to 900TB without requiring any additional secondary storage capacity. Before the switch to thin provisioning, the company had been considering adding an additional R200.

This particular data center was continuously pinched for floor space, power, and cooling, so this savings represents a significant benefit beyond the savings in capital outlays. The company has now been able to delay the purchase of any new secondary storage for a year as a result of thin provisioning and the increased efficiency it provides. Storage utilization went from less than 40% (due to mostly underutilized volumes) to closer to 70%.

Customer Recommendations
This customer doesn’t hesitate to recommend the use of thin provisioning in a disk-to-disk backup environment. The company also uses thin provisioning for home directories on the production side of the house. According to company practice, each of 4,500 users has up to 1GB of network file storage as a home directory, which would require 4.5TB of total storage. Using thin provisioning, the company meets this requirement with only 600GB of actual disk storage.

Despite these successes, the customer is quick to point out that thin provisioning may not be appropriate for all uses. For OLTP applications, for instance, it is much harder to move data around without impacting the application should storage become critical, so important database applications are probably not a good choice for thin provisioning or should not be aggressively thin provisioned.
 

RELATED INFORMATION

Thin Provisioning for NetApp SAN Environments

One of the big disadvantages of traditional disk arrays is that they force you to allocate dedicated storage space to a disk volume or LUN when you create it. Since it's often hard to gauge the amount of space you'll need up front, you end up overprovisioning and wasting valuable disk space.

In contrast, when you create a LUN on a NetApp system, you don't have to dedicate specific disk blocks. Instead, blocks are allocated only as data is written. In this way, multiple LUNs can flexibly share the same pool of free storage. You simply add more capacity when free storage gets low, and you can painlessly grow a LUN if more space is required.

Learn more. Read the technical report.


The Versatile Storage Platform
To truly appreciate the versatility of the NetApp architecture, it is important to view how the storage is managed and how the storage is accessed as related to functionality. Too often storage vendors separate these two concepts, which creates overly specialized storage systems that eventually become isolated islands.

In May, NetApp user Ben Rockwood provided his own overview of how the NetApp Data ONTAP operating system manages data on disk and how that data is accessed from client systems.

Highlighted technologies include:
  • RAID, RAID-DP, and traditional volumes
  • Aggregates and FlexVol volumes
  • Snapshot functionality and FlexClone™ technology
  • LUN creation and masking

Read more:


A Quick Primer on NetApp Data Protection Software

NetApp customers have two potential alternatives for data protection: SnapMirror® or SnapVault software.

SnapMirror is replication software intended for disaster recovery solutions. The mirror is an exact replica of data on the primary storage that can be mounted read/write to recover from failure. If a backup is deleted on the source, it will go away on the mirror at the next replication.

SnapVault, in contrast, is intended for disk-to-disk backup. It retains all backup copies as they appeared at the time they were created on primary storage for a user-specified period of time. Secondary storage used by SnapVault cannot be mounted read/write. Backups must be recovered from secondary storage to the original or an alternative primary storage system in order to restart.

At a more technical level, SnapVault takes a point-in-time image based on qtrees, while SnapMirror copies an entire image at the level of a LUN inside a volume.

Get the details. Read the reports:

SUBSCRIBE | PROVIDE FEEDBACK