Tech On Tap :: Insights for Simplifying Data Management | NetApp(R) - Network Appliance
To Tech on Tap Home Click to visit NetApp
TECH ONTAP ARCHIVE - JULY 2007 (PDF)

NetApp A-SIS Deduplication:
Top 10 Customer Questions Answered

In May, NetApp announced a new deduplication technology that can significantly increase the amount of data stored in a set amount of disk space: Advanced Single Instance Storage (A-SIS) deduplication. This technology is available (at no charge!) for NetApp NearStore® R200 and NearStore on FAS systems.

A recent NetApp TechTalk Webcast and chat session generated a substantial number of questions about the new technology, and we saw a very high response to last month's Tech OnTap engineering perspective.

The full 23-page chat transcript (pdf) is now available, while the top 10 customer questions about A-SIS deduplication are answered below:

  1. What exactly is A-SIS deduplication?
  2. How do I add the A-SIS deduplication capability to a system?
  3. Why is an R200 or NearStore license required?
  4. Are there any plans to remove the NearStore license requirement?
  5. What do you mean by "light-duty" primary storage?
  6. Can the system perform other operations while A-SIS deduplication is running?
  7. Can I estimate my space savings before installing A-SIS deduplication?
  8. Can previously written volume data be deduplicated?
  9. Can I schedule the time when deduplication is run?
  10. What kind of space savings can I expect?

1. What exactly is A-SIS deduplication?

A-SIS deduplication is a general-purpose space reduction feature available on NearStore R200 systems and NearStore on FAS systems. When A-SIS deduplication is enabled, all data in the specified flexible volume can be scanned at intervals and duplicate blocks removed, resulting in reclaimed disk space. Note: A-SIS deduplication is not supported on R100, R150, FAS250, FAS270, or any products in the 800 or 900 families.

2. How do I add the A-SIS deduplication capability to a system?

The A-SIS license enables the deduplication capability. This license is available for R200 systems and for any FAS system that has a NearStore license installed. There is no charge for the A-SIS deduplication license. A-SIS deduplication requires a minimum Data ONTAP® version of 7.2.2 for FAS3000/6000 systems.

3. Why is an R200 or NearStore license required to enable A-SIS deduplication?

Although A-SIS deduplication is application transparent, it has not been tested with mission-critical primary applications. By requiring a NearStore R200 system (or NearStore on FAS license) to enable A-SIS deduplication, we can help ensure that customers do not attempt to use A-SIS deduplication in performance-intensive application environments that have not been fully tested.

4. Are there any plans to remove the NearStore license requirement for A-SIS deduplication?

We are continuously monitoring customer adoption of A-SIS deduplication in backup, archival, and light-duty primary storage environments. As we gain more experience with A-SIS deduplication in these environments, we fully expect its use to broaden. Until we gain that experience however, A-SIS deduplication will require the NearStore license.

5. What do you mean by "light-duty" primary storage? Is this appropriate for A-SIS deduplication?

What we mean by light-duty primary storage is volumes that contain primary (first copy) data, but that are not performance driven. Some examples of this would be user home directories, document directories, and application volumes that experience heavy I/O loads during the day but are quiescent at night and on weekends. These volumes might very well benefit from A-SIS deduplication if the system has the performance headroom to support the additional overhead imposed by A-SIS deduplication.

6. Can the system perform other operations while A-SIS deduplication is running?

Yes, A-SIS deduplication runs as a background process and the system can perform any other operation during this process.

7. Can I estimate my space savings before installing A-SIS deduplication?

Yes. A space estimation tool is available to NetApp SEs. This tool is a standalone application that operates on a Linux® client and will "crawl" through any NFS volume (either NetApp or non-NetApp NFS volumes up to 2TB) and will estimate the amount of space savings you will get with A-SIS deduplication.

8. Can previously written volume data be deduplicated?

Yes. A CLI command signals A-SIS deduplication to scan and deduplicate existing data on a volume. This command can be run at any time on a volume that contains previously written data, and we recommend it be run whenever A-SIS deduplication is first enabled on a volume.

9. Since A-SIS deduplication is run after the data is written to the volume, can I schedule the time when this deduplication is run?

Yes. A CLI command allows you to set individual A-SIS deduplication schedules for each volume.

10. What kind of space savings can I expect with A-SIS deduplication?

The space savings of any deduplication product is dependent on the number of duplicate objects that can be found and removed. A-SIS deduplication is no different. Based on internal testing and customer feedback, the chart below illustrates some sample space savings achieved by A-SIS deduplication in typical environments:

Dataset
Full System Backups*
VMware Images
Tech Pubs
Software Archives
Database
Home Dirs
Web & MS Office
Oil & Gas Seismic Data
E-mail Archive

Sample Space Savings Observed
20:1 (95%) over time
85%
50%
50%
30 – 50%
30 – 50%
30 – 45%
30%
20%

*In data backup environments, space savings grow over time as repetitive full backups are retained. For example, tests with CommVault Galaxy provided a 20:1 space reduction over time, assuming daily full backups with a 2% daily file modification rate.

More Information

Didn't find the answers you were looking for? Several hundred questions were submitted during a July 19 TechTalk chat session. For more details, check out the full 23-page transcript (pdf).


 

RELATED INFORMATION

Engineering Perspective: Deduplication Comes of Age

At its heart, A-SIS deduplication relies on the time-honored computer science technique of reference counting.

When A-SIS deduplication is enabled on a volume, it computes a database of fingerprints for all of the in-use blocks in the volume (a process known as "gathering"). Once this initial setup is finished, the volume is ready for deduplication.

To avoid slowing down ordinary file operations, the search for duplicates is done as a separate batch process. As the file system gets updated during normal use, WAFL® creates a log describing the changes to its data blocks. This log accumulates until one of the following occurs:

  • The administrator issues a sis start command
  • The next time specified in the sis config schedule occurs
  • The changes to the log exceed a predetermined threshold

Learn more. Read the article.


How Data Deduplication Fits into the NetApp Master Plan

Following is an excerpt from a recent post on NetApp founder Dave's blog:

Buying less storage is the small picture. The big picture is that we want to help customers create a disk-based copy for all of their primary storage. …

Interesting things start to happen when you create a disk-based copy of everything. Instead of doing searches on primary storage, which could hurt performance, why not search the secondary copy? If the people running decision support systems want their own copy of a critical database, why not clone the secondary instead of paying for a whole new copy? Why not create lots of cloned copies for the test and development team preparing to upgrade to the next version of Oracle or SAP?

When you create a copy of everything, and add functionality like Snapshot™ copies and clones, what you end up with is a smart copy infrastructure that can completely change the way you think about data management.

This won't happen overnight. We understand that. But anything that helps people reduce the cost of creating copies helps us achieve our vision more quickly. In the short run, data deduplication helps customers save space and save money, but what's more important is that by reducing the cost of copies, it helps us achieve our master plan.

Read the full Dave's Blog post on deduplication.

SUBSCRIBE | PROVIDE FEEDBACK