Death by Scale:
Three Secrets to Avoiding It
By John Hanna
NetApp has built its reputation on simplicity and ease of management. Despite this success, however, it is becoming obvious that modern IT is quite literally awash in data – a rising tide of information that threatens to overwhelm even the most disciplined and well-funded IT departments. Increasing complexity and management difficulty are the inevitable results of continued storage infrastructure growth.
So what happens when the five systems and 35TB you were initially managing become 100 systems supporting 1000TB? Many companies are experiencing annual data growth of over 50% plus increased retention requirements, which means that in just a few years your data could easily triple or quadruple. How are you going to manage an environment that is expanding exponentially?
You hear about companies managing petabytes of storage with just a few dedicated storage admins. One company that I can't name publicly had over 5,000TB of NetApp storage worldwide (5 petabytes!) across over 600 storage systems the last time I checked in with them. To make matters more complex, the footprint has at least doubled every 12 months.
This particular customer has eight people dedicated to storage at its HQ and various worldwide locations, and relies heavily on three dedicated NetApp support engineers. That's an average of just over 450TB per person!
How do companies manage that much storage??
In this case, the company has standardized on a minimum number of server, database, and storage vendors. The fact that NetApp uses a single operating system for its entire range of products – from entry-level systems to high-performance primary storage to near-line storage – is a significant advantage for them. But obviously these are not the only keys to their success.
In this article, I'm going to talk about three ways NetApp engineers are designing products that can help customers of all sizes succeed in the face of massive data growth:
- Scalability Secret #1: Become a grouper
- Scalability Secret #2: Don't reinvent the wheel
- Scalability Secret #3: Delegate, delegate, delegate
NetApp is developing a suite of management software that uses these secrets to best advantage to bring simplicity to the management of large amounts of storage. NetApp Operations Manager (Ops Mgr, formerly called DataFabric® Manager or DFM), is the foundation of this approach. In the following sections, I'll refer specifically to Operations Manager and related products.
Scalability Secret #1: Become a Grouper
NetApp Solution: Hierarchical groups can save you hundreds of hours every year, versus dealing directly with individual assets.
If you were hoping to find out how to become a large tropical fish in this section, sorry to disappoint you. (If anyone knows that secret, please get back to me.) The type of grouper I'm talking about is the kind that groups things to make them simpler. Any time you can group related storage assets together and operate or report on them as a whole, you can save a lot of time and manage more storage with fewer people.
The hierarchical grouping feature of NetApp Ops Mgr lets you create logical groups to significantly simplify management, monitoring, and reporting tasks. Ops Mgr allows sets of storage assets – storage systems, volumes, LUNs, etc. – to be organized into groups that match the way your organization uses the assets – by location, by project, by application, by department, etc. By establishing appropriate groupings you can do filtered reporting, maintain better control over sets of resources in less time, and target alerts specifically to resource owners so that problems are directed to the best people for quick resolution.
Each group is represented by an icon in Ops Mgr that allows you quickly monitor and manage the group or drill down to see historical trends or real-time information, or to access individual group assets. Each group icon is accompanied by an alert icon that lets you immediately see all alerts that are relevant to that group.
Assets can belong to more than one group, and one group can contain another. For instance, you might define a regional group that contains several data center groups. Each data center group might consist of primary and secondary storage groups, or groups based on application type, or both. You may also want different groupings for business purposes and reporting versus day-to-day management. Obviously, deciding the best way to define groups takes some thought, but the up-front work pays off.
Once groups are configured, you can quickly dive into filtered views of information that pertain only to the group you're interested in. For instance, you might define a group that contains all the storage assets used by a corporate database. By drilling into that group, you can see everything about the underlying assets, down to the level of a broken disk, while excluding everything that isn't of interest.
NetApp has also applied this approach to backup and disaster recovery. If you want to protect a group of storage assets using the same policy, you can simply group the storage assets into the same "dataset" using a tool called NetApp Protection Manager. On the back end, resource pools group secondary storage to simplify management and increase flexibility. Read last month's article on Protection Manager or watch the demo to learn more.
Scalability Secret #2: Don't Reinvent the Wheel
NetApp Solution: Avoid custom configuration whenever possible.
This one seems obvious enough, but in the absence of the right tools and the right planning, too many of us end up repeating the same time-consuming processes over and over again. To help avoid this type of problem with storage configuration, the customer I mentioned earlier worked closely with NetApp SupportEdge Services to track usage patterns in different organizations and geographies, and has developed processes to account for these differences during capacity planning and forecasting. They have established standard storage system configurations to meet the distinct requirements of a range of applications, eliminating the need to create a custom hardware and software configuration for each new storage system.
The Data ONTAP Configuration Management feature of Ops Mgr lets you maintain the same Data ONTAP software configuration across multiple systems. This includes all system configuration information such as administration options, audit, AutoSupport, and CIFS and NFS options, plus many of the standard system configuration files (for example usermap.cfg and hosts.equiv). After you install a new storage system, you can quickly apply a configuration, and you're ready to provision volumes or LUNs and go to work.
You can create a configuration template either by pulling it from an existing system or by creating it from scratch, and then assign the template to a configuration group. Configuration groups are also hierarchical – one group can contain another, enabling configuration templates to inherit the configuration details from the group above.
Whenever you add a new storage system, you can simply push the appropriate configuration to it, and all the appropriate options are set automatically, shortening the amount of effort needed to scale out your storage infrastructure.
Once a system is configured, you can monitor to ensure configuration compliance and receive alerts if the configuration on a system changes. If a configuration change has occurred, you can correct it by simply pushing the template out to the system again.
Again, NetApp has applied this approach in a similar way to simplify adding backup or disaster recovery for new data. Unless the data requires some unique protection policy that doesn't apply to anything else you've got, you can add protection with one or two clicks using Protection Manager. Even creating and applying a brand new policy only takes four steps.
Scalability Secret #3: Delegate, Delegate, Delegate
NetApp Solution: Role-based administration lets you grant the ability to perform specialized tasks to other admins without giving up control.
The only thing better than doing a job quickly and efficiently is getting someone else to do it. One of the trends that has occurred as IT infrastructures have become more complex is increased specialization. Today we have server admins, application admins, and storage admins. Within the storage admin category, there is a trend toward subspecialties such as backup, monitoring, and provisioning. Additionally, these roles are increasingly distributed across multiple locations and data centers.
Obviously, you would like each person to be able to carry out the specific operations they need to perform without necessarily giving everyone the keys to the kingdom. Ops Mgr uses role-based access control (RBAC) to allow you to delegate the ability to perform particular tasks on a particular set of assets to particular people. For instance, a backup operator might be able to kick off jobs in the local data center, but not define or change what those jobs consist of or perform any other tasks for which he or she is not authorized. This provides protection from both mistakes and malfeasance. It also makes it possible to delegate particular, well-defined storage tasks to application or server admins so that they can perform limited storage tasks without always involving a storage admin.
Roles are assigned to users or groups based on defined responsibilities. Each role consists of one or more capabilities, or possible actions, plus resources such as storage systems or volumes that can be acted on.
With Global being the highest group in the hierarchy, the next level down (one level down from Global) can be defined as a resource available to a role. An example using the graphic in the previous section would be that a role defined for business report access and no management could be restricted to view data only in the Business Reports level of the hierarchy, completely locking that role from being able to view anything in the Storage Ops level. This restriction of access to groups, in roles, exists only for the first level down from Global in the hierarchical groups. The backup operator role discussed earlier can initiate or schedule backups only within a certain group. A backup administrator role might inherit the backup operator role plus include the ability to create backup relationships and perform restores across the enterprise. These capabilities make it easy to distribute administration tasks across the enterprise for improved scaling without sacrificing security or risking disaster.
What Is the Future of Data Management?
These are a few of the things that NetApp manageability tools do today to help keep your storage infrastructure manageable, even in the event of exponential growth. And, to deliver our ultimate vision for how storage environments should be managed, NetApp continues to build on the core functionality Ops Mgr and leverages an Integrated Data Management approach.
Integrated Data Management is about integrating the underlying data and storage management processes into the data administrators' (App admin, DBA, Exchange Admin, Server admin, backup admin etc.) view of the world. This enables the data administrator to carry out some of the data management tasks themselves within the confines set by the storage administrator. This not only increases the flexibility and productivity of data administrators but it also improves the productivity of storage administrators and allows the storage administrator to focus on high value tasks and at the same time have the necessary control over the infrastructure.
NetApp engineers are working on an additional suite of products that leverage and complement Ops Mgr to streamline particular tasks. Protection Manager, which was released earlier this year, is the first phase of this effort. This policy based management application simplifies the configuration, management, and monitoring of Snapshot copies, disk-based backup, and replication across your entire NetApp storage infrastructure, and can incorporate other servers and storage using Open Systems SnapVault (OSSV). In the future, NetApp will add additional tools to its manageability family including an enhanced version of the existing Performance Advisor as well as a tool to facilitate rapid provisioning.
Ultimately, NetApp believes that building more and more tools solely designed for storage administrators is a losing battle. Instead, NetApp is committed to removing the one-off requests barraging these folks (which have turned them into storage help desks), delegating administration tasks across the admin tiers in an organization and automating processes across the stack controlled by storage administration policy. And that's how we help our customers avoid death by scale.