A Case Study: How NetApp IT Achieved 60% Utilization While Saving 40,000 kWh per Month
By Stacey Rosenberry, Gary Garcia, and Devinder Singh
In March, NetApp will publish a new white paper in the Vision series. As a Tech OnTap member, you're invited to enjoy early access to the report, Reducing Data Center Power Consumption through Efficient Storage.
In 2006, NetApp IT undertook a project to increase utilization and upgrade hardware. This project required migrating from old, inefficiently used storage systems to new, more scalable systems. A significant benefit of the migration was the adoption of NetApp Data ONTAP® 7G and FlexVol™ technology.
This consolidation yielded significant results:
- Storage utilization increased from less than 40% to an average of 60%.
- Storage footprint reduced from 24.83 racks to 5.48.
- 50 storage systems replaced with 10.
- Direct power consumption decreased by 41,184 kWh per month.
- $59,305 in annual electricity costs eliminated.
- Substantial capacity and performance gains.
This article explains the challenges that NetApp faced, the different phases of the consolidation, and key results. For details of exactly how we achieved (and calculated!) some of these results, see the sidebar.
The Challenge: Low Storage Utilization and Inefficient, Aging Hardware
Like many companies, Network Appliance has experienced rapid, sustained growth in recent years. With a 30% annual growth rate, simply adding more disks to our installed storage systems was not a viable long-term solution. The NetApp IT team was experiencing challenges in three key areas:
- Low storage utilization. Overall storage utilization per volume was less than 40%. In many cases, additional spindles had been deployed to provide adequate application performance, resulting in unused capacity.
- Aging hardware. This project focused on a variety of older hardware, including 34 F760s, 12 F820s and F840s, and 4 F880s. These systems were running older versions of the Data ONTAP operating system, which did not allow the team to take advantage of advanced features such as FlexVol technology. These older systems also use lower capacity drives with lower overall storage density, resulting in a storage environment with a large number of storage systems and greater management complexity.
- Space, cooling, and power constraints. The 50 storage systems involved in this project had a combined maximum power consumption of 329kW and required additional power to meet cooling needs. Our current data center has 6,500 square feet, of which 70% is built out for use. Building out the remaining 30% would require significant retrofits to add power and cooling capacity at significant expense.
Additional Project Challenges
When we started the upgrade project, we realized that this was not just an infrastructure process; bringing our business applications up to modern best practices also required that we rationalize the network topology, the data storage layouts and application code. Our project methodology was adapted to integrate with each application team, using planned software release windows opportunistically. Although we primarily set out to tackle our storage issues, it was impossible to ignore the rest of the environment.
- Applications
This storage environment supports a wide variety of critical business applications used by more than 20 business groups. One thing that worked to our advantage was that rather than distributing business-critical applications across multiple worldwide data centers, in most cases NetApp was already using a single global instance of applications, reducing complexity versus enterprises that have widely distributed applications.
- Servers
Naturally, the applications were spread across an even larger number of servers. The impact of the storage migration to each server had to be assessed, and each server had to be migrated to the new storage environment. The difficult part was not the server to storage relationship, but rather the relationships from a shared storage infrastructure to the application set. In effect, the migration was many application migrations, the servers to storage relationships were simply the context.
- Networks
Network Appliance had adopted a segmented network strategy, but legacy systems still depended primarily on one monolithic flat network that mixed development and production and exposed applications unnecessarily to network "weather". This project gave us an excellent opportunity to bring legacy systems into best practices.
- Resources
The upfront coordination and support of NetApp application developers and storage administrators was a must. We also had to coordinate the efforts of the DBA, UNIX® server, and Windows® server teams. Without buy-in from management to have these resources available, this project wouldn’t have been completed.
The Solution: Consolidate Data across 109 Applications, 343 Servers, and 50 Storage Systems
Phase I: Discovery
The project began with a thorough audit of the entire environment, including applications, servers, and networks.
- Our initial discovery indicated that we needed to consider 109 different applications. Each application had at least two environments (development and production), while some tier 1 application had as many as eight discrete environments.
- These applications were utilizing 343 servers. By talking to application owners, we found that 148 of these servers would not require migration and that 18 could be decommissioned. This left 177 servers whose data would need to be migrated to consolidated storage.
- Application storage was being provided by 50 separate storage systems with 53.6TB of stored data on 331 volumes. We discovered just under 5,161 mounts to these servers. In many cases, information was hard-coded and would need to be changed by each application team before we could proceed.
Phase II: Analysis
Based on the audit, NetApp IT decided to implement the following changes:
- Decommission 50 storage systems and replace them with 10 of the latest model storage systems (at that time, the FAS980c) running Data ONTAP 7G.
- Host the new storage systems in segmented networks so that performance could be better managed between applications.
- Migrate existing servers to the new network infrastructure.
- Migrate 46 applications. We decided 44 applications were already compliant with storage standards and learned that 19 could be decommissioned.
- Convert all mounts to standardized references; eliminate all references to specific storage systems.
This sounds simple enough, but it represents a significant amount of change with a lot of dependencies. As Dave Robbins, senior director of NetApp Global Infrastructure, pointed out, "NetApp IT may own the plumbing, but the application folks own the furniture and ultimately we can't screw up the house during the remodel."
Phase III: Implementation
The project began with an intensive manual process of cleaning up the data. Every data set had to be reviewed. We had developed scripts that allowed us to do an inventory of mount points[—]where they are connected, etc.[—]but ultimately each mount had to be scrutinized by someone from the responsible application team, and each team had to decide what to keep, what to archive, and what to delete. Programmers also had to go back and fix any hard-coded mounts and other dependencies that would break during the migration.
Next, we installed the new storage systems and configured new networks utilizing segmented VLANs to isolate application traffic. With those tasks complete, data migration could begin. We worked through the applications one at a time. For each application we established a migration team and developed a move plan. Two to four application projects were run concurrently. Actual data movement was carried out using either NDMPcopy or NetApp SnapMirror® replication software. Once an application was migrated, we made old volumes obsolete and decommissioned old storage systems.
Project Results
The storage consolidation phase of this technology refresh has provided a broad range of benefits addressing the storage challenges previously described above.
Challenge #1: Low storage utilization
Result: An average of 60% storage utilization
- Disk utilization increased from about 40% to more than 60%. This was a direct result of the move to Data ONTAP 7G and FlexVol. Using flexible volumes, we have been able to spread application volumes across a large number of spindles for performance without sacrificing disk space. Increased utilization means that we need less total disk capacity, decreasing power consumption and cooling requirements and simplifying management. For the Cognos application highlighted previously, for example, utilization jumped from an average of 28% across 8 storage systems (a high of 80% and low of 4%) to an average of 85%.
Challenge #2: Aging, inefficient hardware
Result: Significant gains in capacity, performance, flexibility, reliability, and ease of management
- Increased capacity and performance. Although in the short term we reduced our storage requirement by improving utilization, this upgrade also positions NetApp to quickly expand storage capacity in the data center as necessary. Replacement of older disks with 144GB disks substantially increases the capacity of each disk shelf. Each of the new systems has a maximum capacity of 64TB, meaning that the 10 storage systems deployed can support up to 640TB. These 10 storage systems also offer significantly more performance and capability than the 50 systems they replaced.
- Increased operation flexibility. The move to consolidated storage on Data ONTAP 7G makes it much easier to add capacity (and less expensive as a result of better utilization). With FlexVol volumes, we can easily add new volumes or grow or shrink existing volumes to meet changing demands.
- Increased stability and reliability. All new storage systems are clustered for improved data availability, and all RAID groups utilize RAID-DP™ for greater protection against disk failure. Using diagonal parity, RAID-DP can recover from two disk failures in the same RAID group, yet offers the same performance as NetApp RAID 4.
- Simplified management by replacing 50 storage systems with 10. Now we have only 10 storage systems to manage, and we took care to rationalize volume names, mounts, and exports while eliminating hard-coded dependencies to ensure smoother operations going forward.
Challenge #3: Space, cooling, and power constraints.
Result: Reduced storage footprint to under 6 racks and cut annual power costs by $60,000.
- Substantially reduced data center footprint. As shown in the following table, through this consolidation we’ve been able to reduce our storage footprint from 24.83 standard 47U foot racks to 5.48 racks.
- Reduced power consumption and electricity costs. In total, the storage equipment that we decommissioned drew a maximum of 1631 amps, or 329kW, and was replaced with equipment drawing a maximum of 331 amps, or 69kW. This resulted in an electricity savings estimated at $59,305 annually (see the sidebar for details). Additionally, the resulting decrease in heat load works out to 93.549 tons of air conditioning.
|
Original |
After Consolidation |
Rack Space |
24.83 |
5.48 |
Disk Utilization |
<40% |
>60% |
Direct Power Usage |
329kW (Max) |
69kW (Max) |
Estimated Annual Power Savings |
- |
$60,000 |
|
Finally, as part of this project, the team reorganized the NetApp network infrastructure. The segmented network architecture allows us to isolate application traffic using VLANs for better, more predictable performance and improved security.