![]() |
NetApp - News - Tech OnTap Newsletter |
What Makes the FAS6070 so Fast?
An Engineering Roundtable on FAS6070 Storage Controller Performance.
Tom Holland, senior engineer, Platform Hardware Engineering.
Tom has been a member of the NetApp Engineering team for six years and helped develop the F800, R200, and FAS6000 platforms. (left)
Naresh Patel, senior engineer, Performance Engineering. Naresh has a Ph.D. in performance and has worked on platform and Data ONTAP® performance for seven years. (middle) Howard Gage, manager, Hardware Engineering. Howard is also a six-year NetApp veteran and was responsible for delivering the F800, FAS900, and FAS6000 series platforms. (right) The below article was originally published in the June 2006 edition of the Tech OnTap newsletter. To receive the newsletter monthly and enjoy other great benefits, sign up today. The new FAS6000 storage family features two models, the FAS6030 and FAS6070. While the FAS6030 provides significant performance improvements, the FAS6070 has actually measured twice the performance of the previous high-end NetApp storage system, the FAS980 model. Here's a quick comparison of how the two systems stack up:
It's rare that a company releases a new high-end platform that provides a 100% performance improvement. To find out how NetApp did it, NetApp product engineer and best practice author Chris Lueth tapped some of the key engineers who worked on the new platform.
Chris: What types of goals did NetApp set for this new platform?
Howard: There were a number of dimensions. Of course we looked at raw performance, but we also focused on improving database workload performance with 64-bit addressability to support memory size greater than 4GB. NetApp has a unique advantage in its ability to provide true unified storage (that is, consolidating multiple types of storage and protocols on a single box: primary and nearline, CIFS and NFS, FC and iSCSI target mode, etc.), so we wanted to design in enough flexible I/O to allow customers to take advantage of that. We also decided to move to an embedded I/O to reduce cost. Naresh: The other key area of focus was high-performance computing, which stretches the limits of most systems. In the past we had a relatively small number of clients simultaneously accessing data on our systems. Now we see chip vendors and computer animation studios that have thousands of Linux® servers and they often want to concurrently read/write large amounts of data from/to a single system. This means that the performance requirements of this system are growing considerably, so our goal is to improve overall price/performance by delivering excellent performance with the smallest number of systems. Chris: Were there any particular technology considerations that impacted platform design? Howard: Processor frequencies are becoming fairly static, so the trend is in adding processing power through additional processors and—as we go forward—through multicore processors. The challenge that we faced involved finding the right processor architecture to take advantage of cost-effective symmetric multiprocessing. We needed an architecture that offered 64-bit addressing, scalable memory bandwidth, and scalable I/O architecture. Chris: What are some key architectural improvements to the new FAS6000 platform that allow the high-end FAS6070 to perform twice as fast compared to previous high-end NetApp systems? Howard: Certainly the transition from older front side bus technology to HyperTransport™ links has provided tremendous internal and I/O bandwidth for the FAS6000 series relative to previous NetApp storage. There are four 4GT (Giga-transfers/second) links that operate concurrently in each FAS6070 controller, and at those speeds each link has tremendous burst bandwidth. If you compare the 4GT speed of the FAS6070 HyperTransport links to the 0.4GT on the FAS980, you'll get a feel for where some of the performance improvement comes from. That's an order of magnitude difference. Naresh: In fact, the four HyperTransport links form a square interconnect with CPUs directly connected in each corner and deliver an aggregate burst bandwidth of 32GB/s per FAS6070 controller, or 64GB/s for an active/active, dual controller configuration. This is a huge improvement over the FAS980, which peaked at only 3.2GB/s and was limited by a shared bus. Having ten times the internal bus bandwidth of a FAS980 gave us a lot of headroom as we scaled up other components inside the FAS6070. This ranged from increasing NVRAM size to increased onboard I/O to the adoption of PCI Express (PCIe) technology for expansion cards. One of the other things we wanted to address with this new platform is memory subsystem performance. You don't want multi-GHz CPUs spending a lot of time waiting for memory accesses to complete. A memory bottleneck that affects storage controller performance ultimately affects application performance and limits options for infrastructure improvements such as storage consolidation. To prevent this, memory controllers are integrated in the CPU so that memory is directly connected rather than being routed through a slower intermediate chip. This reduces the time it takes to retrieve data from memory, ultimately using CPU cycles more efficiently. With industry-leading memory bandwidth and low memory latency, this is fastest hardware platform NetApp has ever designed.
As a result, the FAS6070 can maintain consistently high performance for applications ranging from intense transactional database workloads to very large e-mail deployments that cause pressure on memory. It facilitates consolidation projects by enabling different applications and workloads to be served from a single storage controller without starving the memory resources available to each. And in either case, with more memory and faster memory access, necessary background activities such as data replication and backups have considerably less impact on the performance of customer applications.
Tom: I'd also point out that the performance gains aren't only because of the increased internal bandwidth. With the FAS6000 series, NetApp has transitioned to a native 64-bit architecture in a platform that can take full advantage of it. In addition to operating on larger chunks of data in a single operation, the 64-bit architecture provides a less obvious benefit to a high-performance system. With the 32-bit architecture of the FAS980, accessing more than 4GB of memory requires multiple operations; but on the FAS6070, all 32GB in each controller can be directly accessed in a single operation. A good way to summarize would be to say that the FAS6000 is by far the best data flow engine we've ever designed. Chris: Okay, lets dive into a couple of items a little further. How much more NVRAM memory does the FAS6070 have, and how does that help customers? Tom: Whereas the FAS980 ships with 512MB of onboard memory on its NVRAM card, the FAS6070 has 2GB of memory on its NVRAM card. Since NVRAM is exercised more for write-intensive workloads, the extra memory on the FAS6070 NVRAM card helps customers see better performance for this type of workload. We have also changed the system interface on NVRAM to PCIe (PCI Express). This eliminates potential bottlenecks that the older PCI-Xbased slots might introduce. The new NVRAM provides adequate bandwidth to take advantage of the larger NVRAM memory during write-intensive workloads. Naresh: In general, the larger NVRAM allows WAFL® to write to disk more efficiently, so customers see higher sustained write throughput from the system. Also, when there is a burst of writes in what typically isn't a write-intensive workload, say like a database, the extra memory on the NVRAM card helps absorb a longer burst of writes because writing to NVRAM is much faster than writing to disks. With the database example, if a database has logging activity at certain periods, then the resulting writes would go very fast, thus freeing the database engine for serving business activities. In short, we increased the amount of NVRAM memory based on the high throughput capabilities of the FAS6070 to ensure optimal performance during writes. Chris: That's a good segue into another one of the items we wanted to go into a little more detail on. What exactly is PCIe, and what is the customer benefit? Howard: PCIe was designed to overcome bandwidth limitation issues with earlier PCI and PCI-X expansion slots. While the FAS6070 already has significant onboard I/O even without expansion cards, most enterprise deployments will likely populate many of the available expansion slots to accommodate their I/O needs. Since the FAS6070 was designed for high performance, we selected PCIe to take advantage of the bandwidth improvements it provides. Chris: So how does the extra PCIe bandwidth translate into a tangible customer benefit? Naresh: 100 MHz PCI-X slots are 0.8GB/s peak, and x8 PCIe slots are 4GB/s. PCI-X slots at 100 MHz could be shared between two slots, so a couple of fast HBAs could become limited by the PCI-X bandwidth. PCIe, on the other hand, is a point-to-point link, so each HBA has its own high-speed private path to memory. For customers, this means that they can add high-speed HBAs such as 10GbE and not have to worry about slot location; any available PCIe slot in the FAS6070 has plenty of bandwidth headroom. Tom: In addition to increased bandwidth, PCIe provides improved RAS features. For example, instead of a shared bus, each link is point to point. A failed device on one HBA won't prevent the use of an HBA in another slot. Also, the protocol has improved error detection and includes the ability to correct errors if they occur on the link, so the system can survive certain errors that would be fatal to PCI-X. Chris: Are there other things about the FAS6070 that you're particularly proud of? Naresh: We did introduce a new 10GbE TCP/IP Offload Engine Ethernet adapter with the FAS6000 launch and as part of the Data ONTAP 7.2 release. The main reason for making this card available as quickly as we did was based on customer feedback about their needs to have higher throughput Ethernet infrastructure as well as to reduce their Ethernet port density. Many customers are transitioning from current 1GbE infrastructures to 10GbE, much like the transition from 10/100Mb Ethernet to 1GbE in years past, and they told us that they wanted 10GbE support in our storage products. Along with increasing the Ethernet throughput with this adapter, we also provided support for a TCP/IP Offload Engine capable card, also known as TOE, for this card and the 4-port 1GbE card. The TOE functionality basically saves CPU cycles on the storage controller by offloading most of the networking activity to its own processing engine, hence the name. Depending on the workload and storage platform, the TOE aspect of the card could provide significant performance improvement.
Tom: I'd definitely call out the failed FRU indicators. It's a fairly simple idea to provide LED failure indicators on the motherboard to highlight faulty FRUs. What's clever about them is that these LEDs stay lit up when the power is disconnected and the motherboard tray has been opened up.
I'm not aware of any other system where you can power the system off and an LED will keep blinking to identify the failed component. The goal behind this approach with the failure indicators is to help make serviceability much easier and faster. Howard: Something else we should highlight is another new NetApp technology. The Remote LAN Module (RLM) is bundled with the FAS6000 series. The RLM is a standalone service processor inside the FAS6000 system that offers remote platform management of FAS6000. The RLM provides remote access to the system console over its Ethernet interface, using the secure SSH protocol. The RLM allows you to do remote troubleshooting, including power cycles and obtaining console logs, that just isn't possible with the serial console port. The RLM is also integrated with NetApp AutoSupport to call home and generate a customer case for any problems detected. We added the RLM to improve serviceability and manageability. (Learn more about RLM on the NOW™ customer site). Naresh: One last thing I'd add is that the FAS6070 was designed for deployments where considerable scalability is required. The FAS6070 can handle up to 1008 disk drives and, when using the 500GB SATA disks, up to 504TB of storage. This type of capacity scaling enables the FAS6070 to support large numbers of clients, evolving customer application requirements, and multiple applications and types of storage on a single system. |