How a 1,500-Node Diskless Server Farm
Evolved into a Fully Virtual Ecosystem
By Gregg Ferguson
When we first got the idea for our engineering test lab in Research Triangle Park, it was in response to a growing need within NetApp to be able to test our products against large grids or server farms and to quickly reproduce any customer problems that might occur in such environments. Our original plan was to use server blades in which each blade booted from a local disk. However, as the project progressed, it became clear that the time and administrative overhead required to copy a boot image to a thousand local disks would result in more time spent configuring and managing the cluster than running actual tests.
Instead, we designed our test lab to include 1,120 server blades booting over iSCSI. We dubbed the lab Kilo-Client, and believe it was the largest iSCSI-based diskless server farm in the world when it was launched in 2005 (and may still be!). We later added an additional 98 blades with iSCSI HBAs and 280 blades capable of booting over Fibre Channel. Check out the sidebar for specific hardware and software components.
The result: a 1,500-node server farm that packs massive performance and flexibility into a footprint of just over 389 square feet.
While the motivation for Kilo-Client today remains more or less the same as it was in the beginning, the lab has evolved to keep up with emerging technologies. Plus, over the past two years we've learned a lot about operating and maintaining a big environment. This article focuses on aspects of the current test lab design that customers and partners tend to find most interesting, including:
- Rapid server provisioning
- Provisioning virtual environments
- Booting over iSCSI, FC, and NFS
- iSCSI over 10 gigabit Ethernet
- Management automation
- Thin provisioning
- Data center cooling
Before I dive in, however, I want to make it clear that the Kilo-Client builds on just five or six different technologies, each of which is currently in use at hundreds of NetApp customer sites. Creating the architecture was mostly a matter of pulling together all those elements–each of which I had been exposed to during my years as a NetApp SE–in a single infrastructure.
In short: there is absolutely nothing in our test lab that can't be leveraged by any NetApp customer.
Rapid Server Provisioning
One of our early goals was to quickly provision a compute grid capable of meeting specific test characteristics. This meant that servers had to be quickly booted with any OS/application environment. We solved this problem using NetApp FlexClone® technology to enable the rapid creation of system images without making full physical copies of those images.
A set of "golden" boot images is created (as iSCSI and Fibre Channel SAN LUNs) for each operating system and application stack required in the server farm. Using SnapMirror® and FlexClone, we can quickly reproduce hundreds of clones (a FlexClone clone for each server being configured for a test); only host-specific "personalization" needs to be added to the core image for each provisioned server. This unique approach affords near-instantaneous image provisioning with a near-zero footprint (only the blocks of the images that differ need to be added to the storage system, which keeps track of the individual images), enabling us to configure and boot all or a subset of our nearly 1,500 blades in minutes.
Virtual Environments
What we ultimately discovered was that simply provisioning the server environment–although our method works extremely well–is not enough. What NetApp engineers wanted, and what we really needed to be able to do, was to rapidly provision complete virtual environments, including compute grids, interconnect fabrics, and storage grids.
That's exactly what we do today. We can automatically configure a compute grid running almost any OS (including VMware) and connect it via a vLAN (IP), vSAN (Fibre Channel), NFS, or even CIFS (we don't have the capability to boot over CIFS but can test CIFS functionality) to any of five possible storage grids. A typical virtual environment–which might include 100 servers, multiple OSs, and five to six storage controllers–is usually up and running in an hour or less. The most complex environment we ever created took about 10 hours to get up and running, and involved 500 servers, 30 NetApp FAS 6070s, 72 shelves of 300GB FC drives (~500TB), and the Data ONTAP® GX operating system.
Figure 1) A true virtual environment.
At any given time, our lab is running 12–15 virtual environments that are used for everything from product and interoperability testing to troubleshooting to proof-of-concept testing. Tests can be preempted by halting a server and creating a space-efficient, derived clone of that system (using FlexClone). Test configurations of any environment can be preserved or shared with other users and re-run months or years later, perhaps even on an alternative (albeit with the same architecture) system. Once we've built an environment, we never have to rebuild it. For example, say we build a Red Hat Linux® environment and the team requesting this environment loads Oracle 10g™. After they're done with the test they can create a clone, and this pre-configured environment can be reused as necessary in the future.
A final important point is that these virtual environments can be accessed and managed from anywhere in the world. A NetApp engineer working in any of six global facilities or a systems engineer at any location worldwide can schedule resources and run tests remotely.
Booting over iSCSI, FC, or NFS
One of the unique differentiators of NetApp storage is the ability to support iSCSI, FC, and network storage (NFS and CIFS) all from a single storage platform. Lots of customers find it most efficient to deploy iSCSI for some applications and Fibre Channel SAN for others, and to support additional applications using network-attached storage. As a result, we face new challenges on a daily basis in our test lab, and having a highly flexible environment that can support just about any protocol we throw at it is a huge advantage.
The original Kilo-Client design allowed us to boot our server blades over iSCSI using hardware initiators (iSCSI HBAs). Today we can boot servers using any of four approaches:
- Over iSCSI using the hardware initiator (1,218 blades)
- Over iSCSI using the software initiator (entire environment)
- Over Fibre Channel using an FC HBA (280 blades)
- Over NFS (entire environment)
This allows us to test and compare various environments and booting methods. If we aren't specifically testing booting methods, we tailor our approach based on test requirements. For instance, if someone wants to perform Fibre Channel testing with fault injection, we would typically boot the servers being used for other tests over iSCSI or NFS to leave the Fibre Channel free for the testing.
iSCSI over 10GBE
A while back I was asked to do a presentation about the Kilo-Client design at an event sponsored by blade.org. After my talk, virtually every vendor at the show wanted to sell me his or her new technology for use with the Kilo-Client. I was even approached by one overenthusiastic salesman in the bathroom!
When I got home, I went through all the cards that had been forced on me and discovered that several were from vendors of 10 Gigabit Ethernet gear. I called them up and ultimately we created a test kit using the IBM Blade Center with 10 Gigabit Ethernet switch modules from BLADE Network Technologies, and NetXen controllers connected to a NetApp cluster also outfitted with 10 Gigabit Ethernet cards. The result was a configuration with 10 Gigabit Ethernet from end to end that was capable of diskless booting using iSCSI. Reps from BLADE Network Technologies and NetXen and I took that configuration to an event in New Orleans where it generated a lot of interest and the hardware went on to shows in Paris and Singapore (although I did not).
So far, we've done mostly functional testing, but this architecture gives us the ability to do large-scale performance comparisons involving 10 gigabit Ethernet versus Fibre Channel–as well as anything else we might want to test.
Automated Configuration Management
When the Kilo-Client was created, we had a few scripts to help with configuration and that was about it. As we freely admitted at the time, that was the weakest element. Today, our work follows a predictable pattern of schedule→provision→monitor→adjust resources based on load→de-provision→re-schedule and so on.
We've got an automation framework in place to do all of those things that is about 70% of the way there–a big improvement. Customers struggling with scalability issues are interested in our management approach, because it shows how a very limited staff can effectively manage a dynamic, high-volume, high-request environment.
Thin Provisioning
I never actually used the term "thin provisioning" in association with Kilo-Client until a Gartner analyst pointed out that this is one of the best large-scale, real-world examples of it. He's right–our lab is highly space efficient because cloned images (LUNs) only consume additional disk space as boot images change, providing over 1,500-fold efficiency of capacity.
For example, let's say we wanted to boot all 1,498 servers with Red Hat Linux. The total storage requirement in our test lab would be 7.63TB (assumes 20GB for each of seven boot storage systems and 5GB per blade). In a traditional server farm–or even with traditional diskless booting–we would need a full 20GB per server so our total storage requirement would be 30TB. Ouch! As I said upfront, we'd spend more time configuring and managing the cluster than running the tests.
Cooling Design for Dense Configurations
One of the questions I get asked most often is, "How in the world do you cool this beast?" Part of this ties into the point I made about thin provisioning: there just isn't as much to cool as there would be in a traditional environment.
Still, 1,500 blades, 7102 fabric ports, and 87 storage controllers are an awful a lot of equipment packed into a dense area. In our original data center, we used a hot aisle/cold aisle approach. We created a cold aisle on the front side of the equipment (where air is drawn in) by adding extra cooling equipment there. That gave us as much as a 30-degree delta from front to back.
We recently moved to a new data center, and in our new lab we took a different approach–we created a cold room. We purchased new floor-to-ceiling cabinets and made sure that all openings were completely sealed from front to back, creating an air conditioning plenum. The only place for the cooled air in front of the equipment to go is through the equipment, and it never mixes with heated air coming out the back. Air pressure is also slightly higher on the cold side to ensure the flow is only in one direction. Using this approach, we get about 8 kilowatts of cooling in the lab versus 4 kilowatts with the previous design.
Incidentally, some visitors have asked if we use controllable power strips that power down clients that are not used. In truth we didn't even consider it since our goal has been 100% utilization since day 1. The servers are 100% reserved and automated tests run overnight so there are no shut-down periods.
Summary
Over the past two years we've learned a lot about managing a large-scale environment. We've also discovered from customers and analysts that this architecture is impacting how they think about technology and data center design. Key benefits include:
- Huge reductions in server provisioning time
- Massive flexibility in the infrastructure for quick reconfiguration
- Ability to manage the infrastructure with a small team
- Ability to save and re-use an environment that has been configured
The ultimate promise of this architecture is scalability. How can a company grow at 30% without increasing hardware at the same 30% rate? Many companies can no longer build out their data centers fast enough to accommodate growth, and the types of technologies we're using have the potential to bend that acquisition curve.
Find Out More
I've touched on a lot of different topics here, and space limitations have kept me from going into much detail on any of them. Although the Kilo-Client architecture is a moving target, there are additional resources available. Most recently, EMA released a white paper that provides an outside assessment of the Kilo-Client architecture. A technical case study from January 2007 also provides more details.
Got Ideas?
Since our test lab now has 1,500 nodes, the name Kilo-Client no longer fits. We're in the process of brainstorming new names–Data Center Virtualization Center, anyone?–and thought we'd invite you to submit your ideas.
If you're the first person to suggest the name we end up using, you'll get a one-on-one, one-hour briefing with my team plus Tech OnTap will treat you to a Microbrew box with home brewing kit, t-shirts, and more.