| Home > News > Tech OnTap    

 
Printable Page
The NetApp Kilo-Client
A 1,500-node diskless enterprise server farm running iSCSI and FCP.

David Brown and Gregg Ferguson
Engineering Support Manager and Laboratory Administrator, NetApp Engineering

David Brown and Gregg Ferguson David and Gregg helped design, build, and manage the NetApp Kilo Client. This project is one of several firsts for them. As an engineer at Lawrence Livermore labs, David was a founding member of the CIAC, Computer Incident Advisory Council. David also holds a patent for 10 Gbps mobile data center design and helped start the Engineering Support program at the Research Triangle Park (RTP), North Carolina NetApp office. Gregg, a 22-year industry veteran, held a variety of engineering, systems administrator and IT manager positions before joining NetApp in 2000. After opening the original NetApp RTP facility, Gregg spent over four years in the field as a systems engineer before returning to the NetApp Engineering IT team.


The below article was originally published in the March 2006 edition of the Tech OnTap newsletter. The article was updated in December of 2006 to reflect the addition of 380 blades (1,500 total) and FC SAN support. To receive the Tech OnTap newsletter and enjoy other great benefits, sign up today.

Over the past few years, a variety of new technologies have emerged that promise to change the way data centers are designed and managed. Among these are large computing grids with hundreds or thousands of hosts; The Kilo-Client Architecture Supports Multiple OS The Kilo-Client Architecture Supports Multiple OS
The Kilo-Client Architecture Supports Multiple OS
IP SANs running the iSCSI protocol for inexpensive, consolidated block storage; the ability to boot diskless servers over either FC or IP SANs; and sophisticated cloning technologies for advanced storage provisioning.

In early 2006, NetApp Engineering combined these technologies to build what we believe is the world's largest iSCSI-based diskless server farm. All server nodes boot from NetApp storage using either iSCSI or Fibre Channel SAN and can be rapidly rebooted to run Linux®, Windows®, or other operating systems. The system leverages NetApp FlexClone™ technology to rapidly create system images without making full physical copies of those images. The baseline image is shared among all of the nodes of the farm; only host-specific "personalization" needs to be added to the core image for each of the provisioned servers. This unique approach affords near-instantaneous image provisioning with near-zero footprint (only the blocks of the images that differ need to be added to the storage system, which keeps track of the individualized images).

The Kilo-Client project began in 2005, when the NetApp Engineering IT team was challenged to create a testing environment capable of meeting or exceeding the most extreme conditions NetApp products encounter in the field.

The NetApp team designed a 1,120-blade server farm and originally planned to use diskfull server blades in which each blade booted from a local disk. However, as the project progressed, it became clear that the time and administrative overhead required to copy a boot image to over a thousand local disks would result in more time spent configuring and managing the cluster than running actual tests.

Instead, the team chose a diskless approach. Although they weren't aware of anyone using iSCSI-based SAN boot for 1,000+ nodes, earlier that year Gregg had helped deploy a 250-node diskless environment based on FC SAN technology. The team also had access to internal expertise managing a large grid environment (a 400+-node diskfull testbed already in use at the NetApp Pittsburgh Technology Center).

The result is the NetApp Kilo-Client, a 1500-node diskless server farm with all nodes booting from a back-end network running both iSCSI and Fibre Channel SAN. This architecture packs massive performance into a footprint of just over 235 square feet and offers many unique advantages:

  • Provisioning time reduced 10x to 500x. A set of "golden" boot images is created (as iSCSI and Fibre Channel SAN LUNs) for each operating system and application stack required in the server farm. NetApp FlexClone technology is used to nearly instantaneously create space-efficient copies of those golden images, which are then "personalized" (individual server information such as IP address, hostname, etc.) and used to network boot the blades—potentially across the entire 1,500 host system.
  • Space-efficient copies. The cloned images (LUNs) only consume additional disk space as boot images change, providing over 1500-fold efficiency of capacity.
  • Enables preemption and resumption of system state. Tests can be preempted by halting a server and creating a space-efficient, derived clone of that system (using FlexClone). Test configurations can be preserved or shared with other users and re-run months or years later, perhaps even on an alternative (albeit same architecture) system..
  • Highly efficient utilization. Instead of having equipment sitting idle during an extended setup, saving LUNs allows the team to quickly reinstate previous tests. Most of the configuration work has been automated using scripts, so the team can switch from one operating environment to another in little more than the time it takes to reboot the blades.
  • Scalable and flexible. By partitioning the grid into smaller subsets, a variety of smaller tests can be run simultaneously (with a different OS for each test as needed), or the team can harness the full power of all 1,500 nodes for the most massive workload tests. When we had people ask for Solaris - which won't iSCSi boot - we gave them Solaris virtual machines. Then the Open System SnapVault folks asked for 1000 clients so we used the free VMware server for Linux to give them 1000 virtual machines on 200 physical clients.

For details, see the BayLISA presentation The Kilo-Client Host Swarm.



The information in this article is useful to me.
1 2 3 4 5
Strongly Disagree Strongly Agree

What other types of information would you like to see included in this article?


Additional comments:

  Thanks! 



 
Related Information
Future of IP Storage Q&A
NetApp Supports iSCSI Software SAN Boot Capabilities from Microsoft
Kilo-Client Overview
  • 1,500 Blades
    • 1,178 blades with QLogic iSCSI hardware initiators for CIFS, NFS, etc. testing
    • 322 blades with Qlogic Fibre Channel HBAs capable of SAN booting via FCP or iBoot (to be used for additional CIFS, NFS clients or for FCP testing)
  • Network Infrastructure
    • Fourteen Cisco 4948 switches (boot infrastructure)
    • 10 Cisco 7609 switches (test infrastructure)
    • Multiple 10GbE switches (test environments)
  • SAN Boot Storage
    • Five NetApp FAS980 and one FAS6070 cluster (active boot images; one per 224-252 blades)
    • NetApp NearStore® (master boot images)
  • Test Virtual Environments
    • 20 FAS3050s in Data ONTAP® GX Cluster
    • 26 FAS6070s in Data ONTAP GX Cluster
    • Five FAS3050s for Data ONTAP 7G Testing
    • Seven FAS980s for Data ONTAP 7G Testing
    • Two FAS270s for Data ONTAP 7G Testing
    • Two FAS6070s for Data ONTAP 7G Testing
See the slide deck. (PDF)
BayLISA Presentation:
The NetApp Kilo-Client
At the BayLISA March general meeting, David and Gregg described the server farm in detail and explained how particular technologies were chosen.

Watch BayLISA videos of the Kilo-Client presentation:

Project Overview (10 mins.)
Architecture (35 mins.)
Q&A Session (20 mins.)

Read the presentation slide deck. (PDF)