NetApp - News - Tech OnTap Newsletter


 
The NetApp Kilo-Client
A 1,500-node diskless enterprise server farm running iSCSI and FCP.

David Brown and Gregg Ferguson
Engineering Support Manager and Laboratory Administrator, NetApp Engineering

David Brown and Gregg Ferguson David and Gregg helped design, build, and manage the NetApp Kilo Client. This project is one of several firsts for them. As an engineer at Lawrence Livermore labs, David was a founding member of the CIAC, Computer Incident Advisory Council. David also holds a patent for 10 Gbps mobile data center design and helped start the Engineering Support program at the Research Triangle Park (RTP), North Carolina NetApp office. Gregg, a 22-year industry veteran, held a variety of engineering, systems administrator and IT manager positions before joining NetApp in 2000. After opening the original NetApp RTP facility, Gregg spent over four years in the field as a systems engineer before returning to the NetApp Engineering IT team.


The below article was originally published in the March 2006 edition of the Tech OnTap newsletter. The article was updated in December of 2006 to reflect the addition of 380 blades (1,500 total) and FC SAN support. To receive the Tech OnTap newsletter and enjoy other great benefits, sign up today.

Over the past few years, a variety of new technologies have emerged that promise to change the way data centers are designed and managed. Among these are large computing grids with hundreds or thousands of hosts; The Kilo-Client Architecture Supports Multiple OS The Kilo-Client Architecture Supports Multiple OS
The Kilo-Client Architecture Supports Multiple OS
IP SANs running the iSCSI protocol for inexpensive, consolidated block storage; the ability to boot diskless servers over either FC or IP SANs; and sophisticated cloning technologies for advanced storage provisioning.

In early 2006, NetApp Engineering combined these technologies to build what we believe is the world's largest iSCSI-based diskless server farm. All server nodes boot from NetApp storage using either iSCSI or Fibre Channel SAN and can be rapidly rebooted to run Linux®, Windows®, or other operating systems. The system leverages NetApp FlexClone™ technology to rapidly create system images without making full physical copies of those images. The baseline image is shared among all of the nodes of the farm; only host-specific "personalization" needs to be added to the core image for each of the provisioned servers. This unique approach affords near-instantaneous image provisioning with near-zero footprint (only the blocks of the images that differ need to be added to the storage system, which keeps track of the individualized images).

The Kilo-Client project began in 2005, when the NetApp Engineering IT team was challenged to create a testing environment capable of meeting or exceeding the most extreme conditions NetApp products encounter in the field.

The NetApp team designed a 1,120-blade server farm and originally planned to use diskfull server blades in which each blade booted from a local disk. However, as the project progressed, it became clear that the time and administrative overhead required to copy a boot image to over a thousand local disks would result in more time spent configuring and managing the cluster than running actual tests.

Instead, the team chose a diskless approach. Although they weren't aware of anyone using iSCSI-based SAN boot for 1,000+ nodes, earlier that year Gregg had helped deploy a 250-node diskless environment based on FC SAN technology. The team also had access to internal expertise managing a large grid environment (a 400+-node diskfull testbed already in use at the NetApp Pittsburgh Technology Center).

The result is the NetApp Kilo-Client, a 1500-node diskless server farm with all nodes booting from a back-end network running both iSCSI and Fibre Channel SAN. This architecture packs massive performance into a footprint of just over 235 square feet and offers many unique advantages:

  • Provisioning time reduced 10x to 500x. A set of "golden" boot images is created (as iSCSI and Fibre Channel SAN LUNs) for each operating system and application stack required in the server farm. NetApp FlexClone technology is used to nearly instantaneously create space-efficient copies of those golden images, which are then "personalized" (individual server information such as IP address, hostname, etc.) and used to network boot the blades—potentially across the entire 1,500 host system.
  • Space-efficient copies. The cloned images (LUNs) only consume additional disk space as boot images change, providing over 1500-fold efficiency of capacity.
  • Enables preemption and resumption of system state. Tests can be preempted by halting a server and creating a space-efficient, derived clone of that system (using FlexClone). Test configurations can be preserved or shared with other users and re-run months or years later, perhaps even on an alternative (albeit same architecture) system..
  • Highly efficient utilization. Instead of having equipment sitting idle during an extended setup, saving LUNs allows the team to quickly reinstate previous tests. Most of the configuration work has been automated using scripts, so the team can switch from one operating environment to another in little more than the time it takes to reboot the blades.
  • Scalable and flexible. By partitioning the grid into smaller subsets, a variety of smaller tests can be run simultaneously (with a different OS for each test as needed), or the team can harness the full power of all 1,500 nodes for the most massive workload tests. When we had people ask for Solaris - which won't iSCSi boot - we gave them Solaris virtual machines. Then the Open System SnapVault folks asked for 1000 clients so we used the free VMware server for Linux to give them 1000 virtual machines on 200 physical clients.

For details, see the BayLISA presentation The Kilo-Client Host Swarm.



The information in this article is useful to me.
1 2 3 4 5
Strongly Disagree Strongly Agree

What other types of information would you like to see included in this article?


Additional comments:

  Thanks! 



 
© 2007 Network Appliance, Inc. | All rights reserved. | Specifications subject to change without notice.
NetApp, the Network Appliance logo, the bolt design, Data ONTAP, DataFabric, FAServer, FilerView, FlexClone, FlexVol, MultiStore, NearStore, NetCache, SecureShare, SnapDrive, SnapLock, SnapManager, SnapMirror, SnapMover, SnapRestore, SnapValidator, SnapVault, SyncMirror, VFM, and WAFL are registered trademarks and Network Appliance, ApplianceWatch, BareMetal, Camera-to-Viewer, Center-to-Edge, ContentDirector, ContentFabric, EdgeFiler, FlexShare, HyperSAN, InfoFabric, NetApp Availability Assurance, NetApp ProTech Expert, NOW, NOW NetApp on the Web, RoboCache, RoboFiler, SecureAdmin, Serving Data by Design, SharedStorage, Smart SAN, SnapCache, SnapCopy, SnapDirector, SnapFilter, SnapMigrator, Snapshot, SnapSuite, SohoCache, SohoFiler, The evolution of storage, Vfiler, Virtual File Manager, and Web Filer are trademarks of Network Appliance, Inc. in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.

Contact Network Appliance:
New Customers: 800.443.4537
Current Customers: 888.263.8277  Prompt 1 for Service; Prompt 2 for Product
Federal Customers: 888.352.2996  Prompt 1 for Service; Prompt 2 for Product
Resellers: 888.317.6294