Tech On Tap :: Insights for Simplifying Data Management | NetApp(R) - Network Appliance
To Tech on Tap Home Click to visit NetApp
TECH ONTAP ARCHIVE - OCTOBER 2006 (PDF)
John Fullbright
Nick Triantos
Global SAN Systems Engineer, NetApp
A member of the elite Global Systems Engineering Group, Nick helps top NetApp global enterprise customers solve their toughest technical challenges. Nick has been in systems or support engineering roles for nearly 15 years, including positions at HP as an account support engineer (Server Group) and presales technical consultant (Storage Group). Nick maintains a blog and has previously authored Tech OnTap articles on choosing iSCSI or FC SAN and tips for simplifying SANs.
NetApp SE Lab Report:
SAN Boot with VMware ESX 3.0.0
By Nick Triantos

Enterprises with large numbers of servers are increasingly turning to diskless servers that boot from SAN (FC or IP) to reduce costs, consolidate storage, and streamline provisioning. Although SAN boot is not new, the introduction of the blade server has helped accelerate its adoption. Blade servers provide greater manageability, reduced hardware costs, and simpler cable management in addition to power, cooling, and real-estate savings.

One of the most popular platforms that lends itself to booting from SAN is VMware ESX Server. More and more enterprises are deploying VMware ESX Server to consolidate hundreds of physical servers into a few diskless blades in a single-blade chassis. Others have elected to deploy VMware using racks of standalone 1U Intel-based servers.

Given that most servers today ship with internal SATA drives, which are not supported to host the ESX Server image, SAN boot becomes an attractive choice. Storage-based snapshots, cloning, and replication techniques provide the added advantage of being able to quickly recover a corrupted ESX image from a clone or a snapshot and restore it on the same physical server or replicate it to a remote site for DR purposes.

Like most technology vendors, Network Appliance has labs in the vast majority of its sales offices. These labs are primarily used for technology demonstrations, and they utilize a variety of operating systems and third-party software. The NetApp Dallas office also has a systems engineering lab, where NetApp SEs can delve deep into specific technologies. This lab has included a VMware ESX Server environment since the 2.5.1 release, and we have been booting ESX Server from a fabric since the 2.5.3 release.

In August I decided to upgrade to the new ESX 3.0.0 release to get a feel for the changes, and I'm definitely impressed with the results.

With ESX 3.0.0, VMware has made significant advancements in supporting boot from SAN. The multiple requirements from the previous release have eased. Based on my experience, the setup process is quick and easy and—at least for testing purposes—the environment has been working flawlessly.

Setting Up ESX 3.0.0 for SAN Boot

Setting up VMware for SAN boot with NetApp storage was a breeze. The whole process didn't take more than 20 or 25 minutes from the time I provisioned the boot LUN to the time the ESX image installation had completed.

The following table shows our setup.

Server IBM x346
CPU 2x Xeon 3.2GHz
Memory 8GB
FC HBA 2x QLA 2340
FC Switch MDS 9120
External Storage NetApp FAS3050c
Table 1. NetApp SE Lab Setup


After the installation, it was time to create virtual machines (VMs) and install the guest operating system. I chose to install VMs on LUNs over iSCSI so I could get a feel for VMware's implementation. Configuring the iSCSI initiator was a breeze, and I was able to install the guest operating system with no issues. Given that ESX currently does not provide a multipathing mechanism for iSCSI LUNs, I chose to implement NIC teaming, which essentially serves the same purpose.

Suggested Edits to the Default ESX 3.0.0. Configuration

If you're interested in SAN boot with ESX 3.0.0, there are some things you need to consider. First, I highly recommend that before making any HBA purchasing decisions you contact your storage vendor and carefully review the VMware I/O Compatibility Guide for ESX Server 3.0. You will find that certain model HBAs are not supported for SAN booting.

Additionally, there are a number of tweaks and customizations that you can make to achieve higher performance and nondisruptive failovers if hardware failures occur. I recommend three simple changes to the default ESX Server 3.0.0 setting:

  • Enable the BIOS on only one HBA.
  • Modify the Execution Throttle/Queue Depth.
  • Modify the PortDownRetryCount parameter.

I go into more detail on each point in the following sections. However, keep in mind that this advice hasn't been fully tested or approved by NetApp engineering, so I can't claim that this is the right answer for all environments.

Tip #1: Enable the BIOS on only one HBA.

You need to enable the BIOS on the second HBA only if you need to reboot the server while either the original HBA used for booting purposes, the cable, or the FC switch has failed. In this scenario, you would use QLogic Fast!UTIL to select the active HBA, enable the BIOS, scan the BUS to discover the boot LUN, and assign the WWPN and LUN ID to the active HBA. However, when both HBA connections are functional, only one needs to have its BIOS enabled.

Tip #2: Modify the Execution Throttle/Queue Depth.

The Execution Throttle/Queue Depth signifies the maximum number of outstanding commands that can execute on any one HBA port. The default for ESX 3.0.0 is 32, but the best value for your environment depends on a couple of factors:

  • Total number of LUNs exposed through the array target ports
  • Array target port queue depth

The formula to determine the value is:

Queue Depth = Target Queue Depth / Total number of LUNs mapped from the array

This formula will guarantee that a simultaneous fast load on every LUN will not flood the Target Port resulting in QFULL conditions. A QFULL condition signifies a Target port's inability to process more I/O than is capable. In most operation systems, upon receiving a QFULL condition from the Target, the HBA driver will typically decrease a LUN's maximum queue depth to the minimum value, typically "1", thereby throttling I/O to the Target port. When the Target stops issuing QFULL conditions, the HBA driver will start gradually increasing the LUN queue depth value, thereby slowly increasing I/O to the Target port.

Here's an example of how the above formula can help you avoid a QFULL condition. If a Target Port has a queue depth of 1024 and 64 LUNs are exposed thru that port, then the Queue Depth on each host should be set to 16 outstanding I/Os per LUN. This is the safest approach and guarantees no QFULL conditions:

16 Outstanding I/Os per LUN x 64 LUNs = Target Port Queue Depth

But—be careful. If a separate queue depth calculation for each host is performed using the above formula, then you still have the potential for QFULL conditions.

Here's why. Let's expand the previous example and assume that we have a total of 64 LUNs and four ESX hosts, each of which has 16 LUNs mapped.

Performing the calculation for each ESX host separately would yield: Queue Depth = 1024 / 16 LUNs = 64 Outstanding I/Os per LUN. However, a simultaneous fast load on all 64 LUNs across four ESX servers would yield: 64 Outstanding I/Os per LUN x 64 LUNs = 4096 which is much greater than Queue Depth of the Physical Array Target Port. This is an undesirable condition that, under certain circumstances, will generate a QFULL and throttle I/O.

To Change the Queue Depth on a QLogic HBA
  1. Create a copy of /etc/vmware/esx.conf.
  2. Locate the following entry for each HBA:
    /device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
    /device/002:02.0/options = ""
  3. c) Modify as shown:
    /device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
    /device/002:02.0/options = "ql2xmaxqdepth= xxx"
    Where xxx is the queue depth value.
  4. Reboot.
Tip #3: Modify the PortDownRetryCount parameter.

The PortDownRetryCount parameter value must be set to the value recommended by your storage vendor, using Fast!UTIL. This setting specifies the number of times the adapter's driver retries a command to a port that is returning Port Down status. This value for ESX server is 2* n +5, where n is the value of PortDownRetryCount from the HBA BIOS.

You can change this value directly in the HBA, or you can do it after you've installed ESX by editing the /etc/vmware/esx.conf file. To edit the file, locate the "options=" entry under the HBA model you are using and make the following change.

To Change the PortDownRetryCount Parameter
  1. Create a copy of /etc/vmware/esx.conf.
  2. Locate the following entry for each HBA:
    /device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
    /device/002:02.0/options = ""
  3. Modify as shown:
    /device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
    /device/002:02.0/options = "qlport_down_retry= xxx"
    Where xxx is the value recommended by your storage vendor. The equivalent setting for Emulex HBAs is "lpfc_nodedev_tmo". The default is "30".
  4. Reboot.

Overall Assessment

So far, my experience with SAN booting VMware ESX 3.0.0 has been nothing but positive. From a procedural perspective, the process is certainly much easier than with previous releases. In addition, I've found the ESX host's reliability during storage controller failover testing to be rock solid so far.


 

RELATED INFORMATION

Love VMware But Hate Backups?

IT teams typically report that server virtualization reduces the number of servers, network ports, floor space, maintenance, and electricity required. Once the honeymoon is over, however, many of these same IT teams find themselves trapped in a backup nightmare.

In a recent Tech OnTap article, NetApp systems engineer and VMware expert Vaughn Stewart explains how to simplify backups for virtual infrastructures.

Get the details. Read the article.


Best Practices for Effective Business Application Management

Auto-parts manufacturer Magna International consolidated 65 servers into 4 physical servers using VMware and NetApp storage.

In a recent Webcast, Magna IT systems analyst Chris Green describes his experiences with NetApp and VMware and explains how the deployment helped improve server utilization, reduce deployment time, and achieve DR efficiency.

Other guest speakers include:

  • Brian Byun, VP, Products and Alliances at VMware
  • Rich Clifton, VP and GM, NetApp Networked Storage business unit

Watch the TechTalk Webcast.

SUBSCRIBE | PROVIDE FEEDBACK