Tag: Multipathing

Here’s the short and sweet for configuring the best practice SATP rule for XtremIO storage on ESXi 5.5 using PowerCLI (5.8 Release 1, in my case). I can’t claim any credit beyond aggregation and adaptation: the parameters are from the XtremIO user guide and the script comes from VirtuallyHyper.com (thanks!). See my earlier post about the SATP rule itself and how to manually implement it: VMware ESXi Round Robin NMP IOPS=1.

#Variables
$cluster = "Production"

foreach($esx in Get-Cluster $cluster | Get-VMHost){
$esxcli = Get-EsxCli -VMHost $esx
# List XtremIO SATP rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
# Create a new SATP rule for XtremIO
$result = $esxcli.storage.nmp.satp.rule.add($null,"tpgs_off","XtremIO Active/Active",$null,$null,$null,"XtremApp",$null,"VMW_PSP_RR","iops=1","VMW_SATP_DEFAULT_AA",$null,"vendor","XtremIO")
# List XtremIO Rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
Write-Host "Host:", $esx.Name, "Result", $result
}

Storage Technology Virtualization

shout-iopsTuesday, October 7, was a big day for me. After searching for more than three months for the cause of a repeated storage connectivity failure, I finally found a chunk of definitive data. The scientific method would be happy–I had an hypothesis, a consistently reproducible test, and a clear finding to a proposition that had hung in the ether unanswered for two months.

My environment has never seemed eccentric or exceptional until EMC, VMware, and I were unable to explain why our ESXi hosts could not sustain a storage controller failover (June). It was a “non-disruptive update” sans the “non-“. The array, though, indicated no issues inside itself. The VMs and hosts depending on the disks didn’t agree.

As with any troubleshooting, a key follow-up is being able to reproduce it and to gather sufficient logs when you do, so that another downtime event isn’t necessary after that. We achieved that first part (repro) with ease, but came up short on analytical data to find out why (August). Being that this was a production environment, repeated hard crashes of database servers wasn’t in the cards.

The other participant organizations in this Easter egg hunt were suspicious of the QLogic 8262 Converged Network Adapter firmware as the culprit, apparently after receiving indications to that effect from QLogic. As that data came second-hand, I can’t say whether that was a guess or a hard-evidence-based hypothesis. Our CNAs were running the latest available from Dell’s FTP update site (via the Lifecycle Controller), but that repository stays a few revisions behind for some unknown yet intentional reason (ask Dell).

Storage Technology Virtualization

When we started our initial foray into the all-flash array space, we had to put on the brakes when the “best practice” recommendations started flying from the SEs and guides. In a perfect world, we’d be entirely on the new array (Pure Storage was first), but migration is a necessary process. We also wanted a clear back to go back if POCs failed. The recommendation for IOPS before changing paths with Round-Robin native multipathing (NMP) was one of those settings.

From the EMC XtremIO Storage Array User Guide 2.4:

For best performance, it is recommended to do the following:

  • Set the native round robin path selection policy on XtremIO volumes presented to the ESX host.
  • Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1.

These settings ensure optimal distribution and availability of load between I/O paths to the XtremIO storage.

I never pursued that path to see if HP 3PAR would tolerate it, since other settings were clearly incompatible, but apparently HP came to their own realization on the matter. That said, please take caution with environments running more than just these two arrays, and watch out for the other “best practices” for all-flash arrays. Setting the queue depth to max (256) or raising concurrent operations to 64 will likely overwhelm or cause I/O loss when non-flash arrays are under pressure.

Storage Technology Virtualization