Tag: ESXi

With the release of ESXi 6.0 Update 1a, which fixed the network connectivity issue that plagued all ESXi 6.0 releases until October 6, I have begun my own journey from 5.5 to 6.0. I’m taking a new approach for me, though, as I use Update Manager to perform an upgrade rather than the fresh installs I have always preferred.

Why? Because I learned at VMworld 2015 from the authorities (designers) that upgrading is actually VMware’s recommended path. You can read more from my notes on session INF5123.

What follows below assumes that you have already rebuilt or upgraded to vCenter 6.0 Update 1. In Update 1, the Web Client now supports Update Manager so that everything can be performed there. No more thick client! Now if we can just get rid of Flash…

Step 1: Import ESXi Image

From the home landing page of the vSphere Web Client, navigate here:

  • Update Manager
    • Select an Update Manager server
      • Go to Manage
        • Then ESXi Images
          • Import ESXi Image…
            • Browse to the ISO

esxi6_import

Technology Virtualization

Last week I ran across a tweet talking about a VMware Labs fling that introduces ESXtop statistics as a plugin into the vSphere Web Client. If you’re not familiar with “flings”, they are experimental tools made by VMware engineers and shared with the community. Anyways, this fling jumped on my list immediately.

Download ESXtopNGC Plugin: https://labs.vmware.com/flings/esxtopngc-plugin

The first thing you might notice is the System Requirements’ sole item: “vCenter Server Appliance 5.5”. Hmm. I run vCenter Server on Windows since Update Manager still requires it and I don’t see the value of having both the vCSA and a Windows VM, as opposed to just one Windows VM. A few comments quickly came in, though, confirming that it works just fine as a plugin on Windows vCenter, too.

Here’s how to install it:

1. Download ESXtopNGC (“Agree & Download” in the top left)

esxtop_fling_download

Technology Virtualization

Here’s the short and sweet for configuring the best practice SATP rule for XtremIO storage on ESXi 5.5 using PowerCLI (5.8 Release 1, in my case). I can’t claim any credit beyond aggregation and adaptation: the parameters are from the XtremIO user guide and the script comes from VirtuallyHyper.com (thanks!). See my earlier post about the SATP rule itself and how to manually implement it: VMware ESXi Round Robin NMP IOPS=1.

#Variables
$cluster = "Production"

foreach($esx in Get-Cluster $cluster | Get-VMHost){
$esxcli = Get-EsxCli -VMHost $esx
# List XtremIO SATP rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
# Create a new SATP rule for XtremIO
$result = $esxcli.storage.nmp.satp.rule.add($null,"tpgs_off","XtremIO Active/Active",$null,$null,$null,"XtremApp",$null,"VMW_PSP_RR","iops=1","VMW_SATP_DEFAULT_AA",$null,"vendor","XtremIO")
# List XtremIO Rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
Write-Host "Host:", $esx.Name, "Result", $result
}

Storage Technology Virtualization

shout-iopsTuesday, October 7, was a big day for me. After searching for more than three months for the cause of a repeated storage connectivity failure, I finally found a chunk of definitive data. The scientific method would be happy–I had an hypothesis, a consistently reproducible test, and a clear finding to a proposition that had hung in the ether unanswered for two months.

My environment has never seemed eccentric or exceptional until EMC, VMware, and I were unable to explain why our ESXi hosts could not sustain a storage controller failover (June). It was a “non-disruptive update” sans the “non-“. The array, though, indicated no issues inside itself. The VMs and hosts depending on the disks didn’t agree.

As with any troubleshooting, a key follow-up is being able to reproduce it and to gather sufficient logs when you do, so that another downtime event isn’t necessary after that. We achieved that first part (repro) with ease, but came up short on analytical data to find out why (August). Being that this was a production environment, repeated hard crashes of database servers wasn’t in the cards.

The other participant organizations in this Easter egg hunt were suspicious of the QLogic 8262 Converged Network Adapter firmware as the culprit, apparently after receiving indications to that effect from QLogic. As that data came second-hand, I can’t say whether that was a guess or a hard-evidence-based hypothesis. Our CNAs were running the latest available from Dell’s FTP update site (via the Lifecycle Controller), but that repository stays a few revisions behind for some unknown yet intentional reason (ask Dell).

Storage Technology Virtualization

When we started our initial foray into the all-flash array space, we had to put on the brakes when the “best practice” recommendations started flying from the SEs and guides. In a perfect world, we’d be entirely on the new array (Pure Storage was first), but migration is a necessary process. We also wanted a clear back to go back if POCs failed. The recommendation for IOPS before changing paths with Round-Robin native multipathing (NMP) was one of those settings.

From the EMC XtremIO Storage Array User Guide 2.4:

For best performance, it is recommended to do the following:

  • Set the native round robin path selection policy on XtremIO volumes presented to the ESX host.
  • Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1.

These settings ensure optimal distribution and availability of load between I/O paths to the XtremIO storage.

I never pursued that path to see if HP 3PAR would tolerate it, since other settings were clearly incompatible, but apparently HP came to their own realization on the matter. That said, please take caution with environments running more than just these two arrays, and watch out for the other “best practices” for all-flash arrays. Setting the queue depth to max (256) or raising concurrent operations to 64 will likely overwhelm or cause I/O loss when non-flash arrays are under pressure.

Storage Technology Virtualization

Yesterday, I used storage vMotion to empty a datastore into another one. Today when I went to unmount it, I hit a couple snags:

  1. One host was using the datastore for its core dump location (evidenced by a locked /vmkdump directory)
  2. After that was cleared (details below), two hosts still wouldn’t let it go
  3. Thinking that a host reboot might clear the lock, I entered maintenance mode, but had a VM on each host that wouldn’t leave
  4. Trying to power off one of those VMs and migrate it cold resulted in it stuck powered off

The power of Google led to a few answers, and the desire for a shortcut led to a final one :).

Storage Technology Virtualization

As part of a project consolidating mission critical services, I am moving a few VMs between vSphere / vCenter datacenters. The keyword here is “datacenters” and for emphasis, they are managed by different vCenter servers operating in linked mode. Because of this setup, the migration isn’t a simple cluster & storage vMotion.

Here’s the process I am following. I hope it helps; if you use another method, feel free to comment.

migrate_services1. Enable SSH on an ESXi host in the source and destination cluster; on the source host, also open SSH outbound on the host firewall

  • In vSphere Client, go to the “Configuration” tab on each host
  • Under “Software” on the left side of the right pane, select “Security Profile”
  • In the top right under “Services”, click “Properties…”
  • Scroll down to “SSH” and click “Options…”
  • Select “Start and stop manually”, then click “Start” and return to the Security Profile page
  • On the source ESXi host, also click “Properties…” under “Firewall”
  • In Firewall Properties, check “SSH Client” and click “OK”

Technology Virtualization

If you regularly SSH into your ESX hosts, this may be old news to you. But if you’re like me and mostly manage your ESX hosts via vSphere Client, you might have a surprise waiting for you when you upgrade to ESX & ESXi 4.1. With the advent of ESX Active Directory integration, VMware kindly decided to impose some new changes and requirements for local user accounts. What does this mean to you?

For me, it meant that when I tried to SSH into my ESX host, I ran into “Access is denied.” And with only one non-root user account on the system, this meant no remote access (on the host itself). Root is restricted to interactive access, so that wasn’t any help. Thankfully the Dell Remote Access Card (DRAC) put me on the console, so to speak, and let me poke around as root.

The solution, though, came from a Google search, a somewhat unhelpful VMware KB article (1024235), and a little connecting of the dots. AD integration places a new dependency on the local “Administrators” role. If local user accounts aren’t in that role, they can’t get in.

Oddly enough, vSphere Client has to be targeted directly at the ESX host (not vCenter) to edit the role and local users. Looking while connected through vCenter won’t get you anywhere. So, here we go:

Security Technology Virtualization

We’ve been running ESX since the days of v2.5, but with the news that v4.1 will be the last “fat” version with a RedHat service console, we decided it was time to transition to ESXi. The 30+ step guide below describes our process using an EMC CLARiiON CX3 SAN and Dell hosts with redundant Qlogic HBAs (fiber environment).

  1. Document network/port mappings in vSphere Client on existing ESX server
  2. Put host into maintenance mode
  3. Shutdown host
  4. Remove host from Storage Group in EMC Navisphere
  5. Create dedicated Storage Group per host for the boot LUN in Navisphere
  6. Create the 5GB boot LUN for the host
  7. Add the boot LUN to the host’s Storage Group
  8. Connect to the host console via the Dell Remote Access Card (DRAC)
  9. Attach ESXi media via DRAC virtual media
  10. Power on host (physically or via the DRAC)

Storage Technology Virtualization