Tag: QLogic

With the release of ESXi 6.0 Update 1a, which fixed the network connectivity issue that plagued all ESXi 6.0 releases until October 6, I have begun my own journey from 5.5 to 6.0. I’m taking a new approach for me, though, as I use Update Manager to perform an upgrade rather than the fresh installs I have always preferred.

Why? Because I learned at VMworld 2015 from the authorities (designers) that upgrading is actually VMware’s recommended path. You can read more from my notes on session INF5123.

What follows below assumes that you have already rebuilt or upgraded to vCenter 6.0 Update 1. In Update 1, the Web Client now supports Update Manager so that everything can be performed there. No more thick client! Now if we can just get rid of Flash…

Step 1: Import ESXi Image

From the home landing page of the vSphere Web Client, navigate here:

  • Update Manager
    • Select an Update Manager¬†server
      • Go to Manage
        • Then ESXi Images
          • Import ESXi Image…
            • Browse to the ISO

esxi6_import

Technology Virtualization

shout-iopsTuesday, October 7, was a big day for me. After searching for more than three months for the cause of a repeated storage connectivity failure, I finally found a chunk of definitive data. The scientific method would be happy–I had an hypothesis, a consistently reproducible test, and a clear finding to a proposition that had hung in the ether unanswered for two months.

My environment has never seemed eccentric or exceptional until EMC, VMware, and I were unable to explain why our ESXi hosts could not sustain a storage controller failover (June). It was a “non-disruptive update” sans the “non-“. The array, though, indicated no issues inside itself. The VMs and hosts depending on the disks didn’t agree.

As with any troubleshooting, a key follow-up is being able to reproduce it and to gather sufficient logs when you do, so that another downtime event isn’t necessary after that. We achieved that first part (repro) with ease, but came up short on analytical data to find out why (August). Being that this was a production environment, repeated hard crashes of database servers wasn’t in the cards.

The other participant organizations in this Easter egg hunt were suspicious of the QLogic 8262 Converged Network Adapter firmware as the culprit, apparently after receiving indications to that effect from QLogic. As that data came second-hand, I can’t say whether that was a guess or a hard-evidence-based hypothesis. Our CNAs were running the latest available from Dell’s FTP update site (via the Lifecycle Controller), but that repository stays a few revisions behind for some unknown yet intentional reason (ask Dell).

Storage Technology Virtualization

We received an update from our Dell team today. It looks like politics of who’s the root cause are going to make all of us suffer for another 6 months at least. Read on…

3Sept2014 BG – We investigated this issue. Looks like it is due to anomalous behavior of HyperV NDIS stack. Driver allocates MSIX interrupts for VMQs at load time and sets interrupt affinity bases on the RSS processor set returned by NdisGetRssProcessorInformation call as per MSDN documentation. None of the host CPU is in the list of RSS processors below the base RSS Processor number RssBaseProcNumber.

Later on NDIS specifies cpu0 when it send OID to allocate VMQ. Driver doesn’t find any MSIX interrupt to satisfy the VMQ allocate OID and hence driver fails the OID.

Microsoft Networking Technology Virtualization

We’ve confirmed with Dell Support that QLogic has identified a bug with the QLE82xx network driver (at least through version¬†5.3.12.0925) and Virtual Machine Queues (VMQ). As of August 15, 2014, QLogic reports that they have reproduced the issue, but have not resolved it.

We have a case open with Microsoft Hyper-V Support and that case number has been shared with Dell and QLogic to coordinate troubleshooting and support as the issue becomes more visible in the community. I’ll post updates as we have them. There is word of a beta driver at some point, which we’ve expressed interest in testing.

Networking Technology Virtualization

In Part 1, I laid out a brief summary of VMQ and an example of the configuration that is appropriate for our four-socket, ten-core Hyper-V host. Here in Part 2, I’ll unpack the issue we’re facing in spite of our textbook configuration.

Following the guidance in VMQ Deep Dive, Part 2, using the commands in Part 1 of this blog, we find the below queue list. The three line items are default queues given to the Hyper-V host (HV01) on its physical ports and the related logical switch. We should be seeing the VMs on this host in that list as well, but we aren’t.

vmq_ps_not_zero

The Windows System event log shows the following error, which seems to be the issue with queuing.

vmq_oid_failed

Microsoft Networking Technology Virtualization