Category: Microsoft

We’re in the process of updating our load balancing platforms and are migrating several test/dev and backoffice applications from Kemp Virtual LoadMaster (VLM) load balancers to F5 BIG-IP Local Traffic Manager (LTM) virtual edition (VE). Wow…three abbreviations in the first sentence. Buckle up :).

One of the services that we migrated this week was the Azure Rights Management Server (RMS) Connector. If you’re here, you probably know what the Azure RMS connector is, but just in case, here’s the short definition from Microsoft:

The Microsoft Rights Management (RMS) connector lets you quickly enable existing on-premises servers to use their Information Rights Management (IRM) functionality with the cloud-based Microsoft Rights Management services.

This seemed like it would be a simple migration when I started, but I couldn’t get the health monitor to report the servers up. It was all green on Kemp, but not on F5, with all the same check parameters. I tried it in a browser (IE, Chrome and Firefox) and that puzzled me more. The only way to load the check page (https://<azurermsconnector_fqdn/_wmcs/certification/servercertification.asmx) was to authenticate with domain user credentials. Strange. It seems that the VLM, at least in v7.1-20b, was accepting a 401 as a healthy response. But that didn’t help with BIG-IP…

BIG-IP (F5) has a great CLI utility that helped verify that authentication was indeed the hold up on its health monitor. Using openssl s_client, I checked using the monitor parameters and received a 401 unauthorized error:

 <h2>401 - Unauthorized: Access is denied due to invalid credentials.</h2>
 <h3>You do not have permission to view this directory or page using the credentials that you supplied.</h3>

I tried putting some domain credentials into the username and password fields on the BIG-IP health monitor, but that only succeeded in locking out the account. Then my colleague came across something that pointed us to use the account UPN ([email protected]) and that did the trick. Here’s the step-by-step from start to finish:

 1. Create HTTPS Health Monitor (Local Traffic > Monitors > Create…)

Fields below are required or otherwise different from default values

  • Name: rms_conn_https (feel free to name it according to your format)
  • Description: HTTPS monitor for Rights Management Server Connector
  • Type: HTTPS
  • Parent Monitor: https
  • Send String: GET /_wmcs/certification/servercertification.asmx HTTP/1.1\nHost: rmsconnector.domain.com\n
    (make sure you change the “Host:” to the FQDN of your RMS connector)
  • Receive String: ServerCertificationWebService
  • User Name: [email protected] (replace with your domain user service account)
  • Password: <password of above account>

f5_rms_monitor

Microsoft Networking Technology

We’ve been setting up Active Directory Federation Services (ADFS) on Windows Server 2012 R2 to tie up with Office365, and we ran into a snag with load balancing ADFS on our aging F5 BIG-IP LTM. It’s on the dinosaur end of the historical timeline, or to put it another way, “it’s in its sunset year”, and the latest supported code is 10.2.4.

This poses a bit of an issue with monitoring the ADFS servers, since the version shipping with Windows Server 2012 R2 includes a new SSL TLS feature called “Server Name Indication”, or SNI. The prehistoric 10.2.4 BIG-IP code doesn’t support SNI. Thankfully, Microsoft provides a way to monitor the servers over HTTP (instead of HTTPS), but the documentation we found–links below–was lacking an important detail.

Microsoft Networking Technology

We received an update from our Dell team today. It looks like politics of who’s the root cause are going to make all of us suffer for another 6 months at least. Read on…

3Sept2014 BG – We investigated this issue. Looks like it is due to anomalous behavior of HyperV NDIS stack. Driver allocates MSIX interrupts for VMQs at load time and sets interrupt affinity bases on the RSS processor set returned by NdisGetRssProcessorInformation call as per MSDN documentation. None of the host CPU is in the list of RSS processors below the base RSS Processor number RssBaseProcNumber.

Later on NDIS specifies cpu0 when it send OID to allocate VMQ. Driver doesn’t find any MSIX interrupt to satisfy the VMQ allocate OID and hence driver fails the OID.

Microsoft Networking Technology Virtualization

On Monday I had the privilege of participating in an unplanned recovery drill after maintenance on our site UPS and generator tripped over itself (four times). Needless to say, a lot of our infrastructure doesn’t take too kindly to unexpected darkness, and its lack of choreography means that things come up out of order. But that’s not the focus here, just the context. Most everything was restored in relatively short order thanks to good documentation of this nature:

  1. Confirm power is reliable and not likely to fail again in the immediate future
  2. Power on network switches (core, rack, etc)
  3. Power on storage array(s) (HP, EMC, etc)
  4. Power on virtualization hosts (ESXi, Hyper-V)
  5. Connect to each virtualization host directly (vSphere Client to hostname / RDP to Hyper-V Console)
  6. Confirm presence of all storage (LUNs, datastores, CSVs)
  7. Confirm recognition/identities of hosted virtual machines (out-of-order boot may see VMs as “unknown”)
  8. If any storage is missing or VMs “unknown”, reboot hosts to confirm storage accessibility
  9. If VMs auto-power on, force power off to prevent incorrect boot order (i.e. VMs on before Active Directory)
  10. Power on Active Directory servers
  11. Reboot AD servers (one at a time) until they come up smoothly, recognize network (as “domain network”), and serve DHCP, DNS, etc
  12. Power on vCenter and VMM servers
  13. Power on DFS and shared file servers
  14. Power on System Center Operations Manager server to begin monitoring
  15. Power on load balancers
  16. Etc…

Back on topic, this post is about DFS-R (Distributed File System Replication), mentioned in Step 13, and only fully understood in this context today. I probably should have known this by now, but there’s a reason why Step 13 isn’t enough to get DFS-R operational. It catches me by surprise every time when someone reports that data is out of sync, and probably every time since, I’ve had to manually re-sync the data before doing an authoritative sync because some data comes from each side. Finally today, I know why and how to fix it.

Microsoft Technology

A few days ago, one of our VMs running on Hyper-V 2012 R2 became stuck and locked in a “Backup up…” status. We use Veeam Backup & Replication 7.0 and had noticed that this particular VM had been reverting to crash-consistent backups for the prior three days. The summary said it was a transient VSS error, so we didn’t dive deeper until it persisted. That’s when we saw it was stuck.

The problem with “stuck” is that Hyper-V won’t let it go. Users on Social TechNet discuss this issue here, but the nutshell is that it requires a reboot (often hard) of the Hyper-V host, because the VM process locks and the host won’t transition it, even fully shutdown at the guest level. Thus, we evacuated everything else and then power cycled the host. Windows Server 2012 R2 didn’t react so well to that, and subsequently required booting into Safe Mode to finally realize it was okay and able to boot normally.

veeam_vss_failedThat next backup window had a bunch of warnings about change block tracking (CBT) files (.avhdx) not matching, but it performed full backups fine. Not so the day after. Failed. Failed. Failed…

I maintenance mode’d and rebooted the Hyper-V hosts, restarted their VSS services, etc, but still they failed.

Then I tried a backup with our DPM server, which used to backup our Hyper-V VMs. It succeeded. So it wasn’t a host issue like I originally thought.

Microsoft Technology Virtualization

In Part 1, I laid out a brief summary of VMQ and an example of the configuration that is appropriate for our four-socket, ten-core Hyper-V host. Here in Part 2, I’ll unpack the issue we’re facing in spite of our textbook configuration.

Following the guidance in VMQ Deep Dive, Part 2, using the commands in Part 1 of this blog, we find the below queue list. The three line items are default queues given to the Hyper-V host (HV01) on its physical ports and the related logical switch. We should be seeing the VMs on this host in that list as well, but we aren’t.

vmq_ps_not_zero

The Windows System event log shows the following error, which seems to be the issue with queuing.

vmq_oid_failed

Microsoft Networking Technology Virtualization

In our ongoing (sort-of pilot) migration from VMware vSphere 5.5 to Microsoft Hyper-V 2012 R2, we encountered a very concerning and puzzling issue with backups. The transition had been smooth for the most part and we used the project to bring aging Windows/SQL 2008 servers up to 2012 R2 and 2014, respectively. Two of our SQL environments had moved over just fine and were being backed up successfully with Microsoft Data Protection Manager 2012 R2 for the time being (other products are being considered, including Veeam). The third of such SQL environments ran into a host of VSS errors once its data was populated and a backup attempted.

sqlvss_dpmfailed
DPM 2012 R2 – Job Failed

Background (before/after):

  • Hypervisor: vSphere 5.5 to Hyper-V 2012 R2
  • Guest OS: Windows Server 2008 to 2014
  • Backup product: EMC Avamar 7.0.1 to MS DPM 2012 R2
  • Backup method: Crash-consistent image to VSS-quiesced image

 

We had seen an occasional VSS-related backup failure from time to time in DPM, but most were tied to available disk space for the protection group (DPM doesn’t do so well with deduplication of images, so growing has been near-continual). Retrying didn’t make a difference this time, though. We restarted VSS writers and even took downtime to restart the VM. Still the same failure.

Microsoft Technology Virtualization

Last week I encountered a briefly puzzling situation that’s worth noting as a tip when replacing a server on the network and needing to keep the same hostname. We’re a…

Microsoft Networking Technology

Microsoft has been gaining ground in the virtualization sphere one step at a time since Hyper-V first premiered. While the some increments were negligible (or merely painstakingly obvious), they achieved significant breakthroughs in late 2013 with the release of all things “2012 R2”. The puzzle piece on which we’ll focus here is VMQ (specifically dynamic VMQ, or dVMQ).

get-netadaptervmqVMQ gives Hyper-V and System Center Virtual Machine Manager (VMM) Logical Switches what Receive Side Scaling (RSS) provides to physical servers; namely, it leverages multiple compute cores/interrupts to increase network traffic efficiency. The network teaming (or Load-Balancing Fail-Over, LBFO) configuration is important here, because it affects how VMQ maps queues to processors. The full table of possibilities is given halfway down the page of TechNet’s VMQ Deep Dive, Part 2. In a nutshell, some configurations need NIC queues to overlap the same processors (so that all queues are everywhere), while others need segregation (so every queue has its own unique core).

Microsoft Networking Technology Virtualization

If you’re thinking about Windows Server 2012, or maybe you aren’t, but you have a lot of uncompressed archive data like log files, syslog, documents, etc, check out the new deduplication feature. It’s a real charmer.

As an example for this post, last Friday I undertook to apply this new functionality to our syslog server. It runs Kiwi Syslog (now owned by SolarWinds) and has held about 400GB of logging that we’ve pruned at the 3-month mark to keep it under control. With this in mind, I spun up a new VM, imaged it from our SCCM server with Windows Server 2012, and signed on.

Microsoft Storage Technology