ESXi Change Tracking File & Datastore Unmount Issues

Yesterday, I used storage vMotion to empty a datastore into another one. Today when I went to unmount it, I hit a couple snags:

  1. One host was using the datastore for its core dump location (evidenced by a locked /vmkdump directory)
  2. After that was cleared (details below), two hosts still wouldn’t let it go
  3. Thinking that a host reboot might clear the lock, I entered maintenance mode, but had a VM on each host that wouldn’t leave
  4. Trying to power off one of those VMs and migrate it cold resulted in it stuck powered off

The power of Google led to a few answers, and the desire for a shortcut led to a final one :).

 

1. Core dump location (vmkdump)

A VMware community thread provided the first solution. Start with enabling TSM and opening an SSH session to the host that owns the dumpfile (use vmkfstools -D <dumpfile> to identify the host MAC address of the owner). List the core dump files:

  • esxcli system coredump file list

Then, if the dumpfile on the datastore is active, unconfigure it:

  • esxcli system coredump file set -u

Next, remove the dumpfile:

  • esxcli system coredump file remove -f <dumpfile including full path>

Finally, delete/confirm that the file and directory have been removed from the datastore (SSH or datastore browser).

 

2. Clingy hosts (error; resolution follows)

ctk_datastore_in_use

 

3 & 4. Maintenance mode can’t evict a VM (from each of the two clingy hosts)

ctk_migrate

Initially I thought this was only on one host and one VM (I later found out it was one on each host). The two VMs that had issues had been storage vMotion’d the day before, seemingly without issue. Apparently their CTK files didn’t come across properly (or shouldn’t have).

On the first host and VM, I found the below VMware KB articles (2009244 and 2001004) which mandated powering off the VMs. The first VM happened to be tolerant of a brief power off, so I did. Actually, I powered it off before finding /following KB 2009244, because I thought it might just have a bad handle on its VMX file (we’ve seen it once before), but then we couldn’t power it back on…

ctk_power_on

Always lovely when your VM is stuck off. But that’s where the KB came in. I renamed the CTK files (both VMDKs that had been moved), and the VM moved hosts and powered up properly. Its original host rebooted merrily as well.

Then I went to repeat the process on the second clingy host, only its VM was very much in use. Hmm… I had almost resigned myself to emailing the users and planning for an after-hours reboot, when a thought came to mind.

Snapshots. When Changed-Block Tracking (CBT, user of CTK files) gets enabled by an application like Avamar, it takes a quick snap to do so. That’s also how it takes the backup and uses the snap. So, following this line, why not try a snapshot to clear/release the CTK file?

Most pleasantly, I took a simple snapshot (no memory or quiescing) and the VM became right as rain and migrated successfully.

I think this last part was more of a sub-set of the KB scenario, but perhaps it will help another admin with reluctant VMs.

Be First to Comment

Leave a Reply