Tag: Avamar

This week it was finally time to put our old EMC Avamar backup/DR grids out to pasture, and while I had removed most of the configurations from them already, I still needed to sanitize the disks. Unfortunately, a quick search of support.emc.com and Google revealed that “securedelete” operations on Avamar grids require EMC Professional Services engagements. Huh? I want to throw the thing away, not spend more money on it…

A few folks offered up re-initializing the RAID volumes on the disks as one way to prepare for decommissioning. That’s definitely one option. Another is to wipe the data from within, which has much of the same result, but provides a degree of detailed assurances that the PowerEdge RAID Controller doesn’t give (unless your PERC can be configured for repeated passes of random data).

Totally a side note: when I started this, I had the misconception that this method would preserve the OS and allow a second-hand user to redeploy it without returning to the EMC mothership. As you’ll note below, one of the paths we wipe is the location of /home and the rest of the OS. :x

Under the covers, Avamar is stripped down Linux (2.6.32.59-0.17-default GNU/Linux, as of Avamar 7.1), so that provided the starting point. The one I chose and that I have running across 10 storage nodes and 30 PuTTY windows is “shred”.

Shred is as simple as it sounds. It shreds the target disk as many times as you want it. So for Avamar, how many disks is that?

avamar_shred_df

Security Storage Technology

If you use a backup product that leverages VMware’s changed block tracking (CBT), you have probably also found cases when CBT wasn’t the right fit. In the world of EMC Avamar, I’ve found that VM image-level backups will slow to a crawl if enough blocks change but don’t quite reach the level (25%) necessary to automatically revert to a full backup.

When I created a case with EMC Support, they dug into the logs and then pointed to best practices that recommend disabling CBT when more than 10,000 blocks regularly change between backups. The problem I hit next was that the top result and KB for enabling/disabling CBT was a VMware post stating that step 1 was to power off the VM. Backups are running long and the next maintenance window isn’t for two weeks. Hmm…

  1. Avamar method
  2. PowerCLI method

Technology Virtualization

Avamar Error Code: 10059

2014-08-05 11:13:27 avvcbimage Error <17780>: Snapshot cannot be performed because Host 'esx01.domain.com' is currently in Maintenance Mode (Log #1)
2014-08-05 11:13:27 avvcbimage FATAL <0000>: [IMG0009] The Host 'esx01.domain.com' is in a state that cannot perform snapshot operations. (Log #1)
2014-08-05 11:13:27 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation failed (Log #1)

You may also notice these details in the session drill-down:

2014-08-05 11:13:23 avvcbimage Info <16005>: Login(https://vcenter.domain.com:443/sdk) problem with reused sessionID='52e2b2cb-225f-0229-9b25-929a652617fb' contacting data center 'Datacenter'.
2014-08-05 11:13:23 avvcbimage Warning <0000>: [IMG0014] Problem logging into URL 'https://vcenter.domain.com:443/sdk' with session cookie.

At first, the list of failed VM backups seemed to have no correlation–multiple hosts, various OSes, different policy groups. But the above session details revealed the root cause. Avamar thought the VMs were on a host that was in maintenance mode (or in a previous case, powered off). It’s a bit hard to snapshot a VM on a host that isn’t running the VM or even running at all.

Storage Technology Virtualization

Terrible initial implementation. High-downtime expansion. Unreliable backups. Absentee support. That’s EMC Avamar.

On the tiny upside, deduplication works great…when backups work.

In September 2011, our tragedy began. We’re a 99% VMware-virtualized shop and bought into EMC Avamar on the promise that its VMware readiness and design orientation would make for low-maintenance, high-reliability backups. In our minds, this was a sort of near-warm redundancy with backup sets that could restore mission critical systems to another site in <6 hours. Sales even pitched that we could take backups every four to six hours and thus reduce our RPO. Not to be.

Before continuing, I should qualify all that gloom and woe by saying that we have had a few stretches of uneventful reliability, but that’s only when we avoided changing everything. And one of those supposed times, a bug in the core functionality rendered critical backups unusable. But I digress…

Storage Technology Virtualization