Tag: backup

As Pure//Accelerate approaches, one of my favorite aspects of winning solutions comes to mind. It’s a virtue that transforms products into MVPs, rather than the drama generators so common on the court and in the field. What is it?

Simplicity

Businesses have enough knobs and pain points with tier-1 Oracle/SAP deployments and SQL, SharePoint and Exchange farms. The last thing they need is for storage and data protection to jump on the pile. That’s why enterprises need Pure Storage and Rubrik.

From the ground up, Pure and Rubrik have simplicity in their DNA. If you have a FlashArray on the floor, then you already know the freedom and ease it brings to storage infrastructure. Gone are the days of tweaking with RAID sets or tuning LUNs to squeeze out a few performance points. With a few cables and a vSphere plugin, Pure serves up datastores and gets out of the way.

Rubrik brings the same unobtrusive value to data protection and is the perfect pairing to Pure. From rack & go to policy-driven automation to instant recovery, Rubrik drives straight to the point and with beautiful simplicity.

Rack & Go

The first thing that stands out with Rubrik is its lean footprint–it doesn’t eat up precious data center space. When we deployed Rubrik at ExponentHR, we shrunk our backup layout from 14RU at each data center to just 4RU, with an even greater reduction in power consumption and cabling complexity.

With the previous product, the physical installation wasn’t easy, but it paled in comparison to the configuration and learning curve challenges. In contrast, the entire Rubrik deployment took 90 minutes to install and configure at both sites, including drive time. Starting the engine was as easy as a set of vCenter credentials.

Storage Technology Virtualization

This week it was finally time to put our old EMC Avamar backup/DR grids out to pasture, and while I had removed most of the configurations from them already, I still needed to sanitize the disks. Unfortunately, a quick search of support.emc.com and Google revealed that “securedelete” operations on Avamar grids require EMC Professional Services engagements. Huh? I want to throw the thing away, not spend more money on it…

A few folks offered up re-initializing the RAID volumes on the disks as one way to prepare for decommissioning. That’s definitely one option. Another is to wipe the data from within, which has much of the same result, but provides a degree of detailed assurances that the PowerEdge RAID Controller doesn’t give (unless your PERC can be configured for repeated passes of random data).

Totally a side note: when I started this, I had the misconception that this method would preserve the OS and allow a second-hand user to redeploy it without returning to the EMC mothership. As you’ll note below, one of the paths we wipe is the location of /home and the rest of the OS. :x

Under the covers, Avamar is stripped down Linux (2.6.32.59-0.17-default GNU/Linux, as of Avamar 7.1), so that provided the starting point. The one I chose and that I have running across 10 storage nodes and 30 PuTTY windows is “shred”.

Shred is as simple as it sounds. It shreds the target disk as many times as you want it. So for Avamar, how many disks is that?

avamar_shred_df

Security Storage Technology

With Virtualization Field Day 5 (VFD5) coming up this week, it seems appropriate timing for an update on Rubrik in action. For a refresh on what Rubrik is, check out Mike Preston’s #VFD5 Preview – Rubrik. I’ll be using some of what he shared as launching points for elaboration and on-the-ground validation.

Share Nothing – Do Everything

rubrik_systemI believe that this is both the most important and likely the most overlooked characteristic of Rubrik and its architecture. It is crucial because it defines how users manage the solution, build redundancy in and around it, and assess performance needs. I also believe it is overlooked because it is like the foundation of a great building–most of it is under the surface and behind the scenes, enabling prominent edifices like Time Machine-like simplicity.

One way that I can describe it is “multi-master management and operations”, though it falls short because Rubrik has no slaves. Every node is the master. Some data protection solutions have redundant storage nodes which all depend on a single control node. If issues arise with control, the plethora of storage behind it is helpless except to sit and maintain integrity. With Rubrik, all nodes have command authority to manage, support, and execute across the infrastructure.

Storage Technology Virtualization

On March 24th, Duncan Epping posted a new blog entitled “Startup intro: Rubrik. Backup and recovery redefined” and subsequently tweeted said post. On that same day in another part of the world (my office), we had paperwork in hand, waiting to be inked, to refresh aging EMC Avamar Gen4 nodes with an Avamar/DataDomain combo. We had looked at several other options from HP, Dell, and Veeam, but it was all just more of the same with a minor pro or con, but nothing worth writing about (including Avamar/DD). No one had really advanced what VCB (VMware Consolidated Backup) brought to the market in 2007.

rubrik_logo

Then I saw Duncan’s tweet, and I thought to myself, “Hey! This sounds like what we were trying to get when we bought Avamar in 2011!” So I hopped over to rubrik.com, which pretty much consisted of the Aurora Borealis and a button to click for “Early Access”–simplicity from the start! :) The next day, Mike and the guys at Rubrik walked through a demo that confirmed the revolutionary impression I’d started to gather from Duncan. Sign me up!

rubrik_calendarOn April 29th, it hit the floor in two data centers with Eric and Ray shepherding the process (we’re talking beta here, so it’s only prudent to have some authorities on hand to ensure success). Lunch and driving the 15 minutes between sites took the longest part of the install. Seriously. The installs were complete and protecting VMs before the clock struck noon.

Storage Technology Virtualization

If you use a backup product that leverages VMware’s changed block tracking (CBT), you have probably also found cases when CBT wasn’t the right fit. In the world of EMC Avamar, I’ve found that VM image-level backups will slow to a crawl if enough blocks change but don’t quite reach the level (25%) necessary to automatically revert to a full backup.

When I created a case with EMC Support, they dug into the logs and then pointed to best practices that recommend disabling CBT when more than 10,000 blocks regularly change between backups. The problem I hit next was that the top result and KB for enabling/disabling CBT was a VMware post stating that step 1 was to power off the VM. Backups are running long and the next maintenance window isn’t for two weeks. Hmm…

  1. Avamar method
  2. PowerCLI method

Technology Virtualization

Avamar Error Code: 10059

2014-08-05 11:13:27 avvcbimage Error <17780>: Snapshot cannot be performed because Host 'esx01.domain.com' is currently in Maintenance Mode (Log #1)
2014-08-05 11:13:27 avvcbimage FATAL <0000>: [IMG0009] The Host 'esx01.domain.com' is in a state that cannot perform snapshot operations. (Log #1)
2014-08-05 11:13:27 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation failed (Log #1)

You may also notice these details in the session drill-down:

2014-08-05 11:13:23 avvcbimage Info <16005>: Login(https://vcenter.domain.com:443/sdk) problem with reused sessionID='52e2b2cb-225f-0229-9b25-929a652617fb' contacting data center 'Datacenter'.
2014-08-05 11:13:23 avvcbimage Warning <0000>: [IMG0014] Problem logging into URL 'https://vcenter.domain.com:443/sdk' with session cookie.

At first, the list of failed VM backups seemed to have no correlation–multiple hosts, various OSes, different policy groups. But the above session details revealed the root cause. Avamar thought the VMs were on a host that was in maintenance mode (or in a previous case, powered off). It’s a bit hard to snapshot a VM on a host that isn’t running the VM or even running at all.

Storage Technology Virtualization

A few days ago, one of our VMs running on Hyper-V 2012 R2 became stuck and locked in a “Backup up…” status. We use Veeam Backup & Replication 7.0 and had noticed that this particular VM had been reverting to crash-consistent backups for the prior three days. The summary said it was a transient VSS error, so we didn’t dive deeper until it persisted. That’s when we saw it was stuck.

The problem with “stuck” is that Hyper-V won’t let it go. Users on Social TechNet discuss this issue here, but the nutshell is that it requires a reboot (often hard) of the Hyper-V host, because the VM process locks and the host won’t transition it, even fully shutdown at the guest level. Thus, we evacuated everything else and then power cycled the host. Windows Server 2012 R2 didn’t react so well to that, and subsequently required booting into Safe Mode to finally realize it was okay and able to boot normally.

veeam_vss_failedThat next backup window had a bunch of warnings about change block tracking (CBT) files (.avhdx) not matching, but it performed full backups fine. Not so the day after. Failed. Failed. Failed…

I maintenance mode’d and rebooted the Hyper-V hosts, restarted their VSS services, etc, but still they failed.

Then I tried a backup with our DPM server, which used to backup our Hyper-V VMs. It succeeded. So it wasn’t a host issue like I originally thought.

Microsoft Technology Virtualization