Tag: disaster recovery

Rubrik makes instant recovery easy everywhere. As I wrote four months ago, it only takes a few clicks to bring a previous version of any protected VM into production. In 2.0, the great folks at Rubrik enhanced this capability with replication.

Replication is a word that means many things to many people and could quickly get abused in comparisons. In our previous data protection solution, replication of backups was limited to scheduled jobs and practically meant our off-site backups were anywhere from 3 hours (best case) to 48 hours (worst case) old, with no guarantees.

Rubrik takes a refreshingly different tactic. In its policy-based world, backups are driven by SLAs (gold, silver, bronze, etc), which are defined by frequency and retention of snapshots. Replication is married to these policies and is triggered upon the completion of VM backups.

For example, this morning one of our mission-critical SQL servers in our Gold Repl SLA domain started a backup job at 6:35am and completed that job one minute later at 6:36am. Gold Repl takes snapshots every 4 hours, keeps those hourlies for 3 days, and then keeps dailies for a month. As the “Repl” denotes, it also replicates and retains 3 days of those backups at another site. Oh, and as the cherry on top, it additionally archives the oldest backups to Amazon S3. Pretty comprehensive, eh?

repl_source_snap

Storage Technology Virtualization

This session was decent and few people can complain about its lack of technical detail. Rather, the struggle here was against losing the forest for the trees. Perhaps a winning strategy could have been starting with the demo at the end and using it as the use case from which to explain each component and contrast old methods vs. new features.

The latest versions of SRM 6.1, vSphere 6, and NSX 6.2 (as well as SRM Air) bring incredible new capabilities to disaster recovery and orchestrated failover plans. It’s definitely at a point worth engaging a trusted VMware partner who can understand your specific environment and then architect one or more solution sets using the many VMware options, as well as 3rd-party products.

Technology Virtualization

On March 24th, Duncan Epping posted a new blog entitled “Startup intro: Rubrik. Backup and recovery redefined” and subsequently tweeted said post. On that same day in another part of the world (my office), we had paperwork in hand, waiting to be inked, to refresh aging EMC Avamar Gen4 nodes with an Avamar/DataDomain combo. We had looked at several other options from HP, Dell, and Veeam, but it was all just more of the same with a minor pro or con, but nothing worth writing about (including Avamar/DD). No one had really advanced what VCB (VMware Consolidated Backup) brought to the market in 2007.

rubrik_logo

Then I saw Duncan’s tweet, and I thought to myself, “Hey! This sounds like what we were trying to get when we bought Avamar in 2011!” So I hopped over to rubrik.com, which pretty much consisted of the Aurora Borealis and a button to click for “Early Access”–simplicity from the start! :) The next day, Mike and the guys at Rubrik walked through a demo that confirmed the revolutionary impression I’d started to gather from Duncan. Sign me up!

rubrik_calendarOn April 29th, it hit the floor in two data centers with Eric and Ray shepherding the process (we’re talking beta here, so it’s only prudent to have some authorities on hand to ensure success). Lunch and driving the 15 minutes between sites took the longest part of the install. Seriously. The installs were complete and protecting VMs before the clock struck noon.

Storage Technology Virtualization

A few days ago, one of our VMs running on Hyper-V 2012 R2 became stuck and locked in a “Backup up…” status. We use Veeam Backup & Replication 7.0 and had noticed that this particular VM had been reverting to crash-consistent backups for the prior three days. The summary said it was a transient VSS error, so we didn’t dive deeper until it persisted. That’s when we saw it was stuck.

The problem with “stuck” is that Hyper-V won’t let it go. Users on Social TechNet discuss this issue here, but the nutshell is that it requires a reboot (often hard) of the Hyper-V host, because the VM process locks and the host won’t transition it, even fully shutdown at the guest level. Thus, we evacuated everything else and then power cycled the host. Windows Server 2012 R2 didn’t react so well to that, and subsequently required booting into Safe Mode to finally realize it was okay and able to boot normally.

veeam_vss_failedThat next backup window had a bunch of warnings about change block tracking (CBT) files (.avhdx) not matching, but it performed full backups fine. Not so the day after. Failed. Failed. Failed…

I maintenance mode’d and rebooted the Hyper-V hosts, restarted their VSS services, etc, but still they failed.

Then I tried a backup with our DPM server, which used to backup our Hyper-V VMs. It succeeded. So it wasn’t a host issue like I originally thought.

Microsoft Technology Virtualization

SQL. Veeam Backup & Replication (B&R) server, like most things these days, requires a SQL database for its back-end. However and also common, it doesn’t extend its System Requirements list to define the SQL Server Features actually needed. Given the robust nature of Microsoft SQL Server and the ever-increasing number of features available, it would be easy to overload a simple back-end instance with superfluous services that will absorb resources while never being used.

veeam_sql_featuresSo in this install, I’m sticking to the basics. Perhaps I could go one more minimalistic step and drop “Client Tools Connectivity” as well, but I’d rather not re-run setup for that one non-service feature given that Veeam B&R server console would reasonably seem to be a “client tool” (purists out there, feel free to chime in). Of additional note, I am using SQL 2014 for this, which requires the “Patch 4” just released a few days ago on June 5th. Don’t miss that (includes the R2 update to support Windows Server 2012 R2 as well).

Technology Virtualization

Terrible initial implementation. High-downtime expansion. Unreliable backups. Absentee support. That’s EMC Avamar.

On the tiny upside, deduplication works great…when backups work.

In September 2011, our tragedy began. We’re a 99% VMware-virtualized shop and bought into EMC Avamar on the promise that its VMware readiness and design orientation would make for low-maintenance, high-reliability backups. In our minds, this was a sort of near-warm redundancy with backup sets that could restore mission critical systems to another site in <6 hours. Sales even pitched that we could take backups every four to six hours and thus reduce our RPO. Not to be.

Before continuing, I should qualify all that gloom and woe by saying that we have had a few stretches of uneventful reliability, but that’s only when we avoided changing everything. And one of those supposed times, a bug in the core functionality rendered critical backups unusable. But I digress…

Storage Technology Virtualization

Speaker: Lee Dilworth (VMware), Chad Sakac (EMC)

Premise:
– lots of confusion between disaster recovery (DR) and disaster avoidance (DA)

Part 1: Disaster Avoidance vs. Disaster Recovery

Disaster Avoidance
– you know a host will go down
– Host: vMotion
– Site: vMotion

Disaster Recovery
– unplanned host outage
– Host: VMware HA
– Site: SRM

*** More content forthcoming (to fill in the blanks) ***

Technology Virtualization

Speakers: Lee Dilworth, Clive Wenman (VMware)

Understanding the Use Cases and Implementation Options

Prior to SRM 5, relied on array-based replication
– requires same versions of vCenter and SRM but ESX versions can vary
SRM 5 now supports vSphere Replication (in addition to array-based)
– vSphere Replication requires parity of all versions of vSphere

SRM: Site Recovery Manager
SRA: Storage Replication Adapter

SRM 5 UI allows seeing both sites from one interface

vSphere Replication offers a cost-effective choice/alternative to array-based
– does not replace array-based for the foreseeable future

Technology Virtualization