Category: <span>Virtualization</span>

With Virtualization Field Day 5 (VFD5) coming up this week, it seems appropriate timing for an update on Rubrik in action. For a refresh on what Rubrik is, check out Mike Preston’s #VFD5 Preview – Rubrik. I’ll be using some of what he shared as launching points for elaboration and on-the-ground validation.

Share Nothing – Do Everything

rubrik_systemI believe that this is both the most important and likely the most overlooked characteristic of Rubrik and its architecture. It is crucial because it defines how users manage the solution, build redundancy in and around it, and assess performance needs. I also believe it is overlooked because it is like the foundation of a great building–most of it is under the surface and behind the scenes, enabling prominent edifices like Time Machine-like simplicity.

One way that I can describe it is “multi-master management and operations”, though it falls short because Rubrik has no slaves. Every node is the master. Some data protection solutions have redundant storage nodes which all depend on a single control node. If issues arise with control, the plethora of storage behind it is helpless except to sit and maintain integrity. With Rubrik, all nodes have command authority to manage, support, and execute across the infrastructure.

Storage Technology Virtualization

On March 24th, Duncan Epping posted a new blog entitled “Startup intro: Rubrik. Backup and recovery redefined” and subsequently tweeted said post. On that same day in another part of the world (my office), we had paperwork in hand, waiting to be inked, to refresh aging EMC Avamar Gen4 nodes with an Avamar/DataDomain combo. We had looked at several other options from HP, Dell, and Veeam, but it was all just more of the same with a minor pro or con, but nothing worth writing about (including Avamar/DD). No one had really advanced what VCB (VMware Consolidated Backup) brought to the market in 2007.

rubrik_logo

Then I saw Duncan’s tweet, and I thought to myself, “Hey! This sounds like what we were trying to get when we bought Avamar in 2011!” So I hopped over to rubrik.com, which pretty much consisted of the Aurora Borealis and a button to click for “Early Access”–simplicity from the start! :) The next day, Mike and the guys at Rubrik walked through a demo that confirmed the revolutionary impression I’d started to gather from Duncan. Sign me up!

rubrik_calendarOn April 29th, it hit the floor in two data centers with Eric and Ray shepherding the process (we’re talking beta here, so it’s only prudent to have some authorities on hand to ensure success). Lunch and driving the 15 minutes between sites took the longest part of the install. Seriously. The installs were complete and protecting VMs before the clock struck noon.

Storage Technology Virtualization

When I wrote the “Doing It Again” posts about XtremIO and Pure Storage, I didn’t actually think I would get that chance. EMC’s concessions around our initial XtremIO purchase seemed like our next site replacement would be a foregone conclusion. However, when the chips were counted, the hand went to another player: Pure Storage.

pure_boxesLast Friday, the Pure hardware arrived. Unpacking and racking was simple–no cage nuts needed, and the only necessary tool (screwdriver) is included in the “Open Me First” box. The same instructions that I respected during our 2013 POC led the way. I recall back then that QA on their readability was the CEO’s 12-year-old son. If he could follow them, they were customer-ready. Unconventional but effective.

This morning, the Pure SE (@purebp) and I finished the cabling and boot-up config. Three IP addresses, two copper switch ports, and four FC interfaces. The longest part was my perfectionistic cable runs. What can I say? The only spaghetti I like is the edible Italian kind. Fiber and copper should be neat and clean.

Storage Technology Virtualization

xtremio_logoThe procedure for upgrading EMC XtremIO storage arrays to their latest major code release (3.0) has caused no shortage of conversation among the enterprise storage community. Granted, a large portion of that derives from competitors and marketing material which are keen to take advantage of this hurdle in the XtremIO track.

For those unfamiliar, the hurdle is the disruptive and destructive nature of the 3.0 upgrade process. To move from 2.4 to 3.0, customers must move all data from the brick(s) to another storage platform. EMC promises to provide the loaner gear to swing said data for the upgrade, but that doesn’t alleviate the infrastructure and migration impact of such a task (especially if some things are physical and without niceties like Storage vMotion).

We’ve had our share of challenges getting to this point, as you can read from prior posts, but we’re finally here. Since others are following closely behind, I thought it would be helpful to document the steps necessary to complete this upgrade (where possible, I’ll include the actual upgrade-to-3.0 tech details, but those are mostly handled by EMC).

1. Create an EMC Support Request (SR) requesting upgrade

This step is sort of a misnomer, because the XtremIO 3.0 upgrade isn’t handled by the EMC Support team. Due to the nature of swinging data, loaner hardware, etc, EMC Sales and Professional Support actually handles the process as a “free upgrade”.

Storage Technology Virtualization

If you use a backup product that leverages VMware’s changed block tracking (CBT), you have probably also found cases when CBT wasn’t the right fit. In the world of EMC Avamar, I’ve found that VM image-level backups will slow to a crawl if enough blocks change but don’t quite reach the level (25%) necessary to automatically revert to a full backup.

When I created a case with EMC Support, they dug into the logs and then pointed to best practices that recommend disabling CBT when more than 10,000 blocks regularly change between backups. The problem I hit next was that the top result and KB for enabling/disabling CBT was a VMware post stating that step 1 was to power off the VM. Backups are running long and the next maintenance window isn’t for two weeks. Hmm…

  1. Avamar method
  2. PowerCLI method

Technology Virtualization

HP 3PAR recently released version 3.2.1 of the InForm OS, which most notably brought in-line deduplication to the already rock-solid storage platform. Last week, I wrote briefly about it and included screenshots of estimates in the CLI. Today, I’d like to share real-world results.

I’d like to give particular thanks to Ivan Iannaccone of the HP 3PAR team for reaching out and for early access to the 4.6.1 IMC with dedupe in the GUI.

After I ran the estimate in the previous post, I learned from Ivan that estimates (and jobs) of multiple virtual volumes (VVs) in the same common provisioning group (CPG) will return increased data reduction ratios (read: less used space). Thus, when I received the new InForm Management Console (IMC) yesterday, I ran a new estimate against two VDI (Microsoft RemoteFX) VVs to see how the numbers panned out.

3par_dedupe_preview_rfx

As you can see, the dedupe ratio rose from 2.31 to 2.83. Every little bit helps, but what is the actual deduplication ratio?

Storage Technology Virtualization

Last week I ran across a tweet talking about a VMware Labs fling that introduces ESXtop statistics as a plugin into the vSphere Web Client. If you’re not familiar with “flings”, they are experimental tools made by VMware engineers and shared with the community. Anyways, this fling jumped on my list immediately.

Download ESXtopNGC Plugin: https://labs.vmware.com/flings/esxtopngc-plugin

The first thing you might notice is the System Requirements’ sole item: “vCenter Server Appliance 5.5”. Hmm. I run vCenter Server on Windows since Update Manager still requires it and I don’t see the value of having both the vCSA and a Windows VM, as opposed to just one Windows VM. A few comments quickly came in, though, confirming that it works just fine as a plugin on Windows vCenter, too.

Here’s how to install it:

1. Download ESXtopNGC (“Agree & Download” in the top left)

esxtop_fling_download

Technology Virtualization

Here’s the short and sweet for configuring the best practice SATP rule for XtremIO storage on ESXi 5.5 using PowerCLI (5.8 Release 1, in my case). I can’t claim any credit beyond aggregation and adaptation: the parameters are from the XtremIO user guide and the script comes from VirtuallyHyper.com (thanks!). See my earlier post about the SATP rule itself and how to manually implement it: VMware ESXi Round Robin NMP IOPS=1.

#Variables
$cluster = "Production"

foreach($esx in Get-Cluster $cluster | Get-VMHost){
$esxcli = Get-EsxCli -VMHost $esx
# List XtremIO SATP rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
# Create a new SATP rule for XtremIO
$result = $esxcli.storage.nmp.satp.rule.add($null,"tpgs_off","XtremIO Active/Active",$null,$null,$null,"XtremApp",$null,"VMW_PSP_RR","iops=1","VMW_SATP_DEFAULT_AA",$null,"vendor","XtremIO")
# List XtremIO Rules
# $esxcli.storage.nmp.satp.rule.list() | where {$_.description -like "*XtremIO*"}
Write-Host "Host:", $esx.Name, "Result", $result
}

Storage Technology Virtualization

shout-iopsTuesday, October 7, was a big day for me. After searching for more than three months for the cause of a repeated storage connectivity failure, I finally found a chunk of definitive data. The scientific method would be happy–I had an hypothesis, a consistently reproducible test, and a clear finding to a proposition that had hung in the ether unanswered for two months.

My environment has never seemed eccentric or exceptional until EMC, VMware, and I were unable to explain why our ESXi hosts could not sustain a storage controller failover (June). It was a “non-disruptive update” sans the “non-“. The array, though, indicated no issues inside itself. The VMs and hosts depending on the disks didn’t agree.

As with any troubleshooting, a key follow-up is being able to reproduce it and to gather sufficient logs when you do, so that another downtime event isn’t necessary after that. We achieved that first part (repro) with ease, but came up short on analytical data to find out why (August). Being that this was a production environment, repeated hard crashes of database servers wasn’t in the cards.

The other participant organizations in this Easter egg hunt were suspicious of the QLogic 8262 Converged Network Adapter firmware as the culprit, apparently after receiving indications to that effect from QLogic. As that data came second-hand, I can’t say whether that was a guess or a hard-evidence-based hypothesis. Our CNAs were running the latest available from Dell’s FTP update site (via the Lifecycle Controller), but that repository stays a few revisions behind for some unknown yet intentional reason (ask Dell).

Storage Technology Virtualization

PowerCLIScripting and automation are ashamedly new territories for me. I’ve heard enough clarion calls to grow and develop personally and professionally, though, that I know I have to gain ground here. Hopefully this is the first step in building my knowledge base of such tools.

In this entry, I need to solve for two configuration tasks.

Syslog

First, I am concluding an evaluation of Splunk and need to reset my vSphere ESXi 5.5 hosts’ syslog global log hosts to only our existing syslog server. My inclination was to click through the vSphere Client and change it host by host, as it would have taken less time than it has to write the words in the post so far. However, as I am in search of a new syslog solution, possibly VMware LogInsight, I know I will need to do this again.

Technology Virtualization