Tag: EMC

This morning, Dell and EMC announced their impending merger as Dell and Silver Lake set out to acquire EMC and its holdings with cash and stock, while maintaining VMware as an independent, publicly-traded company. The event sets off incredible tidal waves financially and technologically and raises many questions.

To that end, the CEOs and other principals from Dell, EMC, VMware, and Silver Lake held conference calls with shareholders and media/analysts this morning. The following 9 questions from participants of the latter call–New York Times, Financial Times, Boston Globe, Wikibon, and others–cover most of the big questions on everyone’s minds. In keeping with Dell’s private holding (and EMC’s soon-to-be), “no comment” showed up a few times where we all hoped to find insight. Time will tell.

Security Storage Technology Virtualization

This week it was finally time to put our old EMC Avamar backup/DR grids out to pasture, and while I had removed most of the configurations from them already, I still needed to sanitize the disks. Unfortunately, a quick search of support.emc.com and Google revealed that “securedelete” operations on Avamar grids require EMC Professional Services engagements. Huh? I want to throw the thing away, not spend more money on it…

A few folks offered up re-initializing the RAID volumes on the disks as one way to prepare for decommissioning. That’s definitely one option. Another is to wipe the data from within, which has much of the same result, but provides a degree of detailed assurances that the PowerEdge RAID Controller doesn’t give (unless your PERC can be configured for repeated passes of random data).

Totally a side note: when I started this, I had the misconception that this method would preserve the OS and allow a second-hand user to redeploy it without returning to the EMC mothership. As you’ll note below, one of the paths we wipe is the location of /home and the rest of the OS. :x

Under the covers, Avamar is stripped down Linux (2.6.32.59-0.17-default GNU/Linux, as of Avamar 7.1), so that provided the starting point. The one I chose and that I have running across 10 storage nodes and 30 PuTTY windows is “shred”.

Shred is as simple as it sounds. It shreds the target disk as many times as you want it. So for Avamar, how many disks is that?

avamar_shred_df

Security Storage Technology

emcworldLeading up to EMC World 2015, IT Central Station asked how I would compare EMC XtremIO and HP 3PAR. Until recently, the flash storage conversation in my organization and many others has centered on XtremIO and Pure Storage, the leaders of the all-flash array (AFA) space. To that end, I’ve written a few posts already.

In 2015, though, the HP giant began to rouse and challenge the mainstream status quo with its 3PAR offering. Quantifying 3PAR’s platform is different from XtremIO and Pure, though, as it can seem amorphous given the many ways it can be quoted. Are you asking for all flash? 3PAR will give you that and lay claim to the best-of-breed title. Oh, but you want some mass storage akin to archival or virtual tape, too? 3PAR changes jerseys and shouts, “I’m it!” Is it, though? Let’s put 3PAR against XtremIO and see how they measure up!

Define the Conversation

 The hard part about these comparisons and competitive analyses is that we aren’t talking about products of the same species or specialization. I struggle to put it properly, but consider it this way. In pre-AFA days (the age of traditional spinners like NetApp FAS3040, EMC CLARiiON or VNX, and even last-gen 3PAR), the contest was like pitting a Toyota Camry against a Nissan Altima. They did most of the same things with minor strengths, weaknesses, and preferences.

Talking about XtremIO versus 3PAR 74xx is more of a discussion about construction-grade, heavy-duty cranes versus massive earth movers. They are in the same genus/genre, but are far from the same thing. Since they are different, we need to speak to some of the principles behind the questions and be willing to engage in a little philosophy rather than hanging up on shallow metrics.

Architecture + Organization + Potential

I’d like to steer this post to three foundational topics, some where 3PAR and XtremIO are curiously aligned, and others where they diverge notably. In Architecture, I’ll highlight the product frameworks and touch on performance. In Organization, I’ll focus on the companies behind the arrays and what I’ve observed through recent interactions. Ending in Potential, I’ll look to the future, something that is very important, since we’re all prone to think primarily about solving today’s problems.

Storage Technology

In September 2013, my organization and I started a journey into the realm of flash storage. The initial foray took us into two camps and lasted much longer than we expected. In fact, our 2013 storage decision bore with it lessons and tests that lasted until it was once again time to make another upgrade, our 2015 replacement at a sister site.

History

In 2013, while smaller start-ups were aplenty, EMC’s pre-release XtremIO (GA in December 2013) and Pure Storage were the only mainstream contenders. Granted, Pure was still technically a start-up, but then again, XtremIO was an unreleased product purchased by EMC without broad field experience. Everyone was young.

pure_logoMuch of this has already been hashed in my prior posts, but the short story is that we made a decision to forego Pure Storage in 2013 based on a belief in promises by EMC that XtremIO would deliver xtremio_logoeverything that Pure did and more. The two metrics were data reduction and performance. We assumed in the land of enterprise storage that high availability was a given.

Storage Technology

xtremio_logoThe procedure for upgrading EMC XtremIO storage arrays to their latest major code release (3.0) has caused no shortage of conversation among the enterprise storage community. Granted, a large portion of that derives from competitors and marketing material which are keen to take advantage of this hurdle in the XtremIO track.

For those unfamiliar, the hurdle is the disruptive and destructive nature of the 3.0 upgrade process. To move from 2.4 to 3.0, customers must move all data from the brick(s) to another storage platform. EMC promises to provide the loaner gear to swing said data for the upgrade, but that doesn’t alleviate the infrastructure and migration impact of such a task (especially if some things are physical and without niceties like Storage vMotion).

We’ve had our share of challenges getting to this point, as you can read from prior posts, but we’re finally here. Since others are following closely behind, I thought it would be helpful to document the steps necessary to complete this upgrade (where possible, I’ll include the actual upgrade-to-3.0 tech details, but those are mostly handled by EMC).

1. Create an EMC Support Request (SR) requesting upgrade

This step is sort of a misnomer, because the XtremIO 3.0 upgrade isn’t handled by the EMC Support team. Due to the nature of swinging data, loaner hardware, etc, EMC Sales and Professional Support actually handles the process as a “free upgrade”.

Storage Technology Virtualization

If you use a backup product that leverages VMware’s changed block tracking (CBT), you have probably also found cases when CBT wasn’t the right fit. In the world of EMC Avamar, I’ve found that VM image-level backups will slow to a crawl if enough blocks change but don’t quite reach the level (25%) necessary to automatically revert to a full backup.

When I created a case with EMC Support, they dug into the logs and then pointed to best practices that recommend disabling CBT when more than 10,000 blocks regularly change between backups. The problem I hit next was that the top result and KB for enabling/disabling CBT was a VMware post stating that step 1 was to power off the VM. Backups are running long and the next maintenance window isn’t for two weeks. Hmm…

  1. Avamar method
  2. PowerCLI method

Technology Virtualization

We began our hands-on exploration of all-flash arrays in September 2013, and for all intents and purposes, the testing has never really concluded. If I knew then what I know now, I would have conducted a number of tests quickly during the official “Proof of Concept” (POC) phases.

All of the below tests are worth doing on the named products, as well as other similar products that official support the actions. Some tests particularly target a product architecture. Where applicable, I’ll note that. As with any storage array, the best and first test should be running real data (day-to-day workloads) atop it. The points build upon that being implied.

1. Capacity: Fill It Up!

This test is most practically focused on Pure Storage and its history and architecture. At the same time, the concept is worth processing with XtremIO.

In 2013 and before, Pure’s array dashboard showed a capacity bar graph that extended from 0% to 100%. At 80%, the array gave a warning that space was low, but failed to indicate the significance of this threshold. The code releases up to that point put an immediate write throttle on processing when the array passed that threshold. In short, everything but reads ground to a halt. This philosophy of what percentage truly is full was reassessed and redefined around the turn of the year to better protect the array and the user experience.

Pure’s architecture still needs a space buffer for its garbage collection (GC), which I believe is guarded by the redefinition of “full”. However, I have heard of at least one user experience where running near full caused performance issues due to GC running out of space (even with the protected buffer). If you’re testing Pure, definitely fill it up with a mix of data (especially non-dedupe friendly data) to see how it goes in the 80’s and 90’s.

For XtremIO, it’s a conceptual consideration. I haven’t filled up our array, but it doesn’t do anything that requires unprotected buffer space, so the risk isn’t particularly notable (feel free to still try!). The thing here is to think about what comes next when it does get full. The product road map is supposed to support hot-expansion, but today it requires swinging data between bricks (i.e. copy from an array of 1 x-brick to 2 x-bricks, 2 x-bricks to 4 x-bricks, etc).

Storage Technology

A fellow technologist asked a very fair and controversial question in a comment to IOPS Matter: VMware Native Multipathing Rule Attribute Affects Storage Failover, which pertains to my VMware-XtremIO environment. Since my response was running quite long, I thought it better to re-post the question here, followed by the answer.

“We are looking at purchasing a new all-flash SAN for our SQL environment running on VMware 5.5 — in your experience between Pure and EMC XIO, if you had it to do over, which would you buy? We are looking at the X-Brick 10TB against the Pure FA-405 6TB models. SQL compression is about 1.7:1 and dedup is almost nothing until we talk about storing multiple copies of our 300GB database for dev, test, staging, etc. Other than consistent finger-pointing from vendor to vendor, I’m not seeing much difference that would concern me in either direction other than price and that Pure’s 6TB might not exactly match the 8-9TB available in the XIO. Feedback?”

That’s quite the question, the answer to which would become headliner marketing material for whichever product was endorsed. Thankfully for me, the politically “safe” response of “it depends” is actually true. Factors like price, observable data reduction, and I/O patterns all sway the arrow.

Storage Technology

shout-iopsTuesday, October 7, was a big day for me. After searching for more than three months for the cause of a repeated storage connectivity failure, I finally found a chunk of definitive data. The scientific method would be happy–I had an hypothesis, a consistently reproducible test, and a clear finding to a proposition that had hung in the ether unanswered for two months.

My environment has never seemed eccentric or exceptional until EMC, VMware, and I were unable to explain why our ESXi hosts could not sustain a storage controller failover (June). It was a “non-disruptive update” sans the “non-“. The array, though, indicated no issues inside itself. The VMs and hosts depending on the disks didn’t agree.

As with any troubleshooting, a key follow-up is being able to reproduce it and to gather sufficient logs when you do, so that another downtime event isn’t necessary after that. We achieved that first part (repro) with ease, but came up short on analytical data to find out why (August). Being that this was a production environment, repeated hard crashes of database servers wasn’t in the cards.

The other participant organizations in this Easter egg hunt were suspicious of the QLogic 8262 Converged Network Adapter firmware as the culprit, apparently after receiving indications to that effect from QLogic. As that data came second-hand, I can’t say whether that was a guess or a hard-evidence-based hypothesis. Our CNAs were running the latest available from Dell’s FTP update site (via the Lifecycle Controller), but that repository stays a few revisions behind for some unknown yet intentional reason (ask Dell).

Storage Technology Virtualization

When we started our initial foray into the all-flash array space, we had to put on the brakes when the “best practice” recommendations started flying from the SEs and guides. In a perfect world, we’d be entirely on the new array (Pure Storage was first), but migration is a necessary process. We also wanted a clear back to go back if POCs failed. The recommendation for IOPS before changing paths with Round-Robin native multipathing (NMP) was one of those settings.

From the EMC XtremIO Storage Array User Guide 2.4:

For best performance, it is recommended to do the following:

  • Set the native round robin path selection policy on XtremIO volumes presented to the ESX host.
  • Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1.

These settings ensure optimal distribution and availability of load between I/O paths to the XtremIO storage.

I never pursued that path to see if HP 3PAR would tolerate it, since other settings were clearly incompatible, but apparently HP came to their own realization on the matter. That said, please take caution with environments running more than just these two arrays, and watch out for the other “best practices” for all-flash arrays. Setting the queue depth to max (256) or raising concurrent operations to 64 will likely overwhelm or cause I/O loss when non-flash arrays are under pressure.

Storage Technology Virtualization