Tag: deduplication

Since the advent of thin provisioning, the concept of “data efficiency” has been used to describe how storage arrays deliver on large capacity requests while only reserving what’s actually occupied by data. For me, 3PAR (pre-HP) pioneered this in their thin provisioning and “Zero Detect” feature combo, which they like to deem a sort of deduplication 1.0 (they deduped zeroes in stream).

With the wider implementation of deduplication and compression, the “data efficiency” term has stepped (or been pushed) into marketing spotlights around the industry. HP 3PAR still promotes it, EMC XtremIO positions it at the top of their array metrics, and Pure Storage has it at the top-left of their capacity bar.

Is it bad to calculate and show? No. It’s a real statistic after all. Does it have any power or intrinsic value, though? No.

Storage Technology

HP 3PAR recently released version 3.2.1 of the InForm OS, which most notably brought in-line deduplication to the already rock-solid storage platform. Last week, I wrote briefly about it and included screenshots of estimates in the CLI. Today, I’d like to share real-world results.

I’d like to give particular thanks to Ivan Iannaccone of the HP 3PAR team for reaching out and for early access to the 4.6.1 IMC with dedupe in the GUI.

After I ran the estimate in the previous post, I learned from Ivan that estimates (and jobs) of multiple virtual volumes (VVs) in the same common provisioning group (CPG) will return increased data reduction ratios (read: less used space). Thus, when I received the new InForm Management Console (IMC) yesterday, I ran a new estimate against two VDI (Microsoft RemoteFX) VVs to see how the numbers panned out.

3par_dedupe_preview_rfx

As you can see, the dedupe ratio rose from 2.31 to 2.83. Every little bit helps, but what is the actual deduplication ratio?

Storage Technology Virtualization

Today we updated our HP 3PAR P10400 array from InForm OS version 3.1.3 MU1 to 3.2.1 MU1. The big change here is the introduction of Thin Deduplication. Currently it only supports virtual volumes that reside entirely on SSD flash drives (no AO allowed), but word from our account team is that other media types are on the road map.

One of the most interesting features is the ability to run an analysis and estimate the deduplication ratio of data currently on a virtual volume (VV). Not every data type will be dedupe friendly, so this saves you and your disks the headache and wear of converting them to a Thin Deduplicated Virtual Volume (TDVV) only to find out it doesn’t save you anything.

To run the analysis (or “dry run”), open the 3PAR CLI and run:

checkvv -dedup_dryrun <vv_name>

Storage Technology

We began our hands-on exploration of all-flash arrays in September 2013, and for all intents and purposes, the testing has never really concluded. If I knew then what I know now, I would have conducted a number of tests quickly during the official “Proof of Concept” (POC) phases.

All of the below tests are worth doing on the named products, as well as other similar products that official support the actions. Some tests particularly target a product architecture. Where applicable, I’ll note that. As with any storage array, the best and first test should be running real data (day-to-day workloads) atop it. The points build upon that being implied.

1. Capacity: Fill It Up!

This test is most practically focused on Pure Storage and its history and architecture. At the same time, the concept is worth processing with XtremIO.

In 2013 and before, Pure’s array dashboard showed a capacity bar graph that extended from 0% to 100%. At 80%, the array gave a warning that space was low, but failed to indicate the significance of this threshold. The code releases up to that point put an immediate write throttle on processing when the array passed that threshold. In short, everything but reads ground to a halt. This philosophy of what percentage truly is full was reassessed and redefined around the turn of the year to better protect the array and the user experience.

Pure’s architecture still needs a space buffer for its garbage collection (GC), which I believe is guarded by the redefinition of “full”. However, I have heard of at least one user experience where running near full caused performance issues due to GC running out of space (even with the protected buffer). If you’re testing Pure, definitely fill it up with a mix of data (especially non-dedupe friendly data) to see how it goes in the 80’s and 90’s.

For XtremIO, it’s a conceptual consideration. I haven’t filled up our array, but it doesn’t do anything that requires unprotected buffer space, so the risk isn’t particularly notable (feel free to still try!). The thing here is to think about what comes next when it does get full. The product road map is supposed to support hot-expansion, but today it requires swinging data between bricks (i.e. copy from an array of 1 x-brick to 2 x-bricks, 2 x-bricks to 4 x-bricks, etc).

Storage Technology

A fellow technologist asked a very fair and controversial question in a comment to IOPS Matter: VMware Native Multipathing Rule Attribute Affects Storage Failover, which pertains to my VMware-XtremIO environment. Since my response was running quite long, I thought it better to re-post the question here, followed by the answer.

“We are looking at purchasing a new all-flash SAN for our SQL environment running on VMware 5.5 — in your experience between Pure and EMC XIO, if you had it to do over, which would you buy? We are looking at the X-Brick 10TB against the Pure FA-405 6TB models. SQL compression is about 1.7:1 and dedup is almost nothing until we talk about storing multiple copies of our 300GB database for dev, test, staging, etc. Other than consistent finger-pointing from vendor to vendor, I’m not seeing much difference that would concern me in either direction other than price and that Pure’s 6TB might not exactly match the 8-9TB available in the XIO. Feedback?”

That’s quite the question, the answer to which would become headliner marketing material for whichever product was endorsed. Thankfully for me, the politically “safe” response of “it depends” is actually true. Factors like price, observable data reduction, and I/O patterns all sway the arrow.

Storage Technology

Several months ago I walked through some of the issues we faced when XtremIO hit the floor and found it not to be exactly what the marketing collateral might present. While the product was very much a 1.0 (in spite of its Gen2 name), EMC Support gave a full-court-press response to the issues, and our account team delivered on additional product. Now it’s 100% production and we live/die by its field performance. So how’s it doing?

For an organized rundown, I’ll hit the high points of Justin Warren’s Storage Field Day 5 (SFD5) review and append a few of my own notes.

  • Scale-Out vs. Scale-Up: The Impact of Sharing
  • Compression: Needed & Coming
  • Snapshots & Replication
  • XtremIO > Alternatives? It Depends

Storage Technology

If you’re thinking about Windows Server 2012, or maybe you aren’t, but you have a lot of uncompressed archive data like log files, syslog, documents, etc, check out the new deduplication feature. It’s a real charmer.

As an example for this post, last Friday I undertook to apply this new functionality to our syslog server. It runs Kiwi Syslog (now owned by SolarWinds) and has held about 400GB of logging that we’ve pruned at the 3-month mark to keep it under control. With this in mind, I spun up a new VM, imaged it from our SCCM server with Windows Server 2012, and signed on.

Microsoft Storage Technology

Terrible initial implementation. High-downtime expansion. Unreliable backups. Absentee support. That’s EMC Avamar.

On the tiny upside, deduplication works great…when backups work.

In September 2011, our tragedy began. We’re a 99% VMware-virtualized shop and bought into EMC Avamar on the promise that its VMware readiness and design orientation would make for low-maintenance, high-reliability backups. In our minds, this was a sort of near-warm redundancy with backup sets that could restore mission critical systems to another site in <6 hours. Sales even pitched that we could take backups every four to six hours and thus reduce our RPO. Not to be.

Before continuing, I should qualify all that gloom and woe by saying that we have had a few stretches of uneventful reliability, but that’s only when we avoided changing everything. And one of those supposed times, a bug in the core functionality rendered critical backups unusable. But I digress…

Storage Technology Virtualization