Since the advent of thin provisioning, the concept of “data efficiency” has been used to describe how storage arrays deliver on large capacity requests while only reserving what’s actually occupied by data. For me, 3PAR (pre-HP) pioneered this in their thin provisioning and “Zero Detect” feature combo, which they like to deem a sort of deduplication 1.0 (they deduped zeroes in stream).
With the wider implementation of deduplication and compression, the “data efficiency” term has stepped (or been pushed) into marketing spotlights around the industry. HP 3PAR still promotes it, EMC XtremIO positions it at the top of their array metrics, and Pure Storage has it at the top-left of their capacity bar.
Is it bad to calculate and show? No. It’s a real statistic after all. Does it have any power or intrinsic value, though? No.
Thin provisioning is like fiberchannel or iSCSI connectivity in enterprise arrays. An array that doesn’t include it doesn’t rate the classification of “modern enterprise storage” in my book. Do most people consider air conditioning a perk when shopping for a car? I doubt it–it’s a given today.
Beyond the fact that it should be a given, the real issue I have with “Data Efficiency” is the inappropriate and near-fraudulent use of it in sales & marketing communication.
— Jonathan Frappier (@jfrappier) September 25, 2015
The above tweet has a few other issues as well, but for the topic at hand, the efficiency stat of 418 to 1 is the focus. It’s incredible at face value! Who wouldn’t want an XtremIO!?
It’s a completely fabricated number, however. For testing and whatever else is going on with that array, the admin has provisioned 1.1 petabytes on top of 15 terabytes of real storage. Even with magic pixie dust, no storage vendor can achieve that squeeze job (or anything reasonably close for mainstream over-provisioning).
The number that matters is “Data Reduction”. It has a hard backing of deduplication + compression and reflects how much more data you are fitting on the array as a result of software capabilities. That said, any number can be inflated with synthetic activities like cloning, etc (as is also occurring in the above image).
I’m a big proponent of integrity in the workplace and sales cycle, which I realize is idealistic when so much money is on the line with these vendors. Still, they are disgracing the worthy products they represent by using these tactics to seal deals. Let the products stand on their own two feet!
The final takeaway of all these words is that “data efficiency” is more of a reflection on a storage admin’s provisioning style than it is on the storage array’s performance.
If I only provision what I will use, my efficiency (minus dedupe+compression) will be 1. I’m not over-subscribing and I’m using it all–it’s like paying for everything with cash.
If I create massive volumes to leave the OS or hypervisor lots of overhead, then my efficiency will soar–it’s like paying with high-limit credit cards. The limit is sky-high, but trying to use it all would bankrupt your array (and resume–most arrays don’t take kindly to being completely full, which means downtime). At the end of the day, my number will be different than yours, and that’s all it is–different.
So leave “efficiency” out of your POC conversation. Keep it to what’s objective. Reduction is the name of the game. How’s yours?