In September 2013, my organization and I started a journey into the realm of flash storage. The initial foray took us into two camps and lasted much longer than we expected. In fact, our 2013 storage decision bore with it lessons and tests that lasted until it was once again time to make another upgrade, our 2015 replacement at a sister site.
In 2013, while smaller start-ups were aplenty, EMC’s pre-release XtremIO (GA in December 2013) and Pure Storage were the only mainstream contenders. Granted, Pure was still technically a start-up, but then again, XtremIO was an unreleased product purchased by EMC without broad field experience. Everyone was young.
Much of this has already been hashed in my prior posts, but the short story is that we made a decision to forego Pure Storage in 2013 based on a belief in promises by EMC that XtremIO would deliver everything that Pure did and more. The two metrics were data reduction and performance. We assumed in the land of enterprise storage that high availability was a given.
In the following months of early 2014, we learned not to assume anything and also how to help a young product mature through bug discovery. During a controller replacement for a failed fiberchannel (FC) module, we encountered an Infiniband bug that took down the XtremIO array for 3 hours. EMC fixed that bug in the next release. Then we discovered that virtual machines with EFI firmware could not (re)boot on XtremIO. That led to EMC’s “best practice” of configuring ESXi’s advanced setting, Disk.DiskMaxIOSize, to 4096 from the default 32768. In June we had our largest issue when our hosts lost connectivity during a “non-disruptive” upgrade. The troubleshooting for that lasted for more than 6 months and was never cleanly resolved. Another “best practice”–setting native multipathing to change paths after every I/O–was determined to be necessary to avoid this issue with our QLogic converged network adapters (CNAs).
As for data reduction, XtremIO started out at 1.5 to 1, in contrast to Pure’s 4.2 to 1. The compression coming in XIOS 3.0 was supposed to bring parity between those numbers, but that never panned out. Instead, the 1.5x deduplication dropped to 1.3 (due to block sizes increasing to 8K), and compression added its own 1.3x, for a net of 1.7x. EMC addressed this with complementary hardware to make raw capacity match/exceed Pure’s logical capacity. They stand by their word, and for that, we are thankful.
Most recently, I discovered that our dataset did change some during the spring of 2014 such that our data reduction comparison may have been slightly disparate. Table compression was introduced into part of our database structure, leading to lower reduction numbers in our latest Pure implementation. Even so, we see nearly 2x better data reduction on Pure now compared to XtremIO.
Today, we have 8 months of up-time under our belt on XtremIO (2 x 20TB bricks), not counting the migration off of it in December 2014 for the XIOS 3.0 upgrade. VMware Storage vMotion gets the credit for no downtime on that. We actually haven’t had a chance to request the latest minor update for XIOS, so our first non-disruptive update is still in the future.
Performance-wise, XtremIO has never skipped a beat. Mass-backups by our SQL servers don’t faze it (unless you consider a jump from 0.5ms to 1.5ms for <15 minutes an issue), and it will surely be the last bottleneck in our virtual infrastructure. CPU and RAM will need to exponentially scale, and even then it only be a capacity issue (which further extends the performance lead).
Strangely enough, it is that capacity factor that caused us to reconsider the competition when it came time to purchase a sister SAN for our XtremIO. For years, we faced the inverse problem of excess capacity that could not support the performance demands. Now we had performance that couldn’t stretch to meet capacity (due to the low data-reduction ratio).
The Candidates in 2015
To replace our HP 3PAR V400 (P10400) from circa 2012, we engaged with EMC, HP, Pure, and Dell for solutions and quotes. Other vendors were given light investigation, but their products tended to be niche and focus on just one characteristic rather than the bigger picture, or they just weren’t mature enough to lean on (especially considering our above history with “beta” XtremIO).
Dell Compellent was all about hardware. Their technical sales team openly stated that they viewed deduplication as too risky and relegated compression to nearline storage only. They also seemed stuck in the past with a focus on mixing SLC, MLC, and “mixed-use” SSDs on top of spinning disks. I hear that they are working on modernizing, but that’s still coming. If hardware is all that matters to you, though, Dell was the cheapest.
HP 3PAR was actually the lead contender all the way up until Pure stepped in. We like our history with 3PAR and our 3PAR arrays on the floor. They have never failed us–*sigh* well, once, but it was the 1.0 of deduplication and true to form, I found the rarest of bugs for them (I’m lucky like that). Anyways, 3PAR has been a winner for us since 2009. The GUI and CLI are straightforward (once you learn the lingo of CPGs, VVs, VLUNs, etc) and management has consistently taken a minimum of effort.
The thing that argued most strongly to me was/is the 3PAR product & support team. Ivan and his team are fully engaged and accessible, and they currently have a full tank of momentum in their innovation engine. HP was late to this party–hence, they weren’t in the picture in fall 2013–but they are here now and the 3PAR line is a winning pick.
3PAR’s solution for us involved half of the capacity in flash and half in 15K SAS disks. You might think that’s moving backwards in a flash world, but unless your data truly pushes the limits of the storage autobahn, this isn’t a right-and-wrong debate. It’s a better-and-best one. Furthermore, 3PAR Adaptive Flash Cache (AFC) makes that spinning stuff behave much closer to flash speed.
Physics, rack space, and power do matter, though, which is what moves the lime light to all-flash arrays.
EMC XtremIO was the incumbent with a deal on the table to get a killer deal on a matching partner to our existing 2 x 20TB pair of bricks. Initial estimates, however, put our data needs above what we could bet on that pair to deliver (we learned in Round 1 not to assume more than 1:1 reduction). In an XtremIO world, that means doubling capacity (4 x 20TB). There are no odd-numbered solutions, except 1-brick.
That steep upgrade path carries a lot of cost and a significant chunk of rack space with it. Two bricks consume 13U’s. Four require 25U’s, I believe (one less than 2x because you only need one Infiniband switch). And while EMC was able to extend the exceptional pricing to the four-brick solution, it was still above the rest of the candidates. If there was a way to guarantee data reduction ratios, that wouldn’t have been the case, because even 1.7x changes the equation. However, we had bet on much more than that originally and no amount of good will can change (XIOS) software design.
That brings us to Pure Storage. Coming out of 2013, we didn’t have anything bad to say about Pure; we simply believed a competitor’s word and went forward on faith. This time around we had the facts and put aside the unproven. Pure won our business on the following factors:
- Innovation: the highest data reduction
- Simplicity: deployment and management
- Cost: lowest price for logical capacity (see “Innovation”)
- Support: great support in 2013 and same today
- Environment: lowest power and least rack space (8U for our setup)
- Expansion: granular, customer-handled, and non-disruptive
- Availability: architecture and track record in our environment
In our internal discussions, we put different filters and weights upon those factors as we measured all of the candidates, and regardless of the weighting, Pure came out on top. That’s important to do in any organization–don’t let the price run away with the discussion. It surely matters, but support, management, growth plans, and availability can make operational costs far exceed the upfront ones.
We are now fully migrated to our new Pure array and it is going well. Our data reduction is different from our 2013 POC–3x vs 4x–but we know why that is the case (our dataset changed), so it’s explainable. Actually, we have plans for improving it as we address our next backup & recovery project, but that’s for a different post :).
In this arena, I want to emphasize that choosing between the leaders is not a right-and-wrong choice. It is subjective and depends on your data and environment. 3PAR is good. XtremIO is good. Pure is good. What makes one or more of them great is you. Your data, your environment, your workload, your future. Your matrix is different from mine. Choose the solution that adds up.