Microsoft Storage Spaces Is Hot Garbage For Parity Storage

I love parity storage. Whether it’s traditional RAID 5/6, erasure coding, raidz/raid2z, whatever. It gives you redundancy on your data without requiring double the drives that mirroring or mirroring+stripping would require.

The drawback is write performance is not as good as mirroring+stripping, but for my purposes (lots of video files, cold storage, etc.) parity is perfect.

In my primary storage array, I use double redundancy on my parity, so effectively N+2. I can lose any 2 drives without losing any data.

I had a simple Storage Spaces mirror on my Windows 10 Pro desktop which consisted of (2) 5 TB drives using ReFS. This had four problems:

  • It was getting close to full
  • The drives were getting old
  • ReFS isn’t support anymore on Windows 10 Pro (need Windows 10 Workstation)
  • Dropbox (which I use extensively) is dropping support for ReFS-based file systems.

ReFS had some nice features such as checksumming (though for data checksumming, you had to turn it on), but given the type of data I store on it, the checksumming isn’t that important (longer-lived data is stored either on Dropbox and/or my ZFS array). I do require Dropbox, so back to NTFS it is.

I deal with a lot of large files (video, cold-storage VM virtual disks, ISOs, etc.) and parity storage is great for that. For boot volumes, OS, applications, and other latency-sensitive operations, it’s SSD or NVMe all the way. But the bulk of my storage requirements is, well, bulk storage.

I had a few more drives from the Best Buy Easystore sales (8 TB drive, related to the WD Reds, for about $129 during their most recent sale) so I decided to use three of them and create myself a RAID 5 array (I know there are objections to RAID 5 these days in favor of RAID 6, while I agree with some of them, they’re not applicable to this workload, so RAID 5 is fine).

So I’ve got 3 WD Easystore shucked drives. Cool. I’ll create a RAID 5 array.

2018-11-17_18-12-15

Shit. Notice how the RAID-5 section is grayed out? Yeah, somewhere along the line Windows removed the ability to create RAID 5 volumes in their non-server operating systems. Instead Microsoft’s solution is to use the newer Storage Spaces. OK, fine. I’ll use storage spaces. There’s a parity option, so like RAID 5, I can do N+1 (or like RAID 6, N+2, etc.).

I set up a parity storage space (the UI is pretty easy) and gave it a quick test. At first, it started sending at 270 MB/s, then it dropped off a cliff to… 32 MB/s.

2018-11-17_17-06-10.png

SphericalEqualAbyssiniancat

That’s it. 32 MB/s a second. What. The. Eff. I’ve got SD cards that can write faster. My guess is that some OS caching was allowing it to copy at 270 MB/s (the hard drives aren’t capable of 270 MB/s). But the hard drives ARE capable of far more than 32 MB/s. Tom’s Hardware found the Reds capable of 200 MB/s sequential writes. I was able to get 180 MB/s with some file copies on a raw NTFS formatted drive, which is inline with Tom’s Hardware’s conclusion.

Now, I don’t need a whole lot of write performance for this volume. And I pretty much only need it for occasional sequential reads and writes. But 32 MB/s is not enough.

I know what some of you are thinking. “Well Duh, RAID 5/parity is slower for writes because of the XOR calculations”.

I know from experience on similar (and probably slower) drives, that RAID 5 is not that slow, even on spinning disks. The XOR calculations are barely a blip in the processor for even halfway modern systems. I’ve got a Linux MD RAID system, with 5 drives and I can get ~400 MB/s of writes (from a simple dd write test).

While it’s true RAID 5 writes are slower than say, RAID 10, they’re not that slow. I set up a RAID 5 array on a Windows Server 2016 machine (more on that later) using the exact same drives it was able to push 113 MB/s.

Capture.PNG

It might have been able to do more, but it was limited by the bottleneck of the Ethernet connection (about 125 MB/s) and the built-in Dell NIC. I didn’t have an SSD to install Windows Server 2016 on and had to a use a HDD that was slower than the drives the RAID 5 array was built with so that’s the best I could do. Still, even if that was the maximum, I’ll be perfectly happy with 113 MB/s for sequential writes.

So here’s where I got crafty. The reason I had a Windows 2016 server was that I thought if I created a RAID 5 volume in Windows 2016 (which you can) I could simply import the volume into Windows 10 Pro.

Unfortunately, after a few attempts, I determined that that won’t work.

2018-11-17_15-55-15.png

The volume shows failed and the individual drives show failed as well.

So now I’m stuck with a couple of options:

  • Fake RAID
  • Drive mirroring
  • Parity but suck it up and deal with 32 MB/s
  • Parity and buy a pair of small SSDs to act as cache to speed up writes
  • By a Hardware RAID Card

Fake Hardware RAID

Early on in my IT career, I’d been fooled by fake RAID. Fake RAID is the feature that many motherboards and inexpensive SATA cards offer: You can setup RAID (0, 1, 5 typically) in the motherboard BIOS.

But here’s the thing: It’s not a dedicated RAID card. The RAID operations are done by the general CPU. It has all the disadvantages of hardware RAID (difficult to troubleshoot, more fragile configurations, very difficult to migrate) and none of the advantages (hardware RAID offloads operations to a dedicated CPU on the RAID card, which fake RAID doesn’t have).

For me, it’s more important to have portability of the drives (just pull disks out of one system and into another). So fake RAID is out.

Drive Mirroring

Having tested drive mirroring performance, it’s definitely a better performing option.

Parity with Sucky Performance

I could just suck it up and deal with 32 MB/s. But I’m not going to. I don’t need SSD/NVMe speeds, but I need something faster than 32 MB/s. I’m often dealing with multi-gigabit files, and 32 MB/s is a significant hindrance to that.

Parity with SSD Cache

About $50 would get me two 120 GB SSDs. As long as I wasn’t doing a massive copy beyond 120 GBs of data, I should get great performance. For my given workload of bulk storage (infrequent reads/writes, mostly sequential in nature) this should be fine. The initial copy of my old mirrored array is going to take a while, but that’s OK.

The trick with an SSD cache is that you have to use PowerShell in order to configure it. The Windows 10 GUI doesn’t allow it.

After some fiddling, I was able to get a Storage Space going with SSD cache.

And… the performance was worse than with the drives by itself. Testing the drives by themselves, I found the that the SSDs had worse sequential performance than the spinning rust. I’d assumed the SSDs would do better, a silly assumption now that I think about it. At least I’m out only $50, and I can probably re-purpose them for something else.

The performance for random I/O is probably better, but that’s not what my workload is on these drives. My primary need is sequential performance for this volume.

Buy A Hardware RAID Card

I don’t like hardware RAID cards. They’re expensive, the software to manage them tends to be really awful, and it make portability of drives a problem. With software RAID, I can pull drives out of one system and put them into another, and voila, the volume is there. That can be done with a hardware RAID card, but it’s trickier.

The performance benefit that they provide is just about gone too, given how fast modern CPUs are and how many cores they have, compared to the relatively slow CPUs on hardware RAID cards (typically less than a GHz, and only one or two cores).

Conclusion

So in the end, I’m going with a mirrored pair of 8 TB drives, and I have two more drives I can add when I want to bring the volume to 16 TB.

Thoughts On Why Storage Spaces Parity Is Such Hot Fucking Garbage

There’s a pervasive thought in IT that parity storage is very slow unless you have a dedicated RAID card. While probably true at one time, much like the jumbo frame myth, it’s no longer true anymore. A halfway modern CPU is capable of dozens of Gigabytes per second of RAID 5/6 or whatever parity/erasure coding. If you’re just doing a couple hundred megabytes per second, it’s barely a blip in the CPUs.

It’s the reason huge honking storage arrays (EMC, Dell, NetApp, VMware VSAN etc.) don’t do RAID cards. They just (for the most part) throw x86 cores at it through either scale-up or scale-out controllers.

So why does Storage Space parity suck so bad? I’m not sure. It’s got to be an implementation problem. It’s definitely not a CPU bottleneck. It’s a shame too, because it’s very easy to manage and more flexible than traditional software RAID.

(way)TL;DR

Tried parity in storage spaces. It sucked bigtime. Tried other shit, didn’t work. Just went with mirrored.

4 Responses to Microsoft Storage Spaces Is Hot Garbage For Parity Storage

  1. Markus Schloesser says:

    Because of all that described mess, I use drivepool. Standard based, reliable, gets the job done, inexpensive

  2. J-Dub says:

    While I agree that parity based storage spaces blow i think i have some light to shed on your SSD caching issue. In my experience the SSD caching works pretty well with enterprise SSDs, more specifically SSDs power loss protection (PLP).

    What you are running into is likely because you used consumer SSDs ($50 for 2x drives?) which pretty much never have power loss protection.

    Most consumer and enterprise drives use a fast DRAM or SLC Flash cache to absorb writes and then dole them out to the slower MLC or QLC flash. The difference is the PLP. Enterprise drives include capacitors on board to ensure that during a power loss event that the DRAM or SLC cache has time to flush to the primary storage before the drive runs out of power.

    In your typical windows environment these SSDs are treated the same. However in Storage Spaces the caching mechanism requires a “verified write”, drives without PLP are not considered to have a “verified write” until the SSD cache has fully written to the drive, basically ignoring the fact that you wrote to the fast DRAM or SLC cache and waiting for the drive to write to the MLC or QLC. Often times bypassing the cache on a value branded consumer drive yields pretty pitiful results as the MLC or QLC struggles to maintain even spinning rust levels of I/O.

    The OS can identify drives with PLP based on their S.M.A.R.T. data and will let the drives use their caching devices as a verified write allowing you to see the expected increase in speeds you’d expect from using SSDs.

    I can say from first hand experience that in a 2x drive SSD cache mirror that 2x 480gb Intel S3710’s trounce 2x technically faster 500gb Samsung 850 EVO drives.

  3. RM says:

    Are you sure those Easystore drives weren’t SMR?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: