PSA: Virtual Interfaces (in ESXi) Aren’t Limited To Reported Interface Speeds

There is an incorrect assumption that comes up from time to time, one that I shared for a while, is that VMware ESXi virtual NIC (vNIC) interfaces are limited to their “speed”.

In my stand-alone ESXi 7.0 installation, I have two options for NICs: vxnet3 and e1000. The vmxnet3 interface shows up at 10 Gigabit on the VM, and the e1000 shows up as a 1 Gigabit interface. Let’s test them both.

One test system is a Rocky Linux installation, the other is a Centos 8 (RIP Centos). They’re both on the same ESXi host on the same virtual switch. The test program is iperf3, installed from the default package repositories. If you want to test this on your own, it really doesn’t matter which OS you use, as long as its decently recent and they’re on the same vSwitch. I’m not optimizing for throughput, just putting enough power to try to exceed the reported link speed.

The ESXi host is 7.0 running on an older Intel Xeon E3 with 4 cores (no hyperthreading).

Running iperf3 on the vmxnet3 interfaces, that show up as 10 Gigabit on the Rocky VM:

[ 1.323917] vmxnet3 0000:0b:00.0 ens192: renamed from eth0
[ 4.599575] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[ 4.602889] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 5 vectors allocated
[ 4.604520] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps

It also shows up as 10 Gigabit on the Centos 8 VM:

[ 2.526942] vmxnet3 0000:0b:00.0 ens192: renamed from eth0
[ 7.715785] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[ 7.719561] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 5 vectors allocated
[ 7.720221] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps

I ran the iperf3 server on the Centos box and the client on the Rocky Box, though that shouldn’t matter much:

vmxnet3 NIC

[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.38 GBytes 20.4 Gbits/sec 0 1004 KBytes
[ 5] 1.00-2.00 sec 2.63 GBytes 22.6 Gbits/sec 0 1.22 MBytes
[ 5] 2.00-3.00 sec 2.59 GBytes 22.3 Gbits/sec 0 1.22 MBytes
[ 5] 3.00-4.00 sec 2.56 GBytes 22.0 Gbits/sec 0 1.28 MBytes
[ 5] 4.00-5.00 sec 2.65 GBytes 22.7 Gbits/sec 0 1.28 MBytes
[ 5] 5.00-6.00 sec 2.60 GBytes 22.4 Gbits/sec 0 1.28 MBytes
[ 5] 6.00-7.00 sec 2.62 GBytes 22.5 Gbits/sec 0 1.28 MBytes
[ 5] 7.00-8.00 sec 2.55 GBytes 21.9 Gbits/sec 0 1.28 MBytes
[ 5] 8.00-9.00 sec 2.52 GBytes 21.6 Gbits/sec 0 1.28 MBytes
[ 5] 9.00-10.00 sec 2.46 GBytes 21.1 Gbits/sec 0 1.28 MBytes
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 25.6 GBytes 22.0 Gbits/sec 0 sender
[ 5] 0.00-10.04 sec 25.6 GBytes 21.9 Gbits/sec receiver

So around 22 Gigabits per second, VM to VM with vmxnet3 NICs that report as 10 Gigabit.

What about the e1000 NICs. They show up as 1 Gigabit (just showing one here, but they both are the same):

[43830.168188] e1000e 0000:13:00.0 ens224: renamed from eth0
[43830.182559] IPv6: ADDRCONF(NETDEV_UP): ens224: link is not ready
[43830.245789] IPv6: ADDRCONF(NETDEV_UP): ens224: link is not ready
[43830.247271] IPv6: ADDRCONF(NETDEV_UP): ens224: link is not ready
[43830.247994] e1000e 0000:13:00.0 ens224: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[43830.249059] IPv6: ADDRCONF(NETDEV_CHANGE): ens224: link becomes ready

e1000 NIC

[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.42 GBytes 12.2 Gbits/sec 905 597 KBytes
[ 5] 1.00-2.00 sec 924 MBytes 7.75 Gbits/sec 87 607 KBytes
[ 5] 2.00-3.00 sec 842 MBytes 7.07 Gbits/sec 0 626 KBytes
[ 5] 3.00-4.00 sec 861 MBytes 7.22 Gbits/sec 0 638 KBytes
[ 5] 4.00-5.00 sec 849 MBytes 7.12 Gbits/sec 0 655 KBytes
[ 5] 5.00-6.00 sec 878 MBytes 7.36 Gbits/sec 0 679 KBytes
[ 5] 6.00-7.00 sec 862 MBytes 7.24 Gbits/sec 0 683 KBytes
[ 5] 7.00-8.00 sec 854 MBytes 7.16 Gbits/sec 0 690 KBytes
[ 5] 8.00-9.00 sec 874 MBytes 7.33 Gbits/sec 0 690 KBytes
[ 5] 9.00-10.00 sec 856 MBytes 7.18 Gbits/sec 197 608 KBytes
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 9.04 GBytes 7.76 Gbits/sec 1189 sender
[ 5] 0.00-10.04 sec 9.04 GBytes 7.73 Gbits/sec receiver

So I got about 7 or so Gigabits per second even with the e1000 driver, even though it shows up as 1 Gigabit. It makes sense they don’t get as much as the vmxnet3 NIC as the e1000 NIC is optimized for compatibility (looking like an Intel E1000 chipset to the VM) and not performance, but still.

My ESXi host is older, with a CPU that’s about 9 years old, so with a faster CPU and more cores, it’s probable I could pass even more than 22 Gbit/7 Gbit respectively. But it was still sufficient to demonstrate that VM transfer speeds are *not* limited by the reported vNIC interface speed.

This is probably true for other hypervisors (KVM, Hyper-V, etc.) but I’m not sure. Let me know if you know in the comments.

Cut-Through Switching Isn’t A Thing Anymore

So, cut-through switching isn’t a thing anymore. It hasn’t been for a while really, though in the age of VXLAN, it’s really not a thing. And of course with all things IT, there are exceptions. But by and large, Cut-through switching just isn’t a thing.

And it doesn’t matter.

Cut-through versus store-and-forward was a preference years ago. The idea is that cut-through switching had less latency than store and forward (it does, to a certain extent). It was also the preferred method, and purchasing decisions may have been made (and sometimes still are, mostly erroneously) on whether a switch is cut-through or store-and-forward.

In this article I’m going to cover two things:

  • Why you can’t really do cut-through switching
  • Why it doesn’t matter that you can’t do cut-through switching

Why You Can’t Do Cut-Through Switching (Mostly)

You can’t do cut-through switching when you change speeds. If the bits in a frame are sent at 10 Gigabits, they need to go into a buffer before they’re sent over a 100 Gigabit uplink. The reverse is also true. You can’t stuff a frame that’s piling into an interface 10 times faster than it’s sending (though it’s not slowed down).

So any switch (which is most of them) that uses a higher speed uplink than host facing port is store-and-forward.

Just about every chassis switch involves speed changes. Even if you’re going from a 10 Gigabit port on one line card to a 10 Gigabit port on another line card, there’s a speed change involved. The line card is connected to another line card via a fabric module (typically), and that connection from line card to fabric module is via a higher speed link (typically 100 Gigabit).

There’s also often a speed change when going from one module to another, even if say the line cards were 100 Gigabit and the fabric module were 100 Gigabit, the link between them is usually a slightly higher speed in order to account for internal encapsulations. That’s right, there’s often an internal encapsulation (such as Broadcom’s HiGig2) that slightly enlarges the frames bouncing around inside of a chassis. You never see it, because the encap is added when the packet enters the switch and removed before it leaves the switch. The speed is slightly bumped to account for this, hence a slight speed change. That would necessitate store-and-forward.

As Ivan Pepelnjak noted, I got this part wrong (about Layer 3 and probably VXLAN, the other reasons stand, however).

You can’t do cut-through switching when doing Layer 3. Any Layer 3 operation involves re-writing part of the header (decrementing the TTL) and as such a new CRC for the frame that packet is encapsulated into is needed. This requires storing the entire packet (for a very, very brief amount of time).

So any Layer 3 operation is inherently store-and-forward.

Any VXLAN is store-and-forward. See above about Layer 3, as VXLAN is Layer 3 by nature.

Any time a buffer is utilized. Anytime two frames are destined for the same interface at the same time, one of them has to wait in a buffer. Any time a buffer is utilized, it’s store-and-forward. That one is hopefully obvious.

So any switch with a higher-speed uplink, or any Layer 3 operations, or when buffers are utilized, and of course when VXLAN is used, it’s automatically store-and-forward. So that covers about 99.9% of use cases in the data center. Even if your switch is capable of cut-through, you’re probably not using it.

It Doesn’t Matter That Everything Is (Mostly) Store-and-Forward

Network engineers/architects/whathaveyou of a certain age probably have it engrained that “cut-through: good” and “store-and-forward: bad”. It’s one of those persistent notions, that may have been true at one time (though I’m not sure cut-through was ever that advantageous in most cases), but no longer is. The notion that Hardware RAID is better than software RAID (isn’t not anymore), LAGs should be powers of 2 (not a requirement on most gear), Jumbo frames increase performance (miniscule to no performance benefit today in most cases), MPLS is faster (it hasn’t been for about 20 years) are just a few that come to mind.

“Cut-through switching is faster” is technically true, and still is, but it’s important to define what you mean by “faster”. Cut-through switching doesn’t increase throughput. It doesn’t make a 10 Gigabit link a 25 Gigabit link, or a 25 Gigabit link a 100 Gigabit link, etc. So when we talk about “faster”, we don’t mean throughput.

What it does is cut the amount of time a frame spends in a single switch.

With 10 Gigabit Ethernet a common speed, and most switches these days supporting 25 Gigabit, the serialization delay (the amount of time it takes to transmit or receive a frame) is miniscule. The port-to-port latency of most DC swtiches is 1 or 2 microseconds at this point. Compared to other latencies (app latency, OS network stack latency, etc.) this is imperceptible. If you halved the latency or even doubled the latency, most applications wouldn’t be able to tell the difference. Even benchmarks wouldn’t be able to tell the difference.

Cutting down the port-to-port latency was the selling point of cut-through switching. A frame’s header could be leaving the egress interface while it’s tail-end was still coming in on the ingress interface. But since the speeds are so fast, it’s not really a significant cause of communication latency. Storing the frame/packet just long enough to get the entire frame and then forward it doesn’t cause any significant delay.

From iSCSI to VMotion to SQL to whatever, the difference between cut-through and store-and-forward is unmeasurable.

Where Cut-Through Makes Sense

There are a very small number of cases where cut-through switching makes sense, most notably high-frequency trading. In these rare cases where latency absolutely needs to be cut down, cut-through can be achieved. However, there’s lots of compromises to be made.

If you want cut-through, your switches cannot be chassis. They need to be top-of-rack switches with a single ASIC (no interconnects). The interface speed needs to be the same throughout the network to avoid speed changes. You can only do Layer 2, no Layer 3 and of course no VXLAN.

The network needs to be vastly overprovisioned. Anytime you have two packets trying to leave an interface at the same time, one has to be buffered, and that will dramatically increase latency (far beyond store-and-forward latency). The packet sizes will also need to be small as to reduce latency.

Too-Long; Didn’t Read

The bad news is you probably can’t do cut-through switching. But the good news is that you don’t need to.

Requiem for FCoE

FCoE is dead. We’re beyond the point of even asking if FCoE is dead, we all know it just is. It was never widely adopted and it’s likely never going to be widely adopted. It enjoy a few awkward deployments here and there, and a few isolated islands in the overall data center market, but it it never caught on the way it was intended to.

So What Killed FCoE?

So what killed FCoE? Here I’m going to share a few thoughts on why FCoE is dead, and really never was A Thing(tm).

It Was Never Cheaper

Ethernet is the champion of connectivity. It’s as ubiquitous as water in an ocean and air in the.. well, air. All the other mediums (ATM, Frame Relay, FDDI, Token Ring) have long ago fallen by the wayside. Even mighty Infiniband has fallen. Only Fibre Channel still stands as the alternative for a very narrow use case.

The thought is that the sheer volume of Ethernet ports would make them cheaper (and that still might happen), but right now there is no real price benefit from using FCoE versus FC.

In the beginning, especially, FCoE was quite a bit more expensive than running separate FC and Ethernet options.

Even if it comes out as a draw, the extra management and clumsy integration with management styles make them more expensive from a practical perspective. Which brings me to the next point:

Fibre Channel and Ethernet/IP Networks are Just Managed Differently

The joke is that you can unplug any Ethernet cable for up to 7 seconds, plug it back in, and you don’t have to tell anyone. If you unplug any Fibre Channel cable for even 2 seconds, find a new job.

Fibre Channel is really SCSI over Fibre Channel (and now NVMe over Fibre Channel, though that’s uncommon). And SCSI is a high-maintenance payload. IP-based protocols have various recovery mechanisms at various levels if payloads are lost, or the protocols don’t care. SCSI does care if a message is lost, it cares a lot. Its recovery mechanisms are time consuming and still possible to end up with data corruption.

As a result, Fibre Channel networks are handled with a lot more care than we do with a traditional Ethernet/IP network. The environment is lot more static, with changes made infrequently, where as Ethernet/IP networks, especially with EVPN/VXLAN implementations are only getting more dynamic. Dynamic and Fibre Channel don’t go well together. FCoE doesn’t change that.

Trying to impose the same rules of Fibre Channel management onto an Ethernet/IP switch generally doesn’t go over well.

Fibre Channel Interconnectivity Has Always Sucked

Fibre Channel switches are designed around open standards (such ANSI T11). They’re well documented, well understood. And yet few people build fabrics that include both Cisco and Brocade (now part of Broadcom) switches.

They implemented the standards slightly differently, and there’s lots of orchestration and control plane stuff going on (yes, I know, super technical here).

There are a few ways around this, such as interoperability mode but it’s clumsy and awkward and seldom used (expect perhaps in migrating from one vendor to another).

There’s also NPIV in combination with NPV/Access Virtual Gateway mode (Cisco and Brocade’s “proxy” mode, respectively), but that makes the the NPV/Access Virtual Gateway switches “invisible” to the fabric, getting around the fabric services integration.

Ethernet itself is way more interoperable. You wouldn’t think twice about connecting a Cisco switch to an Arista switch via Ethernet/IP. Or a Juniper switch to an Extreme Networks switch. The protocols are simpler, and way more interoperable. That’s an advantage to those technologies. FCoE forces you to go the single-vendor route, since FC is generally single-vendor.

(One exception that we’re seeing is VXLAN/EVPN, right now you would not build an VXLAN/EVPN network with two vendors, and it could be that it’s never a good idea to. That might be a next blog post.)

Fibre Channel Generally is in Decline

While not a direct reason why FCoE is dead, it certainly didn’t help. When FCoE was developed, Fibre Channel was in its heyday. It was, for a while, the very best way to do storage. Now there’s a lot of options out there, and many of them are better suited for most environments than Fibre Channel. And there’s not much innovation in declining tech.

Fibre Channel in general is dying off, but like a lot of technology in IT, it’s dying very, very slowly. Unix servers peaked around 2004, and have been in decline since. Still though, both IBM and Oracle (Sun) continue to do respectable business in the Unix market.

Probably a better way to describe Fibre Channel in general is to call it a legacy technology. Enterprise IT especially is very sedimentary and full of legacy tech. That’s the technology that isn’t growing, expanding, but we still need to keep it around because modernization is either not possible or too costly (or management makes poor choices…)

Fibre Channel is likely to be around for a while, and while there will be new deployments here and there (I was involved in one recently) it will mostly be deployed and refreshed to “keep the lights on”, so to speak. Fibre Channel is mostly a “scale up” technology, and storage has moved to “scale out” where Fibre Channel is not as well suited.

Since Fibre Channel is in decline, the need to put it on Ethernet is, via the transitive property, also in decline.

One Place FCoE Will Continue (and Thrive)

Cisco UCS uses FCoE for their B-series blades. It works, and it works well. It’s its own little island of FCoE, and doesn’t require any special configuration. Fabric and the hosts see native Fibre Channel, so operationally it’s no different than regular Fibre Channel connectivity to a SAN. It works because it’s mostly hidden from everyone involved. It just looks like regular FC.

I think FCoE will continue in that environment as long as B-series blades support Fibre Channel.

One Way FCoE Might Come Back

There’s one scenario I think possible (though not likely) where FCoE makes a resurgence, and even becomes the dominant way Fibre Channel is deployed: When native fibre channel switches no longer make sense.

Right now development in Fibre Channel is not… much of a thing. 64 GFC has been a standard for a while, and only recently Brocade has a product. Cisco has announced future support for 64 GFC but hasn’t released any switches or line cards that have them. There’s also a 128 GFC and 256 GFC standard (using four lanes, much like 40, 100, and 400 Gigabit Ethernet) but as far as I know the interfaces have never been produced. The 128 GFC standard has been around for 5 years, and the 256 GFC standard for about 2 years, and interfaces haven’t ben produced. I don’t foresee either being implemented. Ever.

So it’s certainly possible that 64 GFC is the last interface speed that Fibre Channel will see. There doesn’t seem to be much of a demand for faster, and the vendors (Cisco and Brocade/Broadcom) seem more of a wait-and-see. Ethernet is getting all the speed increases, with 400 Gigabit interfaces shipping, 100 Gigabit common place and relatively cheap, and plans for 800 Gigabit already being finalized.

So if Fibre Channel there’s demand for faster than 64 GFC (such as ISLs), to get to those speeds it might need to be Ethernet. I think it would be in the form of a switch that we treat like a Fibre Channel switch, in that we build a single vendor SAN, use zones and zonesets, and it only carries storage traffic. There would be A/B fabrics, etc. Hosts would have separate FCoE and Ethernet interfaces, and wouldn’t try to combine the two. But instead of native Fibre Channel interfaces, the interfaces would be FCoE. You can do this today: You can build a Fibre Channel fabric comprised of entirely FCoE interfaces from the host to the storage array. It’s just not currently practical from a cost and switch model availability situation.

Final Thoughts

So Fibre Channel over Ethernet is pretty much dead. It never really became A Thing, where as Fibre Channel was most certainly A Thing. But now Fibre Channel is a legacy technology, so while we’ll continue to see it for years to come, it’s not an area that’s likely to see a lot of innovation or investment.

The Three Levels of Data Protection for Data Hoarders

The following post is aimed for photographers and other digital hoarders. Those of us that want to keep various digital assets not just for a few years, but a lifetime, and even multiple lifetimes (passed down, etc.)

There are three levels of data protection: Data resiliency, data backup, and data archive.

Data Resiliency (Redundant Disks, RAID, NAS/DAS)

Data resiliency is when you have multiple disks in some sort of redundant configuration. Typically this is some type of RAID array, through there are other technologies now that operate similar to RAID (such as ZFS, Storage Spaces, etc.) This will protect you from a drive failure. It will not, however, protect you from accidental file deletion, theft, flood/natural disaster, etc. The drives have the same file system on them, and thus have a lot of “shared fate”, where if something happens to one, it can happen to the other.

To put it simply, while there are some scenarios where your data is protected by data resiliency (drive failure), there are scenarios where it won’t (flood, theft).

RAID is not backup.

Data Backup

One of the maxims we have in the IT industry in which I’ve worked for the past 20 years is RAID is not backup. As stated in the previous section, there are scenarios where RAID will not keep your data safe. What will make it safer in the short term is to have a good backup solution. Data backup is not generally a long-term solution, but it is something that’s good to have.

A data backup is a mechanism where files are copied from your active environment to a non-active environment. Probably the best general backup mechanism I’ve seen is Time Machine from Apple. You can designate a drive, typically an external one, and the system automatically backs up files to that drive. You can browse the history of your file and file systems and retrieve something you deleted months ago.

There are lots of cloud solutions now, where your data is backed up to a cloud service like Dropbox, Backblaze, etc. Short term, I like these solutions. I do not like them for long term solutions.

I don’t like them for archive.

Backup is not archive.


Archive is probably what most of us really want long term. Our treasured photos, memories, projects, etc., we want to keep them forever. Not only do we want them to last our entire lifetime, we want to be able to pass them to our heirs.

Over the years and decades, your data will have different homes. Multiple drives or even arrays, copied from one to the other.

I don’t like any backup solutions for archive, as backup solutions are too tied to a particular platform. The best backup solution is putting your files in a file structure.

For photos, I prefer having the JPEGs, raw files, HEIFs, etc., just in file systems. I don’t like them stored in photo management systems like Apple Photos or Adobe Lightroom. These systems change/evolve over time, and it can make accessing them a decade from now difficult. I’ve run into this with Apple iPhotos, which transitioned to Photos a few years ago. Photos will convert an older iPhotos repos into Photos, but it’s not always perfect. It’s just much easier to have the basic files in a basic file structure.

These files will be copied onto multiple hard drives so there are multiple copies, and moved every few years (about 5 years or so) since hard drives have a limited life span.

Archive can often be associated with backup, but I like to keep the two distinct, as I feel there are different strategies between them.


There’s a lot more details that go into these three concepts of course, but I hope this will get you thinking about your long term plan for your treasured files.

Wow: NVMe and PCIe Gen 4

Recently it’d come to my attention that my old PC rig wasn’t cutting it.

Considering it was 10 years old, it was doing really well. I mean, I went from HDD to 500 GB SSD to 1 TB SSD, up’d the RAM, and replaced the GPU at least once. But still, it was a 4-core system (8 threads) and it had performed admirably.

The Intel NIC was needed because the built-in ASUS Realtek NIC was a piece of crap, only able to push about 90 MB/s. The Intel NIC was able to push 120 MB/s (close to the theoretical max for 1 Gigabit which is 125 MB/s).

The thing that broke the camel’s back, however, was video. Specifically 4K video. I’ve been doing video edits and so forth in 1080p, but moving to 4K and the power of Premerier Pro (as opposed to iMovie) was just killing my system. 1080p was a challenge, and 4K made it keel over.

I tend to get obsessive about new tech purchases. My first flat screen TV purchase in 2006 was the result of about a month of in-depth research. I pour over specs and reviews for everything from parachutes (btw, did you know I’m a skydiver?) to RAM.

Eventually, here’s the system I settled on:

AMD came out of nowhere and launched Ryzen 3, which put ADM from a budget-has-been to a major contender in the desktop world. Plus, they were the first to come out with PCIe Gen 4.0, which allowed for each lane of PCIe to give you 2 GB/s of bandwidth. m.2 drives can connect to 4 lanes, giving a possible throughput of 8 GB/s of bandwidth.
Compare that with SATA 3, at 600 MB/s, and that’s quite a difference. SATA is fine for spinning rust, but it’s clear NVMe is the only way to unlock SSD storage’s potential.
When I built the system, I initially installed Linux (CentOS 7.6, to be exact) just to run a few benchmarks. I was primarily interested in the NVMe drive and the throughput I could expect. The drive advertises 5 GB/s reads and 4.3 GB/s writes.
Using dd if=/dev/zero of=testfile and using various blocksizes and counts to write a 100 GB file, I was able to get about 2.8 GB/s writes. Not quite what the drive had promised in terms of writes, but much better than the 120. I was able to get about 3.2 GB/s reads.
For various reasons (including that while Linux is a fantastic OS in lots of regards, it still sucks on the desktop, especially for my particular needs) I loaded up Windows 10. CrystalDiskMark is a good free benchmark and I was able to test my new NVMe drive there.
I ran it, thinking I’d get the same results from Linux. Nope!
I got pretty much what the drive promised.
As a comparison, here’s how my old SATA SSD fared:
About 10x performance. Here’s a couple of takeaways:
PCIe 4 does matter for storage throughput. Would I actually notice in my day-to-day operations the difference between PCIe 3 and PCIe 4? Probably not. But I’m working with 4K video and some people are already working with 6K and even 8K video, that’s not too far down the line for me.
SATA is dead for SSD storage. The new drives are more than capable of utterly overwhelming SATA 3 (600 MB/s, LOL).  Right now, SATA is sufficient for HDDs, but as platters get bigger sequential reads will continue to climb.
I don’t doubt that Linux can do the same, it’s just my methodology failed me. The dd command from /dev/zero had never failed to be the best way to test write speeds for HDD and SATA SSDs, but now I need to find another method for Linux (or perhaps there is some type of bottleneck in Linux).
New PCIe 4 NVMe SSDs are super fast and can be had for a relatively low amount of money ($180 USD for 1 TB). They’re insanely fast.
I need a new way to benchmark Linux storage.

For ESXi: Realtek NICs Are Awful And Don’t Use Them

OK, this isn’t a really a controversial opinion. This is more as a guide for those who run into these problems when trying to setup their first whitebox/homelab systems for ESXi.

So it goes something like this: You’ve got an old desktop, gaming rig, or workstation. You decide you’ll retire it to your home data center (or basement, or laundry room) as a hypervisor. ESXi by itself (no vSphere controller) is free, and here’s how to download and get the license key.

For most desktop/workstation type of hardware, you can install ESXi from the general ESXi installer except for one aspect: Many of these types of systems use Realtek, Marvell, or other desktop/consumer grade NICs, and there’s not an ESXi driver for these. And for good reasons: They suck.

So you have the choice: Try to use a special custom ISO installer with the Realtek?Marvell/etc. driver loaded, or buy a different NIC. In most of IT, there’s usually more than one right answer, and a heaping dose of “it depends”. However, for this particular question (Realtek or buy another NIC) there’s only right right answer: Buy another NIC.

Realtek NICs suck. They don’t perform well, they’re a pain to work with for ESXi, so just buy a NIC. The other desktop NICs don’t fare much better. If it’s not recognized by ESXi, it’s a pretty good bet it’s shit.

You can get a one or two port Intel Pro 1000 NIC on eBay for $20-30 USD. These NICs work great. I’ve even replaced the Realtek NIC on my Windows 10 Pro workstation and went from 700 Mbps to fully saturating a gigabit NIC for file transfers. (Make sure they’re Intel Server NICs, the Pro NICs, and not the desktop NICs.)

For $20-30 additional, you can install ESXi on just about any desktop or workstation hardware with the standard ESXi installer. I’m sure there are edge cases, but for me desktop/workstation plus Intel Pro NIC has worked fine.

Certification Exam Questions That I Hate

In my 11 year career as an IT instructor, I’ve had to pass a lot of certification exams. In many cases not on the first try. Sometimes for fair reasons, and sometimes, it feels, for unfair reasons. Recently I had to take the venerable Cisco CCNA R&S exam again. For various reasons I’d allowed it to expire, and hadn’t taken many exams for a while. But recently I needed to re-certify with it which reminded me of the whole process.

Having taken so many exams (50+ in the past 11 years) I’ve developed some opinions on the style and content of exams.


In particular, I’ve identified some types of questions I utterly loath for their lack of aptitude measurement, uselessness, and overall jackassery. Plus, a couple of styles that I like.

This criticisms is for all certification exams, from various vendors, and not limited to even IT.

To Certify, Or Not To Certify

The question of the usefulness of certification is not new.

One one hand, you have a need to weed out the know-its from the know-it-nots, a way to effectively measure a person’s aptitude in a given subject. A certification exam, in its purest form, is meant to probe the knowledge of the applicant.

On the other hand, you have an army of test-dumping dullards, passing exams and unable to explain even basic concepts. That results in a cat-and-mouse game between the exam creators and the dump sites.

And mixed in, you have a barrage of badly formed questions that are more appropriate to your local pub’s trivia night than it is a professional aptitude measurement.

So in this article I’m going to discuss the type of questions I despise. Not just because they’re hard, but because I can’t see how they accurately or fairly judge a person’s aptitude.

Note: I made all of these questions up. As far as I know, they do not appear on any certification exam from any vendor. This is not a test-dump. 

Pedantic Trivia

The story goes that Albert Einstein was once asked how many feet are in a mile. His response was this: “I don’t know, why should I fill my brain with facts I can find in two minutes in any standard reference book?”


I really relate to Einstein here (we’re practically twinsies). So many exam questions I’ve sat through were pure pedantic trivia. The knowledge of the answer had no bearing on the aptitude of the applicant.

Here’s an example, similar to ones I recall on various exams:

What is the order of ink cartridges in your printer? Choose one.

A: Black, Magenta, Cyan, Yellow

B: Yellow, Cyan, Magenta, Black

C: Magenta, Cyan, Black, Yellow

Assuming you have a printer with color cartridges, can you remember the order they go in? Do you care? Does it matter? Chances are there’s a diagram to tell you were to put them.

Some facts are so obscure they’re not worth knowing. That’s why reference sources are there.

I can even make the argument about certain details about regularly used aspects of your job. Take VRRP for example. For network administrators, VRRP and similar are a way to have two or more routers available to answer to a single IP address, increasing availability. This is a fundamental networking concept, one that any network administrator should know.

VRRP uses a concept known as a vMAC. This is a MAC address that sits with the floating IP address, together making a virtual router that can move between physical routers.

So far, everything about what I’ve described about VRRP (and much more that I haven’t) would be fair game for test questions. But a question that I think is useless is the following:

The vMAC for VRRP is (where XX is the virtual router ID): 

A: 00:01:5A:01:00:XX

B: 00:00:5A:01:00:XX

C: 00:01:5E:00:FF:XX

D: 00:00:5E:00:01:XX

I’m willing to bet that if you ask 10 good CCIEs what the vMAC address of a VRRP is, none would be able to recite. Knowledge of this address has no bearing on your ability to administer a network. How VRRP works is important to understand, but this minutia is useless.


I have two theories where these questions come from.

Theory 1: I’ve written test questions (for chapter review, I don’t think I’ve written actual certification questions) and I know it’s difficult to come up with good questions. Test banks are often in the hundreds, and it can be a slog to make enough. Trivia questions are easy to come up with and easy to verify.

Theory 2: Test dumpers. In the cat and mouse game between test writers and test dumpers, vendors might feel the need to up the difficulty level because pass rates get too high (which I think only hurts the honest people).


Exact Commands

Another one I really despise is when a question asks you for the exact command to do something. For example:

Which command will send the contents of one directory to a remote server using SSH?

A: tar -cvf  – directory | ssh root@ “cd /home/user/; tar -xvf -” 

B: tar -xvf – directory | ssh root@ “cd /home/user/; tar -xvf -” 

C: tar -cvf  – directory > ssh root@ “cd /home/user/;  tar -cvf -” 

D: ssh root@ “cd /home/user/ tar -xvf -” > tar -xvf directory

For common tasks, such as deleting files, that’s probably fair game (though not terribly useful). Most CLIs (IOS, Bash, PowerShell) has tab completions, help, etc., so that any command syntax can be looked up. Complex pipes like the former are the kind I use with some regularity, but I often have to look it up.


The Unclear Questions

I see these in certification tests all the time. It’ll be a question like the following:

What are some of the benefits of a pleasant, warm, sunny day? (Choose Three)

  • A: Vitamin D from sunlight
  • B: Ability to have a picnic in a park
  • C: No need for adverse weather clothing
  • D: Generally improves most people’s disposition

Look at those answers. You could make an argument for any of the four, though the question is looking for three. They’re all pretty correct. Reasonable people, even intelligent, experienced people, can disagree on that correct answer is.

Questions I Do Like

I try not to complain about something if I don’t have something positive to contribute. So here’s my contribution: These are test questions that I think are more than fair. If I don’t know the answers to these types of questions, I deserve, in every sense of fairness, to get the question wrong.

Scenario Questions

A scenario question is something like this: “Given X, what would happen”.

For example, if a BDPU was received on portfast enabled interface, what would happen? 


If a host with an IP netmask combo of was to try to communicate with a host configured on the same Layer 2 segment with an IP address of, would they be able to communicate? 

I like those types of questions because they test your understanding of how things work. That’s far more important for determining competency I think.

There are some network basics, that might seem like trivia, but knowing would be important to know. For example:

What is the order of a TCP handshake?





This question is fundamental to the operations of networks, and I would hope any respectable network engineer would know this. This would be important for TCP dump analysis, and other fundamental troubleshooting.


If you write test questions, ask yourself: Would the best people doing what this question tests get this answer right? Is it overly pedantic? Is there a clear answer? 

This was mostly written as a frustration piece. But I think I’m not alone in this frustration.

A Discussion On Storage Overhead

Let’s talk about transmission overhead.

For various types of communications protocols, ranging from Ethernet to Fibre Channel to SATA to PCIe, there’s typically additional bits that are transmitted to help with error correction, error detection, and/or clock sync. These additional bits eat up some of the bandwidth, and is referred to generally as just “the overhead”.

For 1 Gigabit Ethernet and 8 Gigabit Fibre Channel as well as SATA I, II, and III, they use 8/10 overhead. Which means for every eight bits of data, an additional two bits are sent.

The difference is who pays for those extra bits. With Ethernet, Ethernet pays. With Fibre Channel and SATA, the user pays.

1 Gigabit Ethernet has a raw transmit rate of 1 gigabit per second. However, the actual transmission rate (baud, the rate at which raw 1s and 0s are transmitted) for Gigabit Ethernet is 1.25 gigabaud. This is to make up for the 8/10 overhead.

SATA and Fibre Channel, however, do not up the baud rate to accommodate for the 8/10 overhead. As such, even though 1,000 Gigabit / 8 bits per byte = 125 MB/s, Gigabit Fibre Channel only provides 100 MB/s. 25 MB/s is eaten up by the extra 2 bits in the encoding. The same is true for SATA. SATA 3 is capable of transmitting at 6 Gigabits per second, which is 750 MB/s. However, 150 MB/s of that is eaten up by the extra 2 bits, so SATA III can transmit 600 MB/s instead.


There’s a new type of raw data transmission hitting the networking world called PAM 4. Right now it’s used in 400 Gigabit Ethernet. 400 Gigabit Ethernet is 4 channels of 50 Gigabit links. You’ll probably notice the math on that doesn’t check out: 4 x 50 = 200, not 400. That’s where PAM 4 comes in: The single rate change is still 50 gigabaud, but instead of the signal switching between two possible values (0, 1), it switches between 4 possible values (0, 1, 2, 3). Thus, each clock cycle can represent 2 bits of data in stead of 1 bit of data, doubling the transmission rate.

Higher Level Protocol Overhead

For networking storage on Ethernet, there’s also additional overhead for IP, TCP/UDP, and possibly others (VXLAN for example). In my next article, I’ll talk about why they don’t really matter that much.

A Primer for Home NAS Storage Speed Units and Abbreviations

One of the most common mistakes/confusion I see with regard to storage is how speed is measured.

In tech, there’s some cultural conventions to which units speeds are discussed in.

  • In the networking world, we measure bits per second
  • In the storage and server world, we measure speed in bytes per second

Of course they both say the same thing, just in different units. You could measure bytes per second in the networking world and bits per second in the server/storage world, but it’s not the “native” method and could add to confusion.

For NAS, we have a bit of a conundrum in that we’re talking about both worlds. So it’s important to communicate effectively which method you’re using to measure speed: bits of bytes.

Generally speaking, if you want to talk about Bytes, you capitalize the B. If you want to talk about bits, the b is lower case. I.e. 100 MB/s (100 Megabytes per second) and 100 Mbit or Mb (100 Megabit per second).

This is important, because there a 8 bits in a byte, the difference in speed is pretty stark depending on if you’re talking about bits per second or bytes per second. Examples:

  • 200 Mb/s is written to mean 200 Megabits per second
  • 200 MB/s is written to mean 200 Megabytes per second

Again, the speed difference is pretty stark:

  • 200 Mb/s (Megabits per second, about 1/5th of the total rate available on Gigabit Ethernet) = 25 Megabytes per second
  • 200 MB/s (Megabytes per second, almost double what a Gigabit Ethernet links could send) = 1.6 Gigabits/second

200 Mb/s easily fits in a Gigabit Ethernet link. 200 MB/s is more than a Gigabit Ethernet link could handle.


It’s generally acceptable to write bits per second as Xb, Xbit, Xbit/s, and Xbps, where X is the multiplier prefix (Mega, Giga, Tera, etc.)

The following are examples of 1.21 Gigabits per second :

  • 1.21 Gbps
  • 1.21 Gb/s
  • 1.21 Gbit/s

It’s generally acceptable to write bytes per second as XB, XByte, XByte/s, and XBps, where X is the multipler (Mega, Giga, Tera, etc.)

The following are examples of 1.21 Gigabytes per second:

  • 1.21 GBps (less common)
  • 1.21 GB/s
  • 1.21 GByte/s

A Gigabit Ethernet interface can theoretically handle 125 MB/s (1,000 mbit / 8 bits per byte = 125). A 10 Gigabit Ethernet interface. Depending on your NIC, horsepower, and systems, you may or not be able to reach that. But that’s the theoretical limit for Gigabit Ethernet.

10 Gigabit Ethernet (10GE) can theoretically handle 1250 MB/s (10,000 mbit / 8 bits per byte).

Binary Multipliers

There’s also KiB (Kibi Byte) and Kib (Kibibit), where kibi is a 1024 multiplier, and not 1,000. GiB (GibiByte) and TiB (TibiByte) are 10242 and 10243, respectively.

The idea is to be native to the binary numbers, rather than multiples of 10 (decimal).

We don’t tend to use those measurements in network or storage transmit/receive rates, but it’s showing up more and more in raw storage measurements.


SATA I, II, and III are 1.5, 3, and 6 Gigabits/second respectively. They push 150, 300, and 600 MB/s respectively. You’ll probably note that math doesn’t check out: 6 Gigabits/second divided by 8 bits in a byte is 750 MB/s, not 600 MB/s, so where did the extra 150 MB/s go? I’ll cover that in the next article.

Read more of this post

Microsoft Storage Spaces Is Hot Garbage For Parity Storage

I love parity storage. Whether it’s traditional RAID 5/6, erasure coding, raidz/raid2z, whatever. It gives you redundancy on your data without requiring double the drives that mirroring or mirroring+stripping would require.

The drawback is write performance is not as good as mirroring+stripping, but for my purposes (lots of video files, cold storage, etc.) parity is perfect.

In my primary storage array, I use double redundancy on my parity, so effectively N+2. I can lose any 2 drives without losing any data.

I had a simple Storage Spaces mirror on my Windows 10 Pro desktop which consisted of (2) 5 TB drives using ReFS. This had four problems:

  • It was getting close to full
  • The drives were getting old
  • ReFS isn’t support anymore on Windows 10 Pro (need Windows 10 Workstation)
  • Dropbox (which I use extensively) is dropping support for ReFS-based file systems.

ReFS had some nice features such as checksumming (though for data checksumming, you had to turn it on), but given the type of data I store on it, the checksumming isn’t that important (longer-lived data is stored either on Dropbox and/or my ZFS array). I do require Dropbox, so back to NTFS it is.

I deal with a lot of large files (video, cold-storage VM virtual disks, ISOs, etc.) and parity storage is great for that. For boot volumes, OS, applications, and other latency-sensitive operations, it’s SSD or NVMe all the way. But the bulk of my storage requirements is, well, bulk storage.

I had a few more drives from the Best Buy Easystore sales (8 TB drive, related to the WD Reds, for about $129 during their most recent sale) so I decided to use three of them and create myself a RAID 5 array (I know there are objections to RAID 5 these days in favor of RAID 6, while I agree with some of them, they’re not applicable to this workload, so RAID 5 is fine).

So I’ve got 3 WD Easystore shucked drives. Cool. I’ll create a RAID 5 array.


Shit. Notice how the RAID-5 section is grayed out? Yeah, somewhere along the line Windows removed the ability to create RAID 5 volumes in their non-server operating systems. Instead Microsoft’s solution is to use the newer Storage Spaces. OK, fine. I’ll use storage spaces. There’s a parity option, so like RAID 5, I can do N+1 (or like RAID 6, N+2, etc.).

I set up a parity storage space (the UI is pretty easy) and gave it a quick test. At first, it started sending at 270 MB/s, then it dropped off a cliff to… 32 MB/s.



That’s it. 32 MB/s a second. What. The. Eff. I’ve got SD cards that can write faster. My guess is that some OS caching was allowing it to copy at 270 MB/s (the hard drives aren’t capable of 270 MB/s). But the hard drives ARE capable of far more than 32 MB/s. Tom’s Hardware found the Reds capable of 200 MB/s sequential writes. I was able to get 180 MB/s with some file copies on a raw NTFS formatted drive, which is inline with Tom’s Hardware’s conclusion.

Now, I don’t need a whole lot of write performance for this volume. And I pretty much only need it for occasional sequential reads and writes. But 32 MB/s is not enough.

I know what some of you are thinking. “Well Duh, RAID 5/parity is slower for writes because of the XOR calculations”.

I know from experience on similar (and probably slower) drives, that RAID 5 is not that slow, even on spinning disks. The XOR calculations are barely a blip in the processor for even halfway modern systems. I’ve got a Linux MD RAID system, with 5 drives and I can get ~400 MB/s of writes (from a simple dd write test).

While it’s true RAID 5 writes are slower than say, RAID 10, they’re not that slow. I set up a RAID 5 array on a Windows Server 2016 machine (more on that later) using the exact same drives it was able to push 113 MB/s.


It might have been able to do more, but it was limited by the bottleneck of the Ethernet connection (about 125 MB/s) and the built-in Dell NIC. I didn’t have an SSD to install Windows Server 2016 on and had to a use a HDD that was slower than the drives the RAID 5 array was built with so that’s the best I could do. Still, even if that was the maximum, I’ll be perfectly happy with 113 MB/s for sequential writes.

So here’s where I got crafty. The reason I had a Windows 2016 server was that I thought if I created a RAID 5 volume in Windows 2016 (which you can) I could simply import the volume into Windows 10 Pro.

Unfortunately, after a few attempts, I determined that that won’t work.


The volume shows failed and the individual drives show failed as well.

So now I’m stuck with a couple of options:

  • Fake RAID
  • Drive mirroring
  • Parity but suck it up and deal with 32 MB/s
  • Parity and buy a pair of small SSDs to act as cache to speed up writes
  • By a Hardware RAID Card

Fake Hardware RAID

Early on in my IT career, I’d been fooled by fake RAID. Fake RAID is the feature that many motherboards and inexpensive SATA cards offer: You can setup RAID (0, 1, 5 typically) in the motherboard BIOS.

But here’s the thing: It’s not a dedicated RAID card. The RAID operations are done by the general CPU. It has all the disadvantages of hardware RAID (difficult to troubleshoot, more fragile configurations, very difficult to migrate) and none of the advantages (hardware RAID offloads operations to a dedicated CPU on the RAID card, which fake RAID doesn’t have).

For me, it’s more important to have portability of the drives (just pull disks out of one system and into another). So fake RAID is out.

Drive Mirroring

Having tested drive mirroring performance, it’s definitely a better performing option.

Parity with Sucky Performance

I could just suck it up and deal with 32 MB/s. But I’m not going to. I don’t need SSD/NVMe speeds, but I need something faster than 32 MB/s. I’m often dealing with multi-gigabit files, and 32 MB/s is a significant hindrance to that.

Parity with SSD Cache

About $50 would get me two 120 GB SSDs. As long as I wasn’t doing a massive copy beyond 120 GBs of data, I should get great performance. For my given workload of bulk storage (infrequent reads/writes, mostly sequential in nature) this should be fine. The initial copy of my old mirrored array is going to take a while, but that’s OK.

The trick with an SSD cache is that you have to use PowerShell in order to configure it. The Windows 10 GUI doesn’t allow it.

After some fiddling, I was able to get a Storage Space going with SSD cache.

And… the performance was worse than with the drives by itself. Testing the drives by themselves, I found the that the SSDs had worse sequential performance than the spinning rust. I’d assumed the SSDs would do better, a silly assumption now that I think about it. At least I’m out only $50, and I can probably re-purpose them for something else.

The performance for random I/O is probably better, but that’s not what my workload is on these drives. My primary need is sequential performance for this volume.

Buy A Hardware RAID Card

I don’t like hardware RAID cards. They’re expensive, the software to manage them tends to be really awful, and it make portability of drives a problem. With software RAID, I can pull drives out of one system and put them into another, and voila, the volume is there. That can be done with a hardware RAID card, but it’s trickier.

The performance benefit that they provide is just about gone too, given how fast modern CPUs are and how many cores they have, compared to the relatively slow CPUs on hardware RAID cards (typically less than a GHz, and only one or two cores).


So in the end, I’m going with a mirrored pair of 8 TB drives, and I have two more drives I can add when I want to bring the volume to 16 TB.

Thoughts On Why Storage Spaces Parity Is Such Hot Fucking Garbage

There’s a pervasive thought in IT that parity storage is very slow unless you have a dedicated RAID card. While probably true at one time, much like the jumbo frame myth, it’s no longer true anymore. A halfway modern CPU is capable of dozens of Gigabytes per second of RAID 5/6 or whatever parity/erasure coding. If you’re just doing a couple hundred megabytes per second, it’s barely a blip in the CPUs.

It’s the reason huge honking storage arrays (EMC, Dell, NetApp, VMware VSAN etc.) don’t do RAID cards. They just (for the most part) throw x86 cores at it through either scale-up or scale-out controllers.

So why does Storage Space parity suck so bad? I’m not sure. It’s got to be an implementation problem. It’s definitely not a CPU bottleneck. It’s a shame too, because it’s very easy to manage and more flexible than traditional software RAID.


Tried parity in storage spaces. It sucked bigtime. Tried other shit, didn’t work. Just went with mirrored.