It’s relatively easy to enable vSAN on our Nested lab, so let’s take a short detour.
I originally had setup my nested lab with vSAN but immediately got very low disk IO performance of about 2 IOPS, so I switched to NFS storage. Even NFS had sub-par performance as my previous posts showed, leading to the discovery of extensive dropped network packets due to numerous issues. I’ve now rebuilt the lab from the ground up and have solved the dropped packets so it’s time to show that vSAN works nested as well.
I should repeat that Nested VSAN / ESXi is for lab use only and not supported by VMware for production use.
VSAN with a minimum of 4 hosts and all-flash supports:
- Erasure coding and RAID5
- RAID6 ( 6 hosts minimum )
So our lab design for VSAN requires:
- Add 1 nested ESXi hosts for a total of 4 hosts.
- Add one SSD for cache, and one SSD for capacity on each host.
( These are backed by hybrid ZFS / NAS storage on the nested ESXi host )
- One vmKernel port for VSAN traffic
- new vLAN 30 and IP subnet for vSAN traffic
- vLAN 30 enabled on ESXi nested VM up-links
- vLAN 30 enabled physical switch, and physical host uplinks.
- Install 4th ESXi nested host per previous lab.
- Install disks for VSAN in to each host. I’m using two disk groups on each host.
Add (2) 5GB disks for cache ( should be >10% of capacity disks )
Add (2) 260GB disks for capacity ( must be > 255G )
- On the distributed switch, add a new Portgroup for VSAN with the properties.
Teaming: Active: Uplink 4, Standby: Uplink 3. Others inactive.
( Note Uplinks 3 and 4 are shared with NFS and vMotion traffic while uplinks 1 and 2 are used for LAN and vxLAN traffic – see previous design info )
- Create a new vmKernel port on each host in an isolated network / VLAN
IP: 10.30.1.x ( new IP for each host in 10.30.1.0/24 subnet ) ( vlan 30 )
Assign to dvS VSAN portgroup.
- On the nested ESX VM, add VLAN 30 to the allowed vLans on the 3rd and 4th vmnics , which are the uplinks from the VM.ADD Image
- On the physical switch, add vLAN 30 to SAN uplink ports from the physical ESX host . On my Cisco 2970 switch the commands are:
config t interface range GigabitEthernet0/5, GigabitEthernet0/6 switchport trunk allowed vlan add 30 exit exit
- Then define the new vlan 30 on the physical switch
config t interface vlan 30 name VSAN mtu 9000 exit exit
- With all the previous VSAN pre-requisites set, just enable VSAN.
Navigate to the Nested Cluster “Compute”
Edit Settings, and enable VSAN.
- Mark all (4) disks on each host as flash, by selecting a set of disks, then clicking the 4th icon “Mark as Flash”. Verify the Drive type shows “Marked as Flash”
- Select the 5GB disks as ‘cache’, and the 260G disks as ‘capacity’.
With this new build, I was curious about performance so I ran a storage vMotion into the VSAN.
It appears now we are getting about 100 IOPs, and 6MBs performance which is much better than before with network drops. This is certainly usable for labs, but not high performance. Why ? I can think of two issues:
- 10Gb networking is required for all-flash with de-dup, but we have only 1GB physical uplinks.
- The underlying storage is not true SSD, but ZFS backed hybrid NAS. The NAS has a small L2ARC SSD cache for reads, and a SSD SLOG for caching writes.
Looking into the VSAN Network speeds I tried running a Storage vMotion into VSAN ( policy RAID-10). During the vMotion, I moved the nested ESXi to different hosts – and it didn’t affect the storage performance.
The following vSAN performance graphs shows max IOPS around 90, and significant latencies growing past 300ms, with no difference with nested hosts on same physical hosts or spread out. This indicates the performance issue might be related to marking our disks as SSD, when they are not really SSD but hybrid ZFS backed.
After configuration the vSAN as Hybrid mode, with HDD capacity disks, the performance increased a bit, nearly 2x IOPS, 3x thruput , and less than half the latency. So yes you can run all Flash mode in the lab without true underlying Flash disk, but performance will greatly suffer.
In my case the underlying ZFS Hybrid flash storage had latencies typically under 5ms, but after adding VSAN loads to the lab the underlying ZFS latencies went up to over 15ms.
So again the recommendation is to only use All Flash mode with true flash, unless you can tolerate very low IOPS and high latencies on your vSAN.
One reason for bad write performance above is described here:
This describes the RAID5 vSAN ‘write amplification’ in which each write require 2 reads and 2 writes.