ESX/NSX nested lab – Part II

This part continues to build out NSX on top of the Nested ESX cluster created in the last part ESX/NSX nested lab – Part I.

Part I included some basic NSX setup for convenience such as.

  1. NSX Manager installed from OVF, and linked to vCenter.
  2. Physical network ESXi uplinks included allowed vLAN 100 for vxlan.
  3. distributed switch set to allow Jumbo 9000 MTU,  (> 1600 MTU require for NSX )

 

  1. Install NSX on hosts
    1. Create IP pools
      nsx3
      Networking & Security – Groups – IP Pools
      1. Controller-pool  ( in management IP LAN )
        10.24.10.1 – 10.24.10.5    /16
      2. vTEP-pool ( create new unused LAN subnet + vLAN for vxlan )
        10.100.1.10 – 10.30.1.20  /24  vlan: 100,  GW: 10.100.1.1, no DNS
        The vxlan GW is not used here – this would typically be a hardware device that routes vxlan to vLAN.
    2. Setup new vLAN 100 for vxlan segment on physical switch
      Important: Set  vxlan (  vlan 100 ) mtu to 1600 on physical switch.
      E.G. For Cisco 2970 switch

      config t
      vlan 100
      name vxlan
      mtu 1600
      exit
      exit
    3. Install one Controller ( into NSX-compute cluster )
      For a lab, only a single controller needed – not three
      Controllers can run on non NSX prepared hosts ( e.g in Management cluster )
    4. Install NSX to Compute cluster ( with 3 Nested ESX/NSX hosts )
    5. Prepare vxlan on hosts
    6. Setup Segment ID
      1. pool: 5000-5999      ( 999 segments )
      2. Multicast: 239.1.1.1-239.1.4.254  ( for optional multicast )
    7. Setup transport zone ‘Global’
      1. To include NSX-compute cluster,  Unicast mode
  2. Test vTEP connectivity
    After NSX install each host should have a vTEP ( with IP from vtep-pool  )
    connected to the autogenerated vxlan portgroup.
    Login to one of the nested hosts and ping the vTEP on other hosts.  Make sure at least one Nested host is on a different physical host to verify physical connetivity.

    1. ssh esx-n1 ( host )
      esxcfg-vmknic -l  (  show vmkernal nics and IP’s )  Note vmk3 is used for vxlan.

      vmkping ++netstack=vxlan -s 1470 -d -I vmk3 10.100.1.10
      vmkping ++netstack=vxlan -s 1570 -d -I vmk3 10.100.1.11
    2. The 2nd command should ping to show MTU 1600 is working. If only the 1st command pings , then you need to set mtu size on vlan 100.   Ping to all  combinations nested hosts vtep ip’s

Links

  1. Nested ESXi 6.5 virtual appliance   (vGhetto)
  2. Nested ESXi LearnSwitch   (vGhetto)
  3. How to clone a nested ESXi VM  (vGhetto)
  4. NSX issue on dvs in nested ESXi  (telecomOccasionally)
  5. Nested Virtualization  (Limitless) installing ESX, trunked portgroup

vSphere / ESXi homelab setup

Introduction

4AERO RACK (2018)

This post documents my evolving home-lab setup as of December 2018

Since many of my clients are smaller SMB’s they often need hands-on help with physical storage, networking and security in addition to VMware.   Since my background was coming from a software ( not IT ) area I needed a lab that used physical hardware ( e.g production servers and network equipment )

Equipment

  • My lab has gone through several iterations,  with the following equipment used as of Dec 2018.  This lab is for learning hardware and networking in addition to VMware so it is a bit more involved.  I try to segregate traffic on vLANs just like production.
  • I have only 2 physical ESXi servers, but lots of Cores and RAM to run a Nested ESXi environment ( described later )
  • Note that older equipment is often very reasonably priced on Ebay.  R630’s are now showing up for great prices.  1GB SFP optics and LC fibre cables are also very inexpensive.
  • Servers
    • Dell R710,  Dual socket, 6 cores, 144GB RAM,  4 x 1GB Nic
    • Dell R630, Dual socket, 10 cores, 256GB RAM,  4 x 1GB Nic, 2 x 10GB SFP+
  • Shared Storage  ( Running FreeNAS 11.2 )
      • Dell 2950, 2 socket, 4 core, 32GB RAM, 2 x 1GB Nic,
        6 x 4TB LFF WD Red SATA for ZFS pool

    1 x Intel 320, 80GB SSD,  L2ARC ( with power fail caps )
    1 x Intel 3700, 100G SSD, ZFS SLOG (with power fail caps )
    See FreeNAS blog entry.

  • Networking
    • 2 x Cisco 2970G-24TS rack mount switches
      24 x 1G ports on each switch
      4 x SFP uplink ports on each switch
    • Note The 2970G switch in my mobile rack uplinked to it’s sister switch in my Office over 2 x SFP fiber connections ( LACP on uplinks , passing 802.1Q vLAN trunks up to the office )   This was my first hands-on experience with LC fibre and switch-to-switch LAG.
  • Power
    • APC SUA3000RM2U 3000VA UPS
      To reduce chance of disk corruption on FreeNAS due to power interruptions
      Available inexpensively from refurbups.com
  • Backup
    • For static ISO files ,  templates,  etc.  I sync FreeNAS NFS shares to Amazon S3
    • For VM’s I take daily ZFS snapshots , and replicate them to cloud storage.
    • Both of these are done from FreeNAS GUI
      ( see FreeNAS backup and replication )
  • Security
    • Perimeter FW –  pfSense VM
      vLAN trunks for WAN, LAN over LAG (fibre) from rack to office.
    • Backup FW –  Ubiquity Edgerouter X
      Since  main FW is a VM,  this small device is for emergency use if my lab cluster is down and I need to get to the Internet.  The EdgeRouter also supports
      ECMP and BGP for uplinks from an NSX edge.
  • Remote access
    • Console
      Physical KVM for my older servers  and iDRAC Enterprise for newer ones.
      Note that modern browser security often prevents older iDRAC from working.
    • VPN remote access
      OpenVPN ( part of pfSense server )
  • Rack –  This is an older screw-mount style 4-post mobile rack and KVM found for under $300 on Ebay for local pickup.   The rack has several 4-inch fans in the top for cooling ( I added more )    I also added egg-crate foam insulation to the inside walls to deaden the sound quite a bit.

Configuration

Network Design

My existing physical Lab design now uses a single distributed vSwitch (dvS) with (4) 1G uplinks from each physical hosts.

Normally you would use separate dvS for LAN and SAN segment to separate traffic.   However there are two reasons for using one dvS.

  1. When migrating to 10G converged networking your hosts typically have only 2 uplinks ( and therefor only a single dvS is supported )
  2. When setting up a nested ESXi environment it’s easier to have only a single dvS

Since I’m not yet ready to upgrade my lab to 10G switches,  the current design is a compromise.

dvS Uplinks 1 and 2 are dedicated for LAN, DMZ , WAN and VTEP traffic,  while uplinks 3 and 4 are for SAN and vMotion.   This keeps the traffic separated , and each pair of uplinks is redundant at the physical level.

Note that for this design each dv-portgroup must be set to use the correct set of uplinks.
For example a LAN portgroup teaming policy would use uplink 1 (active), uplink 2 (standby) and the other uplinks set to not used.

When migrating later to 10G networking,  there would be only 2 redundant uplinks wih all traffic separated by vLAN.   Separating the Storage and Network traffic is not really needed with 10G,  and NIOC can manage it.

Distributed Switch (dvS) Setup

The single dvS  and associated physical switch ( Cisco 2970G-24TS ) is setup as follows:

  • Distributed vSwitch – dvS
    Settings:   switch MTU=9000Uplinks ( all traffic on vLAN’s )

    • Uplink 1 and Uplink 2
      LAN, DMZ, WAN traffic on vLAN 24,26 and 902
      typical physical switch settings for Cisco 2970 uplink 1

      interface GigabitEthernet0/11
      description ESX2 vmnic0 LAN
      switchport trunk encapsulation dot1q
      switchport trunk allowed vlan 24,26,902
      switchport mode trunk
      spanning-tree portfast trunk

      repeat same settings for Gi0/12 for uplink 2

    • Uplink 3 and Uplink 4
      NFS, vMotion traffic on vLAN 25, 881
      typical physical switch settings for Cisco 2970 uplink3

      interface GigabitEthernet0/5
      description ESX2 vmnic3 SAN,vMot
      switchport trunk encapsulation dot1q
      switchport trunk allowed vlan 25,881
      switchport mode trunk
      spanning-tree portfast trunk

      repeat settings for Gi0/6 for uplink 4

  • Portgroups settings on dvS
    • dv-LAN
      Settings:   ( Uplinks 1 and 2 used )

      vlan 24
      Teaming:
      Active: Uplink 1
      Standby: Uplink 2
      Unused:   Uplink 3, Uplink 4

    • dv-NFS
      Settings:  ( Uplinks 3 and 4 used )

      vlan 25
      Teaming:
      Active: Uplink 3
      Standby: Uplink 4
      Not used:  Uplink 1, Uplink 2

    • dv-Motion
      Settings:  ( Uplinks 3 and 4 used )

      vlan 881
      Teaming:
      Active: Uplink 3
      Standby: Uplink 4
      Not used:  Uplink 1, Uplink 2

Verify shared Storage and vMotion working correctly.

Physical switch vLAN’s

Best practice is to avoid using the default vLAN (1) on the switch.

ESX management traffic and iDRAC etc. are running on a Management vLAN
configured on the switch.  Storage traffic is on a separate vLAN 25.  As I showed above under ‘dvs Setup’ the Management vlan 24 is passed over trunks into ESX for the management port group.

interface Vlan24
description Mgmt
no ip address
no ip route-cache
!
interface Vlan25
description Storage NFS iSCSI
no ip address
no ip route-cache

FYI – the ‘no ip route cache’ is the default – since this is only a layer-2 switch.

Storage notes

I’m using NFS storage in my setup,  so I have only a single “NFS” portgroup on the dvS that has and active and standby uplinks for redundancy.  For simplicity I’m not using LACP.

Note that my FreeNAS storage also supports iSCSI, but in that case proper redundancy requires multipath setup.    This is typically done by using two portgroups ( SCSI-1,  SCSI-2 ) with each one using a specific uplink ( and no standby ).  iSCSI is likely not compatible with LACP uplinks

 

 

 

ESX/NSX nested lab – Part I

This part includes setup of Nested ESX VM’s.  The second part will include nested NSX.

Setup

This section covers what worked for me and the major steps required.   You may need to reference the listed links for more info.

The steps to setup the lab are:

  1. I have two ESXi physical host running ESX 6.5 (typical)
    128GB RAM, 2 sockets, 24 cores, NFS or iSCSI storage
    dvSwitch v6.5.0.Note: NSX  vibs not installed on the 2 physical host(s) as it breaks things.NSX manager 6.4.4 already installed and integrated with vCenterVirtual and Physical switches for LAN enabled for MTU: 1600Virtual and Physical switches for SAN, vMotion set for jumbo MTU:9000
  2. The distributed Switch, dvS, can be used for both Physical hosts, and the nested hosts, if the uplinks from the nested hosts are setup to mirror the configuration of the physical hosts.   The nested host has 4 vnics that are used for uplinks – to mirror the 4 physical Nic’s on my physical hosts.   Note that the Nested vNic’s are attached to the special “Nested…” port groups on the switch.   This can get confusing pretty fast so see the diagram below on how the special nested porgroups are configured.
  3. Install ESXi 6.5 to a VM for each host, named esxN1, esxN2, etc.
    using vHW:v13

    1. Set CPU=6, cores per socket=6  expand CPU section and set:
      Hardware Virtualization (Expose HW assisted Virt to guest)
      Note that NSX controllers require 4 cpu  – so your virtual hosts should have at least that many cores.  Also set cores/socket same as CPU, so that you only use one ESX license instead of 6, allowing more nested Hosts  ( Assume you have vMUG Advantage licenses or equivalent )
    2. Set RAM=8GB or more
    3. Install a single 4GB disk for ESXi image, no other disks.  If the disk is increased to 8GB,  then a local scratch partition will be used for logs, else the logs will be on RAM disk.
    4. Setup 4 network interfaces.  Connect first of these to your management network.
    5. Under VM options, Force EFI setup on next boot.
  4. Install ESX-Learnswitch vib onto ESXi 6.5 physical host (link #2)
    Note that ESXi 6.7 includes LearnSwitch, so no install is needed.
    Summary:
    scp VMware-ESX-6.5.0-xxxxx-learnswitch.zip root@esxhost:/tmp
    ssh root@esxhost
    esxcli software vib install -d /tmp/VMware-ESX-6.5.0-xxxxx-learnswitch.zip
  5. Prepare dVS portgroups for nested ESXi according to (link #2):nestpg
    I’m using Nest-LAN1,  Nest-LAN2, Nest-SAN1, Nest-SAN2 so that each
    nested ESXi host has 2 LAN uplinks and 2 SAN uplinks for symmetry to my physical hosts.  Some migrations to/from vSwitch to dvS are difficult without having two uplinks for each PG.Note these 4 portgroups are used ONLY for the nested ESXi VM uplinks,  not the nested hosts , or any other VM’s.     Since link status isn’t likely correct for these PG’s we map them 1-to-1 to the vnics’s,  so only the physical host’s PG’s make teaming decisions based on the more likely accurate link-status of the physical nics.

    1. Nested-LAN ( for Mgmt, VTEP ) properties
      1. Security – allow promiscuous mode, allow forged transmit,  allow mac changes
      2. VLAN set to trunk mode VLAN = 24,100   (see Link #5)
        For LAN and  vxlan traffic
        Insure that Physical uplink passes vLans: 24, 100
      3. assocated vmKernel set to 1600 MTU,  and dvs set to 1600 MTU
        This is for future use of NSX vxlan.
    2.  Nested-SAN ( for vMotion, NFS )  properties
      1. Security – allow promiscuous mode, allow forged transmit, allow mac changes
      2. VLAN – trunk mode vlan = 25,881
        Physical uplink passes vLans: 25, 881
      3. associated vmKernel set to 9000 MTU
    3. Setup LearnSwitch for these 4 dv Portgroups
      Extract python script per link #2.
      Set vcenter admin/password and four Portgroup name(s) in cfg script
      Run:
      python learnswitch_cfg.py  vcenter_IP  dvsNAME esxHost1IP add
      python learnswitch_cfg.py  vcenter_IP  dvsNAME esxHost2IP add
  6. Install new ESXi VM from ISO
    1. Assign Mgmt IP
    2. Setup time server
  7. Join Host to new Cluster ‘compute’

    nestedESX
    Nested ESX VM’s esx-n1, etc and Compute cluster with Nested Hosts
  8. Configure new host
    Add additional vmkernels for vMotion, NFS, etc.  Initially these will need to run from 2 local vSwitches , vSwitch0 for LAN,  vSwitch1 for SAN/vMotion
  9. Join host to vDS, and Migrate vmkernel ports and uplinks to vDS.
  10. Test that NFS storage and vMotion are working correctly.
    Test vmotion between Nested ESX running on different physical hosts.
    Verify NFS storage latency is normal.

Continue on with ESX/NSX nested lab – Part II setup.

Links

  1. Nested ESXi 6.5 virtual appliance   (vGhetto)
  2. Nested ESXi LearnSwitch   (vGhetto)
  3. How to clone a nested ESXi VM  (vGhetto)
  4. NSX issue on dvs in nested ESXi  (telecomOccasionally)
  5. Nested Virtualization  (Limitless) installing ESX, trunked portgroup

Changes

  1. Add nested ESXi PG graphic