VMworld 2015: DRS Advancements in vSphere 6 #INF5306

This was by far my longest session as Naveen let the clock fly by–I guess that’s the benefit of being the last session of the day! He definitely made the mode of it, though, and crammed a ton of great information on DRS and HA, both present and future, into the session.

I feel like the notes below actually capture a substantial amount of the practical information, so please enjoy. DRS has always been the magic sauce in vSphere and it’s only getting better.

Biggest joy of DRS in vSphere 6.0: vMotion performance increase by 60%!

Live Notes & Commentary

Talk Outline:

  • Strength in Numbers
  • Metrics, Constraints & Cost Benefits Analysis
  • What’s New in vSphere 6
  • Ubiquitous DRS
  • Advanced Options
  • Best Practices
  • Future Directions
  • Q&A

Strength in Numbers

  • 92% of clusters have DRS enabled
  • 79% run fully automated
  • 87% have affinity & anti-affinity rules
  • 43% have resource pools
  • 99.8% use host maintenance mode

DRS Metrics, Constraints & Cost Benefit Analysis

  • Metrics used for initial placement (IP) & load balancing (LB)
    • Innumerable host- and VM-level stats considered in IP & LB
  • A few important host metrics
    • CPU reserved
    • Memory reserved
  • A few important VM metrics
    • CPU active, run and peak
    • Memory overhead, growth-rate
    • Active, consumed and idle memory
    • Shared memory pages, balloon, swapped, etc
  • VM happiness is the most important metric!
    • If VM’s demand and entitlement for resources are always met, then VM is “happy”
    • During initial placement, DRS ensures minimum performance impact on already running VMs
    • During load balancing, DRS ensures least performance impact on source & destination hosts
  • Constraints for IP & LB
    • HA admission control policies (slot-based, reserved % for failover, etc)
    • Affinity and anti-affinity rules
    • # of concurrent vMotions
    • Time to complete vMotion
    • Data store connectivity
    • vCPU to pCPU ratio
    • Reservation, limits or shares settings
    • Agent VMs (i.e. vShield Edge)
    • Special VMs (i.e. SMP-FT, vFlash, latency sensitive VMs, etc)
  • Cost Benefit & minGoodness
    • Cost-Benefit Analysis: cost of migration vs potential benefits
    • Costs:
      • Per-vMotion, a reservation of 30% of a CPU core for 1GbE and 100% of a CPU core for 10GbE
      • Memory consumption of “shadow VM” at the destination host
      • Memory reclamation implication at the destination host, if all memory allocated
    • Benefits
      • Positive performance benefits to VMs at source host
      • Positive performance gains for migrated VM at destination host
      • Overall load at source & dest is improved (normalized entitlement)
    • Each analysis results in a rating ranging from -2 to +2
    • minGoodness: score derived through “what-if” analysis that indicates positive or negative impact on cluster

What’s New in vSphere 6.0

Network-aware DRS (NDRS v1)

  • Ability to specify bandwidth reservation for important VMs
  • Initial placement based on VM bandwidth reservation
  • Automatic remediation in response to reservation violations due to:
    • pNIC saturation
    • pNIC failure

Cross-VC xvMotion Placement

  • Unified host and datastore recommendation for x-VC vMotion
  • Runs a combined DRS and SDRS algorithm to generate a tuple (host + DS)
  • CPU, memory, and network reservations are considered as part of HA admission control
  • All the constraints are respected as part of the placement

Rule Migration

  • VM-to-VM affinity and anti-affinity rules are carried over during:
    • Cross-cluster vMotion
    • Cross-VC vMotion
  • Initial placement enforces the affinity and anti-affinity constraints
    • If needed, prerequisite moves are made to accommodate the rules

Improved Overhead Computation

  • Background: Static overhead is the minimum memory needed to power-on a VM
    • ESX requires this memory to be reserved for successful power-on
    • Earlier pre-computed static overhead was used for placement decisions
  • DRS now accurately computes the static overhead
  • Greatly improves the consolidation during group power-on (typical in VDI environments)
    • Example:
      • ESX 5.5: 8 vCPU, 128GB RAM VM required 12GB RAM static overhead
      • ESX 6.0: 8 vCPU, 128GB RAM VM only requires 986MB RAM static overhead

Cluster Scale and Performance Improvements

  • Cluster scale increase
    • Single cluster scale: 64 hosts & 8,000 VMs
    • DRS & HA extensively tested at maximum scale
  • Performance improvements
    • Up to 66% improvements in operational throughput
    • VM power-on latency reduced by 25%
    • VM clone latency reduced by 18%
    • vMotion operation is 60% faster — WOW!
    • Faster host maintenance mode
      • Parallelization of power-off VM evacuation
    • DRS performance white paper on vmware.com
    • vCenter performance talk: #INF4764

Ubiquitous DRS

Extensive Algorithm Usage

  • DRS is the lynchpin of SDDC vision
  • vSphere High Availability
    • DRS also leveraged for restarting protected VMs
  • vSphere Update Manager
    • Rolling upgrade relies on DRS algorithm to suggest candidate host for upgrade
    • DRS algorithm facilitates the host evacuation
  • vCloud Director (VCD)
    • pVDC and org-VDC constructs map to DRS resource pool
  • vCloud Air
    • Multi-tenancy resource isolation and allocation models leverage RP & DRS resource controls
  • Fault Tolerance
    • Uses DRS algorithm to choose the host for secondary-VM when FT is enabled
  • ESX Agent Manager (EAM)

Advanced Options

Ideally, none of these options will be used, but understand that some environments may require them

  • Uniform VM distribution across all hosts — “Peanut Butter Spreading”
    • LimitVMsPerESXHost
    • LimitVMsPerESXHostPercent
  • vCPU to pCPU ratio (per core or cluster-level)
    • MaxVcpusPerCore
    • MaxVcpusPerClusterPct
  • vMem to pMem ratio (per host or cluster-level)
    • MaxHostOvercommitPct
    • MaxClusterOvercommitPct
  • Active memory vs Consumed memory management
    • PercentIdleMBInMemDemand (change from 35% to 100%)

Best Practices

  • Tip #1: Full storage connectivity
    • All the hosts have access to all the data stores
    • Results in an efficient initial placement, load balancing and workload consolidation
    • VM availability is improved significantly
  • Tip #2: Power management settings
    • Set host BIOS power management to “OS Control” mode
    • Set ESX power management active policy to “Balanced” mode (default since 5.0)
  • Tip #3: Threshold setting
    • DRS aggressiveness is controlled by the “migration threshold” setting
    • Conservative settings could lead to greater imbalance in the cluster
    • Aggressive settings could lead to more vMotions in the system
    • Default setting of “3” works best
  • Tip #4: Automation level
    • Fully automated mode is the best choice
    • All solution interoperability, scale and performance testing is done at fully automated
  • Tip #5: Beware of Resource Pool priority inversion!
    • RPs are a great way to achieve resource isolation and priority resource allocation
    • Make sure that cramming more VMs won’t dilute the shares
  • Tip #6: Avoid setting CPU-affinity
    • Setting CPU-affinity provides very little performance benefits
    • Contrary to popular belief, the cores are *not* exclusively reserved for VM with CPU-affinity
    • For performance benefits, set VM as “latency sensitive” in ESXi 5.5 and 6.0
      • ESX CPU scheduler and DRS will ensure this VM gets highest priority and preference

Future Directions

  • Proactive High Availability
    • Most existing availability solutions are reactive
    • Improves the availability of VMs
    • Proactive evacuation of VMs based on hardware health metrics
    • Tight integration, qualification and certification with hardware vendors
    • Vendor plugin evaluates host health based on hardware version and component redundancy
      • Moderately degraded
      • Severely degraded
    • VI admin can configure the DRS action for each health state event
      • Host maintenance mode
      • Host quarantine mode
    • VI admin can filter the events
  • NDRS v2
    • Take pNIC saturation into account during IP and LB
    • Tighter integration with NSX
      • Traffic flow-id to identify chatty VMs & co-locate chatty VMs
      • Elephant and mice flow: separate paths of elephants and mice
      • Network layout topology: leverage for availability and performance optimization
  • Proactive DRS
    • Projected demand of VMs is included as part of IP and LB
    • Tighter integration with vROPS analytics engine for projected demands
    • Periodic and seasonality demands incorporated into decision making
    • Under no circumstance will current VM demands be clipped to satisfy future VM demands
  • What-If Analysis
    • A sandbox tab in UI to run “what-if” analysis
    • VM availability assessment by simulating host failures
      • On an existing cluster, simulate “N” host failures to evaluate impact
    • Cluster over-commitment during maintenance mode
      • Simulate hosts in MM and see workload distribution
    • Capacity planning simulation
    • Evaluate savings in a DPM-enabled cluster
  • Auto-Scale of VMs
    • Horizontal and vertical scaling to maintain end-to-end SLA guarantees
    • Horizontal scaling
      • Spin-up and spin-down VMs based on the workload
      • VI admin specifies desired end-to-end latency threshold in 3-tier deployment
      • vScale monitors latency at each tier
    • Vertical scaling
      • Increase CPU and memory resources to meet performance goals
      • DRS resource control “reservation” leveraged to increase or decrease resources
      • CPU/memory hot-add is an additional option for database tier
  • Hybrid DRS
    • Make vCloud Air a seamless extension of enterprise data center capacity through policy-based scheduling

VMworld 2015 | Tuesday | DRS Advancements in vSphere 6, Advanced Concepts, and Future Directions (INF5306)

Naveen Nagaraj, Director R&D

Be First to Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.