VMworld: vMotion in vSphere 5 Best Practices (VSP2122)

Speaker: Sreekanth Setty (VMware)

– What is vMotion?
– Memory Iterative Pre-Copy
– Enhancements in vSphere 5
– vMotion Performance
– Best Practices

What is vMotion?
– enables live migration between physical machines
– transparent to the virtual machine’s guest OS
– key enabler of DRS, DPM, FT

What needs to be migrated?
– processor and device state (CPU, network, SVGA, etc)
– – leverages vSphere checkpoint state serialization infrastructure, around 8MB
– – destination host sends RARP to physical switch to notify of new location
– disk (use shared storage between source and destination host)
– memory (pre-copy memory while VM is running)
– – uses iterative pre-copy

Memory Iterative Pre-Copy
– First Phase: Trace Phase/HEAT Phase
– – send VM cold pages
– – trace all of the VM’s memory
– – impact: brief drop in throughput
– Subsequent Phase
– – passes over memory again and again
– Switch-over Phase
– – VM is momentarily quiesced on source and resumed in destination
– – impact: half-second pause of VM

vMotion in vSphere 4.1
– *pre-copy memory from source to destination
– quiesce VM on source machine
– transfer device state from source to destination
– resume VM on destination
– *copy remainder of memory from source to destination (resume during page-in or RDPI)
– free VM resources on source machine
– may cut over VM before memory is fully copied which could result in significant page faults to retrieve not-yet-copied memory

vMotion in vSphere 5 (enhancements on two steps above)
– pre-copy memory
– – improvements in minimizing tracing
– – new optimizations to saturate 10GigE
– – multi-NIC enablement to reduce pre-copy time
– – SDPS: stun during page-send; kicks in during pathological cases
– – – injects minor delays into memory writes in order to bring convergence progress above writes
– – – only implemented when convergence backward progress
– copy remainder of memory from source to destination
– – improvements to reduce both duration and impact on guest during switch-over phase
– – RDPI is disabled entirely in favor of SDPS

How to gauge vMotion performance
– resource usage (CPU, memory, network during vMotion)
– total duration
– switch-over time
– performance impact on applications running inside the guest
– – application latency and throughout during vMotion
– – time to resume to normal level of performance

vMotion Performance on 1GigE vs. 10GigE
– 8-10x performance improvement on 10GigE
– for idle and moderately loaded VMs, significant reductions in duration
– for heavily loaded VMs, vSphere 4.1 convergence leads to connection drops
– – in vSphere 5.0, SDPS kicks in and no connections are dropped
– take away: consider 10GigE NICs
– multi-NIC enablement improves vMotion performance as well in vSphere 5

Best Practices
– switch to 10GigE vMotion network
– use multiple NICs for vMotion
– – configure all vmnics under one vSwitch
– – configure each vmknic to use a different vmnic as its active vmnic (rest marked as standby)
– – vMotion will transparently switch over to standby vmnic if active fails
– if concerned about vMotion performance
– – place swap files on shared storage
– – using host-local or SSD for swap cache can impact performance
– use ESX clusters composed of matching NUMA architectures when using vNUMA feature
– – vNUMA topology of the VM is set during the power-on based on the NUMA topology of physical host
– – vMotion to a host with a different with different NUMA topology may result in reduced performance

– vMotion in vSphere 5 has a number of new features
– – multi-NIC enablement for vMotion
– – better 10GigE utilization
– – Metro vMotion (support on long-latency networks)
– – successful migrations even with network limitations or pathological workloads
– dramatic improvements in performance in vSphere 5
– – two-fold improvements in duration and impact on application performance
– – consistent performance gains in range of 30% on tier 1 workloads
– – duration times cut by factor of 3x with multiple NICs

Be First to Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.