VMworld: Enhancements in vStorage VMFS 5 (VSP2376)

Speaker: Mostafa Khalil (VMware)

Agenda:
– VMFS3 Limitations
– VMFS5 Enhancements
– LVM Changes
– VMFS5 Changes

# Excellent presentation and deep-dive on VMFS5 and it’s benefits in vSphere 5

VMFS3 Limitations
– 2TB per extent, 32 extents
– limited by block size with max of 8MB (2TB)
– can’t change block size in VMFS

VMFS5 Enhancements
– storage stackable to utilize 2TB+ LUNs
– support for GPT partitioning scheme
– support for 16-byte command descriptor blocks (CDB)
– support for 2TB+ LVM extents
– backward compatible with VMFS3

LVM, VMFS, platform changes
– SCSI changes
– – 2TB+ passthrough RDM support
– – GPT support (no partition table recovery logic yet)
– LVM changes
– – spanned device table
– – LVM Extended Metadata

*** 2TB+ only supported in physical RDMs in this release
** esxcli command expanded to consolidate functions of other esx___ commands

Spanned Device Table
– device table on Dev-0
– – device “naa” name is stored in SD table
– – list of offline extents for spanned volume
– only supported on VMFS5 (LVM2)
– 512 LV per device instead of the original 1024
– maintains backward compatibility
– Spanned Device Descriptor (SDDescriptor)
– – only on newly formatted VMFS5 volumes; upgraded volumes will not have it
– – only on dev-0 (head extent)

List offline extents
– ~ # vmkfstools -Ph /vmfs/volumes/Storage1
– VMFS5 only has 1MB block size (on newly formatted volumes)
– upgraded volumes will retain their block sizes until they are formatted
– vmkfstools command still distinct from esxcli because unique to file systems

VMFS5 Maximums
– VMFS size: 64TB
– LUN size: 64TB
– VMDK size: 2TB
– RDM size:
– – 2TB non-pass-through RDM
– – 2TB+ pass-through RDM (12 byte CDB)
– no more multiple block sizes on VMFS (only 1MB)

VMFS5 Improvements
– new VMFS volumes only have 1MB block size
– freshly created VMFS5 volumes can be tagged as “ATS-only”: no SCSI-2 reservations (if array supports VAAI)
– – ATS: atomic test & set, VAAI primitive
– – assumes that the region lock will succeed
– – VMFS5 checks for ATS support, writes it to the metadata, and doesn’t have to check in the future
– – enables VM booting, etc to happen concurrently because doesn’t have to wait for locks
– small file packing for files under 1K
– sub-block size is 8K (from 64K) for new volumes
– new PB2 system file, for double indirect PB file capability
– LVM supports >2TB extents
– new volume mount work flow
– ability to un-mount a volume
– current version: 5.54

Scalability & Other Changes
– increased file system limits
– – max files increased from 30720 to 130000
– – max file size increased from 256GB to 64TB
– APD support
– – planned datastore maintenance
– – – ability to un/mount VMFS volumes; no auto-mount; user-level script can be invoked to mount datastore based on a policy
– – – event framework for late discovery of devices: LVM sends vmkevent so that user level can mount the volume based on a policy
– – permanent device loss
– – – PSA fast fails the IOs immediately
– – – PSA API registers notification of device loss
– better workers/memory management for datamover

When does VMFS leverage ATS on VAAI?
– new VMFS5 on single-extent on VAAI hardware uses ATS only
– on multi-extent, only allows spanning on ATS hardware
– upgraded VMFS5 and VMFS3 uses ATS but falls back to SCSI-2 when contention
– to turn off ATS only feature: # vmkfstools –configATSonly 0
– – device path: /dev/device/disk:partition

ATS-only volumes
– replaces “reserve-read-modify-write-release” semantic of the on-disk lock acquiring process
– requires VAAI capable storage arrays
– on ATS-only volumes, if array firmware is downgraded, the volume will not be mounted
– – ATS only feature would need to be turned off
– VMFS5 uses ATS for all the following operations:
– – acquiring on-disk lock
– – upgrading optimistic locks
– – unlocking RO/MW locks
– – acquiring heartbeat
– – cleaning heartbeat
– – marking/replaying heartbeat
– – reclaiming a heartbeat

Small file packing
– new “zero level address (zla)” type added
– data is automatically pre-fetched on inode access
– once the file grows, no truncation back to small block
– – only user files use small data packing
– – directories are made with small blocks, upgraded to file blocks once >1 small block
– small file packing performance
– – 6% improvement in boot storm on new VMFS5 volumes (good for VDI)
– – 10-12% degradation in boot storm on upgraded volumes (suggest reformatting)

Double Indirect Addressing
– increases the file descriptor max capacity
– switch happens once the file grows beyond 256GB on 1MB block size volume
– 1MB (desc) * 1024 (PB) * 1024 (PB) * 64 = 64TB

VMFS5 upgrade
– online and in-place
– requires that all hosts run ESXi 5

Be First to Comment

Leave a Reply