virtualizationvelocity
  • Home
  • About
  • VMware Explore
    • VMware Explore 2025
    • VMware Explore 2024
    • VMware Explore 2023
    • VMware Explore 2022
  • VMworld
    • VMworld 2021
    • VMworld 2020
    • VMworld 2019
    • VMworld 2018
    • VMworld 2017
    • VMworld 2016
    • VMWorld 2015
    • VMWorld 2014
  • vExpert
  • The Class Room
  • VMUG Advantage
  • AI Model Compute Planner
  • AI-Q Game
  • Video Hub
  • Tech-Humor
  • Contact

You Won’t Believe What VMware Just Did for GPU Virtualization at Explore 2025

8/25/2025

0 Comments

 
Picture
At VMware Explore 2025, one of the most talked-about technical deep dives was Accelerating AI Workloads: Mastering vGPU Management in VMware Environments, led by Shawn Kelly, Principal Architect at Broadcom, and Justin Murray, Product Marketing Engineer at Broadcom.
​
A vGPU (virtual GPU) allows a single physical GPU to be split into multiple virtual instances so that different virtual machines or workloads can share its power. This makes it possible to maximize GPU usage, reduce costs, and run AI workloads more efficiently across enterprise environments. As organizations scale AI initiatives, mastering vGPU management is quickly becoming essential.

Model Store, AI Gateway, and Deployment Made Simple

Kelly and Murray began with a look at Model Store and AI Gateway, new services designed to simplify how enterprises deploy AI models.
  • Model Store acts as a local repository for managing and versioning AI models.
  • AI Gateway provides a streamlined deployment workflow, aligning with the OpenAI API specification so existing applications built for public APIs can be seamlessly redirected to on-premises GPUs.

This ensures that sensitive data stays secure while still giving developers the same agility they’ve come to expect from cloud providers.
Picture

Time Slice vs MIG: Two Paths to Smarter GPU Sharing

The session then explored how GPUs can be shared among workloads, highlighting two different approaches:
  • Time Slice vGPUs: A round-robin model that allocates 100% of the GPU to a VM for short bursts. This maximizes utilization and is great for workloads that spike intermittently.
  • MIG (Multi-Instance GPU): A hardware-based method of splitting GPUs into fully isolated slices. MIG ensures predictable performance and stronger isolation, making it ideal for multi-tenant scenarios.

​Both methods have clear advantages: Time Slice drives efficiency, while MIG ensures consistency and security.
Picture

Scaling Beyond One GPU with NVLink & Device Groups

For large language models (LLMs) or compute-heavy training tasks, one GPU isn’t enough.

​VMware showed how NVIDIA NVLink and HGX architectures expand scalability:
  • Run 70B+ parameter models across 8x H100/H200 GPUs.
  • Use device groups to split or pool GPU resources flexibly.
  • Scale to 56 vGPU slices per host when combining MIG with VMware’s virtualization stack.

This level of scalability is what makes VMware environments capable of powering enterprise-grade AI workloads.
Picture

GPU-Aware vMotion: A Game Changer in vSphere 9.0

One of the biggest breakthroughs announced was GPU-enabled vMotion in vSphere 9.0.
Historically, migrating GPU-backed VMs was difficult. VMware solved this by pre-copying static model weights (about 70% of GPU memory) while the VM stays live, and only transferring the dynamic cache during stun time.

The results are impressive:
  • 3x faster vMotion stun times.
  • Seamless migration for even 70B parameter models across multiple GPUs.
  • Greater resiliency without locking workloads to a single host.

​This is a huge win for AI operations teams looking to balance performance with flexibility.
Picture

Monitoring & Operational Visibility

To round out the session, VMware revealed new GPU monitoring dashboards that help admins ensure GPUs are being fully utilized and protected.

Key capabilities include:
  • Real-time GPU and memory usage insights.
  • Thermal and hardware health tracking to prevent failures.
  • Cluster-level visibility into which workloads are consuming GPUs.

​For enterprises investing in high-value GPUs, this level of operational control ensures ROI and prevents idle resources.
Picture

Why This Matters

Accelerating AI Workloads: Mastering vGPU Management in VMware Environments demonstrated that VMware is extending its leadership in virtualization to the GPU era.

​With Time Slice and MIG vGPUs, NVLink scaling, GPU-aware vMotion, and powerful monitoring, VMware vSphere 9.0 is now one of the most AI-ready platforms in the enterprise market.

At VMware Explore 2025, the message was loud and clear: the next frontier of virtualization isn’t just about CPUs, memory, or storage; it’s GPUs.
0 Comments



Leave a Reply.

    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture

    RSS Feed

Virtualization Velocity

© 2025 Brandon Seymour. All rights reserved.

Privacy Policy | Contact

Follow:

LinkedIn X Facebook Email
  • Home
  • About
  • VMware Explore
    • VMware Explore 2025
    • VMware Explore 2024
    • VMware Explore 2023
    • VMware Explore 2022
  • VMworld
    • VMworld 2021
    • VMworld 2020
    • VMworld 2019
    • VMworld 2018
    • VMworld 2017
    • VMworld 2016
    • VMWorld 2015
    • VMWorld 2014
  • vExpert
  • The Class Room
  • VMUG Advantage
  • AI Model Compute Planner
  • AI-Q Game
  • Video Hub
  • Tech-Humor
  • Contact