virtualizationvelocity
  • Home
  • About
  • VMware Explore
    • VMware Explore 2025
    • VMware Explore 2024
    • VMware Explore 2023
    • VMware Explore 2022
  • VMworld
    • VMworld 2021
    • VMworld 2020
    • VMworld 2019
    • VMworld 2018
    • VMworld 2017
    • VMworld 2016
    • VMWorld 2015
    • VMWorld 2014
  • vExpert
  • The Class Room
  • VMUG Advantage
  • AI Model Compute Planner
  • AI-Q Game
  • Video Hub
  • Tech-Humor
  • Contact

Your Definitive Source for Actionable Insights on Cloud, Virtualization & Modern Enterprise IT

My First NVIDIA AI Workbench Install: Lessons, Steps, and GPU Benchmarking

8/5/2025

0 Comments

 
Picture
​Installing NVIDIA Workbench for the first time was both exciting and a learning experience.
I quickly realized that when working with GPU-accelerated workloads, matching versions of Python, CUDA, cuDNN, and PyTorch is critical to avoid errors.
​
By the end, not only was my installation successful, but I was also able to benchmark my GPU’s performance against the CPU

My Build

Here’s the system I installed NVIDIA Workbench on:
  • Processor: Intel Core i7‑7800X @ 3.50GHz — 6 cores / 12 threads
  • Graphics Card: NVIDIA GeForce RTX 3060 (12GB VRAM)
  • RAM: 128GB DDR4
  • Storage: 2TB NVMe SSD
  • OS: Windows 11 Pro (64-bit)
​
This setup provides more than enough power to run local AI workloads, model fine-tuning, and development with CUDA acceleration.

Steps I Took to Install NVIDIA Workbench

Why Install CUDA, cuDNN, and PyTorch Alongside NVIDIA Workbench?

While NVIDIA Workbench is the main environment you interact with, it doesn’t automatically include every GPU-acceleration component you’ll need for AI development.
These three installations are essential for unlocking the full power of your NVIDIA GPU inside Workbench and other AI tools:
  • CUDA Toolkit – NVIDIA’s GPU computing platform. It provides the compiler (nvcc), runtime, and core libraries needed for applications to run computations on the GPU. Without CUDA, your GPU is essentially “invisible” to deep learning frameworks.
  • cuDNN – NVIDIA’s Deep Neural Network library. It contains highly optimized GPU implementations of operations like convolution, pooling, and activation functions. Deep learning frameworks like PyTorch and TensorFlow use cuDNN to run neural networks efficiently. Without it, workloads fall back to much slower CPU code.
  • PyTorch (with CUDA Support) – One of the most popular AI/ML frameworks. Installing PyTorch with the CUDA-enabled wheel ensures it can use your GPU for training and inference. This not only accelerates workloads inside Workbench but also lets you test models locally before running them in containerized environments.

In short:
  • CUDA gives you the tools to talk to your GPU.
  • cuDNN gives you the speed for deep learning workloads.
  • PyTorch gives you the framework to actually build and run AI models with that speed.
Without these, NVIDIA Workbench could still run, but your GPU wouldn’t be fully utilized, and your AI workloads would be drastically slower.
Since this was my first time installing NVIDIA Workbench, I documented every step and captured screenshots so others can follow along without hitting the same roadblocks I did.

​Below is the detailed process, with download links and installation tips.

1. Download & Install Python (Compatible Version)

Download: Python 3.12.7 (64-bit)
  • When running the installer:
    1. Check “Add Python to PATH” on the first screen (critical).
    2. Install pip (comes by default).
    3. Choose “Install for all users” if available.
​
Why this version?
While Python 3.13 was available, PyTorch CUDA wheels didn’t yet support it. Python 3.12.7 is currently the sweet spot for compatibility with CUDA 12.x.
Picture

2. Install NVIDIA CUDA Toolkit

Download: CUDA Toolkit 12.5
  • Select Windows → x86_64 → 11 → exe (network).
  • Run the installer and choose Express Install unless you need a custom location.
  • After installation, add these paths to your System Environment Variables → Path:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\libnvvp


nvcc --version

Should show release 12.5.
Picture

3. Install cuDNN

Download: cuDNN 9.11 for Windows (NVIDIA Developer account required).
  • Extract the ZIP.
  • Copy files into your CUDA installation:
    • bin/*.dll → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin
    • include/*.h → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include
    • lib/x64/*.lib → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64
Picture
Verification:

where cudnn64*

Should return the .dll path in your CUDA bin folder.

4. Install PyTorch with CUDA Support

Command:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Why cu121?
PyTorch labels its wheels by the CUDA runtime version. CUDA 12.1 wheels work perfectly with CUDA 12.5 drivers.
Verification:
​Once installed, verify that PyTorch can detect your GPU and the correct CUDA version by running this in Command Prompt:

python -c "import torch; print('CUDA Available:', torch.cuda.is_available()); print('CUDA Version:', torch.version.cuda); print('GPU Name:', torch.cuda.get_device_name(0))"

​If everything is set up correctly, you’ll see something like:

CUDA Available: True
CUDA Version: 12.1
GPU Name: NVIDIA GeForce RTX 3060

If CUDA Available shows False or you get an error, recheck:
  • Your CUDA Toolkit installation
  • cuDNN installation
  • That you installed the correct PyTorch CUDA build

5. Install NVIDIA Workbench

Picture
Download: NVIDIA Workbench
  • Download the installer and follow the prompts.
  • Once installed, Workbench will detect your system CUDA/cuDNN configuration and allow you to run AI workloads locally.
Picture

6. Run a GPU Test

Once everything is installed, use the benchmark scripts in the next section to ensure your GPU is being used correctly.
Tip:
For the GPU benchmarking examples in sections 1–3, copy each code example into Notepad (or your preferred text editor) and save it with a .py extension, such as basic_cuda_test.py, gpu_benchmark.py, or gpu_vs_cpu.py.
Once saved, you can run each script by opening Command Prompt or PowerShell, navigating to the folder where the file is saved, and running:
python script_name.py
Replace script_name.py with the name of the file you saved.

1. Basic CUDA Test

Use this script to confirm PyTorch detects your GPU and that CUDA is available:


import torch

print("PyTorch Version:", torch.__version__)
print("CUDA Available:", torch.cuda.is_available())
print("CUDA Version:", torch.version.cuda)
print("GPU Count:", torch.cuda.device_count())
print("GPU Name:",
      torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")

2. GPU Benchmark Test

This script runs a quick matrix multiplication benchmark on your GPU using PyTorch:


import torch
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
print("GPU:", torch.cuda.get_device_name(0))
print("CUDA Version:", torch.version.cuda)
print("PyTorch Version:", torch.__version__)

# Create two large matrices
size = 10000
a = torch.randn(size, size, device=device)
b = torch.randn(size, size, device=device)

# Warm-up
c = torch.mm(a, b)

# Benchmark
start_time = time.time()
for _ in range(10):
    c = torch.mm(a, b)
torch.cuda.synchronize()
end_time = time.time()

print(f"Time for 10 matrix multiplications ({size}x{size}): "
      f"{end_time - start_time:.2f} seconds")

3. GPU vs CPU Comparison

This script compares performance between GPU and CPU for matrix multiplications:


import torch
import time

size = 5000

# CPU test
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)
start_cpu = time.time()
for _ in range(10):
    c_cpu = torch.mm(a_cpu, b_cpu)
end_cpu = time.time()

# GPU test
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
a_gpu = a_cpu.to(device)
b_gpu = b_cpu.to(device)
torch.cuda.synchronize()
start_gpu = time.time()
for _ in range(10):
    c_gpu = torch.mm(a_gpu, b_gpu)
torch.cuda.synchronize()
end_gpu = time.time()

# Results
print("GPU:", torch.cuda.get_device_name(0))
print("CUDA Version:", torch.version.cuda)
print("PyTorch Version:", torch.__version__)
print(f"GPU Time for 10 multiplications: {end_gpu - start_gpu:.2f} seconds")
print(f"CPU Time for 10 multiplications: {end_cpu - start_cpu:.2f} seconds")
print(f"Speedup (GPU vs CPU): {(end_cpu - start_cpu) / (end_gpu - start_gpu):.2f}x faster")

​Tip: Using ChatGPT for troubleshooting compatibility issues and getting exact install commands/scripts was a great time-saver and helped me avoid common mistakes.

Final Thoughts

The install process was more complex than expected, but now my system is fully set up for GPU-accelerated AI workloads.
​
With the RTX 3060, 128GB RAM, and optimized CUDA setup, I can now run PyTorch models locally with significant speed advantages over CPU-only execution.
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

      Join Our Community

    Subscribe

    Categories

    All
    Artificial Intelligence
    Automation & Operations
    Certification & Careers
    Cloud & Hybrid IT
    Enterprise Technology & Strategy
    General
    Hardware & End-User Computing
    Virtualization & Core Infrastructure

    Recognition

    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture
    Picture

    RSS Feed

    Follow @bdseymour

Virtualization Velocity

© 2025 Brandon Seymour. All rights reserved.

Privacy Policy | Contact

Follow:

LinkedIn X Facebook Email
  • Home
  • About
  • VMware Explore
    • VMware Explore 2025
    • VMware Explore 2024
    • VMware Explore 2023
    • VMware Explore 2022
  • VMworld
    • VMworld 2021
    • VMworld 2020
    • VMworld 2019
    • VMworld 2018
    • VMworld 2017
    • VMworld 2016
    • VMWorld 2015
    • VMWorld 2014
  • vExpert
  • The Class Room
  • VMUG Advantage
  • AI Model Compute Planner
  • AI-Q Game
  • Video Hub
  • Tech-Humor
  • Contact