GPU Virtualization: Nvidia's Meteoric Rise Fueled by the AI Revolution

Nvidia unveiled remarkable earnings in an August report earlier this year, its triple-digit year-over-year growth has captured a lot of attention, and rightfully so. The company has had an outstanding financial performance with revenue that reached an impressive $13.5 billion in its second quarter, showcasing a remarkable 101% increase from the prior year and surpassing its own guidance of $11 billion. Mind you, $6 billion was made as pure profit!

Nvidia’s success can be attributed to its strategic positioning in the rapidly evolving landscape of AI. Nvidia is strategically meeting the rising demand for its GPU chips, which have become indispensable in powering expansive language models and other AI-driven workloads. 

Back in 2020, during the COVID-19 pandemic, there was a shortage of GPUs, as cryptocurrency mining reached its peak during this time. As a major global chipmaker, Nvidia had to launch an initiative to reallocate unused GPU power to COVID-19 research.

Today, Nvidia’s GPUs are indispensable for the majority of AI systems, including the widely-used ChatGPT chatbot, which is what has propelled its extraordinary growth and further substantial growth is foreseen.

Earlier this year, X purchased 10,000 GPUs for an artificial intelligence project. It is estimated that this purchase is in the tens of millions of dollars. The clusters were housed in one of Twitter/X’s two data centers.

So why are GPUs selling like fresh baked bread?

The evolution of GPUs

Graphics Processing Units (GPUs) have come a long way from their humble origins as dedicated components for rendering visuals in video games. In today’s computing landscape, the importance of GPUs extends far beyond gaming graphics.

GPUs play a crucial role in diverse applications in modern computing, particularly in the realm of artificial intelligence (AI). By accelerating graphics rendering and enabling massive parallel processing, they have become essential components in gaming setups, workstations, servers, and other devices ranging from smartphones to supercomputers.

There are many different types of GPUs that aim to support multiple types of workloads and they are all relevant today. However, in this article we will delve into the importance of GPUs in the recent boom of AI technologies and will finish with a comparison of two key GPU virtualization technologies supported by Nvidia— MiG (Multi-Instance GPU) and vGPU (Virtual GPU).

Why are GPUs so important for Machine learning?

Machine learning is a subfield of artificial intelligence (AI) that focuses on the development of algorithms and models that enable computers to learn from data and make predictions or decisions without being explicitly programmed. The fundamental idea behind machine learning is to give computers the ability to learn patterns and insights from data, allowing them to improve their performance over time.

The relationship between machine learning and Graphics Processing Units (GPUs) is very tight and has become increasingly important over the years. GPUs, originally designed for rendering graphics in video games, have proven to be highly effective for accelerating the training and inference processes in machine learning models.

Although CPUs are great for executing a variety of general tasks sequentially, GPUs leverage parallel computing to break down complex problems into multiple smaller calculations simultaneously. This makes them great for managing the distributed computational processes needed for machine learning.

There are Several factors that contribute to the synergy between GPUs and Machine Learning:

Deep Learning and Neural Networks
Deep learning, a subset of machine learning, is notable for image recognition and natural language processing. GPUs are great for the large-scale matrix calculations inherent in deep learning.

Neural networks model problems by connecting nodes, allowing decision-making that captures small components of complex problem-solving models.With layers of interconnected nodes, demand intensive computations in training and inference. GPUs excel in handling the matrix operations essential for training and running neural networks, leveraging their parallel architecture to process multiple data points simultaneously.

Frameworks and Libraries Optimization
Popular machine learning frameworks and libraries, such as TensorFlow and PyTorch, have optimized their operations to take advantage of GPU capabilities. These libraries use APIs (Application Programming Interfaces) like CUDA (Compute Unified Device Architecture) to enable efficient communication between the software and GPU hardware.

Big Data
Big data plays a crucial role in machine learning development by providing very large, extensive training datasets for advanced learning algorithms as businesses and technologies gather more data about their products and services.

Distributed Computing and Scalability
In large-scale machine learning tasks, multiple GPUs can be used in parallel or across different machines to distribute the workload. This distributed computing approach enhances scalability, allowing researchers and practitioners to train more complex models on larger datasets.

Cloud platforms
Whether you are using managed Kubernetes services to orchestrate containerized cloud workloads or building using AI/ML and data analytics tools in the cloud, GPU-accelerated computing can be integrated into managed cloud services to allow users to optimize application performance.

Nvidia MiG vs. Nvidia vGPU

Now, let’s delve into the comparison between two key GPU virtualization technologies from Nvidia— MiG (Multi-Instance GPU) and vGPU (Virtual GPU).

NVIDIA vGPU software generates virtual GPUs that can be shared across multiple GPU-enabled VMs and are accessible from any device, anywhere. It also allows for several GPUs to be aggregated and allocated to a single VM, which enables the exposure of the GPU to VMs as one or multiple vGPU instances.

With NVIDIA MIG, a single GPU can be split into multiple processing  devices, where each device  is assigned a fixed segment of GPU memory and cores in accordance with the specifications outlined by the profile. (See image below)

vGPU software supports powerful GPU performance across a spectrum of workloads (from graphics-intensive virtual workstations to data science and AI applications). It can be run on a physical GPU, in a cloud or enterprise data center server. This allows IT to leverage the advantages of virtualization in terms of management and security as well as the performance of NVIDIA GPUs, essential for modern workloads.

Within the vGPU mode, the GPU’s memory is statically divided, while the compute capability is time-shared among the VMs that concurrently utilize the GPU. Under this mode, a VM, when in operation on the GPU, possesses exclusive access to the entire compute capability of the GPU but is limited to its allocated share of GPU memory.

MIG is different from vGPU. MIG can partition the physical GPU into independent instances, each with its own high-bandwidth memory, cache, and compute cores that are not shared with other instances. It grants the independent operation of each GPU instance, providing maximum isolation between workloads. Where vGPU shares the whole GPU cores and performance can vary depending on the load, MiG provides guaranteed performance based on the instance profile because MiG instances have exclusive control over their GPU partition.  

In other words,with MIG, both memory and computational capability undergo static partitioning. When a VM employs a GPU in MIG mode, it is confined to accessing solely the assigned memory and utilizing the designated computational cores. Thus, even if there are unused computational cores (those not assigned to the VM) within the GPU, the VM remains unable to utilize these idle cores.

According to a study run by vmware, the computational outcomes will remain consistent irrespective of the execution mode selected by a VM to process its workload. The sole distinction lies in the performance, gauged by wall-clock time. 

Both vGPU and MIG modes present distinct advantages and drawbacks: vGPU mode involves time-sharing computational cores, while MIG mode statically partitions the cores. Given this variance in how cores are distributed between these modes, it prompts the question of which mode offers optimal performance (specifically, the shortest run time) for a given workload. 

The results of the study conducted by vmware shows that the vGPU mode exhibits optimal performance (gauged by wall-clock time) when tasked with workloads involving interspersed data transfers and/or CPU computations alongside CUDA computations. 

On the other hand, MIG mode excels in performance for workloads characterized by substantial, large CUDA kernels, experiencing minimal interruption from data transfers or CPU computations. 

In scenarios with aggregated data transfers and CUDA computations, MIG mode demonstrates superior performance for two or fewer concurrently running VMs. Conversely, vGPU mode outperforms in situations with three or more VMs running concurrently.

Are you looking for a cost-efficient GPU virtualization? With tens and sometimes hundreds of thousand USD on the table it is crucial to make the right decisions. Contact us today, and let us guide you in choosing the optimal solution.

Images by macrovector and Freepik