Using GPU Resources

Kubernetes, the backbone of Release AI, provides mechanisms to allocate GPUs to your applications. This is crucial for running compute-intensive tasks such as machine learning model training, deep learning inference, and high-performance data processing.

Example Configurations for GPU Usage

Selecting GPUs for Your Service

To start, you need to specify that your service requires GPUs. This can be done by setting the nvidia.com/gpu limits. Below is an example configuration:

services:
 - name: mlapp
   image: "registry.k8s.io/cuda-vector-add:v0.1"
   replicas: 4
   nvidia_com_gpu:
     limits: 1 # Number of GPUs, must be an integer
   node_selector:
     - key: nvidia.com/gpu
       value: "true"

In this example:

  • The mlapp service is configured to run with an image that leverages CUDA.

  • The replicas field is set to 4, meaning four instances of this service will run.

  • The nvidia_com_gpu.limits is set to 1, specifying that each instance will use one GPU.

  • The node_selector ensures that the service is scheduled on nodes with GPU resources.

Selecting an Instance Type Known to Have GPUs

Choosing the right instance type is important for your application to access the necessary GPU resources. Here’s how you can specify an instance type with GPUs:

node_selector:
  - key: nvidia.com/gpu.product
    value: A10G
  - key: beta.kubernetes.io/instance-type
    value: g5.12xlarge

In this example:

  • The nvidia.com/gpu.product key is set to A10G, specifying that the nodes should have NVIDIA A10G GPUs.

  • The beta.kubernetes.io/instance-type key is set to g5.12xlarge, which is a known instance type that includes GPU resources.

Complete Configuration Example

Combining the GPU selection and instance type configuration, here is the complete setup:

services:
 - name: mlapp
   image: "registry.k8s.io/cuda-vector-add:v0.1"
   replicas: 2
   nvidia_com_gpu:
     limits: 2 
   node_selector:
     - key: nvidia.com/gpu
       value: "true"
     - key: nvidia.com/gpu.product
       value: A10G
     - key: beta.kubernetes.io/instance-type
       value: g5.12xlarge

In this example:

  • The services section defines the application services in your deployment.

  • The mlapp service runs a Docker image optimized for CUDA operations.

  • The replicas field specifies the number of instances to run.

  • The nvidia_com_gpu.limits ensures each instance gets 2 GPU.

  • The node_selector filters the nodes to only those with the specified GPU (A10G) and instance type (g5.12xlarge).

Last updated