NVIDIA KAI Scheduler Enables Gang Scheduling and Workload Prioritization in Ray with KubeRay
By Original by Ekin Karabulut, rewritten by Ai news Staff
Source: https://developer.nvidia.comOctober 3, 2025

The NVIDIA KAI Scheduler is now natively integrated with KubeRay, extending the scheduling engine from NVIDIA Run:ai to Ray clusters. This integration introduces capabilities such as gang scheduling, workload autoscaling, workload prioritization, and hierarchical queues, designed to optimize infrastructure by coordinating job starts, efficiently sharing GPUs, and prioritizing workloads.
Key features enabled by this integration include:
- Gang scheduling: Ensures that distributed Ray workloads launch all workers and actors together, preventing partial allocations that can stall training or inference pipelines.
- Workload and cluster autoscaling: Allows Ray clusters to scale up as resources become available and scale down as demand decreases, aligning compute resources with workload needs without manual intervention. This is particularly useful for offline batch inference workloads.
- Workload priorities: Enables high-priority inference jobs to automatically preempt lower-priority batch training jobs when resources are limited, maintaining application responsiveness.
- Hierarchical queuing with priorities: Facilitates the creation of queues for different project teams with clear priorities, allowing higher-priority queues to borrow idle resources from other teams when capacity is available.
- A Kubernetes cluster with one NVIDIA A10G GPU.
- NVIDIA GPU Operator installed.
- NVIDIA KAI Scheduler deployed.
- KubeRay Operator nightly image or Helm chart configured to use KAI Scheduler (--set batchScheduler.name=kai-scheduler).
- Quota: The deserved share of resources to which a queue is entitled.
- Limit: The upper bound on how many resources a queue can consume.
- Over Quota Weight: Determines how surplus resources are distributed among queues with the same priority; higher weights receive a larger portion of the extra capacity.