skypilot-org/skypilot

[Core] nvcr dynamo image doesn't work on SkyPilot

Open

Aperta il 13 gen 2026

Vedi su GitHub
 (2 commenti) (0 reazioni) (0 assegnatari)Python (4859 star) (311 fork)batch import
good first issuegood starter issues

Descrizione

Using base image nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.7.1 stays stuck at initialization:

sky launch -c test3 --infra k8s --image-id docker:nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.7.1 -- echo hi
Command to run: echo hi
Considered resources (1 node):
--------------------------------------------------------------------------------------
 INFRA                        INSTANCE   vCPUs   Mem(GB)   GPUS   COST ($)   CHOSEN
--------------------------------------------------------------------------------------
 Kubernetes (---------)   -          2       2         -      0.00          ✔
--------------------------------------------------------------------------------------
Launching a new cluster 'test3'. Proceed? [Y/n]:
⠙ Waiting for 'sky.launch' request to be scheduled: 7812b61a-6f28-41f2-9d70-ee29afda028f
⠼ Waiting for 'sky.launch' request to be scheduled: 7812b61a-6f28-41f2-9d70-ee29afda028f
⚙︎ Launching on Kubernetes.
└── Pod is up.
⠧ Preparing SkyPilot runtime (1/3 - initializing)  View logs: sky logs --provision test3

From provision log:

+ '[' -f /etc/apt/sources.list ']'
+ update_apt_sources mirror.umd.edu /etc/apt/sources.list
+ local host=mirror.umd.edu
+ local apt_file=/etc/apt/sources.list
++ prefix_cmd
+++ id -u
++ '[' 1000 -ne 0 ']'
++ echo sudo
+ sudo sed -i -E 's|https?://[a-zA-Z0-9.-]+\.ubuntu\.com/ubuntu|http://mirror.umd.edu/ubuntu|g' /etc/apt/sources.list
environment: line 73: sudo: command not found
+ apt_update_install_with_retries rsync fuse
+ echo 'Install failed with mirror (ubuntu): mirror.umd.edu'
+ restore_source
+ '[' -f /etc/apt/sources.list.backup_skypilot/sources.list ']'
+ set -e
+ '[' 1 = 1 ']'
+ echo 'Error: required package install failed across all mirrors: rsync fuse'
+ return 1
+ echo 'Error: core package installation failed.'
+ exit 1

I suspect it's something to do with sudo not installed/permissions. Adding this works:

# Kubernetes-specific configuration
config:
  kubernetes:
    pod_config:
      spec:
        containers:
        - securityContext:
            # Run as root to allow SkyPilot to install necessary packages
            runAsUser: 0
            runAsGroup: 0

Guida contributor