good first issuegood starter issues
Description
I'm uncertain about the root cause of this one. What I see is that skypilot is unable to allocate an instance. However, I can go to the runpod UI and allocate one. (For example, I specify an A40 in skypilot, it can't allocate one. I can create one in the UI)
Seems like there is some kind of mismatch between the catalog and the api calls that skypilot is making.
Needs investigation to get to the root cause.
Sorry I don't have more details.
One other thought as I'm typing this is that I restrict the regions to allocate the GPU (I do the same thing in the UI, so that's not the problem). I'm wondering, though, if skypilot is correctly iterating through the regions during the allocation.
# Build any_of configuration with cloud/region/accelerator combinations
any_of_configs: list[dict[str, Any]] = []
for gpu_spec in gpu_specs:
# Format accelerator spec with count (e.g., "L4:2" for 2 GPUs)
accelerator_spec = f"{gpu_spec.name}:{gpu_count}" if gpu_count > 1 else gpu_spec.name
runpod_regions = ["CA", "US"]
# Create image_id dict mapping all regions to the image
runpod_image_id = None
if "runpod" in image_ids_by_cloud:
runpod_image_id = {region: image_ids_by_cloud["runpod"] for region in runpod_regions}
for region in runpod_regions:
config_entry: dict[str, Any] = {
"cloud": "runpod",
"accelerators": accelerator_spec,
"region": region
}
# Add image_id dict if specified
if runpod_image_id:
config_entry["image_id"] = runpod_image_id
any_of_configs.append(config_entry)
return {
"any_of": any_of_configs,
}