bugfeature requesthelp wantednew model
Repository metrics
- Stars
- (2,180 stars)
- PR merge metrics
- (平均マージ 5d 16h) (30d で 419 merged PRs)
説明
Your current environment
PyTorch version: 2.5.1
Is debug build: False
OS: openEuler 22.03 (LTS-SP4) (aarch64)
GCC version: (GCC) 10.3.1
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.34
Python version: 3.10.17 (main, Apr 30 2025, 11:54:22) [GCC 10.3.1] (64-bit runtime)
Python platform: Linux-5.15.0-91-generic-aarch64-with-glibc2.34
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[pip3] transformers-stream-generator==0.0.5
[conda] Could not collect
vLLM Version: 0.8.5.post1
vLLM Ascend Version: 0.8.5rc1
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.6 Version: 23.0.6 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B3 | OK | 167.8 45 0 / 0 |
| 0 | 0000:C1:00.0 | 43 0 / 0 51357/ 65536 |
+===========================+===============+====================================================+
| 1 910B3 | OK | 184.9 49 0 / 0 |
| 0 | 0000:01:00.0 | 61 0 / 0 51355/ 65536 |
+===========================+===============+====================================================+
| 2 910B3 | OK | 173.0 45 0 / 0 |
| 0 | 0000:C2:00.0 | 51 0 / 0 51356/ 65536 |
+===========================+===============+====================================================+
| 3 910B3 | OK | 175.9 47 0 / 0 |
| 0 | 0000:02:00.0 | 54 0 / 0 51356/ 65536 |
+===========================+===============+====================================================+
| 4 910B3 | OK | 170.0 45 0 / 0 |
| 0 | 0000:81:00.0 | 50 0 / 0 51336/ 65536 |
+===========================+===============+====================================================+
| 5 910B3 | OK | 165.9 48 0 / 0 |
| 0 | 0000:41:00.0 | 48 0 / 0 51357/ 65536 |
+===========================+===============+====================================================+
| 6 910B3 | OK | 188.4 47 0 / 0 |
| 0 | 0000:82:00.0 | 65 0 / 0 51336/ 65536 |
+===========================+===============+====================================================+
| 7 910B3 | OK | 185.1 49 0 / 0 |
| 0 | 0000:42:00.0 | 62 0 / 0 51380/ 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 0 0 | 2427072 | | 16065 |
| 0 0 | 2427078 | | 16045 |
| 0 0 | 2427071 | | 16047 |
+===========================+===============+====================================================+
| 1 0 | 2427065 | | 16045 |
| 1 0 | 2427069 | | 16065 |
| 1 0 | 2427070 | | 16045 |
+===========================+===============+====================================================+
| 2 0 | 2427080 | | 16064 |
| 2 0 | 2427062 | | 16045 |
| 2 0 | 2427063 | | 16047 |
+===========================+===============+====================================================+
| 3 0 | 2427074 | | 16064 |
| 3 0 | 2427083 | | 16047 |
| 3 0 | 2427084 | | 16045 |
+===========================+===============+====================================================+
| 4 0 | 2427077 | | 16047 |
| 4 0 | 2427066 | | 16045 |
| 4 0 | 2427068 | | 16044 |
+===========================+===============+====================================================+
| 5 0 | 2427079 | | 16045 |
| 5 0 | 2427081 | | 16065 |
| 5 0 | 2427067 | | 16047 |
+===========================+===============+====================================================+
| 6 0 | 2427064 | | 16045 |
| 6 0 | 2427073 | | 16045 |
| 6 0 | 2427075 | | 16047 |
+===========================+===============+====================================================+
| 7 0 | 2427076 | | 16045 |
| 7 0 | 2427082 | | 16067 |
| 7 0 | 2427085 | | 16067 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
1、first download the model from huggingface (using artget pull or anyway)
2、enter the docker and run : vllm serve fixie-ai/ultravox-v0_5-llama-3_1-8b --enable-chunked-prefill=False --max-model-len 7186 --served-model-name fixie-ai/ultravox --dtype=bfloat16 --trust-remote-code
3、then you will got this error: