Description
Reminder
- I have read the README and searched the existing issues.
System Info
QWEN2-1.5B(0.5B)
正常
QWEN2-7B(MoE)
需要使用bf16 #4278 正常
QWEN2-72B
正常,有一点点问题,只能在8卡上启动(stage3),16卡上会OOM,需要继续探究原因。
glm4
注释掉torch.jit行 使用bf16 参考 #4339 #3788
chatglm3
同上方式 但模型合并后需要将原文件夹除去*bin和pytorch_model.bin.index.json以外的文件复制过来 参考 #1307
DeepSeek (MoE)
gemma
正常
LLaMA-3
正常
Baichuan-2
正常
PHI3
报错 File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/urllib3/connection.py", line 615, in connect contents = read_file_cached(tiktoken_bpe_file, expected_hash) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/tiktoken/load.py", line 64, in read_file_cached contents = read_file(blobpath) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/tiktoken/load.py", line 25, in read_file resp = requests.get(blobpath) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/requests/api.py", line 73, in get self.sock = sock = self._new_conn() File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/urllib3/connection.py", line 203, in _new_conn return request("get", url, params=params, **kwargs) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request conn.connect() File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/urllib3/connection.py", line 615, in connect self._validate_conn(conn) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn return session.request(method=method, url=url, **kwargs) File "/home/hadoop-friday-llm/.local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request return tokenizer_class.from_pretrained( File "/home/hadoop-friday-llm/.cache/huggingface/modules/transformers_modules/Phi-3-small-8k-instruct/tokenization_phi3_small.py", line 190, in from_pretrained raise NameResolutionError(self.host, self, e) from e urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f4053c11070>: Failed to resolve 'openaipublic.blob.core.windows.net' ([Errno -2] Name or service not known)
Mistral-7B-v0.1
正常
Mixtral-8x7B-v0.1
8卡 64G需要stage3
CodeLlama-7b-hf(13B)
正常
Yi1.5
正常
Reproduction
llamafactory
Expected behavior
主要挑选了一些具有代表性的模型 重新在npu上实验 希望可以全部成功 但是phi3的失败希望可以解答一下 模型确认是在本地 并使用的绝对路径
Others
No response