good first issue
説明
Hey,
I am trying to run
python3 -m fastchat.serve.cli --model-path tiiuae/falcon-7b --device mps
tiiuae/falcon-7b
Prompt:
User: Who is King Charles?
This leads to the following error:
Assistant: /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py:723: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/c2cb9645-dafc-11ed-aa26-6ec1e3b3f7b3/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x12x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1] 64954 abort python3 -m fastchat.serve.cli --model-path tiiuae/falcon-7b --device mps
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Is there anything I am missing? Thanks!