Phi-3 mini 4k instruct with MICROSOFT's quantization · mlc-ai/mlc-llm#2273

倉庫指標

Star: (16,227 star)
PR 合併指標: (平均合併 4天) (30 天內合併 2 個 PR)

描述

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

貢獻者指南

研究方向: 調查MLC LLM如何載入模型權重，並檢查是否可以直接從HuggingFace載入GGUF量化權重，特別是microsoft/Phi 3 mini 4k instruct gguf。查看模型載入器程式碼以及MLC LLM中是否已有對GGUF格式的支援。考慮是否需要新的轉換器或載入器。
技術棧: python
領域: machine learningai
議題類型: 功能
難度: 3
預計時間: 1-2 天
活動狀態: 活躍
清晰度: 大致清晰
前置要求: PythonMLC LLM basics
新手友善度: 40

倉庫指標

描述

⚙️ Request New Models

Additional context

貢獻者指南

每天在信箱收到新鮮 Easy issues。