Phi-3 mini 4k instruct with MICROSOFT's quantization · mlc-ai/mlc-llm#2273

Repository metrics

Stars: (16,227 stars)
PR merge metrics: (平均マージ 4d) (30d で 2 merged PRs)

説明

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

コントリビューターガイド

調査方針: MLC LLMがモデルの重みをどのように読み込むかを調査し、HuggingFaceから直接GGUF量子化重みを読み込めるかどうかを確認します。特にmicrosoft/Phi 3 mini 4k instruct ggufに注目します。モデルローダーのコードと、MLC LLMにおけるGGUF形式の既存のサポート状況を確認します。新しいコンバーターやローダーが必要かどうかを検討します。
技術スタック: python
領域: machine learningai
Issue 種別: 機能
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: おおむね明確
前提条件: PythonMLC LLM basics
初心者向け度: 40

Repository metrics

説明

⚙️ Request New Models

Additional context

コントリビューターガイド

新着 Easy issues をメールで受け取る。