Phi-3 mini 4k instruct with MICROSOFT's quantization · mlc-ai/mlc-llm#2273

(3 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (1.220 Forks)batch import

help wantednew-models

Repository-Metriken

Stars: (16.227 Stars)
PR-Merge-Metriken: (Durchschn. Merge 4T) (2 gemergte PRs in 30 T)

Beschreibung

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

Contributor Guide

Research-Richtung: Untersuchen Sie, wie MLC LLM Modellgewichte lädt und prüfen Sie, ob es direkt GGUF quantisierte Gewichte von HuggingFace laden kann, insbesondere von microsoft/Phi 3 mini 4k instruct gguf. Sehen Sie sich den Code des Modellladers und die vorhandene Unterstützung für das GGUF Format in MLC LLM an. Überlegen Sie, ob ein neuer Konverter oder Lader benötigt wird.
Tech Stack: python
Domain: machine learningai
Issue Type: Funktion
Schwierigkeit: 3
Geschätzte Zeit: 1-2 Tage
Aktivitätsstatus: Aktiv
Klarheit: Meist klar
Voraussetzungen: PythonMLC LLM basics
Einsteigerfreundlichkeit: 40

Repository-Metriken

Beschreibung

⚙️ Request New Models

Additional context

Contributor Guide

Erhalte frische Easy Issues per E-Mail.