mlc-ai/mlc-llm

Phi-3 mini 4k instruct with MICROSOFT's quantization

Open

#2.273 geöffnet am 4. Mai 2024

Auf GitHub ansehen
 (3 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (1.220 Forks)batch import
help wantednew-models

Repository-Metriken

Stars
 (16.227 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 4T) (2 gemergte PRs in 30 T)

Beschreibung

⚙️ Request New Models

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

Contributor Guide