Phi-3 mini 4k instruct with MICROSOFT's quantization · mlc-ai/mlc-llm#2273

(3 commenti) (0 reazioni) (0 assegnatari)Python (1220 fork)batch import

help wantednew-models

Metriche repository

Star: (16.227 star)
Metriche merge PR: (Merge medio 4g) (2 PR mergiate in 30 g)

Descrizione

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

Guida contributor

Direzione di ricerca: Indaga come MLC LLM carica i pesi del modello e verifica se può caricare direttamente i pesi quantizzati GGUF da HuggingFace, in particolare da microsoft/Phi 3 mini 4k instruct gguf. Esamina il codice del caricatore del modello e il supporto esistente per il formato GGUF in MLC LLM. Valuta se è necessario un nuovo convertitore o caricatore.
Tech stack: python
Dominio: machine learningai
Tipo issue: Funzionalità
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Abbastanza chiara
Prerequisiti: PythonMLC LLM basics
Adatta ai principianti: 40

Metriche repository

Descrizione

⚙️ Request New Models

Additional context

Guida contributor

Ricevi issue Easy fresche nella tua inbox.