Phi-3 mini 4k instruct with MICROSOFT's quantization · mlc-ai/mlc-llm#2273

Métriques du dépôt

Stars: (16 227 stars)
Métriques de merge PR: (Merge moyen 4j) (2 PRs mergées en 30 j)

Description

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
Is this model architecture supported by MLC-LLM? Yes

Additional context

I know others have made this request already (https://github.com/mlc-ai/mlc-llm/issues/2246, https://github.com/mlc-ai/mlc-llm/pull/2222, https://github.com/mlc-ai/mlc-llm/issues/2238, https://github.com/mlc-ai/mlc-llm/issues/2205).

But I am requesting something different: I am suggesting that you do not quantize or modify the weights of the model but that you instead use Microsoft's already 4-bit quantized weights.

The reason is that I suspect (although it is not explicit in their repo) they used quantization-aware training to build these GGUF files. I have tested the regular 32-bit model vs the GGUF 4-bit one and the performance is almost equivalent which is not what I've seen so far with MLC's quantized models (they tend to be more inaccurate compared to their 32-bit counterparts).

Is there a way to use Microsoft's own quantized weights?

Thank you! Federico

Guide contributeur

Direction de recherche: Examinez comment MLC LLM charge les poids du modèle et vérifiez s'il peut charger directement les poids quantifiés GGUF depuis HuggingFace, en particulier depuis microsoft/Phi 3 mini 4k instruct gguf. Regardez le code du chargeur de modèle et le support existant pour le format GGUF dans MLC LLM. Considérez si un nouveau convertisseur ou chargeur est nécessaire.
Stack technique: python
Domaine: machine learningai
Type d'issue: Fonctionnalité
Difficulté: 3
Temps estimé: 1-2 jours
Statut d'activité: Active
Clarté: Plutôt claire
Prérequis: PythonMLC LLM basics
Accessibilité débutant: 40

Métriques du dépôt

Description

⚙️ Request New Models

Additional context

Guide contributeur

Recevez de nouvelles issues Easy par e-mail.