[Feature]: Refactor Int8ScaledMMLinearLayerConfig to use QuantKey · vllm-project/vllm#32268

(7 comments) (0 reactions) (1 assignee)Python (16,816 forks)batch import

feature requestgood first issuehelp wanted

Repository metrics

Stars: (80,034 stars)
PR merge metrics: (Avg merge 3d 17h) (993 merged PRs in 30d)

Description

🚀 The feature, motivation and pitch

Replace boolean configuration fields in ScaledMMLinearLayerConfig with QuantKey objects to provide a more structured, type-safe quantization configuration API.

Ideally we should change this:

@dataclass
class ScaledMMLinearLayerConfig(ScaledMMLinearLayerConfig):
    is_static_input_scheme: bool
    is_channelwise: bool
    input_symmetric: bool

to this:

@dataclass
class ScaledMMLinearLayerConfig(ScaledMMLinearLayerConfig):
    weight_quant_key: QuantKey
    activation_quant_key: QuantKey
    input_symmetric: bool

A parallel work found here #27814 , has split the configuration into Int8 and Fp8 config classes and uses Quantkey for the FP8 config class.

Alternatives

No response

Additional context