Repository Issues

hzy46/MInference_latest

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Stars
 (0 stars)
Forks
 (0 forks)
Indexed issues
 (0 indexed issues)
open beginner issues
 (0 open beginner issues)
Latest indexed
Not indexed yet
Last GitHub push
May 30, 2025
License
No license data
Contributing guide
No contributing guide
Code of conduct
No code of conduct
Dominant language
Python
PR merge metrics
 (PR metrics pending)
Beginner labels
No beginner labels indexed

Issues

0 open indexed issues

No open indexed issues found for this repository.