lm-sys/FastChat

[Feature Request] Add a ctranslate2 model worker

Open

#2,133 创建于 2023年8月1日

在 GitHub 查看
 (2 评论) (1 反应) (1 负责人)Python (38,959 star) (4,736 fork)batch import
enhancementgood first issue

描述

According to some recent analysis on twitter, CTranslate2 can serve LLMs a little faster than vLLM and (maybe?) with a small quality increase. At least for Llama 2.

This could either be a a model worker that's added directly to fastchat OR a doc with extensive documentation on how to write a custom model worker (with mostly working \ implementation code) that anyone can use in their own project. The second option might be best so that people can expand FastChat with custom model workers without having to change the base project too much.

If this is appealing I can get to it at some point.

贡献者指南