lm-sys/FastChat

[Feature Request] Add a ctranslate2 model worker

Open

#2,133 opened on 2023年8月1日

GitHub で見る
 (2 comments) (1 reaction) (1 assignee)Python (38,959 stars) (4,736 forks)batch import
enhancementgood first issue

説明

According to some recent analysis on twitter, CTranslate2 can serve LLMs a little faster than vLLM and (maybe?) with a small quality increase. At least for Llama 2.

This could either be a a model worker that's added directly to fastchat OR a doc with extensive documentation on how to write a custom model worker (with mostly working \ implementation code) that anyone can use in their own project. The second option might be best so that people can expand FastChat with custom model workers without having to change the base project too much.

If this is appealing I can get to it at some point.

コントリビューターガイド