[Feature Request] Add a ctranslate2 model worker · lm-sys/FastChat#2133

(2 comments) (1 reaction) (1 assignee)Python (4,736 forks)batch import

enhancementgood first issue

Repository metrics

Stars: (38,959 stars)
PR merge metrics: (30d に merged PR はありません)

説明

According to some recent analysis on twitter, CTranslate2 can serve LLMs a little faster than vLLM and (maybe?) with a small quality increase. At least for Llama 2.

This could either be a a model worker that's added directly to fastchat OR a doc with extensive documentation on how to write a custom model worker (with mostly working \ implementation code) that anyone can use in their own project. The second option might be best so that people can expand FastChat with custom model workers without having to change the base project too much.

If this is appealing I can get to it at some point.

コントリビューターガイド

調査方針: FastChatのモデルワーカーアーキテクチャとCTranslate2の推論APIを研究してください。新しいワーカーを貢献するか、サンプルコードを含むドキュメントを作成してください。
技術スタック: python
領域: backend
Issue 種別: 機能
難度: 3
推定時間: 1-2日
活動状況: 新着
明確さ: 明確
前提条件: PythonFastChatCTranslate2
初心者向け度: 60

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。