xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

PythonStars 87Forks 19Watchers 87Open issues 32License MIT License
Details
仓库信息
Ownerxrsrke
Homepage
Last pushed2023-12-14
Last updated2025-12-14
Issues fetched at

Stats

Community at a glance

Loading...

Loading

--

Loading

--

Loading

--

Loading

--