xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Details
仓库信息
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Stats
Loading...
Loading
--
Loading
--
Loading
--
Loading
--