help wanted
Description
It seems that I already used many tricks but training still OOM for 8GB gpu. But inference is good now.
This is strange because I know some textural inversion or dreambooth can be trained on 8GB.
What is the secrect of Automatic1111's optimization? Although xformers may help a bit, the currect sliced attention should require even smaller mem than xformers.
Does it make sence to move text encoder and vae outside gpu when training?