huggingface/candle

Performance issues compared to Pytorch

Open

#1139 opened on Oct 20, 2023

View on GitHub
 (22 comments) (1 reaction) (0 assignees)Rust (19,476 stars) (1,440 forks)batch import
help wanted

Description

Hello. I mentioned this in the discord and worked with a member to make sure I wasn't doing anything dumb. I tested the release version of my candle code with cudnn enabled vs equivalent pytorch code, and comparatively candle is about 4x slower.

I have attached the code I was using to compare. It contains both the original python/pytorch implementation of the RealESRGAN RRDBNet arch, as well as my Candle implementation.

I'm limited on time or I would have set up a proper repo for this with a script/program that would run both tests automatically, but this is the best I can do at the moment. In order to use either script, you'll probably have to adjust the paths in each script to match the path of the model and your test images (I did not include test images). I recommend trying ~10 smallish images (128x for example).

Context from discord: https://discord.com/channels/879548962464493619/1136218819447238726/1164985040854339736

Code: rust_candle_test.zip

And the model (had to upload to drive) https://drive.google.com/file/d/1AyvArWkR3qonMV2pBtk3zDkct0yJrh5Z/view?usp=sharing

For reference, here is the results of when I benchmarked it:

PyTorch:

Model took 323.999ms // First run, takes a long time
Saved 00000.png
Model took 35.4905ms // Second run, and subsequent runs after, take significantly less
Saved 00001.png

Candle:

Model took 262.1319ms // First run, takes a while but less time than torch
Saved 00000.png
Model took 124.97ms // Second run, and subsequent runs after, takes less but still far more than pytorch
Saved 00001.png

Please let me know if you need or want any more information. Candle is a very interesting project and seems very promising. It just currently doesn't seem to have as much optimization as pytorch.

Contributor guide