rosejn/cortex

Cortex projects are bound to a specific cuda library version

Open

#107 建立於 2017年2月23日

在 GitHub 查看
 (5 留言) (0 反應) (0 負責人)Clojure (1,269 star) (109 fork)batch import
bugharderhelp wanted

描述

This has bothered me for some time and there isn't too much I can do about it but here we go:

The runtime dependency on the cuda libraries is not ideal the way it is structured.

  1. If the user does not have cuda installed the entire system now fails.
  2. If the user does not have cudnn installed the entire system now fails.
  3. If the user has a newer (older) version of cuda installed the entire system fails at startup. Regardless of the fact that we aren't using and blisteringly new features of cuda.

What people have done for many years with opengl is they bind to the actual shared library dynamically. They then look for the symbols they need in the shared library and those symbols along with the version of opengl detected (with an API call from the library) then dictates their path forward. They dynamically switch rendering paths depending on the feature set available in opengl and often times the specific hardware features available on the card.

Because the binding is dynamic, the program will start start of opengl isn't present but will exit with a nice error message. Also, because the binding is dynamic and they search for specific symbols in the shared library they can have one wrapper library that binds to several versions of opengl and it just exposes the symbols it finds.

This is the ideal situation. Currently in cortex for instance you have the change the project.clj in order to bind to a different version of cuda despite the fact that we aren't using any new features in that version and thus from a dynamic linking perspective this is unnecessary. This is a completely unnecessary incidental complexity that will come back to bite at some point.

The right answer here is to use an intermediate library that can do dynamic loading across the different platforms and find the symbols. You then set global pointers to the symbol value if it is found or not if it is not found (see gl wrangler: http://glew.sourceforge.net/).

Then we at least allow the program to decide if cuda is a necessary dependency and furthermore if particular versions of cuda (and cudnn, npp, cublas) are necessary dependencies What is stopping me from going there is a proper cross platform build system where I can build a library for at least linux, mac, and windows. That and the time required to actually do this.

There may be a solution in the dynamic linking facilities now present in Java but that path needs to be researched. To do this with javacpp we would need to build a small wrapper library that did the dynamic binding to the shared libraries and the symbols in the shared libraries.

In any case, a best-in-class CUDA development system would not have this issue. I suspect the same type of issue would be present should we decide to put effort into opencl.

貢獻者指南