Cortex projects are bound to a specific cuda library version · rosejn/cortex#107

(5 留言) (0 反應) (0 負責人)Clojure (109 fork)batch import

bugharderhelp wanted

倉庫指標

Star: (1,269 star)
PR 合併指標: (30 天內沒有已合併 PR)

描述

This has bothered me for some time and there isn't too much I can do about it but here we go:

The runtime dependency on the cuda libraries is not ideal the way it is structured.

If the user does not have cuda installed the entire system now fails.
If the user does not have cudnn installed the entire system now fails.
If the user has a newer (older) version of cuda installed the entire system fails at startup. Regardless of the fact that we aren't using and blisteringly new features of cuda.

What people have done for many years with opengl is they bind to the actual shared library dynamically. They then look for the symbols they need in the shared library and those symbols along with the version of opengl detected (with an API call from the library) then dictates their path forward. They dynamically switch rendering paths depending on the feature set available in opengl and often times the specific hardware features available on the card.

Because the binding is dynamic, the program will start start of opengl isn't present but will exit with a nice error message. Also, because the binding is dynamic and they search for specific symbols in the shared library they can have one wrapper library that binds to several versions of opengl and it just exposes the symbols it finds.

This is the ideal situation. Currently in cortex for instance you have the change the project.clj in order to bind to a different version of cuda despite the fact that we aren't using any new features in that version and thus from a dynamic linking perspective this is unnecessary. This is a completely unnecessary incidental complexity that will come back to bite at some point.

The right answer here is to use an intermediate library that can do dynamic loading across the different platforms and find the symbols. You then set global pointers to the symbol value if it is found or not if it is not found (see gl wrangler: http://glew.sourceforge.net/).

Then we at least allow the program to decide if cuda is a necessary dependency and furthermore if particular versions of cuda (and cudnn, npp, cublas) are necessary dependencies What is stopping me from going there is a proper cross platform build system where I can build a library for at least linux, mac, and windows. That and the time required to actually do this.

There may be a solution in the dynamic linking facilities now present in Java but that path needs to be researched. To do this with javacpp we would need to build a small wrapper library that did the dynamic binding to the shared libraries and the symbols in the shared libraries.

In any case, a best-in-class CUDA development system would not have this issue. I suspect the same type of issue would be present should we decide to put effort into opencl.

貢獻者指南

研究方向: 研究使用動態載入庫（例如 GLEW）在執行時期綁定 CUDA 函式庫，而非編譯時期。探索 javacpp 在 Clojure 中實現跨平台動態載入的方法。檢查 project.clj 中當前的 CUDA 版本綁定，並探索如何有條件地載入符號。考慮 Linux、macOS 和 Windows 的建置系統需求。
技術棧: java
領域: machine learning
議題類型: 錯誤
難度: 4
預計時間: 超過 1 週
活動狀態: 活躍
清晰度: 清晰
前置要求: ClojureCUDA
新手友善度: 30

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。