CUDA

Skill level
1 Basic skills

I've never programmed in CUDA (or OpenCL) directly so my knowledge is limited to using software already designed for CUDA (or OpenCL). Because our computer vision and machine-learning algorithms are (1) computationally intensive and (2) we are applying these algorithms to real-time live video streams (often in HD), it has been necessary to use the massively-parallel computing capabilities of GPUs to make these algorithms usable in real-time. TensorFlow and the latest releases of OpenCV can leverage the power of Nvidia's proprietary CUDA GPU hardware to achieve the necessary performance gains. Therefore, our group uses Telsa and Maxwell class Nivdia GPUs.

Because the Nvidia GPU hardware, drivers and libraries all have to exactly match with each other, a fair amount of effort is needed to download exactly the right versions of the Nvidia drivers and libraries to match the versions to TensorFlow or OpenCV we are presently using with our GPUs. One cannot download the latest code and expect things to work. OpenPose, for example, was compiled for an older version of TensorFlow and cannot use anything newer than CUDA 9. But, OpenCV 4.3 and higher can use CUDA 10 or higher. Because OpenCV must be matched to a particular Nvidia GPU, the entire library must rebuilt from source. I have been through this exercise enough times that I am now fairly good at building CUDA compatible TensorFlow, Docker and OpenCV images. I presently use a custom-compiled version of OpenCV 4.3 to run the EAST* deep-learning network on a CUDA GPU.

*The EAST DNN is described in the paper EAST: An Efficient and Accurate Scene Text Detector. I use a TensorFlow re-implementation which leverages the ResNet-50 architecture rather than the PVANet architecture described in the paper. This implementation is available from GitHub which was then transformed into the Intel Optimized Model by the OpenVINO toolkit that the OpenCV 4.3 DNN module can now execute with Nvidia CUDA 10+ support. The CUDA-accelerated version of the EAST DNN runs about 8X faster than the non-accelerated version on my Maxwell-based hardware.

Note: OpenCL (originally developed by Apple and maintained as open-source by the Kronos Group that also maintains OpenGL and Valkan), is not presently supported by TensorFlow. While Nvidia supports OpenCL, OpenCL uses JIT-compilation (like OpenGL) which makes it adaptable to a greater range of GPUs but at the expensive of about half the performance; (see A Performance Comparison of CUDA and OpenCL). OpenCV 4.x now supports Vulkan and OpenCL but I have not yet tried to use it.

Experiences using this skill are shown below:



Barco Labs (research)

[I know, this section just echos the same stuff as on the résumé. I plan to expand later.] Worked with PhDs, staff and university interns researching disruptive technologies. Barco Labs deliverables are research papers, patents and demos. Any research that might become a viable product in 2 to 5 years is then passed off to one of the product divisions. (Due to the trade secret nature of this research some details cannot be revealed.) Accomplishments: