I was given the task of researching automatic white-balance algorithms with the goal of calibrating multiple video cameras to the same color balance. The problem was that when switching between multiple cameras covering the same scene, a noticeable color shift was observed in the video stream, especially when the cameras were of different manufacture. In addition, the cameras needed to render a neutral color balance under different room lighting conditions while also ignoring certain objects within the scene in terms of exposure and white balance, such as computer monitors and projection screens.
Since I was already using homography and object segmentation for DNN text detection to find computer monitors and projection screens within a scene, I just added object masking to the same code so that these objects were removed from the video stream used for exposure and color balance adjustments. Here is a before and after view with the masked area shown as black in the masked rendering (with the computed homography matrix shown by the green outline).
Note: I actually used masking, not blackened pixels as shown above, so the masked area had no effect on the exposure histogram one way or the other.
I then looked at different techniques to evaluate the overall exposure using histogram algorithms (provided by OpenCV). An edge case was detecting over- and underexposure where a large population of pixels were either at the top or bottom end of the histogram. I experimented with various algorithms to spread the pixel populations in the histogram as evenly as possible by sending VISCA-IP commands to the camera to change the camera lens aperture. (VISCA is a professional video camera control protocol widely adopted by many manufactures.)
One problem I ran into was that when a RGB video frame was converted to grey-scale by OpenCV, the grey-scale histogram suffered an aliasing effect where every 7th bin in the histogram cancelled out to zero. Therefore, some form of histogram smoothing was necessary. KDE (kernel density estimation) is the usual technique for doing this but it turned out to be quite slow when using a gaussian kernel. Therefore, I tried cubic-spline smoothing which was much faster and when comparing the cubic-spline results against KDE, the agreement was good enough for good exposure evaluation.
Another problem was video latency. The VISCA commands to change the aperture where being sent directly to the camera but the results of the aperture change did not appear in the RTSP video stream right away due to a delay through the video pipeline. Therefore, the software had to wait until a change in aperture actually appeared in the video stream before making the next adjustment. This really slowed down the feedback loop needed to adjust the exposure. Because the VISCA controller was synchronous, it was necessary to run it in a separate thread to keep the video processing loop from dropping frames. And, to minimize the impact of video latency, I devised a relative aperture change algorithm which reduced the number of round-trip commands to the camera needed to obtain the correct exposure. Here is a UML activity diagram of the approach.
The results were quite good unless the room lighting was very uneven and when a spotlight was used on the subject in an otherwise darkened room.
The next task was to align the white balance (WB) of all the cameras so when switching between them the viewer would not notice a change in color balance. I evaluated all the OpenCV auto-WB algorithms (available at that time). This included the following algorithms:
- Simple WB
- This algorithm independently stretches the histogram of each color channel to be the same. The results were disappointing, so I went on to the next algorithm.
- Gray World WB
- This algorithm equalizes the average saturation of each color channel so that the sum of the color channels yields a grey image. The results from this algorithm were also disappointing.
- Learning-based WB
- This algorithm looked more promising as it was a machine-learning algorithm pre-trained on a large set of test images. The algorithm was based a paper by Cheng, Price, Cohen & Brown, "Effective learning-based illuminant estimation using simple features" in the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1000–1008, 2015. This was the first machine-learning algorithm I've run across that didn't use a neural-network approach. Instead, it uses an older technique involving regression trees. The algorithm first generates 4 feature tuples: (1) average chromaticity which is the same as the Gray World WB, (2) brightest chromaticity, (3) dominate chromaticity and (4) a mode of the 300 most common colors. The chromaticity is based on camera RGB stimulus responses rather than CIE tri-stimulus responses of the human eye. (I had to spend a lot of time studying colorimetric theory to understand all this.) These four feature tuples, from thousands of images, were then used to train a set of regression trees. A trained regression tree for a red and green chromaticity feature tuple might look like this:
If 3 of the 4 the regression tree predictions on the feature tuples were in close agreement, that WB adjustment prediction was used. Otherwise, the average of the 4 WB predictions was used. Unfortunately, this algorithm yielded results that were worse overall than either of the two simpler WB algorithms. Maybe the reason for the poor results were that the room scenes I was using were so different from the typical scenes used to train the algorithm that the algorithm didn't know what to do. The algorithm does provide access to the four feature tuples so one can train it with one's own set of images, but I decided that was not worth the effort. In any case, I give up on this most promising of the OpenCV WB algorithms.
Another problem with all these OpenCV algorithms was that it was not possible to retrieve the resulting RGB gain adjustment values for proper white balance. Instead, all these WB algorithms internally called
cv::xphoto::applyChannelGains(input_image, output_image, Bgain, Ggain, Rgain). Therefore, the only thing one could do was provide an input video frame and get a corrected output video frame as the result. But, what I really wanted was access to those hidden RGB gain adjustments so the camera's CCD RGB gains could then be adjusted in the camera via VISCA. That way the white-balanced video from the camera would already be corrected without the need for frame-by-frame video post-processing.
Because of these OpenCV issues, I gave up on the entire OpenCV WB approach and worked out my own automated WB algorithm. I can't explain in detail how my algorithm works because it is still under a Barco trade-secret non-disclosure. This much I can reveal.
Instead of trying to figure out the correct color balance from the scene alone, I use a grey card inserted somewhere within the scene with an ArUco marker that allows the software to locate the grey card automatically within the video frame. My prototype is triple-threaded with a thread for the VISCA controller, another thread for video processing and a third thread for graphing WB convergence times (needed for optimizing the algorithm). Once the algorithm is trained, it normally converges to the correct WB within 2 seconds. Exposure also is now measured off the grey card instead of from the overall scene. This has the advantage of adjusting to the correct exposure in spotlight illuminated rooms (assuming the grey card is lit by the spotlight during camera calibration). Here is an example of how the gray card is used with real-time colorimetric data added for algorithm debugging. (Due to the COVID-19 shelter-in-place order in effect at the time, these frame grabs were done in my bedroom.) The uncorrected frame grab looks like this (with an overall yellowish color case due to the 3500K spotlight on the bookcase):
And, the corrected frame after applying the appropriate CCD gains within the camera (via VISCA) came out like this:
You can see the correlated color temperature (CCT) as seen by the camera is now 6547K (which is close to the ideal D65 illuminate of 6504K). And, the results have been consistent across different camera models at different zooms and different room lighting situations. Overall, the results have been quite satisfactory.