Since I had already computed the homography and object segmentation for the text readability prototype, it was just a matter of using the OpenCV library to apply perspective warping on that video stream (from a computed perspective transform based on the homography matrix) so that one video stream could be seamlessly inserted into the other to create an augmented reality mashup. The result was a much clearer rendering of the projected content as seen in the composite video stream by remote users. This also allowed my smart exposure and white-balance software to adjust for correct room exposure without having to worry about how that would affect the readability of the projected content. (Very often, the projected content would appear overexposed after adjusting for proper room exposure.) Here is a before and after example.
Since our machine-learning software was also tracking people including their head, arm and leg movements, it would have been possible to generate a mask outlining the person's body which could have been overlaid on top of the composited video such that the person's body would not be cropped by the inserted content video. I discussed this with the team but I never actually implemented that enhancement.