One annoying artifact of averaging all frames is that static content such as TV channel logos will appear in the final image for each frame:
Similarly, sometimes frames contains a small black border which also ruins the image. My old way of thinking was that I should make a filter which ignored largely inconsistent pixels. But really, all we need to do is to crop the image…
It is the same deal with logos, what we want to do is to ignore the area which contains the logo. That means we have to select that area, which is pretty easy to do manually. But what about doing it automatically?
Logo detection, first attempt
The logo is positioned exactly the same place on each and every frame. The pixels at the logo thus have consistent values so the idea is that if we find the difference, any value below a certain threshold is part of the logo.
It doesn’t work that well, because the assumption that all other parts are inconsistent doesn’t always hold. Notice how in the above image the whole right side has about the same color. Furthermore, the logo isn’t always consistent as it is usually a bit transparent and changes depending on the background.
Below is an example of this method (top) and the final method (below). Notice how the logo looks dark, but the thresholded version to the right nearly did not capture any of the logo. While it is not very noticeable, the very right side is actually about as dark as the logo, so using any threshold higher than what was used, would include incorrect areas.
Working logo detection
The idea with this method is that instead of trying to find consistency in the frames, we estimate the original image and tries to find the inconsistencies. The original image doesn’t contain the TV logo, so the inconsistency should be high where the logo is. Then for each frame, we find the difference between that frame and the expected image. To ignore inconsistencies such as compression artifacts and the like, we find the consistencies in all the frames differences. A graphical overview of this method is given below:
The old average method is actually a decent estimate for the original image. While the logo does appear, it is quite a bit dimmer than on the individual frames as it is averaged with frames where it does not appear at that spot. As seen in the “Static stitched difference” image, it works well. However it is far from perfect, as there are some bright areas right around the logo. That is the afterimage of the logos in the estimate, if we could come up with a better estimate, that would disappear.
But that is exactly what we can now, since if we ignore the area we just found, the image would appear without logos. While our estimate was good enough, see this example of a credits text we would like to detect and remove: (click on the image to see it in full size)
The first detection (to the left) was far from good enough, but if we make a new estimation of the original image, based on that detection, we get closer to the truth and thus a better detection. We can continue iterating until the detection is stable.
Not quite automatic yet
The thresholds have been selected manually, but I do believe that it might be able to work nearly automatically. If instead of thresholding at the end of each detection instead keep it in gray-scale, we could use it as a probability for that pixel to be correct. The less likely the pixel is correct, the less it will affect the estimate in the presence of other pixels, hopefully resulting in an better estimate. It will most likely require more iterations to get a good estimate though and we might still have to do a final thresholding manually.
I could test it, I have just not yet invested the time needed to do so…
Further work
A few pixels in the mask are missing, which could affect the image negatively. Using dilation here might be a good idea.
In case some areas are completely covered by credits or the like in all frames, there exists algorithms to guess the missing pixels. Could be interesting to mess around with.
Also, it does not appear to currently correctly work on the chroma channels. In the credits example, it is still possible to slightly see the effect of the credits, but most of that is from the chroma channels. I’m not quite sure why the credits still affect the chroma channels, but I guess it is related to the subsampling. Probably nothing to major.
But I’m getting close to understanding how the image alignment issues appears, and I want to try to see if I can solve it first.