As I was researching on digital signal processing I found an interesting term: Super Resolution. Super Resolution is a field which attempts to improve the resolution of an image, by using the information in one or more images. This is exactly what I was doing with Overmix, using multiple images to reduce noise.
However another aspect of Super Resolution use sub-pixel shifts in the images to improve the sharpness of the image. This could not only solve the issue with the imperfect alignment I was having, it could straight out improve the quality further than I had thought possible.
(I had actually tried to use sub-pixel alignment when I ran into the issue and I speculated it might could increase sharpness. But after much work I only managed to make it align properly without reducing the blur I was having even without it, so I didn’t press it further.)
Super Resolution has it limits however. First of all, as it tries to estimate the original image, it cannot magically surpass it and give unlimited precision. If the image was created in “480p”, even a 1080p BD upscale will still only give the “480p” image. If the original was blurry by nature, Super Resolution will result in a blurry image as well, unlike a sharpness filter.
And that raises the question, why is anime blurry and why does it not align on the pixel grid? With one sample, I got the same misalignment with both the 720p TV version and the 1080p BD version. If this was caused by downscaling the issue would be smaller at 1080p, however it isn’t. Most anime does not appear to push the boundaries of 1080p, but since there are misalignment issues I suspect their rendering pipeline isn’t optimal.
The other limit is the available images used for the estimation. If the images we have does not contain any hints on what the original image looks, we can’t guess it. Thus if there are no sub-pixel shifts in an image, Super Resolution can’t do much. And that is actually an issue because most slides only moves vertically which means we only have vertical sub-pixel shifts. In those cases we can only hope to improve detail in the vertical direction.
Using all available information
Since Super resolution uses the information in the images, the more we can get the better.
First of all, the closer we can get to the source the better, as we don’t have to estimate the defects that happens on each conversion. A PNG screenshot is better than a JPEG, and the TV MPEG2 transport stream is better than a 10-bit re-encode.
One thing to notice here is that the PNG screenshot is (with all players I have tried) a 8-bit image, not 10-bit (16-bit*) for Hi10p h264. So using PNG screenshots would loose us 2 bits.
However more importantly, PNG cannot represent an image from a MPEG stream directly. The issue is that PNG only supports RGB and MPEG uses Y’CbCr. Y’CbCr is a different color space invented to reduce the required bandwidth of image/video. The human eye is most sensitive to luminance and not so much to color, which Y’CbCr takes advantage of. MPEG then (normally) uses Chroma subsampling which is the practice of reducing the resolution of the planes containing color information. A 1280×720 encode will normally have one plane at 1280×720 and two at 640×360.
So to save as a PNG, the video player upscales the chroma planes and converts to RGB, losing valuable information.
Going even further, video is compressed using a combination of key- and delta-frames. Key-frames stores a whole image while delta-frames only stores how to get from one frame to another. The specifics about how those frames were compressed is again valuable information. (But I don’t know much about how this is done.)
Status of Overmix
Overmix now accepts a custom file format which can store 8- and 10-bit chroma subsampled Y’CbCr images. I created an application using libVLC that takes the output with minimal preprocessing and stores it in this format. (It also makes it easier to save every frame in the slide.)
Overmix now only uses the Y’ plane to align on, instead of all 3 in RGB. My next goal is to redo the alignment algorithm. Currently it renders an average of all previous added images to align on, as otherwise the slight misalignment would propagate with each added frame. However I will try to use a multi-pass method now, where it will roughly align all images and then do a sub-pixel alignment on the images afterwards. Sub-pixel alignment will, at least in the start, be done by upscaling as optical flow makes no sense to me yet.
Then I need to redo the render system, as it is currently optimized for aligned images, and this will clearly not be the case anymore.
I haven’t worked on Overmix for quite some time due to University stuff, but the next three months I should have plenty of time, so hopefully I will get it done before that is over.