As I was researching on digital signal processing I found an interesting term: Super Resolution. Super Resolution is a field which attempts to improve the resolution of an image, by using the information in one or more images. This is exactly what I was doing with Overmix, using multiple images to reduce noise.
However another aspect of Super Resolution use sub-pixel shifts in the images to improve the sharpness of the image. This could not only solve the issue with the imperfect alignment I was having, it could straight out improve the quality further than I had thought possible.
(I had actually tried to use sub-pixel alignment when I ran into the issue and I speculated it might could increase sharpness. But after much work I only managed to make it align properly without reducing the blur I was having even without it, so I didn’t press it further.)
Limits
Super Resolution has it limits however. First of all, as it tries to estimate the original image, it cannot magically surpass it and give unlimited precision. If the image was created in “480p”, even a 1080p BD upscale will still only give the “480p” image. If the original was blurry by nature, Super Resolution will result in a blurry image as well, unlike a sharpness filter.
And that raises the question, why is anime blurry and why does it not align on the pixel grid? With one sample, I got the same misalignment with both the 720p TV version and the 1080p BD version. If this was caused by downscaling the issue would be smaller at 1080p, however it isn’t. Most anime does not appear to push the boundaries of 1080p, but since there are misalignment issues I suspect their rendering pipeline isn’t optimal.
The other limit is the available images used for the estimation. If the images we have does not contain any hints on what the original image looks, we can’t guess it. Thus if there are no sub-pixel shifts in an image, Super Resolution can’t do much. And that is actually an issue because most slides only moves vertically which means we only have vertical sub-pixel shifts. In those cases we can only hope to improve detail in the vertical direction.
Using all available information
Since Super resolution uses the information in the images, the more we can get the better.
First of all, the closer we can get to the source the better, as we don’t have to estimate the defects that happens on each conversion. A PNG screenshot is better than a JPEG, and the TV MPEG2 transport stream is better than a 10-bit re-encode.
One thing to notice here is that the PNG screenshot is (with all players I have tried) a 8-bit image, not 10-bit (16-bit*) for Hi10p h264. So using PNG screenshots would loose us 2 bits.
However more importantly, PNG cannot represent an image from a MPEG stream directly. The issue is that PNG only supports RGB and MPEG uses Y’CbCr. Y’CbCr is a different color space invented to reduce the required bandwidth of image/video. The human eye is most sensitive to luminance and not so much to color, which Y’CbCr takes advantage of. MPEG then (normally) uses Chroma subsampling which is the practice of reducing the resolution of the planes containing color information. A 1280×720 encode will normally have one plane at 1280×720 and two at 640×360.
So to save as a PNG, the video player upscales the chroma planes and converts to RGB, losing valuable information.
Going even further, video is compressed using a combination of key- and delta-frames. Key-frames stores a whole image while delta-frames only stores how to get from one frame to another. The specifics about how those frames were compressed is again valuable information. (But I don’t know much about how this is done.)
Status of Overmix
Overmix now accepts a custom file format which can store 8- and 10-bit chroma subsampled Y’CbCr images. I created an application using libVLC that takes the output with minimal preprocessing and stores it in this format. (It also makes it easier to save every frame in the slide.)
Overmix now only uses the Y’ plane to align on, instead of all 3 in RGB. My next goal is to redo the alignment algorithm. Currently it renders an average of all previous added images to align on, as otherwise the slight misalignment would propagate with each added frame. However I will try to use a multi-pass method now, where it will roughly align all images and then do a sub-pixel alignment on the images afterwards. Sub-pixel alignment will, at least in the start, be done by upscaling as optical flow makes no sense to me yet.
Then I need to redo the render system, as it is currently optimized for aligned images, and this will clearly not be the case anymore.
I haven’t worked on Overmix for quite some time due to University stuff, but the next three months I should have plenty of time, so hopefully I will get it done before that is over.
July 12th, 2013 10:07
I’m trying to understand super-resolution but couldn’t grasp what sub-pixel alignment is and how it could be used to enhance the resolution of the image. Could you please help me with that?
July 14th, 2013 12:22
With this kind of super resolution you have to imagine you have one large High Resolution (HR) image. Your sequence of Low Resolution (LR) images are downscaled versions of this HR image.
Let us say that the HR image is 25 times larger than your LR images. Thus, pixel 0x0 in your LR image represents all pixels in the range 0x0 to 24×24 inclusive. (Knowing this scale is not important, it is just to make the examples easier to understand.)
Another LR image’s 0x0 pixel might represent the range 25×25-49×49. This would mean that this pixel is exactly the same as the other LR image’s 1×1 pixel (not considering noise).
A LR might not have an offset of exactly 25 however and we might have the 0x0 pixel represent 10×10-34×34 instead. This is a sub-pixel shift, the pixel is not the same as any of the 0x0, 0x1, 1×0 or 1×1 pixels in the other two LR images.
The image does not contain any more information than the other LR images, but it caries another set of information describing the same HR image. So when we have both, we can start trying to estimate what for example the 10×10-24×24 pixel would be, the part that is both in the 0x0-24×24 pixel and the 10×10-34×34 pixel, but not in the 25×25-49×49 pixel (etc.). The more different offsets we have, the better we can estimate smaller parts of the image.
The problem however is that we do not know the exact offsets, so we have to try to align them with sub-pixel precision in order to estimate those offsets. How well we can find those are directly linked to how accurate our estimation of the smaller pixels is.
October 28th, 2013 16:56
Hi, may I suggest another practical application for super resolution?
Digital photography took over analog photography for the masses. However, there are a lot of folks who wanted to continue using theirs analog gear. The problem is: scanner technology is no longer develop. One cannot extract all information from the film and save it into digital file. So there is an idea: using super resolution algorithms to enhance quality of scans. It is well known that the practical limit for film scanning is somewhere between 4000 and 5000 dpi. However, cheap flatbed scanners cannot deliver more that 2400 dpi (low end 1200 dpi). Do you think your software could be used to mix 2-4 different scans into bigger one?
October 29th, 2013 22:54
Super resolution could certainly be used for that, but keep in mind that you would need 4 scans with perfect displacement to get something close to 4800×4800 dpi with a 2400×2400 dpi scanner. So I would guess that would be 6-8 scans in practice, doable but not really that practical.
I did a quick check and it appears that you can get scanners capable of 9600×9600 dpi (optical) for negative scanning. Quite a few more if 4800×4800 dpi is enough. (Specs should not to be blindly trusted though…)
My software is not done yet, progress is steady but slow as I’m doing this in my spare time. When done it could be used for this, as long the images are not rotated relative to each other. 48-bit RGB input/output is low on my priority list.