Jan 25 2014

A year of Overmixing

Category: Overmix,Programs,SoftwareSpiller @ 01:54

It is now about a year ago since I started this project, and not in my wildest dreams I would have imagined how much work I have put into this by now. So here is a quick overview of my progress.

(Overmix is now located at github: https://github.com/spillerrec/Overmix, download here: https://github.com/spillerrec/Overmix/releases )

Automatic aligning

While zooming and rotating is not supported (yet at least), horizontal and vertical movement is rather reliably detected and can be done to sub-pixel precision. Sub-pixel precision is rather slow and memory intensive though, as it works by image upscaling. Still needs some work when aligning images with transparent regions though, and I also believe it can be made quite a bit faster.

De-noising

Noise is removed by averaging all frames and works very well, even on blue-ray rips. However since I have not been able to render while taking sub-pixel alignment into account, it blurs the image ever so slightly.

10-bit raw input and de-telecine

I have made a frame dumper to grab the raw YUV data from the video streams, to ensure no quality loss happen between video file and Overmix. This actually showed that VLC outputs in pretty low quality, and appears to have some trouble with Hi10p content and color spaces. Most noticeable however is that VLC uses nearest-neighbor for chroma-upsampling, which especially looks bad with reds. Having access directly to the chroma channels also opens up for the possibility of doing chroma-restoration using super resolution.

I have made several tools for my custom “dump” format, including the video dumper, Windows shell thumbnailer (vista/7/8), QT5 image plugin and a compressor to reduce the filesize. They can be found in their separate repository here: https://github.com/spillerrec/dump-tools

De-telecine have also been added, as it is necessary in order to work with the interlaced MPEG2 transport stream anime usually is broadcasted in.

Deconvolution

Using deconvolution I have found that I’m able to make the resulting images much sharper, at the expense of a little bit of noise. It appears that anime is significantly blurred, perhaps because of the way it have been rendered, or perhaps intentionally. More on this later…

TV-logo detection and removal

It works decently as shown in a recent post and can also remove other static content such as credits. I have tried to do it fully automatically, but it doesn’t work well enough compared to the manual process. Perhaps some Otsu thresholding combined with dilation will work?

I’m intending to try doing further work with this method to see if I can get it to work with moving items, and doing the reverse, separating the moving item. This is interesting as some panning scenes moves the background independently of the foreground (for faking depth of field).

Steam removal

Prof-of-concept showed that if the steam is moving, we can chose the parts from each frame which contains the least amount of steam. Thus it doesn’t remove it completely, but it can make a significant difference. The current implementation however deals with the colors incorrectly, which is fixable, but not something I care to do unless requested…

Animation

Some scenes contains a repeating animation, while doing vertical/horizontal movement. Especially H-titles seems have much of this, but can also be found as mouth movement and similar. While still in its early stages, I have successfully managed to separate the animation into its individual frames and stitch those, using a manual global threshold.

I doubt it would currently work with minor animations, such as changes with the mouth, noise would probably mess it up. So I’m considering investigating other way of calculating the difference, perhaps using the edge-detected image instead, or doing local differences.



Nov 23 2013

Too much steam in your anime?

Category: Anime,Overmix,Programs,SoftwareSpiller @ 03:54

Well, Overmix is here with a dehumidifier to solve your problem. Too damp? Run it once and watch as your surroundings become clearer.

Your local hot spring before:

and after:

Can’t get enough of Singing in the rain? Don’t worry, just put it in the reverse and experience the downpour.

Normal rainy day:

The real deal:

This is another multi-frame approach, and really just as simple as using the average. Since the steam lightens the image, all you have to do is to take the darkest pixel at that position. (In other words, the lighter the pixel is, the more likely it is to be steam.) Since the steam is moving, this way you use the least steamy parts of each frame to gain a stitched image with the smallest amount of steam.

If we do the opposite, take the brightest pixel, we can increase the amount of steam. That is not really that interesting, but the second example shows how we can uses this to bring out features that would otherwise be treated as noise. We could also combine it with the average approach using a range, to deal with the real noise, but I did this for fun so I didn’t go that far.

While this is a fairly simple method, it highlights that we can use multiple frames not just to improve quality, but also to analyze and manipulate the image. I have several neat ideas I want to try out, but more about those when I have something working.


Nov 09 2013

Colorspaces and VLC

Category: Overmix,Programs,SoftwareSpiller @ 21:23

There are two colorspaces commonly used in video today, which are defined in Rec. 601 and Rec. 709 respectively. Simply speaking, Rec. 601 is mainly used for analog sources, while Rec. 709 mainly is for HD TV and BD.

So how do VLC handle this? It assumes everything is Rec. 601 and you get something like this:

The bottom left is from a DVD and the top right is from the BD release. In comparison, here is how it looks in Overmix, using Rec. 601 for the DVD release and Rec. 709 for the BD release:

VLC also seems to ignore the gamma difference between Rec. 601/709 and sRGB, and it handles 10 bit content in a way that reduces color accuracy to worse than 8 bits sources. Behold, the histogram from a Hi10p source:

Free stuff might be nice, but this is what you get…

EDIT: I messed up the studio-swing removal in Overmix (which is now fixed), so the colors were slightly off. It was consistent between rec.601/709 so the comparison still holds. Overmix might be nice, but this is what you get…

Tags: , , ,


Nov 03 2013

First Super Resolution results

Category: Overmix,Programs,SoftwareSpiller @ 23:22

Just five months later… Here are some early results using artificial data.

Using Wikimedia Commons “picture of the day” for October 31. 2013 by Diego Delso (CC BY-SA 3.0), I created LR (Low Resolution) images which were 4 times smaller in each direction. Each LR image had its own offset, so to have one LR image for all possible offsets, 16 images was created.

To detect the sub-pixel alignment afterwards, the images were upscaled to 4x their size and ordinary pixel-based alignment was used. The upscaled versions were only used for the alignment and thus discarded afterwards. The final image was then rendered at 4x resolution using cubic interpolation, but taking the sub-pixel alignment into account. Lastly the image was deconvolved in GIMP using the G’MIC plugin to remove blur. The results are shown below:

Left side shows the LR (shown upscaled using Nearest neighbor interpolation) and original image respectively. Right side shows the SR (Super Resolution) results, using different interpolation methods. Both are cubic, however the top is using Mitchell and the bottom is using Spline. In simple terms, Spline is more blurry than Mitchell but has less blocking artifacts. Mitchell is usually pretty good choice (as it is a compromise between several other cubic interpolation methods), however the blocking is pretty noticeable here. Using Spline here avoids that and since we attempt to remove blur afterwards it works pretty well. However do notice that Mitchell does recover slightly more detail in the windows to the right.

But while Mitchell often does appear to be slightly more sharp, it tends to mess up more often, which can clearly be seen on the “The power of” building to the left. The windows are strangely mixed up into each other, while they are perfectly aligned when using Spline.

Conclusion

Results are much better than the LR images, however it is more an magnification of 2x instead of the optimal 4x. And to make matters worse, this is generated optimal data without blur or noise.

However this is the simplest way of doing SR and I believe other methods do give better results. Next I want to try the Fourier-based approach which is also one of the early SR methods. It should give pretty good results, but it is not used much anymore because it does not work for rotated or skewed images.

Using artificial data has really shown me why I have had so little success with it so far. I’m mainly working with anime screenshots and the amount of detail which can be restored is probably not that much. My goal is actually more to avoid blurriness that happens when they are not aligned perfectly. Thus while it should have been obvious, lesson learned, do not test on data which you are not sure whether will give an result or not… What I did gain from this is that anime tends to be rather blurry and that image deconvolution can help a lot. When I understand this blurriness in detail I will probably write more about it though.


Jun 07 2013

Overmix and Super Resolution?

Category: Anime,Overmix,Programs,SoftwareSpiller @ 22:43

As I was researching on digital signal processing I found an interesting term: Super Resolution. Super Resolution is a field which attempts to improve the resolution of an image, by using the information in one or more images. This is exactly what I was doing with Overmix, using multiple images to reduce noise.

However another aspect of Super Resolution use sub-pixel shifts in the images to improve the sharpness of the image. This could not only solve the issue with the imperfect alignment I was having, it could straight out improve the quality further than I had thought possible.

(I had actually tried to use sub-pixel alignment when I ran into the issue and I speculated it might could increase sharpness. But after much work I only managed to make it align properly without reducing the blur I was having even without it, so I didn’t press it further.)

Limits

Super Resolution has it limits however. First of all, as it tries to estimate the original image, it cannot magically surpass it and give unlimited precision. If the image was created in “480p”, even a 1080p BD upscale will still only give the “480p” image. If the original was blurry by nature, Super Resolution will result in a blurry image as well, unlike a sharpness filter.

And that raises the question, why is anime blurry and why does it not align on the pixel grid? With one sample, I got the same misalignment with both the 720p TV version and the 1080p BD version. If this was caused by downscaling the issue would be smaller at 1080p, however it isn’t. Most anime does not appear to push the boundaries of 1080p, but since there are misalignment issues I suspect their rendering pipeline isn’t optimal.

The other limit is the available images used for the estimation. If the images we have does not contain any hints on what the original image looks, we can’t guess it. Thus if there are no sub-pixel shifts in an image, Super Resolution can’t do much. And that is actually an issue because most slides only moves vertically which means we only have vertical sub-pixel shifts. In those cases we can only hope to improve detail in the vertical direction.

Using all available information

Since Super resolution uses the information in the images, the more we can get the better.

First of all, the closer we can get to the source the better, as we don’t have to estimate the defects that happens on each conversion. A PNG screenshot is better than a JPEG, and the TV MPEG2 transport stream is better than a 10-bit re-encode.

One thing to notice here is that the PNG screenshot is (with all players I have tried) a 8-bit image, not 10-bit (16-bit*) for Hi10p h264. So using PNG screenshots would loose us 2 bits.

However more importantly, PNG cannot represent an image from a MPEG stream directly. The issue is that PNG only supports RGB and MPEG uses Y’CbCr. Y’CbCr is a different color space invented to reduce the required bandwidth of image/video. The human eye is most sensitive to luminance and not so much to color, which Y’CbCr takes advantage of. MPEG then (normally) uses Chroma subsampling which is the practice of reducing the resolution of the planes containing color information. A 1280×720 encode will normally have one plane at 1280×720 and two at 640×360.

So to save as a PNG, the video player upscales the chroma planes and converts to RGB, losing valuable information.

Going even further, video is compressed using a combination of key- and delta-frames. Key-frames stores a whole image while delta-frames only stores how to get from one frame to another. The specifics about how those frames were compressed is again valuable information. (But I don’t know much about how this is done.)

Status of Overmix

Overmix now accepts a custom file format which can store 8- and 10-bit chroma subsampled Y’CbCr images. I created an application using libVLC that takes the output with minimal preprocessing  and stores it in this format. (It also makes it easier to save every frame in the slide.)

Overmix now only uses the Y’ plane to align on, instead of all 3 in RGB. My next goal is to redo the alignment algorithm. Currently it renders an average of all previous added images to align on, as otherwise the slight misalignment would propagate with each added frame. However I will try to use a multi-pass method now, where it will roughly align all images and then do a sub-pixel alignment on the images afterwards. Sub-pixel alignment will, at least in the start, be done by upscaling as optical flow makes no sense to me yet.

Then I need to redo the render system, as it is currently optimized for aligned images, and this will clearly not be the case anymore.

I haven’t worked on Overmix for quite some time due to University stuff, but the next three months I should have plenty of time, so hopefully I will get it done before that is over.

Tags: , , , ,


Feb 28 2013

Stitching anime screenshots in overdrive

Category: Anime,Overmix,Programs,SoftwareSpiller @ 00:44

I have been developing a new application named Overmix, which attempts to improve the quality of anime screenshot stitching. This article will shortly explain what stitching is, what issues affect the quality and how Overmix tries to fix those. At the end a short summery of the results for the current progress is given.

Background

One common animation technique is panning where the camera moves/pans over the image, showing only a part of it at a given time:

animation of pan shot

(Shot on YouTube: http://youtu.be/DsHjblyEG88?t=6m25s)

Very little movement actually happens during the shot, in fact only the mouth is moving (presumably to reduce animation costs). This makes it possible to combine the frames together to one large image, which is known as “stitching”.

Source quality

The issue is however that more often than not, the video quality isn’t that great. The video has been compressed and especially if the source is a TV-transmission or webcast, visual artifacts can be quite noticeable:

Example of noise artifacts

The two most significant artifacts with anime encodes is noise (shown above) and color banding/posterization (shown below).

Example of color banding

Reducing artifacts

A stitch is normally done by taking two frames, finding the offset between the two images and then soften the edges between the images to make the transition less apparent (which is usually done by applying a gradient on the alpha channel).

Since this is a time consuming process, as few frames as possible is used. The idea is to do the opposite, use as many frames as possible. The reason is that the artifacts are not static, for every frame they differ slightly. In result, every frame carries a slightly different set of information. The goal is then to derive the original information, based on this set of inconsistent information.

Just by using the average, we can get quite decent results:

Comparison between average and single

(Right is a single frame, left is the average of all unique frames.)

Results

Noise artifacts has shown to nearly disappear completely when simply averaging every frame with each other, even when the source has a significant amount of noise artifacts. Color banding is also reduced but with much more varying amounts.

Even with modern TV-encodes, stitches sees a significant improvement from using this technique and can visually be tell apart at normal magnification. Surprisingly, even when using good BD-encodes there is usually a slight improvement, but normally requires 2-4 times magnification to be noticeable.

It has shown that it often is not possible to make a perfect alignment when sticking to the pixel grid. This causes the images to be slightly more blurry than originally. It is an area which still requires work.

Using the average to derive the result is not always desirable, as the encode might contain information not related to the image. Such information could be subtitles, TV logos or simply errors in the source. See the following image as example, the most-right column of pixels was completely black and shows up as lines in the averaged image.

Stitched with Overmix

However the currently devised algorithms has a tendency to choke on the slight misalignment mentioned previously and cause unwanted artifacts. If this is solved best by fixing the misalignment or by improving the algorithm is up to discussion.

Binaries and source code

Overmix is licensed as GPLv3 and can be found here: Overmix on Github

Binaries for Windows 64-bit can also be found on Github here: Overmix releases

Tags: ,


Feb 16 2013

Headphone stand

Category: Lego,TechnicsSpiller @ 17:31

The wires on my headphones kept breaking, so I decided to buy a set of wireless ones. The I bought the Sennheiser RS 160 which uses a portable transmitter, instead of the larger stationary RS 170. However the RS 170 transmitter duals as a headphone stand (and charger) and I would like to have some way of safely storing my headphones, so I built a stand in Lego Technics.

I had two ambitions, to make it look a bit more fancy than what I usually build and to mainly use pieces that I rarely use (as otherwise I would probably disassemble it if I needed them). When I noticed that large box with angled beams I never use, it seemed perfect for this.

Headphone stand made with Lego Technics

In the end it just became two plates with 4 straight beams built using angled beams, but I still think it gives it a nice touch.

The beams are not locked and are free to move, which should make the stand very unstable, however it is not. Partially because of the friction pins, but mainly because the beams are each placed at a slightly different angles than the others. So when the top plate wants to move in one direction, one of the beams will restrict this movement as the angle is slightly off. It is not perfect, but as long as you are not rough with it, it stands.

Tags: ,


Nov 03 2012

NXT console v0.1

Category: Lego,Mindstorms,NXC,Programs,SoftwareSpiller @ 20:27

A long time ago on Mindboards some talk was made about displaying text-output like how it is done in a console, but I never ended up writing any code. Since it have been quite some while since I last wrote anything in NXC, I did this as a quick brush-up project.

Supports scrolling up and down with the left and right button on the NXT, and supports the control characters ‘\n’, ‘\t’, ‘\a’ and ‘\b’. ‘\b’ only works on the text you are currently adding though.

Download:

NXT console v0.1


Jul 21 2012

IE10 flip ahead and standards

Category: Software,WebdevelopmentSpiller @ 23:13

According to within windows IE10 has added a new feature to simplify page navigation. It is called “flip ahead” causes the browser to automatically find the next page if you click on the right side of the page. (It also makes a fancy slide animation which I guess tablet users will enjoy.) To quote within windows: “There are no futile attempts at tapping tiny links or looking for “next page” links on a badly designed website.”

There were two kind of responses in the comments, the ones praising the feature and the ones noting that this feature have been in Opera for years. As a avid Opera user I of course know about this feature and have been using it for a long time. (The main difference is that Opera doesn’t do the fancy animation and have like 10 different ways of activating it.)

But I’m not trying to be a Opera fanboy and rant about IE copying this feature. Rather, I’m happy that they do and hopefully the other browsers will too. Because this is an awesome feature, well, when it works. Sometimes the page you end up on can be completely unexpected.  And that is the issue, it isn’t really that reliable, and it is not really that strange when you consider the implementation.

The way Opera implements it (and most likely also IE) is, according to users on the web, by using a list of words which are likely to be in links pointing to the next page (in several languages). So if it finds a link which matches one of those entries, it will use that as the next page.

So this works when the page uses something commonly like “next page”. However one specific site might use “more destruction” instead of “next”. Will it work now? Perhaps, but in that case, what if another site didn’t have more than one page but did have a link to a site called “More destruction”. You could end up on a completely unrelated site or page. Such cases could be fixed, but there will always be some other special case.

So as a webdeveloper you will either have to carefully test the site in IE (and risking different behavior in Opera), or wait for some way or standard to specify the next page with some form for meta-data. Within windows says to lurk on the IE blog for tips on tailoring your site to this feature, however there is no need to wait on the bloging about it because there already is a way to specify this. Actually, it have been there for about 15 years, it is a part of the HTML 4 specification. It is a single element placed in the HEAD: [Document relationships: the LINK element]

<LINK rel="Next" href="Chapter3.html">

Lets quote the spec: “Next Refers to the next document in a linear sequence of documents. User agents may choose to preload the “next” document, to reduce the perceived load time.” Seems like it took the IE guys 15 years to notice this…

So why do browsers guess? Because way to many sites does not provide this information. And worse yet, a lot of people got it wrong, so several aliases was added to the HTML5 spec… (and I therefore recommend you to use the HTML5 spec as a reference to this instead.) Opera does support it, but because of the amount of websites that doesn’t provide it, the feature still seems shaky at best. Now when a bigger browser like IE gets support hopefully this will change, but it will still take time before the majority of websites adds it. And the “poorly designed” websites Within windows mentioned might never do it…

To conclude this rambling: It (again) saddens me to see the state of the web today.

EDIT: seems like MS really wants to try the impossible and get it working on all sites, just hear this: “Using Flip Ahead requires end user opt-in, and sends your browsing history to Microsoft to improve the quality of the experience.” [Web browsing in Windows 8 Release Preview with IE10]


« Previous PageNext Page »