Aug 13 2015

Minimizing JPEG artifacts

Category: Overmix,Programs,SoftwareSpiller @ 18:21

The original goal with Overmix was to reduce compression noise by averaging frames and this works quite well. However video compression is quite complex and simply averaging pixel values fails to take that into consideration. But since it really is quite complex I will as a start consider the case were each frame is simply compressed using JPEG. (There actually happens to be a video format called Motion JPEG which does just that.) Video is usually encoded using motion compensation techniques to reduce redundancy from other frames, so this is a major simplification. I am also using the same synthetic data from the de-mosaic post (which does not contain sub-pixel movement) by compressing each frame to JPEG at quality 25.

Before we begin we need to understand how the images were compressed. There are actually many good articles about JPEG compression written in an easy to understand manner, see for example Wikipedia’s example or the article on WhyDoMath.
As a quick overview: JPEG segments the image into 8×8 blocks which are then processed individually. This is done by transforming the block into the frequency domain using the Discrete cosine transform (DCT). Informally, this separates details (high frequencies) from the basic structure (low frequencies). The quality is then reduced using a Quantisation table which determines how precise each coefficient (think of it as a pixel) in the DCT is preserved. For lower quality settings, the quantisation table lowers the quality of the high frequencies more. The same table is used for all 8×8 blocks. After the quantisation step, each value is rearranged into a specific pattern and losslessy compressed.
Line-art tends to suffer of this compression since the sharp edge of a line requires precise high-frequency information to reproduce.

The idea to minimize the the JPEG compression is to estimate an image, which best explains our observed compressed JPEG images. Each image is offsetted to each other, so the 8×8 blocks will cover different areas in each image most of the time.
To make the estimate we use use the average of each pixel value like the original ignorant method. Then we will try to improve our estimate in an iterative manner. This is done by going trough all our JPEG compressed frames and try to recover the details lost after quantisation. For each 8×8 block in the compressed frame, we apply the DCT on the equivalent 8×8 area in the estimated image. Then for each coefficient we check if it is the same value as our compressed frame after the quantisation. If so, we will use the non-quantified coefficient to replace the one the the compressed frame. The idea is that we will slowly try to rebuild the compressed frame to its former glory, by using the accumulated information from the other frames to estimate a more precise coefficients that when compressed with the same settings would still produce our original compressed frame.


The figure shows a comparison at 2x magnification. From left to right: A single compressed frame, averaging all the frames pixel by pixel, the JPEG estimation method just explained after 300 iterations, and the original image the compressed frames was generated from.

The most significant improvement comes from the averaging, most of the obnoxious artifacts are gone and the image appears quite smooth. However it is quite blurry compared to the original and have a lot of ringing which is especially noticeable with the hand (second row).
The estimation improves the result a little bit, the edges are slightly sharper and it brings out some of the details, for example the button in the first row and the black stuff in the third row. However it still have ringing and even though it is reduced, it is a bit more noisy making it less pleasing visually.

The problem here is that the quantisation is quite significant and there are many possible images which could explain our compressed frames. Our frames simply does not contain enough redundancy to narrow it in. The only way to improve the estimation is to have prior knowledge about our image.

My quick and dirty approach was to use the noise reduction feature of waifu2x. It is based on a learning based approach and waifu2x have been specially trained on anime/line-art images. Each compressed frame was put through waifu2x on High noise reduction, and this was used as an initial estimate for each image and the coefficients were updated like usual. Afterwards the algorithm was run like before, and the result can be seen below:


From left to right: Single frame after being processed by waifu2x, average of the frames processed by waifu2x, the estimation without using waifu2x (unchanged from the previous comparison), and finally the estimation when using waifu2x.

The sharpness in the new estimate is about the same, but both ringing and especially noise have been reduced. It have a bit more noise than the averaged image, but notice that the details in the black stuff (third row) for the averaged image is even worse than the old average (see first comparison). This is likely due to it not being line-art and waifu2x therefore not giving great results. Nevertheless, the estimation still managed to bring it back as no matter how bad our estimate is, as it still enforces the resulting image to conform to the input images.

But two aspects makes this a “quick and dirty” addition. The prior knowledge is only applied to the initial estimate and ignored for the following iterations, which might be the reason it is slightly more noisy. It still worked here, but there is no guaranty. Secondly waifu2x is trained using presets and ignores how the image was actually compressed.

Tags: , ,


Jul 14 2015

Removing mosaic censor

Category: Anime,Overmix,Programs,SoftwareSpiller @ 12:01

Note: Due to the nature of censored content, I’m showcasing synthetic data in this post.

Japan has its crazy censorship laws, so most hentai anime uses mosaic censoring to hide the important bits. Of course with my work on Overmix I’m interested in camera pans that can be stitched, and those could look like the following:

What is interesting however is that, while it might not be too apparent at first, the mosaic censor changes from frame to frame. Instead of censoring the image which is panned, a specific area is marked as “dirty” and censoring is applied on each frame separately. So looking even closer, we can see that the mosaics are arranged in a grid which is relative to the viewpoint (i.e. fixed) and not the panned image (which is moving).

So depending on how the panned image is offset from the viewpoint, each tile in the mosaic will cover a different area in different frames. If we take all our frames and find the average for each pixels, we can see a vast improvement already:

It is however quite blurry… However super-resolution have thought us that if we have several images of the same motif but where each image caries slightly different information, we can exploit it to increase the resolution. So let’s try to do better.

Before we can start, we need to know how the mosaic censoring works to finish our degradation model. Contrary to what I thought, it is not done by averaging all pixels inside each tile. Instead it is done by picking a single pixel (which I assume is the center one) and use that for the color of the entire tile.
The approach is therefore really simple. We create a high-resolution (HR) grid, and for each frame we plot the tile color into the HR grid into the pixel located at the center of the tile. When we have done that for all the frames, we end up with the following:

There are a lot of black pixels, these are pixels which were not covered by any of our frames, pixels we know nothing about. There is no way of extracting these pixels from our source images, so we need to fake them by using interpolation. I used the “Inpaint – diffusion” filter in G’MIC and the result isn’t too bad:

Those “missing pixels” fully controls how well the decensoring will work. If the viewpoint only pans in a vertical direction we will only have vertical strips of known pixels, thus we cannot increase the resolution in the horizontal dimension. If we have movement in both dimensions it tends to work pretty well as you can see in the overview image below:

Full image available below: (click to view full resolution)


May 23 2015

Extracting an image overlay

Category: Anime,Overmix,Programs,SoftwareSpiller @ 05:15

I’m getting close to having a Super-Resolution implementation, but I took a small detour using what I have learned to a slightly easier problem, extracting overlayed images. Here is an example:
Background merged with overlay
The character is looking into a glass cage and the reflection is added by adding the face as a semi-transparent layer. We also have the background image, i.e. the image without the overlay:
Background layer
A simple way of trying to extract the overlayed image is to subtract the background image, and indeed we can somewhat improve the situation:
Extracted overlay using Gimp
Another approach is to try estimate the overlayed image, which when overlayed on top on the background image will produce the merged image.
This is done starting with a semi-random estimate and then iteratively improve your estimation. (The initial estimate is just the merged image in this case.)
For each iteration we take our estimation and overlays it on the background. If our estimation is off it will obviously differ from our merged image, so we find the difference between the two and use that difference to improve our estimate. After doing enough iterations, we end up with the following:
Extracted overlay using estimation
While not perfect, this is quite a bit better. I still haven’t added regularization which stabilizes the image and thus could improve it, but I’m not quite sure how it affects the image in this context.

Super-Resolution works in a similar fashion, just instead of overlaying an image, it downscales a high-resolution estimate and compares it against the low-resolution images. It just so happens that I never implemented downscaling…


Mar 04 2015

Real Super-Resolution

Category: Overmix,Programs,SoftwareSpiller @ 01:47

I managed to find a working implementation of one of the papers I was interested in understanding, while looking at OpenCV. It is in Japanese, but you can find the code with some comments in English here: http://opencv.jp/opencv2-x-samples/usage_of_sparsemat_2_superresolution
A small warning if you try to run it, it is very memory intensive. With a 1600×1200 image it used 10GB of RAM on my system. It also crashes if your image’s dimensions are not a multiple of the resolution enhancement factor.

All tests are done with 16 low resolution images and with increasing the resolution 4 times. The image below is the result for the best case, where the images are positioned evenly. Left image is one of the 16 Low Resolution (LR) images, the right is the original, and with the middle being the Super Resolution result after 180 iterations:

SR in perfect case

There is a bit of ringing artifacts around the high-contrast edges, but notice how it manages to slightly bring out the lines in the eye, even though it looks completely flat in the LR image.

Below is the same, but with input images haven been degraded by noise and errors. While it does loose a little bit of detail, the results are still fairly good and with less noise than the input images.

SR with noisy input

The last test is with random sub-pixels displacements, instead of them being optimal.  The optimal is shown to the left as a comparison. It is clear that it loses it effectiveness as the image becomes more blocky.

SR with random alignment

My plan is to use this implementation as an aid to understand the parts of the article I don’t fully understand. I would like to try this out with DVD anime sources, but this method (or just this implementation) just wouldn’t work with 150+ images. You can wait on a slow algorithm to terminate, but memory is more of a hard limit. But this method allows to have separate blur/scaling matrices for each LR image, so you can probably improve it by keeping them equal.

Tags:


Feb 28 2015

cgCompress getting ready for release

Category: cgCompress,Programs,SoftwareSpiller @ 02:55

For about a year I have been using my tool to compress VN event graphics, and I have been very pleased with the results. The usefulness of the program greatly increased when I found games having event graphics with up to 200 small variations, instead of the just 2-5 I thought was the norm.

I have been putting the final touches to the program, with the two major additions being:

  • The final output is now validated against the input images, to make sure the compressor does not produce faulty output without your knowing. So far it have been very reliable.
  • The compressor can now also compress images containing transparency. The implementation didn’t take it into account and complicated adding it as an afterthought, but with a bit of rewriting it now works quite well.
    The importance of this is that it can now also be used for compressing character sprites.

The code is available at: cgCompress at GitHub

So why am I not making an release yet? Because I don’t think it will be very useful if it does not properly integrate with your desktop environment. So my next step is to create a decoder to the Windows Imaging Component (WIC) framework. This will make it supported in many Microsoft applications, including Windows Photo Viewer and File Explorer.

I have already been experimenting a bit with WIC, and while I can’t say I can make a high quality implementation, getting something to work shouldn’t be too hard.

Tags: , ,


Nov 18 2014

Animated stitches

Category: Overmix,Programs,SoftwareSpiller @ 00:35

Most stitches are mostly static with perhaps a little mouth movement. Some however do contain significant movement and Overmix has never intended to try to merge this into a single image in a sensible fashion.

This is because I have yet to see a program which can do this perfectly. While I have seen several making something which looks decent at first, they usually have several issues. Common issues are lines not properly connecting at places and straight lines ending up being curved.

For me, it need to be perfect. If I need to manually fix it up, or even redo it from scratch, not much is gained. The goal with Overmix was always to reach a level I would not be able to reach without its assists. Thus there is no reason to pursue a silver-bullet solution, especially if it gives worse results than doing it manually.

Instead the approach I have taken is to detect which images belongs to what movement. For example, if an arm is moving, we want to figure out which images are those where the arm has yet to start moving, the images where the arm has stopped moving, and all those in between. In other words, we end up with a group of images for each frame the animator drew.

These groups can be combined individually without any issues and the set of resulting images can be merged manually. However since we might have reduced 100 video frames to 10 animated frames, we can take advantage of the nice denoising and debanding properties of Overmix, which will improve the final quality of the stitch.

Cyclic movement

One interesting use-case which can be fully automated is cyclic movement, i.e. movement which ends the way it started, and continues to loop. This is often people doing repetitive motions such as waving goodbye.

Of course the real benefit is when the view port is moving, as it would be cumbersome to do manually. The following example was a slow pan-up over the course of 89 frames where the wind is rustling the character’s clothes and hair around, reduced to 22 frames:

animated stitch

Notice how the top part of some frames are missing, as the scene ended before the top part had been in view for each frame. The same was the case for the bottom, but since it contained no movement, any frame could fill in the missing information.

(The animation can be downloaded in FullHD resolution APNG here (98 MiB).)

Algorithm

The main difficulty is making a distinction between noise and movement. (Noise can both be compression artifacts, but also others such as TV logos, etc.) A few methods were tried, but the best and simplest one of those take advantage of the fact that most Japanese animation reduce the animation cost by using a lower frame-rate. This is typically 3 video frames for each animated frame, though it can be dynamic through the animation!

The idea is to compare the difference between the previous frame. Since there are usually 3 consecutive frames without animation, it will return a low difference. But as soon as it hits a frame which contains the next part of the animation, a high difference will appear, causing a spike to appear on the graph. Doing this for every frames gives a result like this:

Graph of frame differences

Using this, we can determine a noise threshold by drawing a line (shown in purple) which intersects as many blue lines as possible. While this is mostly tested on cyclic movement, it works surprisingly well.

The ever returning issue of sub-pixel alignment strikes back though. When the stitch contains movement in both directions, the sub-pixel misalignment can cause the difference to become large enough to cause issues. This can easily be avoided by simply using sub-pixel alignment, but as of now this is quite a bit slower in Overmix.

Once the threshold has been determined the images are separated into groups based on that threshold. If the difference between the last image in a group and the next image is below the threshold, it is added to that group. If it could not be added to any group, a new group containing that image will be created. This is done for all the images.

Further work

Notice the file size of the APNG image is nearly 100 MB. This is because each of the 22 images is rendered independently of each other and thus results in 22 completely different images. But the background is the same for each and every frame, so that means we are not taking advantage of the information about the background found in the other frames. Thus, by detecting which parts in the frames are consistent and which differs when rendering, we can both improve quality and reduce file size.

Aligning the resulting frames can be tricky when there is a lot of movement in a cyclic animation, because the images only have a little resemblance to each other. Even when this does work, sub-pixel alignment+rendering is more important than usual since otherwise the ±0.5 error will show up as a shaky animation. I have an idea to how to solve the alignment issue, but my math knowledge is currently too lacking in order to actually implement it.

Tags:


Mar 16 2014

Fullscreen canvas, not as easy as it appears

Category: Software,WebdevelopmentSpiller @ 14:25

This is a rant about the erroneous information you will get when you try to find out how to implement something, you have been warned. See the end for the solution that was good enough for me.

I have been working on a WebGL based LDraw viewer, so I can embed 3D views of my Lego models on this blog (Project at Github: WebLDraw). Obviously I want to droll at them in glorious fullscreen, and with the HTML5 fullscren API it should be fairly simple. You just have to call ‘requestFullscreen()‘ on you element.

Well almost. In Firefox it applies 100% width and hight automatically, but not in Chrome, resulting in a small canvas on a large black background filling your entire screen. Awesome, uses your screen estate  as efficiently as Metro apps in Windows 8!

Applying a bit of CSS to do it manually should be fairly easy, but that just doesn’t work for a canvas, as it is rendered at a specific resolution, and the CSS style just upscales it. Googling for ‘canvas resize fullscreen’ will give you something like this:

function on_fullscreen_change() {
   canvas.width = window.innerWidth
   canvs.height = window.innerHeight
}
document.addEventListener( 'fullscreenchange', on_fullscreen_change() );

Great, now my canvas is bigger as soon as I enter fullscreen. And not only that, it is just as big as soon I exit fullscreen, because NO, do not exit the glorious fullscreen environment, it is perfect. So a bit more googling and I ended up using ‘document.fullscreenEnabled‘, which I quickly found out just told me if fullscreen was supported. The correct way was to do:

if( canvas == document.fullscreenElement )

Now I can finally enter and exit fullscreen properly. Except that my canvas did not have the correct size, ‘window.innerWidth’ does not give the correct width. But google have all the answers, you just need to do any of:

  • document.width
  • document.body.clientWidth
  • canvas.offsetWidth
  • canvas.getBoundingClientRect().width
  • canvas.style.width = window.innerWidth + “px” (CSS style, fancy)
  • screen.availWidth

None of them gives the correct result however. People don’t seem to notice because they stretch it with ‘width: 100%’ anyway. ‘screen.availWidth‘ exited me though, it actually returned 1920, the width of my screen. Except that ‘screen.availHeight‘ returned 1160 because YES, I do have a taskbar in my desktop environment, and NO, I don’t want to know it is 40px high when it is hidden anyway because I’m in fullscreen mode…

I really wonder why we need to differentiate between ‘screen.availWidth’ and ‘screen.width‘ which gives the screen’s full width in web development. Anyway, that was the final piece in the puzzle and my Dart implementation ended up looking like this: (Most of it maps pretty closely to a JavaScript implementation.)

original_width = canvas.width;
original_height = canvas.height;
canvas.onFullscreenChange.listen( (t){
  if( canvas == document.fullscreenElement ){
    canvas.width = window.screen.width;
    canvas.height = window.screen.height;
  }
  else{
    canvas.width = original_width;
    canvas.height = original_height;
  }
} );

Some of the confusion about the fullscreen API is hard to avoid, because many articles only applies to the old proprietary APIs. But 7 ways to get the monitor width, none of them which are correct? I just can’t even guess how it could end up that bad…

Tags:


Feb 10 2014

Compressing VN CGs

Category: Anime,cgCompress,Programs,SoftwareSpiller @ 06:48

I have a lot of images on my computer, random fanart from the web, screenshots of movies I have seen, etc. I recently saw that one of my image folders was 80 GB big, so it is no wonder I care much about image compression.

I was looking through a visual novel CG collection when I thought: shouldn’t this be able to compress well? After all, VN CGs tend to have a lot of similar images with minor modifications like different facial expressions. So I did a quick test, how well does it compress using different lossless compression algorithms:

Chart showing compression ratios

As expected, PNG is quite a bit better than simply zipping BMP images, and WebP fares even better. However what is this, compressing BMP images with 7z literally kills the competition!

The giant gap from ZIP to 7z does not come from the fact that LZMA is superior to Deflate, but because ZIP only allows files to be compressed individually while 7z can treat all files as one big blob of data. This is also why a general purpose compression algorithm can beat the ones optimized for images, as PNG and WebP also compresses images individually.

Note to the comparison: Usually CG collections have an average of 2-3 versions of each image, here we checked on an extreme case with 13 versions. This obviously exaggerates the results, but the trend still stands.

Doing it better

BMP is the superior solution? There is no way I can accept that, we need to do something about that!

If you have ever worked with GIF animations you properly know that you can reduce the size if you only change the differences between each frame. That is exactly what we want to do, but to use PNG and WebP to compress that difference. The problem is that we need to store the differences and information on how those should interact to recreate all the images, and there isn’t a good file format to do that.

How to get from one CG to another using the difference

So I have created a format based on OpenRaster, which is a layered image format to compete with PSD (Photoshop) and XCF (GIMP). I wanted to use it without modifications, but having multiple images in one file, while planned, appears to be far into the future. (I want it now!) It is basically a ZIP file which contains ordinary image files and a XML document describing layers, blend modes, etc.

Next part is automatically creating such a file from a series of images. For this I have written cgCompress (Github page) and while there is still a lot of work to be done, it has proven that we can do it better. Fundamentally this is done by creating all the differences and then with an greedy algorithm, select the ones which will add the least to the total file size. This continues frame by frame until we have recreated all the original images. I have also worked with a optimal solver, but I have not been able to get it to work with more that 3-5 images (because of time complexity).

Using the greedy algorithm I managed to reduce the file size 25.6% compared 7z compressing BMP images: (Lossless WebP used internally)

Comparision of compressed BMP and cgCompress

This is a compression rate of a whooping 88.7%! Of course, this is only because we are dealing with 13 very similar images. 67.2% of the file size is the start image and without a better image compression algorithm, we can do very little to improve that. That means the 12 remaining images use each 2.7% each (1/13 is 7.7%), not much to work with but I believe I can still make improvements.

This is just one case though, while uncommon, some images still need further optimization to get near-perfect results. I have tried compressing an entire CG collection of 154 images and my results where as following:

Chart showing compression ratios for an entire CG collection

Compared to 7z compressed BMP, there was an improvement of 24.0% and compared to WebP it is 61.1%. On average, the set contained 3.92 variations per image; cgCompress manages to do 2.57 as many images compared to ordinary WebP. The difference between those two numbers is the overhead cgCompress requires to recreate all 3.92 variations per image and it depends on how different the variations are. While I don’t know how low it can get, I do believe there is room for improvement here.

I included lossy WebP here as well (done at quality 95) to give a sense of difference between lossless and lossy compression. cgCompress definitively closes the gap, but if you don’t care about your images lossy WebP is still the way to go. (It should be possible to use lossy compression together with the ideas used in cgCompress though.)

Conclusion

cgCompress can significantly reduce the space needed to store visual novel CG collections. While only a moderate improvement of ~25% over 7z compressed BMP, compressed archives only works well for archiving or transferring over networks. cgCompress, as based on OpenRaster, has proper thumbnailing, viewer support and potentially meta-data. With PNG and WebP being the direct contenders, cgCompress provides a big leap in compression ratio.

On a personal note, going from concept to something I can use in 8 days is quite an achievement for me. While the cgCompress code isn’t too great, I’m still quite happy on how this turned out.

Tags: , , ,


Jan 25 2014

A year of Overmixing

Category: Overmix,Programs,SoftwareSpiller @ 01:54

It is now about a year ago since I started this project, and not in my wildest dreams I would have imagined how much work I have put into this by now. So here is a quick overview of my progress.

(Overmix is now located at github: https://github.com/spillerrec/Overmix, download here: https://github.com/spillerrec/Overmix/releases )

Automatic aligning

While zooming and rotating is not supported (yet at least), horizontal and vertical movement is rather reliably detected and can be done to sub-pixel precision. Sub-pixel precision is rather slow and memory intensive though, as it works by image upscaling. Still needs some work when aligning images with transparent regions though, and I also believe it can be made quite a bit faster.

De-noising

Noise is removed by averaging all frames and works very well, even on blue-ray rips. However since I have not been able to render while taking sub-pixel alignment into account, it blurs the image ever so slightly.

10-bit raw input and de-telecine

I have made a frame dumper to grab the raw YUV data from the video streams, to ensure no quality loss happen between video file and Overmix. This actually showed that VLC outputs in pretty low quality, and appears to have some trouble with Hi10p content and color spaces. Most noticeable however is that VLC uses nearest-neighbor for chroma-upsampling, which especially looks bad with reds. Having access directly to the chroma channels also opens up for the possibility of doing chroma-restoration using super resolution.

I have made several tools for my custom “dump” format, including the video dumper, Windows shell thumbnailer (vista/7/8), QT5 image plugin and a compressor to reduce the filesize. They can be found in their separate repository here: https://github.com/spillerrec/dump-tools

De-telecine have also been added, as it is necessary in order to work with the interlaced MPEG2 transport stream anime usually is broadcasted in.

Deconvolution

Using deconvolution I have found that I’m able to make the resulting images much sharper, at the expense of a little bit of noise. It appears that anime is significantly blurred, perhaps because of the way it have been rendered, or perhaps intentionally. More on this later…

TV-logo detection and removal

It works decently as shown in a recent post and can also remove other static content such as credits. I have tried to do it fully automatically, but it doesn’t work well enough compared to the manual process. Perhaps some Otsu thresholding combined with dilation will work?

I’m intending to try doing further work with this method to see if I can get it to work with moving items, and doing the reverse, separating the moving item. This is interesting as some panning scenes moves the background independently of the foreground (for faking depth of field).

Steam removal

Prof-of-concept showed that if the steam is moving, we can chose the parts from each frame which contains the least amount of steam. Thus it doesn’t remove it completely, but it can make a significant difference. The current implementation however deals with the colors incorrectly, which is fixable, but not something I care to do unless requested…

Animation

Some scenes contains a repeating animation, while doing vertical/horizontal movement. Especially H-titles seems have much of this, but can also be found as mouth movement and similar. While still in its early stages, I have successfully managed to separate the animation into its individual frames and stitch those, using a manual global threshold.

I doubt it would currently work with minor animations, such as changes with the mouth, noise would probably mess it up. So I’m considering investigating other way of calculating the difference, perhaps using the edge-detected image instead, or doing local differences.



Next Page »