October 20, 2009
Flash Median Filter

As you might have read in Eugene's blog, he, Nicoptere and me were having a little competition who could come up with the fastest implementation of a median filter. Median filters have the unpleasant property that they can not be as easily optimized as for example blur filters. The median filter is also one of those types of filters that are not very well suited for an implementation in Pixel Bender due to the way Pixel Bender processes images (except maybe for a trivial 3x3 median).

What a median filter does is to take all the pixels within a certain radius around a pixel, sort them numerically ( channel by channel) and then return the color that ends up in the middle of the sorted list. You can imagine that doing this for every pixel in an image is quite a processing effort. Fortunately people much smarter than me have found ways to speed this up considerably. The paper I had stumbled across sometimes in the past was this one by Simon Perreault and Patrick Hebert. And whilst the authors even provided the C source for this filter I did not bother to look at it since I thought that things that are fast in C are not necessarily fast in Flash and decided to roll my own version of the algorithm based on maintaining the histograms and the kernel in the form of sorted linked lists. I figured that this would speed up the process of joining and separating the histograms since I could use a kind of "zipper" technique to do that. To my astonishment the first tests were extremely slow - even though I tested the speed with a release version (testing this with the debug version is about 10 times slower).

The breakthrough came with a tip Ralph Hauwert gave me: whenever you create a linked list class in Actionscript make sure that you use the "final" keyword for it. Until then I had thought that this keyword acted more as a decorative element without real impact. Well, I'm happy that I was wrong since the speedup in this case was extraordinary.

When you compare my implementation with Eugene's note that a kernel size of 7x7 in his example is the equivalent of a radius of 3 in mine (3+3+1=7), 17x17 is a radius of 8 (8+8+1=17). As far as I can see my version is faster until a radius of 7 but after that his version surpasses mine. I have to study his code now to see how he has solved it.

Since the median filter still takes considerable rendering time I added a "asynchronous" switch which allows to render the effect in background without stalling the rest of the application. Of course this will slow down the whole rendering process.

Here is my Median example
sourcecode and zipped sources

Posted at October 20, 2009 01:05 PM | Further reading
Comments

Cool!
I've updated my post. We have to merge our results in one ultimate solution! ;)

Posted by: Eugene on October 20, 2009 03:00 PM

Yes, I had a look at your code now and I see that you do the histogram quite different than me. It baffles me that your version is that much faster with the big radii - especially since you maintain a 256 item histogram, whereas my list is just as long as the number of different colors. But maybe that's the point - with big radii pretty much all of the different values between 0 and 255 are being used anyway and then my additional conditionals are biting back.

I'll maybe try to implement the "coarse" histogram technique described in the paper - I think that might give me a few more milliseconds.

Posted by: Mario Klingemann on October 20, 2009 03:12 PM

while ( true )
- just a method to keep redoing something until you call break?

where are you actually using a linked list, and why do linked lists work so well? they don't make sense - yet.

Posted by: r on October 20, 2009 03:14 PM

The row histograms and the main kernel are linked lists, as well as the internal queue that controls when a pixel is being dropped from a row histogram.

Linked lists work so we in Flash since there is no lookup necessary to retrieve and object via an index (or worse - a hash) the "next" property IS already the next object.

In this special case I found the linked lists so well suited since (I think) they are rather effective when I want to join several of them together to form the kernel.

list A: 1-3-10-12-55
list B: 2-3-11-12-40
joined A+B: 1-2-3-10-11-12-40-55

The nice thing is that since I know that all lists are sorted I do not always have to go back to the beginning of the list if I combine them. And I do not have to use any sort() function afterwards.

Posted by: Mario Klingemann on October 20, 2009 03:27 PM

Yeah I thought about it. Since sometime you will need all the colors it is better to have them from the very beginning... That's the only difference in our versions i think. And this is why yours faster with small number of histogram colors.
That should be interesting to try the method from the paper...

Posted by: Eugene on October 20, 2009 03:28 PM

Did Ralph explain why the 'final' keyword improves performance? Has he blogged about it?

Thanks for the interesting article.

Alec

Posted by: Alec on October 20, 2009 03:58 PM

Ralph said that he will explain the why in a blog post soon.

Posted by: Mario Klingemann on October 20, 2009 04:02 PM

now if you could use that to create an animated transition - that would be pretty sweet ;)

Posted by: p on October 20, 2009 04:15 PM

The only way to use that in a transition would be to precalculate the frames. In general the median filter is usually being used for noise removal or for intermediate calculations in image processing. Of course you can also use it for painterly effects.

Posted by: Mario Klingemann on October 20, 2009 04:22 PM

In what sense are you using the linked lists? Especially how and when do you need to sort them.

You could also use an AVL- or RedBlack-Tree instead. Merging two trees can be optimized as well. But you have a lookup time of O(n log n) instead of O(n) in the linked list. If you would use a linked list with a backed vector for a random-access-set you have O(n log n) array operations.

The choice between AVL and RB trees depends on the trade-off between insertion/removal and retreival cost. But AVL has a gauranteed height of 1.44*log n compared to 2*log n for the RB tree.

Posted by: Joa Ebert on October 20, 2009 08:13 PM

As noted in the AVM2 performance paper from 2004ish, final is faster because not the whole prototype/trait chain needs to be considered.

Posted by: Joa Ebert on October 20, 2009 08:17 PM

very cool!

I've got a feeling this could be used for some sort of water on paper effect if it could be applied only on an area of the image and with eases as it approaches the edges of the region.

Posted by: li on October 21, 2009 01:41 AM

@joa: the way the median algorithm from this paper works is that it first maintains histograms of kernel height k of every pixel columns . Then it combines k column histograms together to form the median kernel from which it retrieves the median value. This median kernel moves from left to right through the image by removing the leftmost column and adding a new column to the right. Once it has reached the right edge the kernel is cleared. At this moment all the column histograms drop their topmost value and a new column value is added at the bottom. The kernel moves to the left edge and down one row and is being filled with radius + 1 column histograms, then the process repeats. I have tried first to do a zigzag movement through the bitmap in order to avoid the complete clearing of the kernel, but it turned out that the the process involved in dynamically removing and adding those pixels that are involved in the "down and reverse" process created a bigger overhead than the clearing and filling.

Linked lists are used as the column histograms and for the median kernel (which is a combined histogram). Each linked list item of a histogram has an index (which is the color value) and a count how often that color appears. When a new value is inserted in the histogram bigger values are passed through to the end of the list (I guess that is something like a bucket sort). When a value already exists in a list the counter is increased.

When I combine two lists I start at the beginning of the list but once a value has been found I can continue searching from the already found element since I know the to-be-inserted values will be bigger.

But thanks for your tips - I will check if I can improve the current method.

Posted by: Mario Klingemann on October 21, 2009 09:55 AM

Brilliant stuff. The image output is different between Eugene and your implementations (for instance, check out your test app at 50 radius and Eugene's at 100, look at the bottom left corner especially). I seem to prefer the look of yours; it's a bit more "painterly" -- the colors flow more smoothly into each other. I would think, given the nature of the algorithm, this would make yours a bit more accurate.

Posted by: Brett Walker on October 22, 2009 08:37 PM

Oh wow - well spotted! I really thought our results would be identical. And even though it might be interesting from an artistic standpoint I rather would like to be sure that the result is correct. I guess I have to rebuild the brute force method and do a difference between them to see which one is off.

Posted by: Mario Klingemann on October 22, 2009 08:51 PM

You should be able to speedup all these Vector accesses around 10 times by using haXe + flash.Memory API :D

check http://ncannasse.fr/blog/virtual_memory_api

Best !

Posted by: Nicolas on October 23, 2009 12:03 AM

...or using joa's TDSI ;-) - but even with 10 x faster ByteArray access I doubt that this will speed up the filter 10 x. The biggest bottleneck is the joining and separating of histograms and in my implementation they are not even using Vectors or ByteArrays.

Posted by: Mario Klingemann on October 23, 2009 10:24 AM

Have u tested it in fp 10.1? It works really odd in IE..

Posted by: Jloa on November 22, 2009 11:39 PM

I don't fully understand how the linked lists automatically sort themselves, and how you add them together.

I am not from a computer science background, would it be possible to get a laymen explanation on how the linked lists are working? I've built my own (brute force, and slow) implementation of a Median Shift filter and I am interested in how your implementation is working here.

Posted by: Aubrey Taylor on December 12, 2009 04:37 AM
Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?



Thank you!

Most Visited Entries
Sketches, Works & Source Code
Lectures
Contact
Backlog
In Love with
Powered by
Movable Type 2.661

© Copyright Mario Klingemann

Syndicate this site:
RSS 1.0 - RSS 2.0

Quasimondo @ flickr
Quasimondo @ LinkedIn
Quasimondo @ Twitter
Quasimondo @ Facebook
Quasimondo @ MySpace
Quasimondo is a Bright
Citizen of the TRansnational Republic
My other blog in german
Impressum


My family name is written Klingemann,
not Klingelmann, Klingeman, Klingaman, Kingemann,
Kindermann, Killingaman, Klingman, Klingmann, Klingonman
Klingemman, Cleangerman, Klingerman or Kleangerman

profile for Quasimondo at Stack Overflow, Q&A for professional and enthusiast programmers