March 14, 2009
Optimizing Flash Based Face Detection

Yesterday Seb-Lee Delisle pointed us via Twitter to this great example of a Flash based real-time face detection. It turns out that already more than half a year ago Ohtsuka Masakazu had been porting the face detection part of OpenCV to AS3 and added the source code to the Spark project - which is like a Actionscript candy box full of surprises.


So I had a look at the source code and found one feature which I wanted to change: face detection is based on so-called Haar Cascades, simplified one could say that this is a very long list of zones that get checked for certain features in an image. In the orginal version this is an xml file of almost 1 MB size. Because this is so big the xml file gets compressed into a zip file where it fortunately becomes just 100K. So in the orginal version the zip file has first to be loaded and then unpacked and parsed. My consideration was that swfs get zipped anyway, so why not turn the whole xml file directly into a class?

So that's what I did. My adapted version does not need to load any external files anymore (which was my goal) and the whole swf becomes even about 50K smaller than the original version. I also did a few other optimizations like replacing Arrays with linked lists and reading a from ByteArray instead of using getPixel(). The downside is that since the Haar data is being hardcoded into the class now you will not be able to use this class to track anything else but faces (which in theory you could). But I expect that most people will not even know how to prepare a different Haar cascade recognition set - for example I'm one of those.

What's interesting is that this code will even run in Flash Player 9, since there is no PixelBender or Alchemy and neither the new Vector data type being used. Obviously by targeting Flash Player 10 one could even add a few more optimizations, but that's something for another weekend.

Example 1: Ohtsuka Masakazu's modified sample file

Example 2: Real-time webcam face tracking

Here's the modified sourcecode which includes two example files: Marilena_mod10.zip

Posted at March 14, 2009 06:56 PM | Further reading
Comments

works great for me, even in a low light environment (and even with headphones on :) )

Posted by: lars on March 14, 2009 07:19 PM

Great! Spark project is the ActionScript neverland.
I've been trying to implement real-time face detection in Flash, combining Haar Cascades and skin segmentation techniques. The results are pretty good - my problem is still the lack of knowledge of how to create a proper trainig data and extract information from that. Computer vision is the new/old thing for Flash! :)

Posted by: Gabriel Laet on March 14, 2009 07:52 PM

Yeah seems like this spark project does contain a lot of interesting stuff. It's pity it does not have english versions for all stuff. Tough Goole Translate helps a little.

This for example http://www.libspark.org/wiki/yossy/swfassist this got my attention. Seems to be SWF compiler/decompiler written on AS3 :) Something I am playing with as it promises number of cool possibilities :) Will need to check code of this project later.

Posted by: wonderwhy-er on March 14, 2009 10:39 PM

WOW! really nice! Thanks a lot!

Maybe... could you explain why reading from a byte arrays is faster than the normal getPixel() function?

Posted by: Flo on March 15, 2009 02:02 PM

In this special case reading the pixels from a ByteArray seems to be slightly faster. The reason is that astonishingly only the red channel is used so whilst the old version looked something like this:

pix = bitmapData.getPixel(x-1,y-1) >> 16;

the changed version looks like this

pix = byteArray[index+=4];

Where "index+=4" means that we are advancing the pointer by 4 bytes to the next red pixel.

It's not a huge speedup anyway and I would never blindly trust on it that a ByteArray access is faster than getPixel, but it is always worth a try when optimizing bitmap related code.

Posted by: Mario Klingemann on March 15, 2009 02:22 PM

Very cool. Have you thought about trying to add any feature detection to this (e.g., trying to find the eyes/nose/mouth)?

Seems like you could do something rough by estimating space within the rectangle, but without a general idea of the angle of the face, this could be difficult. Perhaps it would be possible to rerun the detection algorithm on the subspace searching for eye shapes or something similar?

Posted by: Noel on March 15, 2009 09:09 PM

I guess there are other Haar cascades for further analysis of a detected face area, like eyes, mouth or nose.

But as I mentioned in the post above I have hardcoded the specific xml data that was contained in the face.zip file into an AS class. As far as I could tell this specific Haar cascade looks only for non-tilted faces. If you intend to use other cascades you'd have to go back to the original Marilena classes.

One could also try some simpler methods to for example find the eyes, like searching for local minima or circular dots.

Posted by: Mario Klingemann on March 15, 2009 09:27 PM

Hi

@Mario: thanks a lot for your ByteArray explanation!

@Noel: few years ago a student from the MIT realized a attention meter ( http://web.media.mit.edu/~jackylee/attention.htm ). He analyzed the Face tracking data via opencv and stores the data in a txt-File. Than he read the data via flash an show the data in a drawing smiley. Maybe his source code could be a help...

Posted by: Flo on March 16, 2009 09:58 AM

this is great, i was following a blog a while back talking about porting OpenCV but didn't know it was done yet. thanks for the update

Posted by: Marc Pelland on March 16, 2009 03:44 PM

hi,

here you have cascade xml for eyes, mouth, nose, ...

http://gias720.dis.ulpgc.es/Gias/modesto.html

direct link : ftp://mozart.dis.ulpgc.es/pub/Software/HaarClassifiers/FaceFeaturesDetectors.zip

Posted by: francoist on March 16, 2009 10:49 PM

Everybody who is interested in face tracking with flash should also check out Steve Shipman's project "deface":
http://cosmodro.me/blog/2009/jan/11/deface-flash-10/

Posted by: Mario Klingemann on March 16, 2009 10:55 PM

Hi Mario,

I guess you haven't written the HaarClass by hand ? :p

Do you have something that does that ? would it be possible for you to share it ? so we could test several Haars ?

thanks

Posted by: francoist on March 17, 2009 11:03 AM

What I did is to add several custom toString() methods to the original classes which returned an actionscript representaion of the class including its variables and allowed me to do a haarCascade.toString() which returned the "baked" class.

The thing is that I then did further optimizations along the way during which i deleted those methods again (stupid, I know).

I think the best I could do would be to write an extra parser which allows to load one of the standard xml and turn it into a pre-baked class.

One thing I removed was the "tilt" property since it was not being used in that cascade, and AFAIK tilt is also not being handled by Marilena. So in case other cascade use tilt one would have to write that part of the code first.

Posted by: Mario Klingemann on March 17, 2009 11:14 AM

just a thought, wouldn't be interesting to replace all the bitmapdata stuff by a shader (pb) ?
I'm a bit newbie on that, but looking at your cameraBitmap class, you are doing some color transform, bitmap matrix. I believe that all those operations are heavy for flash (I always found the bitmapdata slow anyway).
if so maybe the getpixel (pretty heavy too) could be done inside the shader as well.

would that make it faster ? or is it not worth it to try ?


thanks

Posted by: francoist on March 18, 2009 06:53 PM

Well, many people have misconceptions about thinking that by just using PixelBender everything becomes faster and the the native bitmapData methods are slow by nature. That is simply not the case. There are very good uses for PixelBender but there are very good reasons why you might want to keep using the old school methods, too.

First of all - the CameraBitmap class is just a helper class that I am always using when I'm using a webcam - it is not actually part of the face tracking algorithm and in the example the colorMatrix is not even being used.

One calculation heavy part inside the face tracking algorithm is calculating averages over different areas of the image in different scales. But the actual method being used is in my eyes already pretty well optimized, the only way to speed it up would be to find an entirely different method of calculating the averages. But especially when it comes to calculating averages Pixel Bender is the worst tool of choice, one reason being that there are no loops in the Flash version of it, the other reason that since PB works on a "run the whole code on each single pixel again" base it is not able to use any classic optimizations that you would normally use, like caching already read pixels for example.

So to answer your question - no I don't think that PB would make it faster. Using Alchemy probably would. See Ralf Hauwerts latest post: http://www.unitzeroone.com/blog/2009/03/18/flash-10-massive-amounts-of-3d-particles-with-alchemy-source-included/

Posted by: Mario Klingemann on March 18, 2009 07:20 PM

oki, thanks a lot for your explanation, really informative!

Posted by: francoist on March 18, 2009 08:32 PM

Mario, can you please knock all that head spinning code down to 140 characters or less? ;)


Posted by: Michael Kaufman on March 21, 2009 03:37 AM

hey ya'll

i've used haarscascades in opencv (c & python) for years...

rad that haars ported to As3. wouldn't have expeceted it to be fast enough!!

for those wanting to program their own haars - in essence it's easy: just need a LOT of images (thousands). you feed what you want to recognize, then also what you do NOT want it to recognize... and you get an XML to load.

that's it! but configuring opencv & choosing good sample images = the trick.

here's the link to opencv. http://opencv.willowgarage.com/wiki/

Posted by: nano on March 23, 2009 12:33 AM

Does anyone know why only haarcascade_frontalface_alt works, but the other xml files from openCV, e.g. haarcascade_frontalface_default fails to detect face?

Thank you very much!

Posted by: yith on April 17, 2009 07:32 AM

@Mario,

The optimizations are great! There's a big difference between the original and your version.

FYI to change the cascades, you need to find a OpenCV-format cascade file, zip it, and use the fzip python script to add a adler32 checksum (required for fzip). Then you just make the path in the facedetector class point to the right file.

@Yith,

I'm having the same problem! All the OpenCV cascades other than frontalface_alt seem to give false positives.

@francoist

As nano mentioned, the cascades are generated from source images. Put this link together to a tutorial:

http://note.sonots.com/SciSoftware/haartrai
ning.html
(sorry, theres a filter against the work train/ing)

Posted by: Brian Liu on April 19, 2009 03:10 PM

Thanks Liu!

So, do you have any idea on solving the problem? (I found DeFace can use the other OpenCV xml files.)

Posted by: yith on April 20, 2009 06:15 AM

yith,

shoot me an email and maybe we figure this out together.

Posted by: Brian Liu on April 21, 2009 12:46 AM

I modified the classes to load the raw xml cascades and the frontalface_alt is the only one working. Some deeper review to the ported class needs to be done, I think.

Posted by: Julian on April 23, 2009 01:37 AM

Ah - I think I know why it is not working: the originally ported AS class ignores the first two factors of each check zone which AFAIK are responsible for the rotation of the recognition rectangle. In my optimized class therefore I completely removed those factors (which are always 0 in the default cascades).

So in order to fix this one has to go back to the original java or C code, check the class that deals with the transformation of the pixels and port that part over to AS.

Posted by: Mario Klingemann on April 23, 2009 10:38 AM

SO FAST! COOOL!

OpenCV C++ example work slower !!!

Excellent worky!

Posted by: peko on April 29, 2009 09:52 PM

ha! U are cool!
It work quickly
I like u

Posted by: anthonyyuan on May 1, 2009 10:14 AM

Very good work!!!

@Brian: Did you finally find solutions to use the other cascades? I used the marilena original sources and createt my own zipfile using the python script, but it didn't work... help, please! :)

Posted by: David on May 3, 2009 07:00 AM

If anyone is interested, I've set up a project page for a full OpenCV port. (MariLena is only a partial port of a single class) The gcode hosted project is here: http://code.google.com/p/as-opencv/

Posted by: Brian Liu on May 3, 2009 01:31 PM

any other cascades in as? maybe tutorial to translate?

Posted by: sogetsu on May 18, 2009 02:34 AM

Hey,
Just presented a paper at a computer vision conference about this real-time flash face detection project. We made some optimizations to the original viola-jones object detection algorithm. The result is a pretty fast face detector. Were working on turning it into an easy to use API. Check out the site http://www.flashfacedetection.com

Posted by: Theo Ephraim on May 28, 2009 06:52 PM

just a thought, how would detection of angled faces be possible?

Posted by: vin on July 26, 2009 08:24 PM

That's almost what I am searching! Can this be used for gesture recognition with a little effort?

Posted by: Eric on July 27, 2009 10:06 AM

cool;

Posted by: shop on July 29, 2009 07:56 AM

wow!
it works great on my face, all the faces from the magazines i've found, and even on a few seconds my gf's golden retriever's face (he really has human expressions).

i'm only not sure does it work when i close my eyes :)

Posted by: 1GR3 on August 5, 2009 01:36 PM

Is it possible to find the position of the eyes ?

Posted by: jey on September 17, 2009 12:57 AM

Nice! Just got it working, not sure if this is a fp10 update issue, but I had to add the following code (taken from the original squidder example) into the camera init code to get it working:

was:
public function CameraBitmap( .. )
{
..
__cam = Camera.getCamera();
}

changed to:
public function CameraBitmap( .. )
{
..
var index:int = 0;
for ( var i:int = 0 ;i < Camera.names.length; i++ )
{

if ( Camera.names[ i ] == "USB Video Class Video" )
{
index = i;
}
}
__cam = Camera.getCamera(String(index));
}


Looks like some weird change has happened in the web cam api that now requires you to specifically name the camera.

Posted by: gavin on September 29, 2009 08:32 AM
Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?



Thank you!

Most Visited Entries
Sketches, Works & Source Code
Lectures
Contact
Backlog
In Love with
Powered by
Movable Type 2.661

© Copyright Mario Klingemann

Syndicate this site:
RSS 1.0 - RSS 2.0

Quasimondo @ flickr
Quasimondo @ LinkedIn
Quasimondo @ Twitter
Quasimondo @ Facebook
Quasimondo @ MySpace
Quasimondo is a Bright
Citizen of the TRansnational Republic
My other blog in german
Impressum


My family name is written Klingemann,
not Klingelmann, Klingeman, Klingaman, Kingemann,
Kindermann, Killingaman, Klingman, Klingmann, Klingonman
Klingemman, Cleangerman, Klingerman or Kleangerman

profile for Quasimondo at Stack Overflow, Q&A for professional and enthusiast programmers