Monday, March 9th, 2009...1:45 pm

Augmented Reality Primer

Jump to Comments

The Topps Baseball Card augmented reality story in the NY Times is getting a lot of buzz today so I thought I would share the output of thoughts a coworker, Trent Grover, and I came to after initiating some AR demos.

I’ve started hacking away at this AR stuff by evaluating a whole bunch of different technical strategies. After downloading and testing different open source demos and libraries, I believe I’ve decided on one.

AR breaks down nicely into 2 main tasks: tracking and 3D rendering. The options I came up with for these was:

Tracking

  • ARToolkit – The old open source standard. I’ve compiled the source, modified things a bit, figured out how to create and track my own custom glyphs, calibrated cameras, etc. It tracks pretty well, but there’s room for improvement (models sometimes flicker on and off when the tracker “loses” the glyph for a frame at a time here and there, problems with changing lighting conditions, etc).
  • ARToolkitPlus – An open source revision of ARToolkit that has various performance improvements, including nice optimizations for mobile devices. It also has an auto-thresholding capability that helps deal with changing/adverse lighting conditions.
  • FlarToolkit – A port of ARToolkit into Actionscript. When combined with papervision3D, this would allow the entire app to run from flash. This is a pretty cool find as it would allow an AR application to be easily distributed to the web. The big downside is that its tracking performance is really slow (running local on my kick @$$ machine is limited to 15-20 fps without much 3D thrown on top of it).
  • OpenCV – Open source computer vision library. It has quite a few different tracking methods that might be more suitable for natural feature tracking and it has nice easy to use camera control and image processing functionality. I’ve used this library a lot in the past for a slew of different image processing applications.

Rendering

  • OpenGL – The old standard. Perfectly functional and fast, but requires a lot of coding effort to produce a nice simple app (not many shortcuts).
  • Ogre3D – An open source graphics engine (not game engine). This would take some of the pain out of producing a similar app with OpenGL directly while still achieving the same performance, but I haven’t worked with it before.
  • Torque Game Engine – A cheap game engine marketed to indy developers. This is a pretty good game engine (for really cheap) that has much more built in functionality than Ogre. I’ve worked with it a little bit before, but it’s probably overkill for this sort of app.
  • Papervision3D – An open source 3D engine for flash (all actionscript). This is a pretty full featured rendering engine (not game engine) that has some really nice and unique features for interactivity. Since it’s newer, it also supports some more up to date and fully functional 3D model and animation importers. (Sandy and Away3D are two other open source 3D engines for flash with similar capabilities)
  • Processing – A java-based open source programming language created with designers and artists in mind. I’ve never used it before, but it seems to have similar capabilities to the flash based engines, while still providing access to improved performance of OpenGL.

The Decision
For the tracking chunk of this project, I’m going to use a combination of OpenCV and ARToolkitPlus. Wrapping all the image processing in OpenCV allows me to take advantage of my previous experience with it. I can perform all the glyph pose estimation with ARToolkitPlus functions, but can then use OpenCV to enhance the results (by either pre or post processing the video feed). This also makes things easier if we decide to try writing a different tracker later (for natural feature or color based tracking).

For rendering/interaction chunk of the project, I’ve decided to start with papervision3D because it will allow us to leverage our existing in house flash expertise (mostly for any 2D gui framework that we’ll want to tie into our 3D functionality). This has the added benefit of giving me one more excuse to get comfy with papervision so we can begin to fold it into other project work that comes down the pipe. If papervision’s performance doesn’t turn out to be up to snuff, I can still move to one of the other alternatives without losing much time.

To get all this working together, I’ll create a C++ server that performs the tracking and a Flash client that performs the rendering. A custom socket class allows Flash to pass a camera image to OpenCV/ARToolkit for processing, which then sends back the calculated 3D pose matrix to Flash so that papervision content can be thrown on top of it. Marijn Speelman did something similar with face tracking.

I think this direction gives us the best chance of success while greatly widening the value of the associated learning process.

–Trent Grover

One other library that Trent had yet to mess with is Bazar.  It is a computer vision library that allows for ‘Natural Feature Tracking’, or in layman terms, the ability for the computer to track photos rather than black and white vectors.

Sphere: Related Content

Leave a Reply