In late 2011 Qualcomm, one the world leaders in ARM processors, released a series of tools dedicated to computer vision and augmented reality, optimized for ARM architectures. Two of them immediately caught our attention: FastCV, a low-level computer vision SDK that is not without recalling us of the famous OpenCV, and Vuforia, a high-level, out-of-the-box augmented reality SDK. We installed their demo application, and were quite impressed by the quality of their visual tracking. So, a few weeks ago, we decided to try making something out of these SDKs and started working on a mash-up with our own SDK, that would combine the advantages of both solutions. Moodstocks SDK would provide a robust and real-time image recognition (thanks to its smart sync feature), and we would use one of Qualcomm’s SDKs to add visual tracking of the detected object.
Here is how it would work:
- Moodstocks SDK processes the camera flow in real-time, and at some point matches the current frame with an object that belongs to its database. It passes this information to one of Qualcomm’s SDKs. This step has all the advantages of Moodstocks SDK, as it uses our synchronization service: it makes applications lighter to download than any application using a bundled database, and allows changes in the database without requiring an application update each time a change, however minor it may be, occurs in the database.
- Given that the image has been properly recognized, Qualcomm’s SDK locates this object on the current frame and starts tracking it on the following frames, with the impressive fluidity that we could observe in the demo application, to add to the user experience.
We decided to begin with Vuforia SDK, as it seemed easier of use. Its strength for most developers is that it’s quite easy to use: you add Vuforia SDK to your application, upload all the images you want to recognize and track (via a convenient web interface), and in return you download one single qcar-resource.dat file containing the data necessary to perform these operations on your application. However, this full packaging was the source of several problems in our case:
- the first is that you cannot separate the image recognition among a database from the location and tracking part. A possible trick to separate these two functions would have been to create several of these .dat files, one for each image we wanted to track: as image recognition among a database of only one image is a trivial task, Vuforia SDK would matter of factly do no more that tracking. This could have worked, if Vuforia had not supported a single .dat file that must be named qcar-resource.dat… and juggling with some on the fly file renaming would not have been a really practical solution.
- the second one is that these .dat files are quite heavy, and we wanted to keep the advantages of our synchronization. Using one (or several) of these files would require to manage a second, heavier database than the one used for our image recognition.
In the end, Vuforia appeared to be too high-level and not flexible enough for our needs… all these tricks were far too dirty for us to implement them, so we decided to try our chance with the lower-level FastCV SDK.
We began by searching inspiration in the sample codes and documentation of the SDK. That was quickly done: the two sample codes illustrate hardly a tenth of the functions available in the library, and the documentation of each functions holds in two lines tops. But well, ok, this SDK is still quite young and needs to mature for some time. We went on anyway. After a bit of fumbling through the documentation, FastCV SDK seemed to contain everything we needed to implement the tracking part in itself without too much pain: a ready-to-use tracking function, and an efficient features detector to feed it. Of course, it would require a bit of wrapping to work, but it had the advantage of doing what we needed it to do, nothing more, nothing less. So we had what we needed to track features… Cool. But we couldn’t just track any features on the frame: we first had to locate the object recognized by Moodstocks SDK and extract features to track only from this object. And that’s when things got messy.
We tested several different tactics using the different tools included in FastCV. Our first idea implied using a quite classical method in image matching: pre-process the image to track to extract interesting features, describe them, and search for these features in the camera frames at runtime. As the task of finding a reference image in a camera frame when we do know it’s there (from our own SDK) is quite simple, we hoped to do this using only a few features, which would have had the advantage of allowing to package the lightweight necessary information (aka the descriptors) in a smart way by encoding this information directly within each image identifier. That’s for the theory… but in practice, it appeared that FastCV proposed only one, never-heard-of descriptor. From its concise documentation, it seemed to be some kind of custom, lightweight SURF-like descriptor. We tested it, and realized that it would never be robust and discriminant enough to allow image detection with the small number of reference features we could afford… and we found ourselves once more back to the beginning of our research. We tried a few other methods, before realizing that as we went on, our ideas were becoming more and more heavy and impractical… we just couldn’t find a simple and smart enough way to go through this tracking initialization step using FastCV, and creating a 1000-line-long, barely understandable piece of code would not have served the purpose of illustrating what could be done using both FastCV and our SDK.
So after a few days of work, we simply decided to give up -at least for now- on this project. Let it not be said that we criticize Qualcomm’s SDKs: I would personally encourage developers to use them. Only, none of them was fitted to the specific use we wanted. Vuforia is a perfect SDK for any developer who wants to implement painless augmented reality and doesn’t specifically care about the size or flexibility of their apps. It won’t fail to amaze their users and make their eyes shine. Considering FastCV, I would encourage anyone who work in mobile computer vision to use it as an optimized, efficient toolbox from which to pick elementary functions that will boost your applications. As for us, we’ll keep an eye on the evolution of these two young SDKs and won’t miss the opportunity to resume this project if the opportunity arises.