1. Software for computer vision
Feature Extraction for LIBPMK
November 2008
This package provides a framework for detecting interest points and extracting descriptors, mostly for images, but it is extendable to just about anything (such as videos or text). In the image domain, it provides further capabilities for automatically detecting image types and converting between formats so that particular detector/extractor implementations need not worry about low-level details. This library is written as an extension to LIBPMK and fully integrates with it.
»
Project page and
source code
Object Localization with an Implicit Shape Model
MIT Vision Interfaces group; Spring 2008
This is an implementation of the Leibe et al. implicit shape model, which is an algorithm for detecting and localizing instances of an object in a large image. We used this as a baseline in a recent NIPS submission (details coming shortly if it gets in) to detect instances of books in a cluttered environment. This library is written as an extension of LIBPMK.
»
Project page and
source code
Object Recognition with the Spatial PMK
MIT Vision Interfaces group; Fall 2007
This is an implementation of the Lazebnik et al. spatial pyramid match, which is just a uniform PMK on spatial features that quantized by appearance. For image databases where object classes have some spatial consistency or if the object is well-localized, this method was shown to perform well. This library is written as an extension of LIBPMK.
»
Project page and
source code
Adaptive Vocabulary Forests for Recognition and Retrieval
MIT Vision Interfaces group; ICCV 2007
This is an implementation of adaptive vocabulary forests, which is some joint research I did and submitted to ICCV 2007. It is based on libpmk, but also implements the vocabulary trees on top of it, as well as some toy demos. It includes the source code we used for our ICCV demo, in which photos were taken from my camera phone and uploaded to a laptop, which had a tree server running in the background, so that any clients could upload photos and perform searches. The package contains over 15,000 lines of code!
»
Project page,
source code, and
paper (PDF)
LIBPMK: A Pyramid Match Toolkit
Fall 2006-2007, MIT Vision Interface Lab
LIBPMK provides a fast C++ implementation of the Pyramid Match algorithm, as well as a flexible framework with which users can easily and quickly run experiments. The library includes a lot of built-in functionality made from scratch, like k-means and hierarchical clustering, dealing with data sets too large to fit in memory, creating multi-resolution histograms, and performing fast pyramid matches. The experimental framework wraps around LIBSVM to provide an easy way to train and test SVMs.
»
Project page (includes documentation and C++ source code)
Optical Flow: Motion Field and Focus of Expansion
Fall 2005, MIT 6.866 Machine Vision project
This project is an implementation of an iterative method for computing optical flow. Its input is a movie file in any format playable by mplayer (most things should work). The program will overlay the estimated motion field on a grayscale version of the original video. In the case of translational motion along the z-axis (the camera zooming in and out), you can also optionally have it estimate the focus of expansion and draw a dot there. One of my secondary goals was to make memory usage efficient so it scales well with the length of the input movie.
»
Source code
2. Other stuff
A Remote Slide Advancer for your Nokia S60 Phone
Just for fun
Since both my mobile phone and my laptop have Bluetooth,
I made a little clicker that lets me advance slides using the arrow keys on my cell phone. I've used this while giving talks rather than buying a separate slide advancer. From the back, my phone (Nokia N95) actually looks like a camera, so it occasionally raises eyebrows when people see me pressing buttons on it. You can also program any of the keypad buttons to execute any arbitrary command; I once used it to trigger some movies being played in the middle of a presentation.
»
Source code
Real Life Pong
Spring 2007, MIT 6.883
This is an implementation of the game Pong where the paddles are controlled by moving in the physical world. You would run around with a GPS device and a cell phone. The game would display
on the cell phone, and as you moved around, your paddle would move around. The GPS device and phone both communicate using Bluetooth, and you can communicate with a game server over Bluetooth or Wi-Fi. When indoors, the GPS device can be replaced with a cricket. The game can also be displayed on a projector screen mounted on the ceiling pointed at the floor, so you can actually move around in the game world.
»
Source code
Motion Description Language Interpreter
Spring 2002, MCS6, Stuyvesant High School
This was my final project for MCS6, Computer Graphics. We defined a simple motion description language (MDL) and wrote a script parser which takes a bunch of commands and will generate pretty images and animations. It can generate a number of 3D shapes (box, sphere, torus) or arbitrary polygons. It implements the Gouraud shader for the lighting effects. A number of sample scripts are included in the tarball. Our group also made a web-based interface to the MDL renderer and a ray tracer.
»
Source code and
demos
Liar's Poker for the TI-89
Fall 2002, Stuyvesant High School
We liked playing Liar's Poker instead of paying attention to Dr. Majewski in Physics C.. so when he banned cards from the classroom, I made a TI-89 version and we would pass the calculator around, so it would look like we were doing work when we were instead playing 6-player games of Liar's Poker.
»
Source code