Researchers from MIT, Adobe, and Microsoft have demonstrated that they can recover intelligible speech from silent video footage of commonplace objects, such as a potato chip bag or potted plant, that were in the room during a conversation. No James Bond–like gadgets were involved. Rather, the researchers measured the minute vibrations captured in the high-speed video clips, which were taken of the objects from 15 feet away through soundproof glass.
Graduate student Abe Davis is first author on the paper that describes these findings. “When sound hits an object, it causes the object to vibrate. The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye,” he explains.
To reconstruct audio from video, the number of video frames captured per second must be higher than the frequency of the audio signal. For some of their experiments, the team used a high-speed camera that captured 2,000 to 6,000 frames per second (commercial high-speed cameras can capture upwards of 100,000 frames per second).
However, they were also able to exploit a kink in an ordinary digital camera and successfully recover audio. This “rolling shutter” quirk captures video shot slower than the frequency of sound, but the resulting audio was still clear enough to identify the person speaking.
Davis describes the findings as a “new kind of imaging.” “We’re recovering sounds from objects,” he says. “That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.” In the next phase of their research, the team will try to determine the structural properties of objects based on their visible responses to short bursts of sounds.
In addition to Davis, the paper was authored by Frédo Durand and Bill Freeman PhD ’92, both MIT professors of computer science and engineering; Neal Wadhwa, a graduate student in Freeman’s group; Michael Rubinstein PhD ’14 of Microsoft Research, who completed his PhD with Freeman; and Gautham Mysore of Adobe Research. The team will present their paper at Siggraph, a computer graphics conference, later this month.
Listen to WBUR’s Melissa Block talk to Abe Davis about his research.