A person watching videos that show things opening—a door, a book, curtains, a blooming flower, a yawning dog—easily understands that the same type of action is depicted in each clip. “Computer models fail miserably to identify these things. How do humans do it so effortlessly?” asks Dan Gutfreund, a principal investigator at the MIT-IBM Watson AI Laboratory and a staff member at IBM Research. “We process information as it happens in space and time. How can we teach computer models to do that?”
Such are the big questions behind one of the new projects underway at the MIT-IBM Watson AI Laboratory, a collaboration for research on the frontiers of artificial intelligence. Launched last fall, the lab connects MIT and IBM researchers to work together on AI algorithms, the application of AI to industries, the physics of AI, and ways to use AI to advance shared prosperity. The Moments in Time data set is one of three new projects funded by the lab. It pairs Gutfreund with Aude Oliva, a principal research scientist at the MIT Computer Science and Artificial Intelligence Laboratory, as the project’s principal investigators. Moments in Time is built on a collection of 1 million annotated videos of dynamic events unfolding within three seconds. Gutfreund and Oliva, who is also the MIT executive director at the MIT-IBM Watson AI Lab, are using these clips to address one of the next big steps for AI: teaching machines to recognize actions.
“As we grow up, we look around, we see people and objects moving, we hear sounds that people and objects make. We have a lot of visual and auditory experiences. An AI system needs to learn the same way and be fed with videos and dynamic information,” Oliva says. For every action category in the data set, such as cooking, running, or opening, there are more than 2,000 videos. The short clips enable computer models to better learn the diversity of meaning around specific actions and events. “This data set can serve as a new challenge to develop AI models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis,” Oliva adds. Oliva and Gutfreund, along with additional researchers from MIT and IBM, met weekly for more than a year to tackle technical issues, such as how to choose the action categories for annotations, where to find the videos, and how to put together a wide array so the AI system learns without bias. The team also developed machine learning models, which were then used to scale the data collection.
One key goal at the lab is the development of AI systems that move beyond specialized tasks to tackle more complex problems and benefit from robust and continuous learning. “We are seeking new algorithms that not only leverage big data when available—but also learn from limited data to augment human intelligence,” said Sophie V. Vandebroek, chief operating officer of IBM Research, about the collaboration.
In addition to pairing the unique technical and scientific strengths of each organization, IBM is also bringing MIT researchers an influx of resources, signaled by its $240 million investment in AI efforts over the next 10 years, dedicated to the MIT-IBM Watson AI Lab. And the alignment of MIT-IBM interest in AI is proving beneficial, according to Oliva.
“IBM came to MIT with an interest in developing new ideas for an artificial intelligence system based on vision. I proposed a project where we build data sets to feed the model about the world. It had not been done before at this level. It was a novel undertaking. Now people can go to our website and download the data set and our deep-learning computer models.”
In addition, Oliva says, MIT and IBM researchers have published an article describing the performance of neural network models trained on the data set, which itself was deepened by shared viewpoints. “IBM researchers gave us ideas to add action categories to have more richness in areas like health care and sports. They broadened our view. They gave us ideas about how AI can make an impact from the perspective of business and the needs of the world,” she says.
A longer version of this story originally appeared on MIT News on April 4, 2018.