Antonio Torralba

Head of AI+D Faculty, Professor of Electrical Engineering and Computer Science, MIT

Panel: Deep Time & Intelligence

Antonio Torralba, Clevrer, Neural Symbolic Dynamic Reasoning. Courtesy of Antonio Torralba.
Antonio Torralba, Denoising, Harmonic Convolution. Courtesy of Antonio Torralba.

In a series of recent papers and training set designs, Antonio Torralba has begun to shape a case for the centrality of multi-modal sensory perception for the future of AI. Whereas syntax once provided a framework for heuristic AI, and while language can still provide a useful framework for tagging and structuring non-linguistic data, Torralba is building neural networks that can reason about motion given color and sound, or tie sounds to images and motions to sounds. “Deep Time & Intelligence” grounds its discussion of AI training in Torralba’s expertise.

Biography: MIT CSAIL

Symposium Schedule

Panel: Deep Time & Intelligence
Video Release: Friday, April 2, 2021 / 9:00am EST
Live Q&A: Monday, April 5, 2021 / 11:00am–12:00pm EST

Related Works

CoLlision Events for Video REpresentation and Reasoning (CLEVRER)


A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of question: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).



VirtualHome is a multi-agent platform to simulate activities in a household. Agents are represented as humanoid avatars, which can interact with the environment through high-level instructions. You can use VirtualHome to render videos of human activities, or train agents to perform complex tasks.

Harmonic Convolution


Harmonic Convolution is an operation that helps deep networks model priors in audio signals by explicitly utilizing the harmonic structure. This is done by engineering the kernels to be supported by sets of harmonic series, instead of by local neighborhoods as convolutional kernels.