Head of AI+D Faculty, Professor of Electrical Engineering and Computer Science, MIT
Panel: Deep Time & Intelligence
In a series of recent papers and training set designs, Antonio Torralba has begun to shape a case for the centrality of multi-modal sensory perception for the future of AI. Whereas syntax once provided a framework for heuristic AI, and while language can still provide a useful framework for tagging and structuring non-linguistic data, Torralba is building neural networks that can reason about motion given color and sound, or tie sounds to images and motions to sounds. “Deep Time & Intelligence” grounds its discussion of AI training in Torralba’s expertise.
Biography: MIT CSAIL
A diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of question: descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”).
VirtualHome is a multi-agent platform to simulate activities in a household. Agents are represented as humanoid avatars, which can interact with the environment through high-level instructions. You can use VirtualHome to render videos of human activities, or train agents to perform complex tasks.
Harmonic Convolution is an operation that helps deep networks model priors in audio signals by explicitly utilizing the harmonic structure. This is done by engineering the kernels to be supported by sets of harmonic series, instead of by local neighborhoods as convolutional kernels.