ML asks “wHY?!”

Claudia Zhu
4 min readFeb 17, 2021

Review of MIT’s WHY!

Conceptual Contributions

It is an interesting and challenging question to resolve how we determine the “why” behind an image. It is posited that it is a result of “Theory of the Mind” and psychophysics researchers hypothesize that our capacity to reliably infer another person’s motivation stems from our ability to impute our own beliefs to others, that is, if we experience a certain emotion doing something, we infer that others experience that emotion as well. This paper seeks to computationally deduce the motivation behind people’s actions in images. This problem is challenging in many regards as it is far from clear how we can reduce the reasoning behind the theory of the mind into a set of operations that can be run by a machine. However, since humans are able to consistently perform this task, the authors believe introducing this problem to the computer vision community will spur research in this direction and introduce a new dataset to facilitate such further research. These datasets are created by a group of human workers who annotate why people are likely undertaking actions in photographs. These annotations were then combined with state-of-the-art image features to train data-driven classifiers that predict a person’s motivation from images.

The paper presents an incipient framework for inferring people’s motivations in images. First, the authors propose to give computer vision systems access to many of the human experiences by using state-of-the-art language models on estimated on…

--

--