From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

Peter Young, Alice Lai, Micah Hodosh, Julia Hockenmaier


Abstract
We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations, based on a large corpus of 30K images and 150K descriptive captions.
Anthology ID:
Q14-1006
Volume:
Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:
2014
Address:
Cambridge, MA
Editors:
Dekang Lin, Michael Collins, Lillian Lee
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
67–78
Language:
URL:
https://aclanthology.org/Q14-1006
DOI:
10.1162/tacl_a_00166
Bibkey:
Cite (ACL):
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78.
Cite (Informal):
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions (Young et al., TACL 2014)
Copy Citation:
PDF:
https://aclanthology.org/Q14-1006.pdf
Video:
 https://aclanthology.org/Q14-1006.mp4
Data
Flickr30k