Can You Spot the Semantic Predicate in this Video?

Christopher Reale, Claire Bonial, Heesung Kwon, Clare Voss


Abstract
We propose a method to improve human activity recognition in video by leveraging semantic information about the target activities from an expert-defined linguistic resource, VerbNet. Our hypothesis is that activities that share similar event semantics, as defined by the semantic predicates of VerbNet, will be more likely to share some visual components. We use a deep convolutional neural network approach as a baseline and incorporate linguistic information from VerbNet through multi-task learning. We present results of experiments showing the added information has negligible impact on recognition performance. We discuss how this may be because the lexical semantic information defined by VerbNet is generally not visually salient given the video processing approach used here, and how we may handle this in future approaches.
Anthology ID:
W18-4307
Volume:
Proceedings of the Workshop Events and Stories in the News 2018
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, U.S.A
Editors:
Tommaso Caselli, Ben Miller, Marieke van Erp, Piek Vossen, Martha Palmer, Eduard Hovy, Teruko Mitamura, David Caswell, Susan W. Brown, Claire Bonial
Venue:
EventStory
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–60
Language:
URL:
https://aclanthology.org/W18-4307
DOI:
Bibkey:
Cite (ACL):
Christopher Reale, Claire Bonial, Heesung Kwon, and Clare Voss. 2018. Can You Spot the Semantic Predicate in this Video?. In Proceedings of the Workshop Events and Stories in the News 2018, pages 55–60, Santa Fe, New Mexico, U.S.A. Association for Computational Linguistics.
Cite (Informal):
Can You Spot the Semantic Predicate in this Video? (Reale et al., EventStory 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4307.pdf
Data
UCF101