Shuhei Kurita


2023

pdf bib
ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes
Shunya Kato | Shuhei Kurita | Chenhui Chu | Sadao Kurohashi
Findings of the Association for Computational Linguistics: EMNLP 2023

3D referring expression comprehension is a task to ground text representations onto objects in 3D scenes. It is a crucial task for indoor household robots or augmented reality devices to localize objects referred to in user instructions. However, existing indoor 3D referring expression comprehension datasets typically cover larger object classes that are easy to localize, such as chairs, tables, or doors, and often overlook small objects, such as cooking tools or office supplies. Based on the recently proposed diverse and high-resolution 3D scene dataset of ARKitScenes, we construct the ARKitSceneRefer dataset focusing on small daily-use objects that frequently appear in real-world indoor scenes. ARKitSceneRefer contains 15k objects of 1,605 indoor scenes, which are significantly larger than those of the existing 3D referring datasets, and covers diverse object classes of 583 from the LVIS dataset. In empirical experiments with both 2D and 3D state-of-the-art referring expression comprehension models, we observed the task difficulty of the localization in the diverse small object classes.

pdf bib
Query-based Image Captioning from Multi-context 360cdegree Images
Koki Maeda | Shuhei Kurita | Taiki Miyanishi | Naoaki Okazaki
Findings of the Association for Computational Linguistics: EMNLP 2023

A 360-degree image captures the entire scene without the limitations of a camera’s field of view, which makes it difficult to describe all the contexts in a single caption. We propose a novel task called Query-based Image Captioning (QuIC) for 360-degree images, where a query (words or short phrases) specifies the context to describe. This task is more challenging than the conventional image captioning task, which describes salient objects in images, as it requires fine-grained scene understanding to select the contents consistent with user’s intent based on the query. We construct a dataset for the new task that comprises 3,940 360-degree images and 18,459 pairs of queries and captions annotated manually. Experiments demonstrate that fine-tuning image captioning models further on our dataset can generate more diverse and controllable captions from multiple contexts of 360-degree images.

pdf bib
Language and Robotics: Toward Building Robots Coexisting with Human Society Using Language Interface
Yutaka Nakamura | Shuhei Kurita | Koichiro Yoshino
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract

2022

pdf bib
Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows
Keisuke Shirai | Atsushi Hashimoto | Taichi Nishimura | Hirotaka Kameko | Shuhei Kurita | Yoshitaka Ushiku | Shinsuke Mori
Proceedings of the 29th International Conference on Computational Linguistics

We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn a cooking action result for each object in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph. We developed a web interface to reduce human annotation costs. The dataset allows us to try various applications, including multimodal information retrieval.

pdf bib
Iterative Span Selection: Self-Emergence of Resolving Orders in Semantic Role Labeling
Shuhei Kurita | Hiroki Ouchi | Kentaro Inui | Satoshi Sekine
Proceedings of the 29th International Conference on Computational Linguistics

Semantic Role Labeling (SRL) is the task of labeling semantic arguments for marked semantic predicates. Semantic arguments and their predicates are related in various distinct manners, of which certain semantic arguments are a necessity while others serve as an auxiliary to their predicates. To consider such roles and relations of the arguments in the labeling order, we introduce iterative argument identification (IAI), which combines global decoding and iterative identification for the semantic arguments. In experiments, we first realize that the model with random argument labeling orders outperforms other heuristic orders such as the conventional left-to-right labeling order. Combined with simple reinforcement learning, the proposed model spontaneously learns the optimized labeling orders that are different from existing heuristic orders. The proposed model with the IAI algorithm achieves competitive or outperforming results from the existing models in the standard benchmark datasets of span-based SRL: CoNLL-2005 and CoNLL-2012.

2021

pdf bib
Co-Teaching Student-Model through Submission Results of Shared Task
Kouta Nakayama | Shuhei Kurita | Akio Kobayashi | Yukino Baba | Satoshi Sekine
Findings of the Association for Computational Linguistics: EMNLP 2021

Shared tasks have a long history and have become the mainstream of NLP research. Most of the shared tasks require participants to submit only system outputs and descriptions. It is uncommon for the shared task to request submission of the system itself because of the license issues and implementation differences. Therefore, many systems are abandoned without being used in real applications or contributing to better systems. In this research, we propose a scheme to utilize all those systems which participated in the shared tasks. We use all participated system outputs as task teachers in this scheme and develop a new model as a student aiming to learn the characteristics of each system. We call this scheme “Co-Teaching.” This scheme creates a unified system that performs better than the task’s single best system. It only requires the system outputs, and slightly extra effort is needed for the participants and organizers. We apply this scheme to the “SHINRA2019-JP” shared task, which has nine participants with various output accuracies, confirming that the unified system outperforms the best system. Moreover, the code used in our experiments has been released.

2019

pdf bib
Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies
Shuhei Kurita | Anders Søgaard
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In Semantic Dependency Parsing (SDP), semantic relations form directed acyclic graphs, rather than trees. We propose a new iterative predicate selection (IPS) algorithm for SDP. Our IPS algorithm combines the graph-based and transition-based parsing approaches in order to handle multiple semantic head words. We train the IPS model using a combination of multi-task learning and task-specific policy gradient training. Trained this way, IPS achieves a new state of the art on the SemEval 2015 Task 18 datasets. Furthermore, we observe that policy gradient training learns an easy-first strategy.

2018

pdf bib
Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis
Shuhei Kurita | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Japanese predicate-argument structure (PAS) analysis involves zero anaphora resolution, which is notoriously difficult. To improve the performance of Japanese PAS analysis, it is straightforward to increase the size of corpora annotated with PAS. However, since it is prohibitively expensive, it is promising to take advantage of a large amount of raw corpora. In this paper, we propose a novel Japanese PAS analysis model based on semi-supervised adversarial training with a raw corpus. In our experiments, our model outperforms existing state-of-the-art models for Japanese PAS analysis.

2017

pdf bib
Neural Joint Model for Transition-based Chinese Syntactic Analysis
Shuhei Kurita | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present neural network-based joint models for Chinese word segmentation, POS tagging and dependency parsing. Our models are the first neural approaches for fully joint Chinese analysis that is known to prevent the error propagation problem of pipeline models. Although word embeddings play a key role in dependency parsing, they cannot be applied directly to the joint task in the previous work. To address this problem, we propose embeddings of character strings, in addition to words. Experiments show that our models outperform existing systems in Chinese word segmentation and POS tagging, and perform preferable accuracies in dependency parsing. We also explore bi-LSTM models with fewer features.