Akiko Aizawa

Also published as: Akiko N. Aizawa


2019

pdf pdf bib
Unsupervised Rewriter for Multi-Sentence Compression
Yang Zhao | Xiaoyu Shen | Wei Bi | Akiko Aizawa

Multi-sentence compression (MSC) aims to generate a grammatical but reduced compression from multiple input sentences while retaining their key information. Previous dominating approach for MSC is the extraction-based word graph approach. A few variants further leveraged lexical substitution to yield more abstractive compression. However, two limitations exist. First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression. Second, lexical substitution is often inappropriate without the consideration of context information. To tackle the above-mentioned issues, we present a neural rewriter for multi-sentence compression that does not need any parallel corpus. Empirical studies have shown that our approach achieves comparable results upon automatic evaluation and improves the grammaticality of compression based on human evaluation. A parallel corpus with more than 140,000 (sentence group, compression) pairs is also constructed as a by-product for future research.

2018

pdf pdf bib
Using Formulaic Expressions in Writing Assistance Systems
Kenichi Iwatsuki | Akiko Aizawa

Formulaic expressions (FEs) used in scholarly papers, such as ‘there has been little discussion about’, are helpful for non-native English speakers. However, it is time-consuming for users to manually search for an appropriate expression every time they want to consult FE dictionaries. For this reason, we tackle the task of semantic searches of FE dictionaries. At the start of our research, we identified two salient difficulties in this task. First, the paucity of example sentences in existing FE dictionaries results in a shortage of context information, which is necessary for acquiring semantic representation of FEs. Second, while a semantic category label is assigned to each FE in many FE dictionaries, it is difficult to predict the labels from user input, forcing users to manually designate the semantic category when searching. To address these difficulties, we propose a new framework for semantic searches of FEs and propose a new method to leverage both existing dictionaries and domain sentence corpora. Further, we expand an existing FE dictionary to consider building a more comprehensive and domain-specific FE dictionary and to verify the effectiveness of our method.

pdf pdf bib
What Makes Reading Comprehension Questions Easier?
Saku Sugawara | Kentaro Inui | Satoshi Sekine | Akiko Aizawa

A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristics to split each dataset into easy and hard subsets and examine the performance of two baseline models for each of the subsets. We then manually annotate questions sampled from each subset with both validity and requisite reasoning skills to investigate which skills explain the difference between easy and hard questions. From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions. These results suggest that one might overestimate recent advances in MRC.

pdf pdf bib
UC3M-NII Team at SemEval-2018 Task 7: Semantic Relation Classification in Scientific Papers via Convolutional Neural Network
Víctor Suárez-Paniagua | Isabel Segura-Bedmar | Akiko Aizawa

This paper reports our participation for SemEval-2018 Task 7 on extraction and classification of relationships between entities in scientific papers. Our approach is based on the use of a Convolutional Neural Network (CNN) trained on350 abstract with manually annotated entities and relations. Our hypothesis is that this deep learning model can be applied to extract and classify relations between entities for scientific papers at the same time. We use the Part-of-Speech and the distances to the target entities as part of the embedding for each word and we blind all the entities by marker names. In addition, we use sampling techniques to overcome the imbalance issues of this dataset. Our architecture obtained an F1-score of 35.4% for the relation extraction task and 18.5% for the relation classification task with a basic configuration of the one step CNN.

pdf pdf bib
Universal Dependencies for Ainu
Hajime Senuma | Akiko Aizawa

pdf pdf bib
A Language Model based Evaluator for Sentence Compression
Yang Zhao | Zhiyuan Luo | Akiko Aizawa

We herein present a language-model-based evaluator for deletion-based sentence compression and view this task as a series of deletion-and-evaluation operations using the evaluator. More specifically, the evaluator is a syntactic neural language model that is first built by learning the syntactic and structural collocation among words. Subsequently, a series of trial-and-error deletion operations are conducted on the source sentences via a reinforcement learning framework to obtain the best target compression. An empirical study shows that the proposed model can effectively generate more readable compression, comparable or superior to several strong baselines. Furthermore, we introduce a 200-sentence test set for a large-scale dataset, setting a new baseline for the future research.

2017

pdf pdf bib
Seq2seq for Morphological Reinflection: When Deep Learning Fails
Hajime Senuma | Akiko Aizawa

pdf pdf bib
Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability
Saku Sugawara | Yusuke Kido | Hikaru Yokono | Akiko Aizawa

Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.

pdf pdf bib
A Conditional Variational Framework for Dialog Generation
Xiaoyu Shen | Hui Su | Yanran Li | Wenjie Li | Shuzi Niu | Yang Zhao | Akiko Aizawa | Guoping Long

Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.

pdf pdf bib
Toward Universal Dependencies for Ainu
Hajime Senuma | Akiko Aizawa

2016

pdf pdf bib
Discourse Relation Sense Classification with Two-Step Classifiers
Yusuke Kido | Akiko Aizawa

pdf pdf bib
Measuring Cognitive Translation Effort with Activity Units
Moritz Jonas Schaeffer | Michael Carl | Isabel Lacruz | Akiko Aizawa

pdf pdf bib
An Analysis of Prerequisite Skills for Reading Comprehension
Saku Sugawara | Akiko Aizawa

pdf pdf bib
Typed Entity and Relation Annotation on Computer Science Papers
Yuka Tateisi | Tomoko Ohta | Sampo Pyysalo | Yusuke Miyao | Akiko Aizawa

We describe our ongoing effort to establish an annotation scheme for describing the semantic structures of research articles in the computer science domain, with the intended use of developing search systems that can refine their results by the roles of the entities denoted by the query keys. In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text. So far, we have annotated 400 abstracts from the ACL anthology and the ACM digital library. In this paper, the scheme and the annotated dataset are described, along with the problems found in the course of annotation. We also show the results of automatic annotation and evaluate the corpus in a practical setting in application to topic extraction.

pdf pdf bib
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
Michael Carl | Akiko Aizawa | Masaru Yamada

Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT Translation Process Research-DB (TPR-DB), which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.

pdf pdf bib
Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization
Hajime Senuma | Akiko Aizawa

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there are m features in total but only n = o(√m) features are required to distinguish examples, with 𝛺(log m) training examples and reasonable settings, it is possible to obtain a good model in a succinct representation using n log2 mn + o(m) bits, by using a pipeline of existing compression methods: L1-regularized logistic regression, feature hashing, Elias–Fano indices, and randomized quantization. An experiment shows that a noun phrase chunking task for which an existing library requires 27 megabytes can be compressed to less than 13 kilobytes without notable loss of accuracy.

pdf pdf bib
SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Takeshi Abekawa | Akiko Aizawa

In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.

pdf pdf bib
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts
Marcello Federico | Akiko Aizawa

2015

pdf pdf bib
CroVeWA: Crosslingual Vector-Based Writing Assistance
Hubert Soyer | Goran Topić | Pontus Stenetorp | Akiko Aizawa

pdf pdf bib
Technical Term Extraction Using Measures of Neology
Christopher Norman | Akiko Aizawa

pdf pdf bib
Distant-supervised Language Model for Detecting Emotional Upsurge on Twitter
Yoshinari Fujinuma | Hikaru Yokono | Pascual Martínez-Gómez | Akiko Aizawa

2014

pdf bib
Corpus for Coreference Resolution on Scientific Papers
Panot Chaimongkol | Akiko Aizawa | Yuka Tateisi

pdf bib
Annotation of Computer Science Papers for Semantic Relation Extrac-tion
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa

pdf pdf bib
Significance of Bridging Real-world Documents and NLP Technologies
Tadayoshi Hara | Goran Topić | Yusuke Miyao | Akiko Aizawa

pdf pdf bib
Japanese to English Machine Translation using Preordering and Compositional Distributed Semantics
Sho Hoshino | Hubert Soyer | Yusuke Miyao | Akiko Aizawa

2013

pdf pdf bib
Sense Disambiguation: From Natural Language Words to Mathematical Terms
Minh-Quoc Nghiem | Giovanni Yoko Kristianto | Goran Topić | Akiko Aizawa

pdf pdf bib
Diagnosing Causes of Reading Difficulty using Bayesian Networks
Pascual Martínez-Gómez | Akiko Aizawa

pdf pdf bib
Relation Annotation for Understanding Research Papers
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa

pdf pdf bib
Modeling Comma Placement in Chinese Text for Better Readability using Linguistic Features and Gaze Information
Tadayoshi Hara | Chen Chen | Yoshinobu Kano | Akiko Aizawa

2012

pdf bib
Building Japanese Predicate-argument Structure Corpus using Lexical Conceptual Structure
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa

pdf bib
Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques
Hidetsugu Nanba | Toshiyuki Takezawa | Kiyoko Uchiyama | Akiko Aizawa

pdf pdf bib
Predicting Word Fixations in Text with a CRF Model for Capturing General Reading Strategies among Readers
Tadayoshi Hara | Daichi Mochihashi | Yoshinobu Kano | Akiko Aizawa

pdf pdf bib
Recognizing Personal Characteristics of Readers using Eye-Movements and Text Features
Pascual Martínez-Gómez | Tadayoshi Hara | Akiko Aizawa

pdf pdf bib
Framework of Semantic Role Assignment based on Extended Lexical Conceptual Structure: Comparison with VerbNet and FrameNet
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa

2011

pdf pdf bib
Clustering Comparable Corpora For Bilingual Lexicon Extraction
Bo Li | Eric Gaussier | Akiko Aizawa

pdf pdf bib
Analyzing the characteristics of academic paper categories by using an index of representativeness
Takafumi Suzuki | Kiyoko Uchiyama | Ryota Tomisaka | Akiko Aizawa

2010

pdf pdf bib
Mining Coreference Relations between Formulas and Text using Wikipedia
Minh Nghiem Quoc | Keisuke Yokoi | Yuichiroh Matsubayashi | Akiko Aizawa

2003

pdf pdf bib
Analysis of Source Identified Text Corpora: Exploring the Statistics of the Reused Text and Authorship
Akiko Aizawa

2002

pdf pdf bib
A Method of Cluster-Based Indexing of Textual Data
Akiko Aizawa

2000

pdf pdf bib
Automatic Thesaurus Generation through Multiple Filtering
Kyo Kageura | Keita Tsuji | Akiko N. Aizawa