Weiwei Sun


2019

pdf bib
Parsing Chinese Sentences with Grammatical Relations
Weiwei Sun | Yufei Chen | Xiaojun Wan | Meichun Liu
Computational Linguistics, Volume 45, Issue 1 - March 2019

We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.

pdf bib
CUNY-PKU Parser at SemEval-2019 Task 1: Cross-Lingual Semantic Parsing with UCCA
Weimin Lyu | Sheng Huang | Abdul Rafae Khan | Shengqiang Zhang | Weiwei Sun | Jia Xu
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the systems of the CUNY-PKU team in SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA. We introduce a novel model by applying a cascaded MLP and BiLSTM model. Then, we ensemble multiple system-outputs by reparsing. In particular, we introduce a new decoding algorithm for building the UCCA representation. Our system won the first place in one track (French-20K-Open), second places in four tracks (English-Wiki-Open, English-20K-Open, German-20K-Open, and German-20K-Closed), and third place in one track (English-20K-Closed), among all seven tracks.

pdf bib
Graph-Based Meaning Representations: Design and Processing
Alexander Koller | Stephan Oepen | Weiwei Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches.

2018

pdf bib
Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data
Zi Lin | Yuguang Duan | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper studies semantic parsing for interlanguage (L2), taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We first manually annotate the semantic roles for a set of learner texts to derive a gold standard for automatic SRL. Based on the new data, we then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser-based and neural-syntax-agnostic systems, to gauge how successful SRL for learner Chinese can be. We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. Finally, the paper introduces a new agreement-based model to explore the semantic coherency information in the large-scale L2-L1 parallel data. We then show such information is very effective to enhance SRL for learner texts. Our model achieves an F-score of 72.06, which is a 2.02 point improvement over the best baseline.

pdf bib
Neural Maximum Subgraph Parsing for Cross-Domain Semantic Dependency Analysis
Yufei Chen | Sheng Huang | Fang Wang | Junjie Cao | Weiwei Sun | Xiaojun Wan
Proceedings of the 22nd Conference on Computational Natural Language Learning

We present experiments for cross-domain semantic dependency analysis with a neural Maximum Subgraph parser. Our parser targets 1-endpoint-crossing, pagenumber-2 graphs which are a good fit to semantic dependency graphs, and utilizes an efficient dynamic programming algorithm for decoding. For disambiguation, the parser associates words with BiLSTM vectors and utilizes these vectors to assign scores to candidate dependencies. We conduct experiments on the data sets from SemEval 2015 as well as Chinese CCGBank. Our parser achieves very competitive results for both English and Chinese. To improve the parsing performance on cross-domain texts, we propose a data-oriented method to explore the linguistic generality encoded in English Resource Grammar, which is a precisionoriented, hand-crafted HPSG grammar, in an implicit way. Experiments demonstrate the effectiveness of our data-oriented method across a wide range of conditions.

pdf bib
Accurate SHRG-Based Semantic Parsing
Yufei Chen | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We demonstrate that an SHRG-based parser can produce semantic graphs much more accurately than previously shown, by relating synchronous production rules to the syntacto-semantic composition process. Our parser achieves an accuracy of 90.35 for EDS (89.51 for DMRS) in terms of elementary dependency match, which is a 4.87 (5.45) point improvement over the best existing data-driven model, indicating, in our view, the importance of linguistically-informed derivation for data-driven semantic parsing. This accuracy is equivalent to that of English Resource Grammar guided models, suggesting that (recurrent) neural network models are able to effectively learn deep linguistic knowledge from annotations.

pdf bib
Language Generation via DAG Transduction
Yajie Ye | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A DAG automaton is a formal device for manipulating graphs. By augmenting a DAG automaton with transduction rules, a DAG transducer has potential applications in fundamental NLP tasks. In this paper, we propose a novel DAG transducer to perform graph-to-program transformation. The target structure of our transducer is a program licensed by a declarative programming language rather than linguistic structures. By executing such a program, we can easily get a surface string. Our transducer is designed especially for natural language generation (NLG) from type-logical semantic graphs. Taking Elementary Dependency Structures, a format of English Resource Semantics, as input, our NLG system achieves a BLEU-4 score of 68.07. This remarkable result demonstrates the feasibility of applying a DAG transducer to resolve NLG, as well as the effectiveness of our design.

pdf bib
Pre- and In-Parsing Models for Neural Empty Category Detection
Yufei Chen | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Motivated by the positive impact of empty category on syntactic parsing, we study neural models for pre- and in-parsing detection of empty category, which has not previously been investigated. We find several non-obvious facts: (a) BiLSTM can capture non-local contextual information which is essential for detecting empty categories, (b) even with a BiLSTM, syntactic information is still able to enhance the detection, and (c) automatic detection of empty categories improves parsing quality for overt words. Our neural ECD models outperform the prior state-of-the-art by significant margins.

2017

pdf bib
Parsing for Grammatical Relations via Graph Merging
Weiwei Sun | Yantao Du | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with building deep grammatical relation (GR) analysis using data-driven approach. To deal with this problem, we propose graph merging, a new perspective, for building flexible dependency graphs: Constructing complex graphs via constructing simple subgraphs. We discuss two key problems in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. Experiments demonstrate the effectiveness of graph merging. Our parser reaches state-of-the-art performance and is significantly better than two transition-based parsers.

pdf bib
The Covert Helps Parse the Overt
Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We design new algorithms to produce dependency trees in which empty elements are allowed, and evaluate the impact of information about empty category on parsing overt elements. Such information is helpful to reduce the approximation error in a structured parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements, and perform structure regularization via joint decoding. Experiments on English and Chinese TreeBanks with different parsing models indicate that incorporating empty elements consistently improves surface parsing.

pdf bib
Semantic Dependency Parsing via Book Embedding
Weiwei Sun | Junjie Cao | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser.

pdf bib
Parsing to 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the Maximum Subgraph problem in deep dependency parsing. We consider two restrictions to deep dependency graphs: (a) 1-endpoint-crossing and (b) pagenumber-2. Our main contribution is an exact algorithm that obtains maximum subgraphs satisfying both restrictions simultaneously in time O(n5). Moreover, ignoring one linguistically-rare structure descreases the complexity to O(n4). We also extend our quartic-time algorithm into a practical parser with a discriminative disambiguation model and evaluate its performance on four linguistic data sets used in semantic dependency parsing.

pdf bib
Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new Maximum Subgraph algorithm for first-order parsing to 1-endpoint-crossing, pagenumber-2 graphs. Our algorithm has two characteristics: (1) it separates the construction for noncrossing edges and crossing edges; (2) in a single construction step, whether to create a new arc is deterministic. These two characteristics make our algorithm relatively easy to be extended to incorporiate crossing-sensitive second-order features. We then introduce a new algorithm for quasi-second-order parsing. Experiments demonstrate that second-order features are helpful for Maximum Subgraph parsing.

2016

pdf bib
Transition-Based Parsing for Deep Dependency Structures
Xun Zhang | Yantao Du | Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
Towards Accurate and Efficient Chinese Part-of-Speech Tagging
Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

2015

pdf bib
A Data-Driven, Factorization Parser for CCG Dependency Structures
Yantao Du | Weiwei Sun | Xiaojun Wan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Peking: Building Semantic Dependency Graphs with a Hybrid Parser
Yantao Du | Fan Zhang | Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Grammatical Relations in Chinese: GB-Ground Extraction and Data-Driven Parsing
Weiwei Sun | Yantao Du | Xin Kou | Shuoyang Ding | Xiaojun Wan
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Yantao Du | Fan Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging
Weiwei Sun | Xiaochang Peng | Xiaojun Wan
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
Weiwei Sun | Xiaojun Wan
Transactions of the Association for Computational Linguistics, Volume 1

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.

2012

pdf bib
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Weiwei Sun | Xiaojun Wan
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Weiwei Sun | Hans Uszkoreit
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Semantic Cohesion Model for Phrase-Based SMT
Minwei Feng | Weiwei Sun | Hermann Ney
Proceedings of COLING 2012

2011

pdf bib
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Weiwei Sun
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Enhancing Chinese Word Segmentation Using Unlabeled Data
Weiwei Sun | Jia Xu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
Weiwei Sun
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Improving Chinese Semantic Role Labeling with Rich Syntactic Features
Weiwei Sun
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations
Weiwei Sun | Rui Wang | Yi Zhang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Automatic Acquisition of Chinese Novel Noun Compounds
Meng Wang | Chu-Ren Huang | Shiwen Yu | Weiwei Sun
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Word-based and Character-based Word Segmentation Models: Comparison and Combination
Weiwei Sun
Coling 2010: Posters

2009

pdf bib
Chinese Semantic Role Labeling with Shallow Parsing
Weiwei Sun | Zhifang Sui | Meng Wang | Xin Wang
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Chinese Function Tag Labeling
Weiwei Sun | Zhifang Sui
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Prediction of Thematic Rank for Structured Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Meng Wang
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Prediction of Maximal Projection for Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Haifeng Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The Integration of Dependency Relation Classification and Semantic Role Labeling Using Bilayer Maximum Entropy Markov Models
Weiwei Sun | Hongzhan Li | Zhifang Sui
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning