Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences

Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang


Abstract
Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing. In this paper, we explore a bilingual autoencoder approach to model parallel sentences. We extract sentence-level global descriptors (e.g. min, max) from word embeddings, and construct two monolingual autoencoders over these descriptors on the source and target language. In order to tightly connect the two autoencoders with bilingual correspondences, we force them to share the same decoding parameters and minimize a corpus-level semantic distance between the two languages. Being optimized towards a joint objective function of reconstruction and semantic errors, our bilingual antoencoder is able to learn continuous-valued latent representations for parallel sentences. Experiments on both intrinsic and extrinsic evaluations on statistical machine translation tasks show that our autoencoder achieves substantial improvements over the baselines.
Anthology ID:
C16-1240
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2548–2558
Language:
URL:
https://aclanthology.org/C16-1240
DOI:
Bibkey:
Cite (ACL):
Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2548–2558, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences (Zhang et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1240.pdf