Multilingual Corpora
This page lists multilingual corpora. For monolingual corpora, see List of resources by language.
See also Multilingual resources.
- ACQUIS COMMUNAUTAIRE Multilingual Corpus
- CELEX - The Dutch Center for Lexical Information
- CHILDES - Child Language Data Exchange System (component of TalkBank)
- CLUVI Corpus (Galician-English-Spanish-French parallel corpus)
- Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS
- COMPARA corpus
- Debian free software community
- EMILLE corpus
- European Parliament Proceedings Parallel Corpus 1996-2003
- EuroWordNet
- GlossaNet
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Hansard French-English parallel corpus
- Learner Behaviour on the Internet
- Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
- MuchMore Springer Bilingual Corpus
- MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages
- Multilingual Corpora: Available Resources
- Tanaka Corpus: Japanese-English sentence pairs
- MultiSemCor
- Newspapers on the Internet
- OPUS - an open source parallel corpus
- PolyU Language Bank
- Public registry of the Council of the EU
- The Bible as a Resource for Translation Software
- The ECI Multilingual corpus
- Slovenian Corpus FIDA and FIDA+
- SMULTRON Corpus parallell treebank of English, German and Swedish
- The TalkBank System
- UN declaration of human rights in multiple languages
- UN parallel corpora
- UN parallel corpora
- UNITEX
- Useful links about parallel corpora, by Olivier Kraif
- WaCky Project
- Wortlisten: spoken German, English, French, and Dutch