Multilingual Corpora
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
This page lists multilingual corpora. For monolingual corpora, see List of resources by language.
See also Multilingual resources.
- ACQUIS COMMUNAUTAIRE Multilingual Corpus
- CELEX - The Dutch Center for Lexical Information
- CHILDES - Child Language Data Exchange System (component of TalkBank)
- CLUVI Corpus (Galician-English-Spanish-French parallel corpus)
- Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS
- COMPARA corpus
- Debian free software community
- EMILLE corpus
- European Parliament Proceedings Parallel Corpus 1996-2003
- EuroWordNet
- GlossaNet
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Hansard French-English parallel corpus
- Learner Behaviour on the Internet
- Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
- MuchMore Springer Bilingual Corpus
- MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages
- Multilingual Corpora: Available Resources
- Tanaka Corpus: Japanese-English sentence pairs
- MultiSemCor
- Newspapers on the Internet
- OPUS - an open source parallel corpus
- PolyU Language Bank
- Public registry of the Council of the EU
- The Bible as a Resource for Translation Software
- The ECI Multilingual corpus
- Slovenian Corpus FIDA and FIDA+
- SMULTRON Corpus parallell treebank of English, German and Swedish
- The TalkBank System
- UN declaration of human rights in multiple languages
- UN parallel corpora
- UN parallel corpora
- UNITEX
- Useful links about parallel corpora, by Olivier Kraif
- WaCky Project
- Wortlisten: spoken German, English, French, and Dutch