Difference between revisions of "Multilingual Corpora"
Jump to navigation
Jump to search
(HamleDT) |
(Fixed Opus linki, the old one was dead) |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | For | + | This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]]. |
See also [[Multilingual resources]]. | See also [[Multilingual resources]]. | ||
Line 6: | Line 6: | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus] | *[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus] | ||
− | *[http:// | + | *[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information] |
+ | *[https://childes.talkbank.org/ CHILDES - Child Language Data Exchange System] (component of [https://talkbank.org TalkBank]) | ||
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)] | *[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)] | ||
− | |||
− | |||
− | |||
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS] | *[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS] | ||
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus] | *[http://www.linguateca.pt/COMPARA/ COMPARA corpus] | ||
Line 17: | Line 15: | ||
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003] | *[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003] | ||
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet] | *[http://www.illc.uva.nl/EuroWordNet EuroWordNet] | ||
− | |||
*[http://glossa.fltr.ucl.ac.be/ GlossaNet] | *[http://glossa.fltr.ucl.ac.be/ GlossaNet] | ||
− | |||
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | *[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
− | |||
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus] | *[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus] | ||
− | |||
− | |||
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet] | *[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet] | ||
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads] | *[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads] | ||
Line 34: | Line 27: | ||
*[http://multisemcor.itc.it MultiSemCor] | *[http://multisemcor.itc.it MultiSemCor] | ||
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet] | *[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet] | ||
− | *[ | + | *[https://opus.nlpl.eu/ OPUS - an open source parallel corpus] |
− | |||
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank] | *[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank] | ||
− | |||
*[http://register.consilium.eu.int/ Public registry of the Council of the EU] | *[http://register.consilium.eu.int/ Public registry of the Council of the EU] | ||
− | |||
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software] | *[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software] | ||
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus] | *[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus] | ||
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+] | *[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+] | ||
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish | *[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish | ||
− | *[ | + | *[https://talkbank.org The TalkBank System] |
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages] | *[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages] | ||
+ | *[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora] | ||
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | *[http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | ||
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX] | *[http://www-igm.univ-mlv.fr/~unitex/ UNITEX] |
Latest revision as of 09:26, 16 February 2021
This page lists multilingual corpora. For monolingual corpora, see List of resources by language.
See also Multilingual resources.
- ACQUIS COMMUNAUTAIRE Multilingual Corpus
- CELEX - The Dutch Center for Lexical Information
- CHILDES - Child Language Data Exchange System (component of TalkBank)
- CLUVI Corpus (Galician-English-Spanish-French parallel corpus)
- Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS
- COMPARA corpus
- Debian free software community
- EMILLE corpus
- European Parliament Proceedings Parallel Corpus 1996-2003
- EuroWordNet
- GlossaNet
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Hansard French-English parallel corpus
- Learner Behaviour on the Internet
- Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
- MuchMore Springer Bilingual Corpus
- MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages
- Multilingual Corpora: Available Resources
- Tanaka Corpus: Japanese-English sentence pairs
- MultiSemCor
- Newspapers on the Internet
- OPUS - an open source parallel corpus
- PolyU Language Bank
- Public registry of the Council of the EU
- The Bible as a Resource for Translation Software
- The ECI Multilingual corpus
- Slovenian Corpus FIDA and FIDA+
- SMULTRON Corpus parallell treebank of English, German and Swedish
- The TalkBank System
- UN declaration of human rights in multiple languages
- UN parallel corpora
- UN parallel corpora
- UNITEX
- Useful links about parallel corpora, by Olivier Kraif
- WaCky Project
- Wortlisten: spoken German, English, French, and Dutch