Difference between revisions of "Multilingual Corpora"

From ACL Wiki
Jump to navigation Jump to search
(add another UN parallel corpus)
(Fixed Opus linki, the old one was dead)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
For individual languages, see [[List of resources by language]].
+
This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]].
  
 
See also [[Multilingual resources]].
 
See also [[Multilingual resources]].
Line 6: Line 6:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
*[http://spraakbanken.gu.se/ Bank of Swedish]
+
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]
 +
*[https://childes.talkbank.org/ CHILDES - Child Language Data Exchange System] (component of [https://talkbank.org TalkBank])
 
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]
 
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]
*[http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]
 
*[http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]
 
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]
 
 
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]
 
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]
 
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]
 
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]
Line 17: Line 15:
 
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]
 
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]
 
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]
 
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine]
 
 
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]
 
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol]
 
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
*[http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]
 
 
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]
 
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]
 
*[http://korpus.pl/ IPI PAN Corpus of Polish]
 
 
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]
 
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]
 
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]
 
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]
Line 34: Line 27:
 
*[http://multisemcor.itc.it MultiSemCor]
 
*[http://multisemcor.itc.it MultiSemCor]
 
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]
 
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]
+
*[https://opus.nlpl.eu/ OPUS - an open source parallel corpus]
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]
 
 
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]
 
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]
*[http://www.corpusdoportugues.org/ Portuguese Corpus]
 
 
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]
 
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]
 
 
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]
 
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]
 
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]
 
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]
 
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]
 
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]
 
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish
 
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish
*[http://www.corpusdelespanol.org/ Spanish Corpus]
+
*[https://talkbank.org The TalkBank System]
 
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]
 
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]
 
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]
 
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]

Latest revision as of 10:26, 16 February 2021

This page lists multilingual corpora. For monolingual corpora, see List of resources by language.

See also Multilingual resources.