Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
Line 137: Line 137:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 +
===Arabic===
 +
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 +
===Bosnian===
 +
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 +
===Bulgarian===
 +
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 +
===Czech===
 +
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
 +
===Danish===
 +
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 +
===English===
 
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
 +
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
 +
*[http://thetis.bl.uk/ BNC Online Service]
 +
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
 +
===Finnish===
 +
*[http://www.csc.fi/kielipankki/ Finnish text bank]
 +
===French===
 +
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 +
===German===
 +
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
 +
*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
 +
===Haitian Creole===
 +
*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
 +
===Italian===
 +
*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
 +
===Japanese===
 +
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
 +
===Polish===
 +
*[http://korpus.pl/en/ IPI PAN Polish Corpus]
 +
===Romanian===
 +
*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
 +
===Sanksrit===
 +
*[http://sanskritlibrary.org/ Sanskrit Library]
 +
===Slovenian===
 +
*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
 +
===Spanish===
 +
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
 +
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
 +
===Swahili===
 +
*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
 +
----
 +
 +
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus]
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
 
 
*[http://ixa.si.ehu.es/Ixa/resources/sensecorpus A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)]
 
*[http://ixa.si.ehu.es/Ixa/resources/sensecorpus A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)]
 
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
 
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
 
 
*[http://www.aot.ru/search1.html AOT]
 
*[http://www.aot.ru/search1.html AOT]
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
*[http://thetis.bl.uk/ BNC Online Service]
 
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
 
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
 
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
 
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 
 
*[http://pioneer.chula.ac.th/~awirote/ling/corpuslst.htm Corpus Resources (Chulalongkorn University, Thailand)]
 
*[http://pioneer.chula.ac.th/~awirote/ling/corpuslst.htm Corpus Resources (Chulalongkorn University, Thailand)]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/cran/ Cranfield collection]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/cran/ Cranfield collection]
 
*[http://corpus.rae.es/creanet.html CREA]
 
*[http://corpus.rae.es/creanet.html CREA]
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
 
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 
 
*[http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
 
*[http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
 
*[http://www.hum.uva.nl/~ewn EuroWordNet]
 
*[http://www.hum.uva.nl/~ewn EuroWordNet]
*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
 
*[http://www.csc.fi/kielipankki/ Finnish text bank]
 
*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
 
 
*[http://rali.iro.umontreal.ca/ Hansards Corpus - Searchable]
 
*[http://rali.iro.umontreal.ca/ Hansards Corpus - Searchable]
 
*[http://www.hcrc.ed.ac.uk/maptask/ HCRC Map Task Corpus XML annotations]
 
*[http://www.hcrc.ed.ac.uk/maptask/ HCRC Map Task Corpus XML annotations]
*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
 
 
*[http://nats-www.informatik.uni-hamburg.de/~ingo/icopost/ ICOPOST]
 
*[http://nats-www.informatik.uni-hamburg.de/~ingo/icopost/ ICOPOST]
 
*[http://www.ims.uni-stuttgart.de/projekte/TC.html IMS Corpus Toolbox, Univ. of Stuttgart]
 
*[http://www.ims.uni-stuttgart.de/projekte/TC.html IMS Corpus Toolbox, Univ. of Stuttgart]
 
*[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/ IMS Corpus Workbench (CWB)]
 
*[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/ IMS Corpus Workbench (CWB)]
 
*[http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm International Corpus of Learner English]
 
*[http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm International Corpus of Learner English]
*[http://korpus.pl/en/ IPI PAN Polish Corpus]
 
 
*[http://www.ipds.uni-kiel.de/links/datenmaterial.en.html Kiel University's Institute on Phonetics and Speech Procesing]
 
*[http://www.ipds.uni-kiel.de/links/datenmaterial.en.html Kiel University's Institute on Phonetics and Speech Procesing]
 
*[http://www.nilc.icmc.usp.br/lacioweb Lacio Web Corpora]
 
*[http://www.nilc.icmc.usp.br/lacioweb Lacio Web Corpora]
 
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
 
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
 
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
Line 181: Line 207:
 
*[http://www.cs.cmu.edu/web/books.html On-line books at CMU]
 
*[http://www.cs.cmu.edu/web/books.html On-line books at CMU]
 
*[http://logos.uio.no/opus/ OPUS -- An Open Source Parallel Corpus]
 
*[http://logos.uio.no/opus/ OPUS -- An Open Source Parallel Corpus]
*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
 
 
*[http://elex.amu.edu.pl/~przemka/PICLE_search.php Polish subcorpus of the International Corpus of Learner English]
 
*[http://elex.amu.edu.pl/~przemka/PICLE_search.php Polish subcorpus of the International Corpus of Learner English]
 
*[http://www.cirp.es/WXN/wxn/frames/proxectos.html Ramon Piero Center for Research]
 
*[http://www.cirp.es/WXN/wxn/frames/proxectos.html Ramon Piero Center for Research]
 
*[http://about.reuters.com/researchandstandards/corpus/ Reuters Corpus]
 
*[http://about.reuters.com/researchandstandards/corpus/ Reuters Corpus]
*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
 
*[http://sanskritlibrary.org/ Sanskrit Library]
 
*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
 
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S99.html Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S99.html Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio]
Line 197: Line 219:
 
*[http://nora.hd.uib.no/index-e.html The CORPORA DataCenter (Norway)]
 
*[http://nora.hd.uib.no/index-e.html The CORPORA DataCenter (Norway)]
 
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
 
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 
 
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]
 
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]

Revision as of 19:31, 10 November 2006

This list needs some cleaning. Please help.

English

German

Multilingual

Russian

Slovak

Italian

Link collections

Corpora tools

Uncategorized

Arabic

Bosnian

Bulgarian

Czech

Danish

English

Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanksrit

Slovenian

Spanish

Swahili