Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
Line 8: Line 8:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 +
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
 
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
 
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
 
*[http://americannationalcorpus.org/ American National Corpus (ANC)]
 
*[http://americannationalcorpus.org/ American National Corpus (ANC)]
Line 78: Line 79:
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
  
==Uncategorized==
+
==Arabic==
<!-- Please keep this list in alphabetical order -->
 
 
 
===Arabic===
 
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
===Bosnian===
+
==Bosnian==
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
===Bulgarian===
+
==Bulgarian==
 
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
===Croatian===
+
==Croatian==
 
*[http://riznica.ihjj.hr/en/ Croatian Language Corpus at the IHJJ]
 
*[http://riznica.ihjj.hr/en/ Croatian Language Corpus at the IHJJ]
===Czech===
+
==Czech==
 
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
 
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
===Danish===
+
==Danish==
 
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
===English===
+
 
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
+
==Finnish==
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
 
*[http://thetis.bl.uk/ BNC Online Service]
 
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
 
===Finnish===
 
 
*[http://www.csc.fi/kielipankki/ Finnish text bank]
 
*[http://www.csc.fi/kielipankki/ Finnish text bank]
===French===
+
==French==
 
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
===German===
+
==German==
 
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
 
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
 
*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
 
*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
===Haitian Creole===
+
==Haitian Creole==
 
*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
 
*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
===Italian===
+
==Italian==
 
*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
 
*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
===Japanese===
+
==Japanese==
 
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
 
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
===Polish===
+
==Polish==
 
*[http://korpus.pl/en/ IPI PAN Polish Corpus]
 
*[http://korpus.pl/en/ IPI PAN Polish Corpus]
===Romanian===
+
==Romanian==
 
*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
 
*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
===Sanskrit===
+
==Sanskrit==
 
*[http://sanskritlibrary.org/ Sanskrit Library]
 
*[http://sanskritlibrary.org/ Sanskrit Library]
  
===Slovenian===
+
==Slovenian==
 
*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
 
*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
===Spanish===
+
==Spanish==
 
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
 
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
 
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
 
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
===Swahili===
+
==Swahili==
 
*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
 
*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
----
 
  
 +
==Uncategorized==
  
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus]

Revision as of 20:23, 24 April 2008

For languages other than English, see List of resources by language.

English


Slovak

Italian

Link collections

Corpora tools

Arabic

Bosnian

Bulgarian

Croatian

Czech

Danish

Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanskrit

Slovenian

Spanish

Swahili

Uncategorized