Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
Line 66: Line 66:
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
  
==Arabic==
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
==Bosnian==
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 
==Bulgarian==
 
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 
==Croatian==
 
*[http://riznica.ihjj.hr/en/ Croatian Language Corpus at the IHJJ]
 
==Czech==
 
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
 
==Danish==
 
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 
  
 
==Finnish==
 
==Finnish==

Revision as of 13:01, 26 April 2008

For languages other than English, see List of resources by language.

English


Link collections

Corpora tools


Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanskrit

Slovenian

Spanish

Swahili

Uncategorized