Corpora (English)

From ACLWiki
(Difference between revisions)
Jump to: navigation, search
Line 66: Line 66:
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
 
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
  
==Arabic==
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
==Bosnian==
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 
==Bulgarian==
 
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 
==Croatian==
 
*[http://riznica.ihjj.hr/en/ Croatian Language Corpus at the IHJJ]
 
==Czech==
 
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
 
==Danish==
 
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 
  
 
==Finnish==
 
==Finnish==

Revision as of 16:01, 26 April 2008

For languages other than English, see List of resources by language.

Contents

English


Link collections

Corpora tools


Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanskrit

Slovenian

Spanish

Swahili

Uncategorized

Personal tools