Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
m (→‎English: misclassified)
Line 30: Line 30:
 
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]
 
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]
 
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords]
 
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords]
*[http://www.lsi.upc.es/~nlp/tools/mapping.html Mapping WordNet Versions 1.6 and 2.0]
 
 
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]
 
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]

Revision as of 04:39, 2 March 2007


English

German

Multilingual

Russian

Slovak

Italian

Link collections

Corpora tools

Uncategorized

Arabic

Bosnian

Bulgarian

Czech

Danish

English

Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanskrit

Slovenian

Spanish

Swahili