Corpora for English

From ACL Wiki

Revision as of 20:29, 24 April 2008 by Pdturney (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

For languages other than English, see List of resources by language.

Please help us move non-English items below into the List of resources by language.

English

Link collections

Corpora tools

List of stop words
Poliqarp - open source XML-aware indexer, search engine and concordancer
The Sketch Engine
Treebank tokenization scheme

Arabic

Arabic Newswire Part 1

Bosnian

The Oslo Corpus of Bosnian Texts

Bulgarian

Corpus of spoken Bulgarian

Croatian

Croatian Language Corpus at the IHJJ

Czech

Czech National Corpus

Danish

Danish news corpus

Finnish

Finnish text bank

French

Base Textuelle de Moyen Francais

German

Haitian Creole

HAITIAN CREOLE ELECTRONIC TEXTS

Italian

Oxford Text Archive Corpus of Italian Newspapers

Japanese

list of Japanese transitive - intransitive verb pairs

Polish

IPI PAN Polish Corpus

Romanian

Romanian NLP

Sanskrit

Sanskrit Library

Slovenian

Slovene-English Parallel Corpus

Spanish

Swahili

Helsinki Corpus of Swahili (HCS)

Uncategorized

Retrieved from "https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&oldid=5074"

Corpora