Difference between revisions of "Corpora, datasets, lexicons"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
+ | Local lists: | ||
+ | |||
* [[Corpora]] | * [[Corpora]] | ||
* [[Datasets]] | * [[Datasets]] | ||
* [[Lexicons]] | * [[Lexicons]] | ||
+ | External lists: | ||
+ | |||
+ | * [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists] | ||
== Corpora == | == Corpora == | ||
Line 31: | Line 36: | ||
* [http://www.corpusdelespanol.org/ Spanish Corpus] | * [http://www.corpusdelespanol.org/ Spanish Corpus] | ||
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs] | * [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs] | ||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 13:03, 2 November 2006
Local lists:
External lists:
Corpora
English
(alphabetical order)
- American National Corpus (ANC)
- Biomedical corpora
- British National Corpus (BNC)
- Brown Corpus
- Collins Wordbanks
- Gutenberg
- Oxford English Corpus
- WebCorp
Multilingual
(alphabetical order)
- Bank of Swedish
- Oslo Corpus of Bosnian
- Croatian National Corpus (HNK)
- Czech National Corpus (CNC)
- Hungarian National Corpus
- IPI PAN Corpus of Polish
- Portuguese Corpus
- Russian National Corpus (RNK)
- Slovak National Corpus (SNK)
- Slovenian Corpus FIDA and FIDA+
- Spanish Corpus
- Tanaka Corpus: Japanese-English sentence pairs