Difference between revisions of "Corpora, datasets, lexicons"
Jump to navigation
Jump to search
Line 14: | Line 14: | ||
* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)] | * [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)] | ||
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | * [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | ||
+ | * [http://www.cs.utexas.edu/~mfkb/nn/ Noun Compound Repository] | ||
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | * [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | ||
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms] | * [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms] |
Revision as of 17:02, 19 October 2006
Corpora
- American National Corpus (ANC)
- British National Corpus (BNC)
- Brown Corpus
- Collins Wordbanks
- David Lee's Bookmarks for Corpus-based Linguists
- Gutenberg
- Oxford English Corpus
- WebCorp
Datasets
- Linguistic Data Consortium (LDC)
- MRC Psycholinguistic Database
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- University of South Florida Free Association Norms
- WordSimilarity-353 Test Collection