Difference between revisions of "Knowledge collections and datasets (English)"
Jump to navigation
Jump to search
Line 6: | Line 6: | ||
* [http://framenet.icsi.berkeley.edu/ FrameNet] | * [http://framenet.icsi.berkeley.edu/ FrameNet] | ||
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | * [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | ||
− | * [ | + | * [[Noun compound repository|Noun Compound Repository]] |
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | * [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | ||
* [[Spam filtering datasets]] | * [[Spam filtering datasets]] |
Revision as of 12:46, 3 January 2007
Datasets for Computational Linguistics and Natural Language Processing.
- Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
- DIRT Paraphrase Collection - Discovery of Inference Rules from Text
- Edinburgh Associative Thesaurus (EAT)
- FrameNet
- MRC Psycholinguistic Database
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- Spam filtering datasets
- TEASE - Acquisition of Entailment Relations from the Web
- University of South Florida Free Association Norms
- VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
- WordNet
- WordSimilarity-353 Test Collection