Difference between revisions of "Knowledge collections and datasets (English)"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | + | Knowledge collections and datasets for Computational Linguistics and Natural Language Processing. | |
+ | |||
+ | For languages other than English, see [[List of resources by language]]. | ||
+ | |||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
− | |||
* [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]] | * [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]] | ||
* [[DIRT Paraphrase Collection]] - Discovery of Inference Rules from Text | * [[DIRT Paraphrase Collection]] - Discovery of Inference Rules from Text |
Revision as of 11:30, 24 April 2008
Knowledge collections and datasets for Computational Linguistics and Natural Language Processing.
For languages other than English, see List of resources by language.
- Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
- DIRT Paraphrase Collection - Discovery of Inference Rules from Text
- Edinburgh Associative Thesaurus (EAT)
- FrameNet
- MRC Psycholinguistic Database
- Preposition Project
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- SAT Analogy Questions - a way of evaluating algorithms for measuring relational similarity
- Spam filtering datasets
- TEASE - Acquisition of Entailment Relations from the Web
- TOEFL Synonym Questions - a way of evaluating algorithms for measuring degree of similarity between two words
- University of South Florida Free Association Norms
- VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
- WordNet
- WordSimilarity-353 Test Collection