Difference between revisions of "Knowledge collections and datasets (English)"

Revision as of 06:00, 13 May 2007

Datasets for Computational Linguistics and Natural Language Processing.

Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
DIRT Paraphrase Collection - Discovery of Inference Rules from Text
Edinburgh Associative Thesaurus (EAT)
FrameNet
MRC Psycholinguistic Database
Noun Compound Repository
Reuters-21578 Text Categorization Collection
SAT Analogy Questions - a way of evaluating algorithms for measuring relational similarity
Spam filtering datasets
TEASE - Acquisition of Entailment Relations from the Web
TOEFL Synonym Questions - a way of evaluating algorithms for measuring degree of similarity between two words
University of South Florida Free Association Norms
VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
WordNet
WordSimilarity-353 Test Collection

@@ Line 1: / Line 1: @@
 Datasets for Computational Linguistics and Natural Language Processing.
+<!-- Please keep this list in alphabetical order -->
 * [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]]
@@ Line 8: / Line 9: @@
 * [[Noun compound repository|Noun Compound Repository]]
 * [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
+* [[SAT Analogy Questions]] - a way of evaluating algorithms for measuring relational similarity
 * [[Spam filtering datasets]]
 * [[TEASE]] - Acquisition of Entailment Relations from the Web
+* [[TOEFL Synonym Questions]] - a way of evaluating algorithms for measuring degree of similarity between two words
 * [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
 * [[VerbOcean]] - verbs organized by semantic relation, including temporal precedence and strength