Difference between revisions of "Knowledge collections and datasets (English)"

From ACL Wiki
Jump to navigation Jump to search
m
Line 1: Line 1:
 
Datasets for Computational Linguistics and Natural Language Processing.
 
Datasets for Computational Linguistics and Natural Language Processing.
 +
<!-- Please keep this list in alphabetical order -->
  
 
* [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]]
 
* [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]]
Line 8: Line 9:
 
* [[Noun compound repository|Noun Compound Repository]]
 
* [[Noun compound repository|Noun Compound Repository]]
 
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
 
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
 +
* [[SAT Analogy Questions]] - a way of evaluating algorithms for measuring relational similarity
 
* [[Spam filtering datasets]]
 
* [[Spam filtering datasets]]
 
* [[TEASE]] - Acquisition of Entailment Relations from the Web
 
* [[TEASE]] - Acquisition of Entailment Relations from the Web
 +
* [[TOEFL Synonym Questions]] - a way of evaluating algorithms for measuring degree of similarity between two words
 
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
 
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
 
* [[VerbOcean]] - verbs organized by semantic relation, including temporal precedence and strength
 
* [[VerbOcean]] - verbs organized by semantic relation, including temporal precedence and strength

Revision as of 07:00, 13 May 2007

Datasets for Computational Linguistics and Natural Language Processing.

Additional Dataset Collections