Knowledge collections and datasets (English)

From ACL Wiki

Revision as of 14:42, 7 February 2009 by ChristianPietsch (talk | contribs) (+NLG:Data sets)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Knowledge collections and datasets for Computational Linguistics and Natural Language Processing.

For languages other than English, see List of resources by language.

Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
DIRT Paraphrase Collection - Discovery of Inference Rules from Text
Edinburgh Associative Thesaurus (EAT)
FrameNet
MRC Psycholinguistic Database
Preposition Project
Noun Compound Repository
Reuters-21578 Text Categorization Collection
SAT Analogy Questions - a way of evaluating algorithms for measuring relational similarity
Spam filtering datasets
TEASE - Acquisition of Entailment Relations from the Web
TOEFL Synonym Questions - a way of evaluating algorithms for measuring degree of similarity between 2 words
University of South Florida Free Association Norms
VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
WordNet
WordSimilarity-353 Test Collection

See also NLG:Data sets for a collection of data sets used for building natural language generation systems.

Additional Dataset Collections

Linguistic Data Consortium (LDC)

Retrieved from "https://aclweb.org/aclwiki/index.php?title=Knowledge_collections_and_datasets_(English)&oldid=6228"

Knowledge Collections and Datasets