Difference between revisions of "Corpora, datasets, lexicons"

Revision as of 12:58, 2 November 2006

Corpora

English

(alphabetical order)

Multilingual

(alphabetical order)

Other lists of corpora

(alphabetical order)

@@ Line 1: / Line 1: @@
 * [[Corpora]]
 * [[Datasets]]
@@ Line 37: / Line 36: @@
 * [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists]
 * [[Resources]]
-== Datasets ==
-* [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
-* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)]
-* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database]
-* [http://www.cs.utexas.edu/~mfkb/nn/ Noun Compound Repository]
-* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
-* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
-* [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection]
-== Lexicons ==
-(alphabetical order)
-* [http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0: The Categorial Variation Database] - for example, the ''developing'' cluster: {''develop'' (V), ''developer'' (N), ''developed'' (AJ), ''developing'' (N), ''developing'' (AJ), ''development'' (N)}
-* [http://www.wjh.harvard.edu/%7Einquirer/spreadsheet_guide.htm General Inquirer]
-* [http://www.csse.monash.edu.au/~jwb/edict_doc.html JMdict: Japanese-Multilingual Dictionary file]
-* [http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html LCS Database: Lexical Conceptual Structures]
-* [http://www.dcs.shef.ac.uk/research/ilash/Moby/ Moby lexicon project]
-* [http://www.signiform.com/tt/htm/tt.htm ThoughtTreasure]
-=== WordNet and enhancements ===
-(alphabetical order)
-* [http://xwn.hlt.utdallas.edu/ eXtended WordNet] - glosses are syntactically parsed, transformed into logic forms, and content words are semantically disambiguated
-* [http://patty.isti.cnr.it/~esuli/software/SentiWordNet/ SentiWordNet] - assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity
-* [http://wordnet.princeton.edu/ WordNet] - the original
-* [http://tcc.itc.it/research/textec/topics/disambiguation/wordnetdomains.html WordNet Domains] - augmented with Domain Labels, such as POLITICS, ECONOMY, SPORT

Difference between revisions of "Corpora, datasets, lexicons"

Revision as of 12:58, 2 November 2006

Contents

Corpora

English

Multilingual

Other lists of corpora

Navigation menu

Search