Difference between revisions of "Corpora, datasets, lexicons"

From ACL Wiki
Jump to navigation Jump to search
Line 47: Line 47:
 
== Lexicons ==
 
== Lexicons ==
 
(alphabetical order)
 
(alphabetical order)
* [http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0: The Categorial Variation Database]
+
* [http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0: The Categorial Variation Database] - for example, the ''developing'' cluster: {''develop'' (V), ''developer'' (N), ''developed'' (AJ), ''developing'' (N), ''developing'' (AJ), ''development'' (N)}
 
* [http://www.wjh.harvard.edu/%7Einquirer/spreadsheet_guide.htm General Inquirer]
 
* [http://www.wjh.harvard.edu/%7Einquirer/spreadsheet_guide.htm General Inquirer]
 
* [http://www.csse.monash.edu.au/~jwb/edict_doc.html JMdict: Japanese-Multilingual Dictionary file]
 
* [http://www.csse.monash.edu.au/~jwb/edict_doc.html JMdict: Japanese-Multilingual Dictionary file]

Revision as of 06:50, 2 November 2006

Miscellaneous

Corpora

English

(alphabetical order)

Multilingual

(alphabetical order)

Other lists of corpora

(alphabetical order)

Datasets

Lexicons

(alphabetical order)

WordNet and enhancements

(alphabetical order)

  • eXtended WordNet - glosses are syntactically parsed, transformed into logic forms, and content words are semantically disambiguated
  • SentiWordNet - assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity
  • WordNet - the original
  • WordNet Domains - augmented with Domain Labels, such as POLITICS, ECONOMY, SPORT