Difference between revisions of "Corpora, datasets, lexicons"

Revision as of 06:45, 2 November 2006

Miscellaneous

Resources

Corpora

English

(alphabetical order)

Multilingual

(alphabetical order)

Other lists of corpora

(alphabetical order)

David Lee's Bookmarks for Corpus-based Linguists

Datasets

Lexicons

WordNet - the original
- eXtended WordNet - glosses are syntactically parsed, transformed into logic forms, and content words are semantically disambiguated
- WordNet Domains - augmented with Domain Labels, such as POLITICS, ECONOMY, SPORT
- SentiWordNet - assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity

@@ Line 5: / Line 5: @@
 == Corpora ==
+=== English ===
+(alphabetical order)
 * [http://americannationalcorpus.org/ American National Corpus (ANC)]
 * [http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]
-* [http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian]
 * [http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]
 * [http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]
 * [http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]
+* [http://www.gutenberg.org/wiki/Main_Page Gutenberg]
+* [http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]
+* [http://www.webcorp.org.uk/guide/ WebCorp]
+=== Multilingual ===
+(alphabetical order)
+* [http://spraakbanken.gu.se/ Bank of Swedish]
+* [http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]
 * [http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]
 * [http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]
-* [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists]
-* [http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 * [http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]
 * [http://korpus.pl/ IPI PAN Corpus of Polish]
-* [http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]
 * [http://www.corpusdoportugues.org/ Portuguese Corpus]
 * [http://www.ruscorpora.ru/ Russian National Corpus (RNK)]
@@ Line 23: / Line 29: @@
 * [http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]
 * [http://www.corpusdelespanol.org/ Spanish Corpus]
-* [http://spraakbanken.gu.se/ Bank of Swedish]
 * [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]
-* [http://www.webcorp.org.uk/guide/ WebCorp]
+=== Other lists of corpora ===
+(alphabetical order)
+* [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists]
 == Datasets ==

Difference between revisions of "Corpora, datasets, lexicons"

Revision as of 06:45, 2 November 2006

Contents

Miscellaneous

Corpora

English

Multilingual

Other lists of corpora

Datasets

Lexicons

Navigation menu

Search