Difference between revisions of "Corpora (English)"

From ACL Wiki
Jump to: navigation, search
(+MultiUN corpora)
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
For languages other than English, see [[List of resources by language]].
 
For languages other than English, see [[List of resources by language]].
 
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
Line 14: Line 13:
 
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]
 
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]
 
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]
 
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]
 +
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]
 +
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]
 
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]
 
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]
 
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]
 
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]
Line 24: Line 25:
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 +
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
Line 42: Line 44:
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 +
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]
 +
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
 +
*[http://wacky.sslmit.unibo.it/ WaCky]
 
*[http://www.webcorp.org.uk/guide/ WebCorp]
 
*[http://www.webcorp.org.uk/guide/ WebCorp]
 +
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
  
  
Line 50: Line 57:
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
*[http://www.alphabit.net Isabella Chiari: Corpora, Software and Linguistic resources]
 
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
  

Revision as of 15:40, 10 December 2013

For languages other than English, see List of resources by language.


Link collections

Corpora tools