Difference between revisions of "Corpora (English)"

From ACL Wiki
Jump to: navigation, search
(+WMT corpora)
(Added: Araneum)
 
(3 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
*[http://americannationalcorpus.org/ American National Corpus (ANC)]
 
*[http://americannationalcorpus.org/ American National Corpus (ANC)]
 
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE]
 
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE]
 +
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus
 +
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus
 
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]
 
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]
 
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]
 
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]
Line 27: Line 29:
 
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 +
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://nora.hd.uib.no/icame.html ICAME]
 
*[http://nora.hd.uib.no/icame.html ICAME]
Line 44: Line 47:
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 +
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]
 +
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
 
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
 
*[http://wacky.sslmit.unibo.it/ WaCky]
 
*[http://wacky.sslmit.unibo.it/ WaCky]

Latest revision as of 13:33, 8 March 2015

For languages other than English, see List of resources by language.


Link collections

Corpora tools