Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
m (Jonsafari moved page Corpora (English) to Corpora for English: align with other related articles)
m (Move *[http://www.grsampson.net/RSue.html SUSANNE Analytic Scheme] from Uncategorized resource to Resources for English, Corpora for English, Free and Downloadable)
 
(3 intermediate revisions by 2 users not shown)
Line 8: Line 8:
 
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]
 
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]
 
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 +
*[https://corpling.uis.georgetown.edu/gum/ GUM - Georgetown University Multilayer corpus], multiple parses, coreference, entities, sentence types and RST
 
*[https://www.gutenberg.org Project Gutenberg]
 
*[https://www.gutenberg.org Project Gutenberg]
 +
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm International Corpus of English]
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
Line 15: Line 17:
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]
 
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]
 
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]
 +
*[http://www.grsampson.net/RSue.html SUSANNE Analytic Scheme]
 
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]
 
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
Line 66: Line 69:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 +
*[http://corpus-tools.org/annis/ ANNIS] - open source search tool for complex multilayer corpora
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
 
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer
 
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer

Latest revision as of 17:58, 2 September 2019

For languages other than English, see List of resources by language.

Free and Downloadable

Proprietary or Require Prior Permission


Link collections

Corpora tools