Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
m
Line 97: Line 97:
 
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]
 
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]
 
*[http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]
 
*[http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]
 +
*[http://bokrcorpora.narod.ru Bokr Russian Reference Corpus]
  
 
==Slovak==
 
==Slovak==
Line 109: Line 110:
 
*[http://corpus.cilta.unibo.it:8080/coris_ita.html Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)]
 
*[http://corpus.cilta.unibo.it:8080/coris_ita.html Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)]
 
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
 
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
 +
 +
==Link Collections==
 +
 +
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 +
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 +
*[http://www.alphabit.net Isabella Chiari: Corpora, Software and Linguistic resources]
 +
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
 +
 +
==Corpora Tools==
 +
 +
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
 +
*[http://www.sketchengine.co.uk/ The Sketch Engine]
 +
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
  
 
==Uncategorized==
 
==Uncategorized==
Line 118: Line 132:
 
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
 
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
 
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
 
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
 
 
*[http://www.aot.ru/search1.html AOT]
 
*[http://www.aot.ru/search1.html AOT]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
 
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
*[http://thetis.bl.uk/ BNC Online Service]
 
*[http://thetis.bl.uk/ BNC Online Service]
*[http://bokrcorpora.narod.ru Bokr Russian Reference Corpus]
 
 
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
 
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
 
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
 
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
 
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
 
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
Line 151: Line 162:
 
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
 
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
 
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
 
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
 
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
Line 177: Line 187:
 
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
 
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
*[http://www.sketchengine.co.uk/ The Sketch Engine]
 
 
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]
 
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
 

Revision as of 05:38, 3 November 2006

This list needs some cleaning. Please help.

English

German

Multilingual

Russian

Slovak


Italian

Link Collections

Corpora Tools

Uncategorized