Difference between revisions of "Corpora for English"

@@ Line 137: / Line 137: @@
 <!-- Please keep this list in alphabetical order -->
+===Arabic===
+*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
+===Bosnian===
+*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
+===Bulgarian===
+*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
+===Czech===
+*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
+===Danish===
+*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
+===English===
 *[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
+*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
+*[http://thetis.bl.uk/ BNC Online Service]
+*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
+===Finnish===
+*[http://www.csc.fi/kielipankki/ Finnish text bank]
+===French===
+*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
+===German===
+*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
+*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
+===Haitian Creole===
+*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
+===Italian===
+*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
+===Japanese===
+*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
+===Polish===
+*[http://korpus.pl/en/ IPI PAN Polish Corpus]
+===Romanian===
+*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
+===Sanksrit===
+*[http://sanskritlibrary.org/ Sanskrit Library]
+===Slovenian===
+*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
+===Spanish===
+*[http://www.corpusdelespanol.org/ Corpus del Espanol]
+*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
+===Swahili===
+*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
+----
 *[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus]
-*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts]
 *[http://ixa.si.ehu.es/Ixa/resources/sensecorpus A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)]
 *[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
-*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
 *[http://www.aot.ru/search1.html AOT]
-*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
-*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
-*[http://thetis.bl.uk/ BNC Online Service]
-*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
-*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
-*[http://www.corpusdelespanol.org/ Corpus del Espanol]
-*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 *[http://pioneer.chula.ac.th/~awirote/ling/corpuslst.htm Corpus Resources (Chulalongkorn University, Thailand)]
 *[ftp://ftp.cs.cornell.edu/pub/smart/cran/ Cranfield collection]
 *[http://corpus.rae.es/creanet.html CREA]
-*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
-*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
 *[http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
 *[http://www.hum.uva.nl/~ewn EuroWordNet]
-*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
-*[http://www.csc.fi/kielipankki/ Finnish text bank]
-*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS]
 *[http://rali.iro.umontreal.ca/ Hansards Corpus - Searchable]
 *[http://www.hcrc.ed.ac.uk/maptask/ HCRC Map Task Corpus XML annotations]
-*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
 *[http://nats-www.informatik.uni-hamburg.de/~ingo/icopost/ ICOPOST]
 *[http://www.ims.uni-stuttgart.de/projekte/TC.html IMS Corpus Toolbox, Univ. of Stuttgart]
 *[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/ IMS Corpus Workbench (CWB)]
 *[http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm International Corpus of Learner English]
-*[http://korpus.pl/en/ IPI PAN Polish Corpus]
 *[http://www.ipds.uni-kiel.de/links/datenmaterial.en.html Kiel University's Institute on Phonetics and Speech Procesing]
 *[http://www.nilc.icmc.usp.br/lacioweb Lacio Web Corpora]
 *[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
-*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
 *[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 *[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
@@ Line 181: / Line 207: @@
 *[http://www.cs.cmu.edu/web/books.html On-line books at CMU]
 *[http://logos.uio.no/opus/ OPUS -- An Open Source Parallel Corpus]
-*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]
 *[http://elex.amu.edu.pl/~przemka/PICLE_search.php Polish subcorpus of the International Corpus of Learner English]
 *[http://www.cirp.es/WXN/wxn/frames/proxectos.html Ramon Piero Center for Research]
 *[http://about.reuters.com/researchandstandards/corpus/ Reuters Corpus]
-*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]
-*[http://sanskritlibrary.org/ Sanskrit Library]
-*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus]
 *[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio]
 *[http://www.ldc.upenn.edu/Catalog/LDC2001S99.html Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio]
@@ Line 197: / Line 219: @@
 *[http://nora.hd.uib.no/index-e.html The CORPORA DataCenter (Norway)]
 *[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
-*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
 *[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]

Difference between revisions of "Corpora for English"

Revision as of 18:31, 10 November 2006

Contents

English

German

Multilingual

Russian

Slovak

Italian

Link collections

Corpora tools

Uncategorized

Arabic

Bosnian

Bulgarian

Czech

Danish

English

Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanksrit

Slovenian

Spanish

Swahili

Navigation menu

Search