Difference between revisions of "Corpora for English"
Jump to navigation
Jump to search
Line 78: | Line 78: | ||
==English corpora== | ==English corpora== | ||
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car] | *[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car] | ||
+ | *[http://americannationalcorpus.org/ American National Corpus (ANC)] | ||
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE] | *[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE] | ||
+ | *[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora] | ||
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus] | *[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus] | ||
*[http://devoted.to/corpora Bookmarks for Corpus-based Linguists] | *[http://devoted.to/corpora Bookmarks for Corpus-based Linguists] | ||
*[http://info.ox.ac.uk/bnc/ British National Corpus (from Oxford University)] | *[http://info.ox.ac.uk/bnc/ British National Corpus (from Oxford University)] | ||
+ | *[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)] | ||
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)] | *[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)] | ||
+ | *[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus] | ||
+ | *[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks] | ||
*[http://www.athel.com/corpdes.html Corpus of Spoken Professional English] | *[http://www.athel.com/corpdes.html Corpus of Spoken Professional English] | ||
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus] | *[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus] | ||
Line 90: | Line 95: | ||
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)] | *[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)] | ||
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus] | *[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus] | ||
+ | *[http://www.gutenberg.org/wiki/Main_Page Gutenberg] | ||
*[http://nora.hd.uib.no/icame.html ICAME] | *[http://nora.hd.uib.no/icame.html ICAME] | ||
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords] | *[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords] | ||
Line 95: | Line 101: | ||
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data] | *[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data] | ||
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources] | *[http://mwe.stanford.edu/resources/ Multiword Expression Resources] | ||
+ | *[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus] | ||
*[http://pie.usna.edu/ Phrases in English] | *[http://pie.usna.edu/ Phrases in English] | ||
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD] | *[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD] | ||
Line 104: | Line 111: | ||
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation] | *[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation] | ||
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus] | *[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus] | ||
+ | *[http://www.webcorp.org.uk/guide/ WebCorp] | ||
==German corpora== | ==German corpora== |
Revision as of 13:06, 2 November 2006
- 1963 Time Magazine corpus
- 2000 NIST Speaker Recognition Evaluation Corpus
- A Syntactically Annotated Corpus of German Newspaper Texts
- A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)
- Alpino Treebank
- An Empirical Grammar of the English Verb System
- Annotated list of resources on statistical NLP and corpus-based CL
- AOT
- Arabic Newswire Part 1
- Base Textuelle de Moyen Francais
- BNC Online Service
- Bokr Russian Reference Corpus
- BRITISH NATIONAL CORPUS - WORLD EDITION
- Collections of texts and corpora
- Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular
- Corpus del Espanol
- Corpus of spoken Bulgarian
- Corpus Resources (Chulalongkorn University, Thailand)
- Cranfield collection
- CREA
- Czech National Corpus
- Danish news corpus
- Edinburgh Associative Thesaurus (EAT)
- EuroWordNet
- Experimental Corpus Query System (University of Stuttgart, Germany)
- Finnish text bank
- HAITIAN CREOLE ELECTRONIC TEXTS
- Hansards Corpus - Searchable
- HCRC Map Task Corpus XML annotations
- Helsinki Corpus of Swahili (HCS)
- ICOPOST
- IMS Corpus Toolbox, Univ. of Stuttgart
- IMS Corpus Workbench (CWB)
- International Corpus of Learner English
- IPI PAN Polish Corpus
- Kiel University's Institute on Phonetics and Speech Procesing
- Lacio Web Corpora
- LANGUAGE LEARNING CENTER - ACADEMIC CORPUS
- list of Japanese transitive - intransitive verb pairs
- List of stop words
- Manuel Barbera: General Corpora and Corpus Linguistics Resources
- Medlars collection
- Miscellaneous Word Lists from Oxford University
- Multilingual Text Tools and Corpora
- Name lists from US census
- Nexing Corpus
- On-line books at CMU
- OPUS -- An Open Source Parallel Corpus
- Oxford Text Archive Corpus of Italian Newspapers
- Polish subcorpus of the International Corpus of Learner English
- Ramon Piero Center for Research
- Reuters Corpus
- Romanian NLP
- Russian Corpora
- Russian Corpora
- Russian Corpus Page
- Russian Corpus Site
- Russian Corpus Site
- Russian Newspaper Corpus
- Russicon Resources
- Sanskrit Library
- Slovak National Corpus
- Slovene-English Parallel Corpus
- Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
- Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio
- Survey of Electronic Corpora (by Jane A. Edwards, file at CMU)
- Survey of English Usage, University College, London
- Switchboard Transcription Project
- TELRI Research Archive of Computational Tools and Resources
- The Childes Corpus - Children's language
- The CORPORA DataCenter (Norway)
- The Moby Corpus
- The Oslo Corpus of Bosnian Texts
- The Sketch Engine
- The Sofie Treebank - A Parallel Treebank of North European Languages
- Treebank tokenization scheme
English corpora
- American English SpeechDat-Car
- American National Corpus (ANC)
- AMERICAN NATIONAL CORPUS FIRST RELEASE
- Biomedical corpora
- BNCweb a web-based interface to the British National Corpus
- Bookmarks for Corpus-based Linguists
- British National Corpus (from Oxford University)
- British National Corpus (BNC)
- British National Corpus project page (from UCREL)
- Brown Corpus
- Collins Wordbanks
- Corpus of Spoken Professional English
- Dialogue Diversity Corpus
- Electronic Text Center -- University of Virginia
- English Intonation in the British Isles -The IViE Corpus
- English stop words (from SMART)
- English Verb Classes And Alternations: A Preliminary Investigation (Index)
- Exploring Words and Phrases from the British National Corpus
- Gutenberg
- ICAME
- List of English stopwords
- Mapping WordNet Versions 1.6 and 2.0
- Movie Review Data
- Multiword Expression Resources
- Oxford English Corpus
- Phrases in English
- Restricted English Corpus from Dr. Caroline Lyon for PhD
- Sketch Engine
- Susanne: Annotated American English Corpus
- The BNC Index (for the BNCWorld Edition)
- The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
- The Dialogue Diversity Corpus
- The LUCY Corpus - Documentation
- TRAINS Dialogue Corpus
- WebCorp
German corpora
Multilingual corpora
- ACQUIS COMMUNAUTAIRE Multilingual Corpus
- CELEX - The Dutch Center for Lexical Information
- Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS
- COMPARA corpus
- Debian free software community
- EMILLE corpus
- European Parliament Proceedings Parallel Corpus 1996-2003
- EuroWordNet
- French Foreign Ministry's magazine
- GlossaNet
- Haitian Creole corpus -Teknoloji pou lang kreyol
- Hansard French-English parallel corpus
- ICE corpora
- Learner Behaviour on the Internet
- MuchMore Springer Bilingual Corpus
- MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages
- Multilingual Corpora: Available Resources
- MultiSemCor
- Newspapers on the Internet
- OPUS - an open source parallel corpus
- PolyU Language Bank
- Public registry of the Council of the EU
- The Bible as a Resource for Translation Software
- The ECI Multilingual corpus
- UN declaration of human rights in multiple languages
- UNITEX
- Useful links about parallel corpora, by Olivier Kraif
- WaCky Project
- Wortlisten: spoken German, English, French, and Dutch
- Wortlisten: spoken German, English, French, and Dutch