Difference between revisions of "Corpora for English"
Jump to navigation
Jump to search
Line 236: | Line 236: | ||
[[Category:Corpora|*]] | [[Category:Corpora|*]] | ||
+ | 北京万达火车票预定中心 | ||
+ | |||
+ | [http://www.huochepiao168.cn 火车票] [http://www.huochepiao168.cn 订火车票] [http://www.huochepiao168.cn 北京火车票] [http://www.huochepiao168.cn 火车票预定] | ||
+ | [http://www.huochepiao168.cn 火车票预订] [http://www.huochepiao168.cn 火车票查询] | ||
+ | [http://www.huochepiao168.cn 北京火车票预定] [http://www.huochepiao168.cn 北京火车票查询] | ||
+ | [http://www.huochepiao168.cn 北京火车票预订] | ||
+ | [http://www.chepiao168.cn 火车票] [http://www.chepiao168.cn 订火车票] [http://www.chepiao168.cn 北京火车票] [http://www.chepiao168.cn 火车票预定] | ||
+ | [http://www.chepiao168.cn 火车票预订] [http://www.chepiao168.cn 火车票查询] | ||
+ | [http://www.chepiao168.cn 北京火车票预定] [http://www.chepiao168.cn 北京火车票查询] | ||
+ | [http://www.chepiao168.cn 北京火车票预订] | ||
+ | [http://www.shdzbc.net.cn 搬场] [http://www.shdzbc.net.cn 搬家] [http://www.shdzbc.net.cn 上海搬场] | ||
+ | [http://www.shdzbc.net.cn 上海搬场公司][http://www.shdzbc.net.cn 上海搬场] [http://www.shdzbc.net.cn 搬家公司] | ||
+ | [http://www.shdzbc.net.cn 上海搬家公司] [http://www.shdzbc.net.cn 上海搬家] | ||
+ | [http://www.hunqing666.com 婚庆] [http://www.hunqing666.com 婚庆公司] [http://www.hunqing666.com 婚庆网] | ||
+ | [http://www.digseo.net 搜索引擎优化] [http://www.digseo.net 网络营销] |
Revision as of 03:21, 8 January 2008
English
- American English SpeechDat-Car
- American National Corpus (ANC)
- AMERICAN NATIONAL CORPUS FIRST RELEASE
- Biomedical corpora
- BNCweb a web-based interface to the British National Corpus
- Bookmarks for Corpus-based Linguists
- British National Corpus (from Oxford University)
- British National Corpus (BNC)
- British National Corpus project page (from UCREL)
- Brown Corpus
- Collins Wordbanks
- Corpus of Spoken Professional English
- Dialogue Diversity Corpus
- Electronic Text Center -- University of Virginia
- English Intonation in the British Isles -The IViE Corpus
- English stop words (from SMART)
- English Verb Classes And Alternations: A Preliminary Investigation (Index)
- Exploring Words and Phrases from the British National Corpus
- GOV2 Corpus - 426 gigabytes of text
- Gutenberg
- Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia
- ICAME
- Large Text Compression Benchmark's 1G sample of Wikipedia
- List of English stopwords
- Movie Review Data
- Multi-Perspective Question Answering (MPQA)
- Multiword Expression Resources
- Oxford English Corpus
- Phrases in English
- Restricted English Corpus from Dr. Caroline Lyon for PhD
- Sketch Engine
- Susanne: Annotated American English Corpus
- The BNC Index (for the BNCWorld Edition)
- The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
- The Dialogue Diversity Corpus
- The LUCY Corpus - Documentation
- TRAINS Dialogue Corpus
- WebCorp
Galician
- Linguistic Corpus of the University of Vigo (CLUVI)
- Technical Corpus of Galician (CTG)
- Tesouro informatizado da lingua galega (TILG)
German
Multilingual
- ACQUIS COMMUNAUTAIRE Multilingual Corpus
- Bank of Swedish
- CLUVI Corpus (Galician-English-Spanish-French parallel corpus)
- Croatian National Corpus (HNK)
- Czech National Corpus (CNC)
- CELEX - The Dutch Center for Lexical Information
- Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS
- COMPARA corpus
- Debian free software community
- EMILLE corpus
- European Parliament Proceedings Parallel Corpus 1996-2003
- EuroWordNet
- French Foreign Ministry's magazine
- GlossaNet
- Haitian Creole corpus -Teknoloji pou lang kreyol
- Hungarian National Corpus
- Hansard French-English parallel corpus
- ICE corpora
- IPI PAN Corpus of Polish
- Learner Behaviour on the Internet
- MuchMore Springer Bilingual Corpus
- MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages
- Multilingual Corpora: Available Resources
- Tanaka Corpus: Japanese-English sentence pairs
- MultiSemCor
- Newspapers on the Internet
- OPUS - an open source parallel corpus
- Oslo Corpus of Bosnian
- PolyU Language Bank
- Portuguese Corpus
- Public registry of the Council of the EU
- Russian National Corpus (RNK)
- The Bible as a Resource for Translation Software
- The ECI Multilingual corpus
- Slovenian Corpus FIDA and FIDA+
- Spanish Corpus
- UN declaration of human rights in multiple languages
- UNITEX
- Useful links about parallel corpora, by Olivier Kraif
- WaCky Project
- Wortlisten: spoken German, English, French, and Dutch
Russian
- Bokr Russian Reference Corpus
- HANCO: The Helsinki annotated corpus of Russian texts
- Russian Corpora
- Russian Corpora
- Russian Corpus Site
- The Russian National Corpus
- Russian Newspaper Corpus
- Russicon Resources
Slovak
Italian
- LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP
- ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto
- Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)
- Tesoro della lingua italiana delle origini (TLIO)
Link collections
- Collections of texts and corpora
- Manuel Barbera: General Corpora and Corpus Linguistics Resources
- Isabella Chiari: Corpora, Software and Linguistic resources
- Annotated list of resources on statistical NLP and corpus-based CL
Corpora tools
- List of stop words
- Poliqarp - open source XML-aware indexer, search engine and concordancer
- The Sketch Engine
- Treebank tokenization scheme
Uncategorized
Arabic
Bosnian
Bulgarian
Croatian
Czech
Danish
English
- 1963 Time Magazine corpus
- An Empirical Grammar of the English Verb System
- BNC Online Service
- BRITISH NATIONAL CORPUS - WORLD EDITION
Finnish
French
German
- A Syntactically Annotated Corpus of German Newspaper Texts
- Experimental Corpus Query System (University of Stuttgart, Germany)
Haitian Creole
Italian
Japanese
Polish
Romanian
Sanskrit
Slovenian
Spanish
Swahili
- 2000 NIST Speaker Recognition Evaluation Corpus
- A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)
- Alpino Treebank
- AOT
- Corpus Resources (Chulalongkorn University, Thailand)
- Cranfield collection
- CREA
- Edinburgh Associative Thesaurus (EAT)
- EuroWordNet
- Hansards Corpus - Searchable
- HCRC Map Task Corpus XML annotations
- ICOPOST
- IMS Corpus Toolbox, Univ. of Stuttgart
- IMS Corpus Workbench (CWB)
- International Corpus of Learner English
- Kiel University's Institute on Phonetics and Speech Procesing
- Lacio Web Corpora
- LANGUAGE LEARNING CENTER - ACADEMIC CORPUS
- Manuel Barbera: General Corpora and Corpus Linguistics Resources
- Medlars collection
- Miscellaneous Word Lists from Oxford University
- Multilingual Text Tools and Corpora
- Name lists from US census
- Nexing Corpus
- On-line books at CMU
- OPUS -- An Open Source Parallel Corpus
- Polish subcorpus of the International Corpus of Learner English
- Ramon Piero Center for Research
- Reuters Corpus
- Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
- Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio
- Survey of Electronic Corpora (by Jane A. Edwards, file at CMU)
- Survey of English Usage, University College, London
- Switchboard Transcription Project
- TELRI Research Archive of Computational Tools and Resources
- The Childes Corpus - Children's language
- The CORPORA DataCenter (Norway)
- The Moby Corpus
- The Sofie Treebank - A Parallel Treebank of North European Languages
北京万达火车票预定中心
火车票 订火车票 北京火车票 火车票预定 火车票预订 火车票查询 北京火车票预定 北京火车票查询 北京火车票预订 火车票 订火车票 北京火车票 火车票预定 火车票预订 火车票查询 北京火车票预定 北京火车票查询 北京火车票预订 搬场 搬家 上海搬场 上海搬场公司上海搬场 搬家公司 上海搬家公司 上海搬家 婚庆 婚庆公司 婚庆网 搜索引擎优化 网络营销