Search results

Jump to navigation Jump to search
  • * [http://www.isv.cbs.dk/~mbk/treebank/ PAROLE Corpus (SGML format)] (GPL) * [http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
    1 KB (174 words) - 09:38, 26 May 2014
  • * [http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
    151 bytes (21 words) - 14:35, 26 April 2008
  • ...ww.panl10n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation] (A-NC-SA 3.0 licence) ...n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank] (A-NC-SA 3.0 licence)
    1 KB (174 words) - 23:28, 14 November 2018
  • ==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.
    2 KB (295 words) - 09:18, 30 June 2014
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)
    1 KB (169 words) - 05:38, 29 June 2020
  • * '''Name of Dataset:''' MSF2 Corpus * '''Citation:''' If you use the MSF2 corpus in your research, please include the following citation in any resulting pa
    2 KB (224 words) - 05:01, 4 May 2020
  • * [http://sli.uvigo.es/CTG/ Technical Corpus of Galician (CTG)] * [http://sli.uvigo.es/CTAG/ POS-tagged Technical Corpus of Galician (CTAG)]
    2 KB (308 words) - 11:21, 4 August 2014
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Polonicum], Gigaword Polish web corpus * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    3 KB (459 words) - 13:22, 8 March 2015
  • *[http://devoted.to/corpora Corpus-based Linguists (site maintained by David Lee)] ...tanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources]
    2 KB (305 words) - 00:23, 13 February 2007
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    1 KB (131 words) - 09:50, 26 May 2014
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Italicum], Gigaword Italian web corpus * [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]
    3 KB (456 words) - 15:24, 15 March 2019
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    291 bytes (32 words) - 13:06, 25 March 2010
  • * [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] ...//ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus
    3 KB (389 words) - 08:57, 17 June 2015
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    398 bytes (50 words) - 09:42, 26 May 2014
  • * '''Citation:''' If you use this corpus in your research, please include the following citation in any resulting pa * '''Description:''' 1,443 compound nouns extracted from the British National Corpus and annotated with semantic relations. For more information and pointers to
    1 KB (151 words) - 09:22, 24 October 2012
  • ...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    319 bytes (34 words) - 07:02, 20 September 2007
  • | Corpus-based, predictive | Corpus-based, distributional
    5 KB (590 words) - 02:05, 6 September 2020
  • ...s/compare_contexts_NMRs.pdf Learning noun-modifier semantic relations with corpus-based and Wordnet-based features]. In ''Proceedings of the 21st National Co ...ter D. and Michael L. Littman. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'',
    2 KB (197 words) - 17:57, 3 January 2007
  • The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft ...s] and [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.
    2 KB (353 words) - 14:09, 6 August 2020
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus. Currently 14M tokens.
    2 KB (300 words) - 04:38, 29 June 2020

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)