Search results

Jump to navigation Jump to search
  • * [http://www.isv.cbs.dk/~mbk/treebank/ PAROLE Corpus (SGML format)] (GPL) * [http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
    1 KB (174 words) - 09:38, 26 May 2014
  • * [http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
    151 bytes (21 words) - 14:35, 26 April 2008
  • ...ww.panl10n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation] (A-NC-SA 3.0 licence) ...n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank] (A-NC-SA 3.0 licence)
    1 KB (174 words) - 23:28, 14 November 2018
  • ==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.
    2 KB (295 words) - 09:18, 30 June 2014
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)
    1 KB (169 words) - 05:38, 29 June 2020
  • * '''Name of Dataset:''' MSF2 Corpus * '''Citation:''' If you use the MSF2 corpus in your research, please include the following citation in any resulting pa
    2 KB (224 words) - 05:01, 4 May 2020
  • * [http://sli.uvigo.es/CTG/ Technical Corpus of Galician (CTG)] * [http://sli.uvigo.es/CTAG/ POS-tagged Technical Corpus of Galician (CTAG)]
    2 KB (308 words) - 11:21, 4 August 2014
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Polonicum], Gigaword Polish web corpus * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    3 KB (459 words) - 13:22, 8 March 2015
  • *[http://devoted.to/corpora Corpus-based Linguists (site maintained by David Lee)] ...tanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources]
    2 KB (305 words) - 00:23, 13 February 2007
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    1 KB (131 words) - 09:50, 26 May 2014
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Italicum], Gigaword Italian web corpus * [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]
    3 KB (456 words) - 15:24, 15 March 2019
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    291 bytes (32 words) - 13:06, 25 March 2010
  • * [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] ...//ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus
    3 KB (389 words) - 08:57, 17 June 2015
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    398 bytes (50 words) - 09:42, 26 May 2014
  • * '''Citation:''' If you use this corpus in your research, please include the following citation in any resulting pa * '''Description:''' 1,443 compound nouns extracted from the British National Corpus and annotated with semantic relations. For more information and pointers to
    1 KB (151 words) - 09:22, 24 October 2012
  • ...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    319 bytes (34 words) - 07:02, 20 September 2007
  • | Corpus-based, predictive | Corpus-based, distributional
    5 KB (590 words) - 02:05, 6 September 2020
  • ...s/compare_contexts_NMRs.pdf Learning noun-modifier semantic relations with corpus-based and Wordnet-based features]. In ''Proceedings of the 21st National Co ...ter D. and Michael L. Littman. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'',
    2 KB (197 words) - 17:57, 3 January 2007
  • The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft ...s] and [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.
    2 KB (353 words) - 14:09, 6 August 2020
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus. Currently 14M tokens.
    2 KB (300 words) - 04:38, 29 June 2020
  • ...ed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) ...ttp://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus
    4 KB (575 words) - 02:10, 26 August 2016
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    425 bytes (49 words) - 21:18, 16 December 2015
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, ...skiTaggingSiKDD2005.pdf Learning PoS tagging from a tagged Macedonian text corpus]". ''Proceedings of SiKDD 2005 (Conference on Data Mining and Data Warehous
    2 KB (195 words) - 17:04, 7 October 2010
  • ...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    323 bytes (34 words) - 07:40, 8 January 2008
  • * [http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
    548 bytes (72 words) - 08:56, 17 June 2015
  • * '''Recall:''' percentage of named entities defined in the corpus that were found by the program * '''Training data:''' Train split of CONLL-2003 corpus
    3 KB (378 words) - 07:29, 12 July 2019
  • ...es/inlg2006specialsession/INLG-0626.pdf Evaluations of NLG Systems: Common Corpus and Tasks or Common Dimensions and Metrics?] ...s/inlg2006specialsession/INLG-0627.pdf Building a Semantically Transparent Corpus for the Generation of Referring Expressions.]
    3 KB (361 words) - 05:44, 8 February 2009
  • * [http://www.ninjal.ac.jp/english/products/bccwj/ Balanced Corpus of Contemporary Written Japanese (BCCWJ)] (subset is web searchable at Koto * [http://www.edrdg.org/projects/tanaka/tanakacorpus.html Tanaka Corpus] by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
    4 KB (558 words) - 20:40, 11 October 2017
  • *[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain) *[https://www.clarin.si/repository/xmlui/handle/11356/1042 Orwell's 1984 Corpus in MULTEXT-EAST] (public domain)
    5 KB (619 words) - 09:58, 23 February 2016
  • ...n of datasets that contains spam messages, and ham messages from the Enron corpus. See [http://www.aueb.gr/users/ion/docs/ceas2006_paper.pdf this article] fo
    814 bytes (135 words) - 09:07, 19 November 2006
  • ==Kannada POS tagger, Morph analyzer, Corpus== [http://sivareddy.in/downloads Download]. [http://corpus.leeds.ac.uk/tools/ Alternate source]
    751 bytes (101 words) - 03:43, 24 November 2011
  • ...coverage parser for the English language. An evaluation with the [[SUSANNE corpus]] shows that MINIPAR achieves about 88% precision and 80% recall with respe
    737 bytes (99 words) - 11:58, 17 November 2006
  • * '''Citation:''' If you use the TempEval-3 Platinum corpus in your research, please include the following citation in any resulting pa ...and temporal relations by multiple experts and an adjudicator. This is the corpus used to rank participant systems in the TempEval-3 evaluation exercise. Ann
    2 KB (250 words) - 10:44, 23 April 2013
  • ...pusa.net/XXmendea/Konts_arrunta_fr.html XX century's Basque corpus] Basque corpus XX century * [http://www.ztcorpusa.net ZT corpus] Basque Corpus of Science and Technology
    5 KB (728 words) - 09:35, 26 May 2014
  • * July 15, 2011 Completion of corpus selection [TBC]
    622 bytes (71 words) - 10:26, 6 April 2011
  • '''Title:''' ''SeedLing: Building and Using a Seed corpus for the Human Language Project''<br> '''Note:''' Plaintext corpus for >1000 languages with python API<br>
    3 KB (403 words) - 07:46, 29 June 2014
  • * [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory] * [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory]
    7 KB (866 words) - 14:12, 11 November 2018
  • ....is/icelandic_treebank/Download IcePaHC] - the Icelandic Parsed Historical Corpus. 440000 words (12th-19th century texts, phrase structure + PoS + lemma anno
    885 bytes (102 words) - 01:09, 15 April 2011
  • * [http://optima.jrc.it/Acquis/ JRC-Acquis] parallel corpus, 20926909 words, Maltese sentence-aligned with 22 other languages. Public d
    730 bytes (100 words) - 15:40, 20 June 2011
  • ...ttp://www.statmt.org/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    631 bytes (63 words) - 16:59, 7 October 2010
  • | We parsed this corpus using Minipar, extracted subject-predicate-object triples from the results,
    862 bytes (99 words) - 06:28, 22 December 2009
  • | Corpus-based | Corpus-based
    5 KB (687 words) - 11:23, 28 June 2015
  • ! Corpus, window size, vector size | 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions
    4 KB (521 words) - 15:14, 25 January 2017
  • | corpus-based | corpus-based
    2 KB (276 words) - 12:42, 28 June 2015
  • ...nym dictionaries: as acronym dictionary constructed automatically from the corpus and a synonym dictionary that contains geographical terms.
    935 bytes (108 words) - 06:23, 27 September 2011
  • ...y viable given recent advances in NLP and machine learning technology, and corpus availability.
    923 bytes (128 words) - 05:15, 25 June 2012
  • ...ims.uni-stuttgart.de/projekte/TIGER/ Linguistic Interpretation of a German Corpus]† *[http://ysomeya.hp.infoseek.co.jp/ Online Business Letter Corpus KWIC Concordancer]†
    19 KB (2,777 words) - 03:00, 12 September 2019
  • | Corpus-based | Corpus-based
    9 KB (1,199 words) - 09:37, 16 June 2020
  • | Corpus-based | Corpus-based
    9 KB (1,170 words) - 10:03, 22 March 2017
  • * '''Training data:''' sections 2-21 of Wall Street Journal corpus * '''Testing data:''' section 23 of Wall Street Journal corpus
    3 KB (437 words) - 14:23, 28 October 2013

View (previous 50 | next 50) (20 | 50 | 100 | 250 | 500)