Search results

Jump to navigation Jump to search
  • ...ed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) ...ttp://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus
    4 KB (575 words) - 02:10, 26 August 2016
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    425 bytes (49 words) - 21:18, 16 December 2015
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, ...skiTaggingSiKDD2005.pdf Learning PoS tagging from a tagged Macedonian text corpus]". ''Proceedings of SiKDD 2005 (Conference on Data Mining and Data Warehous
    2 KB (195 words) - 17:04, 7 October 2010
  • ...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    323 bytes (34 words) - 07:40, 8 January 2008
  • * [http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
    548 bytes (72 words) - 08:56, 17 June 2015
  • * '''Recall:''' percentage of named entities defined in the corpus that were found by the program * '''Training data:''' Train split of CONLL-2003 corpus
    3 KB (378 words) - 07:29, 12 July 2019
  • ...es/inlg2006specialsession/INLG-0626.pdf Evaluations of NLG Systems: Common Corpus and Tasks or Common Dimensions and Metrics?] ...s/inlg2006specialsession/INLG-0627.pdf Building a Semantically Transparent Corpus for the Generation of Referring Expressions.]
    3 KB (361 words) - 05:44, 8 February 2009
  • * [http://www.ninjal.ac.jp/english/products/bccwj/ Balanced Corpus of Contemporary Written Japanese (BCCWJ)] (subset is web searchable at Koto * [http://www.edrdg.org/projects/tanaka/tanakacorpus.html Tanaka Corpus] by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
    4 KB (558 words) - 20:40, 11 October 2017
  • *[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain) *[https://www.clarin.si/repository/xmlui/handle/11356/1042 Orwell's 1984 Corpus in MULTEXT-EAST] (public domain)
    5 KB (619 words) - 09:58, 23 February 2016
  • ...n of datasets that contains spam messages, and ham messages from the Enron corpus. See [http://www.aueb.gr/users/ion/docs/ceas2006_paper.pdf this article] fo
    814 bytes (135 words) - 09:07, 19 November 2006
  • ==Kannada POS tagger, Morph analyzer, Corpus== [http://sivareddy.in/downloads Download]. [http://corpus.leeds.ac.uk/tools/ Alternate source]
    751 bytes (101 words) - 03:43, 24 November 2011
  • ...coverage parser for the English language. An evaluation with the [[SUSANNE corpus]] shows that MINIPAR achieves about 88% precision and 80% recall with respe
    737 bytes (99 words) - 11:58, 17 November 2006
  • * '''Citation:''' If you use the TempEval-3 Platinum corpus in your research, please include the following citation in any resulting pa ...and temporal relations by multiple experts and an adjudicator. This is the corpus used to rank participant systems in the TempEval-3 evaluation exercise. Ann
    2 KB (250 words) - 10:44, 23 April 2013
  • ...pusa.net/XXmendea/Konts_arrunta_fr.html XX century's Basque corpus] Basque corpus XX century * [http://www.ztcorpusa.net ZT corpus] Basque Corpus of Science and Technology
    5 KB (728 words) - 09:35, 26 May 2014
  • * July 15, 2011 Completion of corpus selection [TBC]
    622 bytes (71 words) - 10:26, 6 April 2011
  • '''Title:''' ''SeedLing: Building and Using a Seed corpus for the Human Language Project''<br> '''Note:''' Plaintext corpus for >1000 languages with python API<br>
    3 KB (403 words) - 07:46, 29 June 2014
  • * [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory] * [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory]
    7 KB (866 words) - 14:12, 11 November 2018
  • ....is/icelandic_treebank/Download IcePaHC] - the Icelandic Parsed Historical Corpus. 440000 words (12th-19th century texts, phrase structure + PoS + lemma anno
    885 bytes (102 words) - 01:09, 15 April 2011
  • * [http://optima.jrc.it/Acquis/ JRC-Acquis] parallel corpus, 20926909 words, Maltese sentence-aligned with 22 other languages. Public d
    730 bytes (100 words) - 15:40, 20 June 2011
  • ...ttp://www.statmt.org/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    631 bytes (63 words) - 16:59, 7 October 2010

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)