Search results

Jump to navigation Jump to search

Page title matches

Page text matches

  • VOA Corpus (small) This corpus is in the public domain
    168 bytes (27 words) - 03:46, 11 August 2015
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...://ucts.uniba.sk/aranea_about/ Araneum Hungaricum], Gigaword Hungarian web corpus
    814 bytes (103 words) - 07:44, 26 June 2016
  • ...tp://ucts.uniba.sk/aranea_about/ Araneum Hispanicum], Gigaword Spanish web corpus * [http://www.corpusdelespanol.org/ Corpus del Español] (website only)
    1 KB (155 words) - 04:40, 29 June 2020
  • ...p://ucts.uniba.sk/aranea_about/ Araneum Nederlandicum], Gigaword Dutch web corpus * [http://www.statmt.org/europarl Europarl corpus] - sentence-aligned with English
    893 bytes (114 words) - 19:04, 5 September 2019
  • * [http://corporavm.uni-koeln.de/colonia/ Colonia], corpus of historical Portuguese. * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    955 bytes (127 words) - 04:09, 4 May 2020
  • *[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus] ...sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]
    3 KB (480 words) - 09:26, 16 February 2021
  • * '''Name of Dataset:''' ABC Corpus. * '''Citation:''' If you use the ABC Corpus in your research, please include the following citation in any resulting pa
    1 KB (187 words) - 18:58, 24 June 2008
  • SUMTIME-METEO is a parallel corpus of naturally occurring weather forecast texts and the The corpus has 1045 parallel data-text units and is
    1 KB (197 words) - 14:46, 7 February 2009
  • ==Telugu POS tagger, Morph analyzer, Lemmatizer, Corpus== Keywords: Telugu, Part of Speech tagger, Lemmatizer, Morph Analyser, Corpus
    1 KB (135 words) - 08:55, 26 May 2014
  • * [http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian Texts] ...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    394 bytes (47 words) - 12:44, 26 April 2008
  • ...s", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German ...wmt15/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl
    2 KB (269 words) - 07:55, 17 June 2015
  • * [http://gtweb.uit.no/korp/ Corpus for North Sámi, South Sámi, parallel corpus North Sámi - Norwegian] ...torio.uit.no/freecorpus/orig/sme/ Original files + metadata for North Sámi corpus]
    1 KB (190 words) - 06:38, 16 August 2017
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://ucts.uniba.sk/aranea_about/ Araneum Slovacum], Gigaword Slovak web corpus
    794 bytes (102 words) - 12:28, 8 March 2015
  • *[http://americannationalcorpus.org/ American National Corpus (ANC)] ...://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]
    5 KB (788 words) - 17:58, 2 September 2019
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Sinicum], Gigaword Chinese web corpus ...icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]
    2 KB (264 words) - 17:42, 2 September 2019
  • * [http://en.wikipedia.org/wiki/Corpus_linguistics Corpus Linguistics] * [http://en.wikipedia.org/wiki/Text_corpus Text Corpus]
    1 KB (163 words) - 07:26, 17 January 2007
  • ....2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hyb ...Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for
    2 KB (233 words) - 04:17, 25 June 2012
  • * [http://www.statmt.org/setimes/ Southeast European Times], sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    1 KB (148 words) - 08:36, 26 May 2014
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Mil
    2 KB (251 words) - 07:40, 17 June 2015
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus
    1 KB (141 words) - 08:52, 26 May 2014

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)