Search results

Jump to navigation Jump to search
  • ....2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hyb ...Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for
    2 KB (233 words) - 05:17, 25 June 2012
  • * [http://www.statmt.org/setimes/ Southeast European Times], sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    1 KB (148 words) - 09:36, 26 May 2014
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Mil
    2 KB (251 words) - 08:40, 17 June 2015
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus
    1 KB (141 words) - 09:52, 26 May 2014
  • * [http://www.isv.cbs.dk/~mbk/treebank/ PAROLE Corpus (SGML format)] (GPL) * [http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]
    1 KB (174 words) - 09:38, 26 May 2014
  • * [http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]
    151 bytes (21 words) - 14:35, 26 April 2008
  • ...ww.panl10n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation] (A-NC-SA 3.0 licence) ...n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank] (A-NC-SA 3.0 licence)
    1 KB (174 words) - 23:28, 14 November 2018
  • ==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.
    2 KB (295 words) - 09:18, 30 June 2014
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)
    1 KB (169 words) - 05:38, 29 June 2020
  • * '''Name of Dataset:''' MSF2 Corpus * '''Citation:''' If you use the MSF2 corpus in your research, please include the following citation in any resulting pa
    2 KB (224 words) - 05:01, 4 May 2020
  • * [http://sli.uvigo.es/CTG/ Technical Corpus of Galician (CTG)] * [http://sli.uvigo.es/CTAG/ POS-tagged Technical Corpus of Galician (CTAG)]
    2 KB (308 words) - 11:21, 4 August 2014
  • ...nagement; applications of machine learning techniques to disambiguation of corpus data. Besides the research objectives, the NLP Laboratory aims at training
    844 bytes (111 words) - 15:28, 14 January 2019
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Polonicum], Gigaword Polish web corpus * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    3 KB (459 words) - 13:22, 8 March 2015
  • *[http://devoted.to/corpora Corpus-based Linguists (site maintained by David Lee)] ...tanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources]
    2 KB (305 words) - 00:23, 13 February 2007
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    1 KB (131 words) - 09:50, 26 May 2014
  • * [http://ucts.uniba.sk/aranea_about/ Araneum Italicum], Gigaword Italian web corpus * [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]
    3 KB (456 words) - 15:24, 15 March 2019
  • * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,
    291 bytes (32 words) - 13:06, 25 March 2010
  • * [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] ...//ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus
    3 KB (389 words) - 08:57, 17 June 2015
  • ...nowledge based systems, professional development, computational modelling, corpus linguistics, etc. I am autistic, and am carer for disabled wife and son, so
    1 KB (191 words) - 13:38, 10 September 2016
  • * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
    398 bytes (50 words) - 09:42, 26 May 2014

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)