Search results

Resources for Croatian
....2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hyb ...Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for

2 KB (233 words) - 05:17, 25 June 2012
Resources for Bulgarian
* [http://www.statmt.org/setimes/ Southeast European Times], sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

1 KB (148 words) - 09:36, 26 May 2014
Resources for Turkish
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, * [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Mil

2 KB (251 words) - 08:40, 17 June 2015
Resources for Slovenian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus

1 KB (141 words) - 09:52, 26 May 2014
Resources for Danish
* [http://www.isv.cbs.dk/~mbk/treebank/ PAROLE Corpus (SGML format)] (GPL) * [http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]

1 KB (174 words) - 09:38, 26 May 2014
Resources for Swahili
* [http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]

151 bytes (21 words) - 14:35, 26 April 2008
Resources for Indonesian
...ww.panl10n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation] (A-NC-SA 3.0 licence) ...n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank] (A-NC-SA 3.0 licence)

1 KB (174 words) - 23:28, 14 November 2018
Resources for Hindi
==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.

2 KB (295 words) - 09:18, 30 June 2014
Resources for Swedish
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)

1 KB (169 words) - 05:38, 29 June 2020
MSF2 The Portuguese/Spanish corpus of Multi-Sentence Fusion (Repository)
* '''Name of Dataset:''' MSF2 Corpus * '''Citation:''' If you use the MSF2 corpus in your research, please include the following citation in any resulting pa

2 KB (224 words) - 05:01, 4 May 2020
Resources for Galician
* [http://sli.uvigo.es/CTG/ Technical Corpus of Galician (CTG)] * [http://sli.uvigo.es/CTAG/ POS-tagged Technical Corpus of Galician (CTAG)]

2 KB (308 words) - 11:21, 4 August 2014
User:Nlpassist
...nagement; applications of machine learning techniques to disambiguation of corpus data. Besides the research objectives, the NLP Laboratory aims at training

844 bytes (111 words) - 15:28, 14 January 2019
Resources for Polish
* [http://ucts.uniba.sk/aranea_about/ Araneum Polonicum], Gigaword Polish web corpus * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

3 KB (459 words) - 13:22, 8 March 2015
Lists of resources
*[http://devoted.to/corpora Corpus-based Linguists (site maintained by David Lee)] ...tanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources]

2 KB (305 words) - 00:23, 13 February 2007
Resources for Romanian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

1 KB (131 words) - 09:50, 26 May 2014
Resources for Italian
* [http://ucts.uniba.sk/aranea_about/ Araneum Italicum], Gigaword Italian web corpus * [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]

3 KB (456 words) - 15:24, 15 March 2019
Resources for Serbian
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

291 bytes (32 words) - 13:06, 25 March 2010
Resources for French
* [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] ...//ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus

3 KB (389 words) - 08:57, 17 June 2015
User:E.s.atwell
...nowledge based systems, professional development, computational modelling, corpus linguistics, etc. I am autistic, and am carer for disabled wife and son, so

1 KB (191 words) - 13:38, 10 September 2016
Resources for Estonian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

398 bytes (50 words) - 09:42, 26 May 2014

Search results

Navigation menu

Search