Search results

Resources for Danish
* [http://www.isv.cbs.dk/~mbk/treebank/ PAROLE Corpus (SGML format)] (GPL) * [http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]

1 KB (174 words) - 09:38, 26 May 2014
Resources for Swahili
* [http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)]

151 bytes (21 words) - 14:35, 26 April 2008
Resources for Indonesian
...ww.panl10n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation] (A-NC-SA 3.0 licence) ...n.net/english/OutputsIndonesia2.htm 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank] (A-NC-SA 3.0 licence)

1 KB (174 words) - 23:28, 14 November 2018
Resources for Hindi
==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.

2 KB (295 words) - 09:18, 30 June 2014
Resources for Swedish
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)

1 KB (169 words) - 05:38, 29 June 2020
MSF2 The Portuguese/Spanish corpus of Multi-Sentence Fusion (Repository)
* '''Name of Dataset:''' MSF2 Corpus * '''Citation:''' If you use the MSF2 corpus in your research, please include the following citation in any resulting pa

2 KB (224 words) - 05:01, 4 May 2020
Resources for Galician
* [http://sli.uvigo.es/CTG/ Technical Corpus of Galician (CTG)] * [http://sli.uvigo.es/CTAG/ POS-tagged Technical Corpus of Galician (CTAG)]

2 KB (308 words) - 11:21, 4 August 2014
Resources for Polish
* [http://ucts.uniba.sk/aranea_about/ Araneum Polonicum], Gigaword Polish web corpus * [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

3 KB (459 words) - 13:22, 8 March 2015
Lists of resources
*[http://devoted.to/corpora Corpus-based Linguists (site maintained by David Lee)] ...tanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources]

2 KB (305 words) - 00:23, 13 February 2007
Resources for Romanian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

1 KB (131 words) - 09:50, 26 May 2014
Resources for Italian
* [http://ucts.uniba.sk/aranea_about/ Araneum Italicum], Gigaword Italian web corpus * [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]

3 KB (456 words) - 15:24, 15 March 2019
Resources for Serbian
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

291 bytes (32 words) - 13:06, 25 March 2010
Resources for French
* [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] ...//ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus

3 KB (389 words) - 08:57, 17 June 2015
Resources for Estonian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

398 bytes (50 words) - 09:42, 26 May 2014
1443 Semantically Annotated Compound Nouns (Repository)
* '''Citation:''' If you use this corpus in your research, please include the following citation in any resulting pa * '''Description:''' 1,443 compound nouns extracted from the British National Corpus and annotated with semantic relations. For more information and pointers to

1 KB (151 words) - 09:22, 24 October 2012
Resources for Montenegrin
...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

319 bytes (34 words) - 07:02, 20 September 2007
MEN Test Collection (State of the art)
| Corpus-based, predictive | Corpus-based, distributional

5 KB (590 words) - 02:05, 6 September 2020
Citations of the Diverse Noun Compound Dataset
...s/compare_contexts_NMRs.pdf Learning noun-modifier semantic relations with corpus-based and Wordnet-based features]. In ''Proceedings of the 21st National Co ...ter D. and Michael L. Littman. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'',

2 KB (197 words) - 17:57, 3 January 2007
Data sets for NLG blog
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft ...s] and [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

2 KB (353 words) - 14:09, 6 August 2020
Resources for Finnish
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English ...p://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus. Currently 14M tokens.

2 KB (300 words) - 04:38, 29 June 2020
Resources for German
...ed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) ...ttp://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus

4 KB (575 words) - 02:10, 26 August 2016
Resources for Lithuanian
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English

425 bytes (49 words) - 21:18, 16 December 2015
Resources for Macedonian
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, ...skiTaggingSiKDD2005.pdf Learning PoS tagging from a tagged Macedonian text corpus]". ''Proceedings of SiKDD 2005 (Conference on Data Mining and Data Warehous

2 KB (195 words) - 17:04, 7 October 2010
Resources for Serbo-Croatian
...ona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

323 bytes (34 words) - 07:40, 8 January 2008
Resources for Czech
* [http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]

548 bytes (72 words) - 08:56, 17 June 2015
CONLL-2003 (State of the art)
* '''Recall:''' percentage of named entities defined in the corpus that were found by the program * '''Training data:''' Train split of CONLL-2003 corpus

3 KB (378 words) - 07:29, 12 July 2019
Sharing data and evaluation (NLG)
...es/inlg2006specialsession/INLG-0626.pdf Evaluations of NLG Systems: Common Corpus and Tasks or Common Dimensions and Metrics?] ...s/inlg2006specialsession/INLG-0627.pdf Building a Semantically Transparent Corpus for the Generation of Referring Expressions.]

3 KB (361 words) - 05:44, 8 February 2009
Resources for Japanese
* [http://www.ninjal.ac.jp/english/products/bccwj/ Balanced Corpus of Contemporary Written Japanese (BCCWJ)] (subset is web searchable at Koto * [http://www.edrdg.org/projects/tanaka/tanakacorpus.html Tanaka Corpus] by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence

4 KB (558 words) - 20:40, 11 October 2017
Resources for Persian
*[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain) *[https://www.clarin.si/repository/xmlui/handle/11356/1042 Orwell's 1984 Corpus in MULTEXT-EAST] (public domain)

5 KB (619 words) - 09:58, 23 February 2016
Spam filtering datasets
...n of datasets that contains spam messages, and ham messages from the Enron corpus. See [http://www.aueb.gr/users/ion/docs/ceas2006_paper.pdf this article] fo

814 bytes (135 words) - 09:07, 19 November 2006
Resources for Kannada
==Kannada POS tagger, Morph analyzer, Corpus== [http://sivareddy.in/downloads Download]. [http://corpus.leeds.ac.uk/tools/ Alternate source]

751 bytes (101 words) - 03:43, 24 November 2011
Minipar
...coverage parser for the English language. An evaluation with the [[SUSANNE corpus]] shows that MINIPAR achieves about 88% precision and 80% recall with respe

737 bytes (99 words) - 11:58, 17 November 2006
TempEval-3 Platinum TimeML annotations (Repository)
* '''Citation:''' If you use the TempEval-3 Platinum corpus in your research, please include the following citation in any resulting pa ...and temporal relations by multiple experts and an adjudicator. This is the corpus used to rank participant systems in the TempEval-3 evaluation exercise. Ann

2 KB (250 words) - 10:44, 23 April 2013
Resources for Basque
...pusa.net/XXmendea/Konts_arrunta_fr.html XX century's Basque corpus] Basque corpus XX century * [http://www.ztcorpusa.net ZT corpus] Basque Corpus of Science and Technology

5 KB (728 words) - 09:35, 26 May 2014
Draft Schedule for SemEval 3
* July 15, 2011 Completion of corpus selection [TBC]

622 bytes (71 words) - 10:26, 6 April 2011
CoNLL 2014 (Resources by paper)
'''Title:''' ''SeedLing: Building and Using a Seed corpus for the Human Language Project''<br> '''Note:''' Plaintext corpus for >1000 languages with python API<br>

3 KB (403 words) - 07:46, 29 June 2014
Journals
* [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory] * [http://www.degruyter.com/journals/cllt Corpus Linguistics and Linguistic Theory]

7 KB (866 words) - 14:12, 11 November 2018
Resources for Icelandic
....is/icelandic_treebank/Download IcePaHC] - the Icelandic Parsed Historical Corpus. 440000 words (12th-19th century texts, phrase structure + PoS + lemma anno

885 bytes (102 words) - 01:09, 15 April 2011
Resources for Maltese
* [http://optima.jrc.it/Acquis/ JRC-Acquis] parallel corpus, 20926909 words, Maltese sentence-aligned with 22 other languages. Public d

730 bytes (100 words) - 15:40, 20 June 2011
Resources for Albanian
...ttp://www.statmt.org/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian,

631 bytes (63 words) - 16:59, 7 October 2010
OPENU Collection - RTE Users
| We parsed this corpus using Minipar, extracted subject-predicate-object triples from the results,

862 bytes (99 words) - 06:28, 22 December 2009
ESL Synonym Questions (State of the art)
| Corpus-based | Corpus-based

5 KB (687 words) - 11:23, 28 June 2015
Bigger analogy test set (State of the art)
! Corpus, window size, vector size | 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions

4 KB (521 words) - 15:14, 25 January 2017
Similar-Associated-Both Test Collection (State of the art)
| corpus-based | corpus-based

2 KB (276 words) - 12:42, 28 June 2015
IKOMA2 - RTE Users
...nym dictionaries: as acronym dictionary constructed automatically from the corpus and a synonym dictionary that contains geographical terms.

935 bytes (108 words) - 06:23, 27 September 2011
Lexical Acquisition
...y viable given recent advances in NLP and machine learning technology, and corpus availability.

923 bytes (128 words) - 05:15, 25 June 2012
Uncategorized resources
...ims.uni-stuttgart.de/projekte/TIGER/ Linguistic Interpretation of a German Corpus]† *[http://ysomeya.hp.infoseek.co.jp/ Online Business Letter Corpus KWIC Concordancer]†

19 KB (2,777 words) - 03:00, 12 September 2019
WordSimilarity-353 Test Collection (State of the art)
| Corpus-based | Corpus-based

9 KB (1,199 words) - 09:37, 16 June 2020
SAT Analogy Questions (State of the art)
| Corpus-based | Corpus-based

9 KB (1,170 words) - 10:03, 22 March 2017
Parsing (State of the art)
* '''Training data:''' sections 2-21 of Wall Street Journal corpus * '''Testing data:''' section 23 of Wall Street Journal corpus

3 KB (437 words) - 14:23, 28 October 2013

Search results

Navigation menu

Search