https://aclweb.org/aclwiki/api.php?action=feedcontributions&user=Biem&feedformat=atomACL Wiki - User contributions [en]2024-03-29T15:36:06ZUser contributionsMediaWiki 1.35.2https://aclweb.org/aclwiki/index.php?title=Resources_for_German&diff=11408Resources for German2016-02-17T05:49:17Z<p>Biem: </p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus<br />
* [http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]<br />
* [http://corpora.ids-mannheim.de/~cosmas/ COSMAS II]<br />
* [http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]<br />
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC]<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus]<br />
* [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml Tübingen Treebank of Written German (TüBa-D/Z)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebads.shtml Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuepp.shtml Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)]<br />
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
<br />
==Evaluation datasets==<br />
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation]<br />
* [https://www.lt.informatik.tu-darmstadt.de/de/data/german-named-entity-recognition/ Named Entity Tagging]<br />
* [https://www.ukp.tu-darmstadt.de/data/lexical-substitution/lexical-substitution-dataset-german/ Lexical Substitution]<br />
* [https://www.lt.informatik.tu-darmstadt.de/de/data/open-source-acoustic-models-for-german-distant-speech-recognition/ Distant Speech recognition]<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
* [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes prototype German dictionaries.<br />
<br />
== Morphological analysis ==<br />
=== Free software ===<br />
* [https://code.google.com/p/morphisto/ Morphisto], based on [[SMOR]], is an [[SFST]]-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)<br />
* [http://www.danielnaber.de/morphologie/index_en.html German morphology data], based on [http://www.wolfganglezius.de/doku.php?id=cl:morphy Morhpy], licensed under CC-BY-SA 3.0<br />
<br />
==Lexicons==<br />
===Free software===<br />
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).<br />
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL)<br />
* [https://github.com/tudarmstadt-lt/GermaNER] - German Named Entity Tagger, mixed LGPL/ASL2.0, free for commercial and academic use<br />
* [https://www.lt.informatik.tu-darmstadt.de/de/software/dependency-collapsing/] Dependency Collapser/propagator to produce Stanford Colla[sed Dependency-style annotations on top of dependency parser output<br />
<br />
===Proprietary/gratis===<br />
* [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.")<br />
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars<br />
<br />
===Unknown license===<br />
* [http://www.ims.uni-stuttgart.de/projekte/IMSLex/ IMSLex German Lexicon] (no license information, but only "sample" download)<br />
* [http://www.cl.uzh.ch/CL/siclemat/sprachanalyse/molif/ mOlif morphological analyzer] (broken link)<br />
<br />
==Resource Access==<br />
* [http://wortschatz.uni-leipzig.de/Webservices/ Web service access to German language statistics]<br />
<br />
==Timeline Analysis==<br />
* [http://wortschatz.uni-leipzig.de/wort-des-tages/ German Words of the Day]<br />
* [http://www.sfs.uni-tuebingen.de/~lothar/nw/ Wortwarte (selection of German neologisms for each day) ]<br />
<br />
[[Category:Resources by language|German]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=10662TWSI Turk bootstrap Word Sense Inventory (Repository)2014-05-09T10:20:15Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010 <br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. <br />
<br />
* ''' See also:''' [[TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)]] : more data, same format.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; http://www.lt.informatik.tu-darmstadt.de/de/data/twsi-turk-bootstrap-word-sense-inventory/<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=10661TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2014-05-09T10:19:46Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T006 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T006, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
Download link for full TWSI data: http://www.lt.informatik.tu-darmstadt.de/de/data/twsi-turk-bootstrap-word-sense-inventory/<br />
<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: Due to file size limitations the data is not available here. Please download it at: http://www.lt.informatik.tu-darmstadt.de/de/data/twsi-turk-bootstrap-word-sense-inventory/<br />
<br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=10660TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2014-05-09T10:18:59Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T006 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T006, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
Download link for full TWSI data: http://www.lt.informatik.tu-darmstadt.de/de/data/twsi-turk-bootstrap-word-sense-inventory/<br />
<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: Due to file size limitations the data is not available here. Please download it at: [http://www.ukp.tu-darmstadt.de/data/lexical-resources/twsi-lexical-substitutions] <br />
<br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:ParticipantSubmissionsDISCO2011.tar.gz&diff=10582File:ParticipantSubmissionsDISCO2011.tar.gz2014-03-10T17:17:35Z<p>Biem: uploaded a new version of &quot;File:ParticipantSubmissionsDISCO2011.tar.gz&quot;: Submissions of Participants for shared task</p>
<hr />
<div>Submissions of the participating systems for the DISCO 2011 shared task.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:AnnotationJudgmentsDiISCO2011.tar.gz&diff=10581File:AnnotationJudgmentsDiISCO2011.tar.gz2014-03-10T17:16:25Z<p>Biem: uploaded a new version of &quot;File:AnnotationJudgmentsDiISCO2011.tar.gz&quot;: (not used for task) sentence level judgments</p>
<hr />
<div>This archive contains the sentence-level judgments as obtained using Mturk. From these, the scores for the shared task were aggregated.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=10580DISCo 2011 shared task data: Compositionality judgments (Repository)2014-03-10T17:11:11Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biem@cs.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-level judgments are also available in this dataset. <br />
<br />
* '''Download''': <br />
http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip (''Training and Test Data, Eval Scripts'') <br />
http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip <br />
<br />
http://aclweb.org/aclwiki/code/4/41/AnnotationJudgmentsDiISCO2011.tar.gz (''Sentence-Level judgments (not used directly in competition)'')<br />
<br />
http://aclweb.org/aclwiki/code/d/de/ParticipantSubmissionsDISCO2011.tar.gz (''Participating systems output'') <br />
<br />
[[Category:Data and code repository]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:Disco2011-shared-task-complete-dataset.zip&diff=10579File:Disco2011-shared-task-complete-dataset.zip2014-03-10T17:09:22Z<p>Biem: uploaded a new version of &quot;File:Disco2011-shared-task-complete-dataset.zip&quot;: DISCO 2011 Complete Dataset (Training and Test Data, Eval Scripts)</p>
<hr />
<div>This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts.<br />
The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. <br />
4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each. <br />
<br />
Phrases consist of two lemmas and come in three grammatical relations:<br />
- ADJ_NN: adjective modifying a noun<br />
- V_SUBJ: noun as a subject of a verb<br />
- V_OBJ: noun as an object of a verb<br />
Passive constructions were resolved to active constructions for relation assignment purposes.<br />
<br />
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=9331TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2012-05-11T12:45:46Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T006 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T006, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
Download link for full TWSI data: http://www.ukp.tu-darmstadt.de/data/lexical-resources/twsi-lexical-substitutions<br />
<br />
<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: Due to file size limitations the data is not available here. Please download it at: [http://www.ukp.tu-darmstadt.de/data/lexical-resources/twsi-lexical-substitutions] <br />
<br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=9097DISCo 2011 shared task data: Compositionality judgments (Repository)2011-12-06T16:58:43Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-level judgments are also available in this dataset. <br />
<br />
* '''Download''': <br />
http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip (''Training and Test Data, Eval Scripts'') <br />
<br />
http://aclweb.org/aclwiki/code/4/41/AnnotationJudgmentsDiISCO2011.tar.gz (''Sentence-Level judgments (not used directly in competition)'')<br />
<br />
http://aclweb.org/aclwiki/code/d/de/ParticipantSubmissionsDISCO2011.tar.gz (''Participating systems output'') <br />
<br />
[[Category:Data and code repository]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=9096DISCo 2011 shared task data: Compositionality judgments (Repository)2011-12-06T16:58:19Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-level judgments are also available in this dataset. <br />
<br />
* '''Download''': <br />
- http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip (''Training and Test Data, Eval Scripts'') <br />
- http://aclweb.org/aclwiki/code/4/41/AnnotationJudgmentsDiISCO2011.tar.gz (''Sentence-Level judgments (not used directly in competition)'')<br />
- http://aclweb.org/aclwiki/code/d/de/ParticipantSubmissionsDISCO2011.tar.gz (''Participating systems output'') <br />
<br />
[[Category:Data and code repository]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:ParticipantSubmissionsDISCO2011.tar.gz&diff=9095File:ParticipantSubmissionsDISCO2011.tar.gz2011-12-06T16:56:59Z<p>Biem: Submissions of the participating systems for the DISCO 2011 shared task.</p>
<hr />
<div>Submissions of the participating systems for the DISCO 2011 shared task.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=9094DISCo 2011 shared task data: Compositionality judgments (Repository)2011-12-06T16:55:19Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-level judgments are also available in this dataset. <br />
<br />
* '''Download''': <br />
* http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip (''Training and Test Data, Eval Scripts'') <br />
* http://aclweb.org/aclwiki/code/4/41/AnnotationJudgmentsDiISCO2011.tar.gz (''Sentence-Level judgments (not used directly in competition)'') <br />
<br />
[[Category:Data and code repository]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:AnnotationJudgmentsDiISCO2011.tar.gz&diff=9093File:AnnotationJudgmentsDiISCO2011.tar.gz2011-12-06T16:53:34Z<p>Biem: This archive contains the sentence-level judgments as obtained using Mturk. From these, the scores for the shared task were aggregated.</p>
<hr />
<div>This archive contains the sentence-level judgments as obtained using Mturk. From these, the scores for the shared task were aggregated.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)&diff=8927POS Tagging (State of the art)2011-08-17T13:11:38Z<p>Biem: /* WSJ */</p>
<hr />
<div>==Test collections==<br />
* '''Performance measure:''' per token accuracy. (The convention is for this to be measured on all tokens, including punctuation tokens and other unambiguous tokens.)<br />
* '''English'''<br />
** '''Penn Treebank''' ''Wall Street Journal'' (WSJ). The splits of data for this data set were not standardized early on (unlike for parsing) and early work uses various data splits defined by counts of tokens or by sections. Most work from 2002 on adopts the following data splits, introduced by Collins (2002):<br />
*** '''Training data:''' sections 0-18<br />
*** '''Development test data:''' sections 19-21<br />
*** '''Testing data:''' sections 22-24<br />
<br />
<br />
== Tables of results ==<br />
<br />
===WSJ===<br />
<br />
{| border="1" cellpadding="5" cellspacing="1" width="100%"<br />
|-<br />
! System name<br />
! Short description<br />
! Main publications<br />
! Software<br />
! All tokens<br />
! Unknown words<br />
|-<br />
| TnT*<br />
| Hidden markov model<br />
| Brants (2000)<br />
| [http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT]<br />
| 96.46%<br />
| 85.86%<br />
|-<br />
| GENiA Tagger**<br />
| Maximum entropy cyclic dependency network<br />
| Tsuruoka, et al (2005)<br />
| [http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ GENiA]<br />
| 97.05%<br />
| Not available<br />
|-<br />
| Averaged Perceptron<br />
| Averaged Perception discriminative sequence model<br />
| Collins (2002)<br />
| Not available<br />
| 97.11%<br />
| Not available<br />
|-<br />
| Maxent easiest-first<br />
| Maximum entropy bidirectional easiest-first inference<br />
| Tsuruoka and Tsujii (2005)<br />
| [http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/ Easiest-first]<br />
| 97.15%<br />
| Not available<br />
|-<br />
| SVMTool<br />
| SVM-based tagger and tagger generator<br />
| Giménez and Márquez (2004)<br />
| [http://www.lsi.upc.es/~nlp/SVMTool/ SVMTool]<br />
| 97.16%<br />
| 89.01%<br />
|-<br />
| Stanford Tagger 1.0<br />
| Maximum entropy cyclic dependency network<br />
| Toutanova et al. (2003)<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| 97.24%<br />
| 89.04%<br />
|-<br />
| Stanford Tagger 2.0<br />
| Maximum entropy cyclic dependency network<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| 97.32%<br />
| 90.79%<br />
|-<br />
| LTAG-spinal<br />
| bidirectional perceptron learning<br />
| Shen et al. (2007)<br />
| [http://www.cis.upenn.edu/~xtag/spinal/ LTAG-spinal]<br />
| 97.33%<br />
| Not available<br />
|-<br />
| SCCN<br />
| semi-supervised CNN<br />
| Søgaard (2011)<br />
| [http://cst.dk/anders/scnn/ SCCN]<br />
| 97.50%<br />
| n/a<br />
|}<br />
<br />
(*) TnT: Accuracy is as reported by Giménez and Márquez (2004) for the given test collection. Brants (2000) reports 96.7% token accuracy and 85.5% unknown word accuracy on a 10-fold cross-validation of the Penn WSJ corpus.<br />
<br />
(**) GENiA: Results are for models trained and tested on the given corpora (to be comparable to other results). The distributed GENiA tagger is trained on a mixed training corpus and gets 96.94% on WSJ, and 98.26% on GENiA biomedical English.<br />
<br />
== References ==<br />
<br />
* Brants, Thorsten. 2000. [http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf TnT -- A Statistical Part-of-Speech Tagger]. "6th Applied Natural Language Processing Conference".<br />
<br />
* Collins, Michael. 2002. [http://people.csail.mit.edu/mcollins/papers/tagperc.pdf Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms]. ''EMNLP 2002''.<br />
<br />
* Giménez, J., and Márquez, L. 2004. [http://www.lsi.upc.es/~nlp/SVMTool/lrec2004-gm.pdf SVMTool: A general POS tagger generator based on Support Vector Machines]. ''Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04)''. Lisbon, Portugal. <br />
<br />
* Shen, L., Satta, G., and Joshi, A. 2007. [http://acl.ldc.upenn.edu/P/P07/P07-1096.pdf Guided learning for bidirectional sequence classification]. ''Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007)'', pages 760-767.<br />
<br />
* Søgaard, Anders. 2011. Semi-supervised condensed nearest neighbor for part-of-speech tagging. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). Portland, Oregon<br />
<br />
* Toutanova, K., Klein, D., Manning, C.D., Yoram Singer, Y. 2003. [http://nlp.stanford.edu/kristina/papers/tagging.pdf Feature-rich part-of-speech tagging with a cyclic dependency network]. ''Proceedings of HLT-NAACL 2003'', pages 252-259.<br />
<br />
* Tsuruoka, Yoshimasa, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/pci05.pdf Developing a Robust Part-of-Speech Tagger for Biomedical Text, Advances in Informatics]" - ''10th Panhellenic Conference on Informatics'', '''LNCS 3746''', pp. 382-392, 2005 <br />
<br />
* Tsuruoka, Yoshimasa and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/emnlp05bidir.pdf Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data]", ''Proceedings of HLT/EMNLP 2005'', pp. 467-474.<br />
<br />
== See also ==<br />
* [[POS Induction (State of the art)]]<br />
* [[Part-of-speech tagging]]<br />
* [[State of the art]]<br />
<br />
<br />
[[Category:State of the art]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)&diff=8926POS Tagging (State of the art)2011-08-17T13:10:10Z<p>Biem: </p>
<hr />
<div>==Test collections==<br />
* '''Performance measure:''' per token accuracy. (The convention is for this to be measured on all tokens, including punctuation tokens and other unambiguous tokens.)<br />
* '''English'''<br />
** '''Penn Treebank''' ''Wall Street Journal'' (WSJ). The splits of data for this data set were not standardized early on (unlike for parsing) and early work uses various data splits defined by counts of tokens or by sections. Most work from 2002 on adopts the following data splits, introduced by Collins (2002):<br />
*** '''Training data:''' sections 0-18<br />
*** '''Development test data:''' sections 19-21<br />
*** '''Testing data:''' sections 22-24<br />
<br />
<br />
== Tables of results ==<br />
<br />
===WSJ===<br />
<br />
{| border="1" cellpadding="5" cellspacing="1" width="100%"<br />
|-<br />
! System name<br />
! Short description<br />
! Main publications<br />
! Software<br />
! All tokens<br />
! Unknown words<br />
|-<br />
| TnT*<br />
| Hidden markov model<br />
| Brants (2000)<br />
| [http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT]<br />
| 96.46%<br />
| 85.86%<br />
|-<br />
| GENiA Tagger**<br />
| Maximum entropy cyclic dependency network<br />
| Tsuruoka, et al (2005)<br />
| [http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ GENiA]<br />
| 97.05%<br />
| Not available<br />
|-<br />
| Averaged Perceptron<br />
| Averaged Perception discriminative sequence model<br />
| Collins (2002)<br />
| Not available<br />
| 97.11%<br />
| Not available<br />
|-<br />
| Maxent easiest-first<br />
| Maximum entropy bidirectional easiest-first inference<br />
| Tsuruoka and Tsujii (2005)<br />
| [http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/ Easiest-first]<br />
| 97.15%<br />
| Not available<br />
|-<br />
| SVMTool<br />
| SVM-based tagger and tagger generator<br />
| Giménez and Márquez (2004)<br />
| [http://www.lsi.upc.es/~nlp/SVMTool/ SVMTool]<br />
| 97.16%<br />
| 89.01%<br />
|-<br />
| Stanford Tagger 1.0<br />
| Maximum entropy cyclic dependency network<br />
| Toutanova et al. (2003)<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| 97.24%<br />
| 89.04%<br />
|-<br />
| Stanford Tagger 2.0<br />
| Maximum entropy cyclic dependency network<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]<br />
| 97.32%<br />
| 90.79%<br />
|-<br />
| LTAG-spinal<br />
| bidirectional perceptron learning<br />
| Shen et al. (2007)<br />
| [http://www.cis.upenn.edu/~xtag/spinal/ LTAG-spinal]<br />
| 97.33%<br />
| Not available<br />
|-<br />
| Søgaard<br />
| semi-supervised CNN<br />
| Søgaard (2011)<br />
| [http://cst.dk/anders/sccn/]<br />
| 97.50%<br />
| n/a<br />
|}<br />
<br />
(*) TnT: Accuracy is as reported by Giménez and Márquez (2004) for the given test collection. Brants (2000) reports 96.7% token accuracy and 85.5% unknown word accuracy on a 10-fold cross-validation of the Penn WSJ corpus.<br />
<br />
(**) GENiA: Results are for models trained and tested on the given corpora (to be comparable to other results). The distributed GENiA tagger is trained on a mixed training corpus and gets 96.94% on WSJ, and 98.26% on GENiA biomedical English.<br />
<br />
== References ==<br />
<br />
* Brants, Thorsten. 2000. [http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf TnT -- A Statistical Part-of-Speech Tagger]. "6th Applied Natural Language Processing Conference".<br />
<br />
* Collins, Michael. 2002. [http://people.csail.mit.edu/mcollins/papers/tagperc.pdf Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms]. ''EMNLP 2002''.<br />
<br />
* Giménez, J., and Márquez, L. 2004. [http://www.lsi.upc.es/~nlp/SVMTool/lrec2004-gm.pdf SVMTool: A general POS tagger generator based on Support Vector Machines]. ''Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04)''. Lisbon, Portugal. <br />
<br />
* Shen, L., Satta, G., and Joshi, A. 2007. [http://acl.ldc.upenn.edu/P/P07/P07-1096.pdf Guided learning for bidirectional sequence classification]. ''Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007)'', pages 760-767.<br />
<br />
* Søgaard, Anders. 2011. Semi-supervised condensed nearest neighbor for part-of-speech tagging. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). Portland, Oregon<br />
<br />
* Toutanova, K., Klein, D., Manning, C.D., Yoram Singer, Y. 2003. [http://nlp.stanford.edu/kristina/papers/tagging.pdf Feature-rich part-of-speech tagging with a cyclic dependency network]. ''Proceedings of HLT-NAACL 2003'', pages 252-259.<br />
<br />
* Tsuruoka, Yoshimasa, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/pci05.pdf Developing a Robust Part-of-Speech Tagger for Biomedical Text, Advances in Informatics]" - ''10th Panhellenic Conference on Informatics'', '''LNCS 3746''', pp. 382-392, 2005 <br />
<br />
* Tsuruoka, Yoshimasa and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/emnlp05bidir.pdf Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data]", ''Proceedings of HLT/EMNLP 2005'', pp. 467-474.<br />
<br />
== See also ==<br />
* [[POS Induction (State of the art)]]<br />
* [[Part-of-speech tagging]]<br />
* [[State of the art]]<br />
<br />
<br />
[[Category:State of the art]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=8893DISCo 2011 shared task data: Compositionality judgments (Repository)2011-06-30T09:01:16Z<p>Biem: </p>
<hr />
<div><br />
<br />
* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-lvel judgments are also available in this dataset. <br />
<br />
* '''Download''': http://aclweb.org/aclwiki/code/3/38/Disco2011-shared-task-complete-dataset.zip (''link to download file'') <br />
<br />
<br />
[[Category:Data and code repository]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:Disco2011-shared-task-complete-dataset.zip&diff=8892File:Disco2011-shared-task-complete-dataset.zip2011-06-30T09:00:26Z<p>Biem: </p>
<hr />
<div>This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts.<br />
The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. <br />
4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each. <br />
<br />
Phrases consist of two lemmas and come in three grammatical relations:<br />
- ADJ_NN: adjective modifying a noun<br />
- V_SUBJ: noun as a subject of a verb<br />
- V_OBJ: noun as an object of a verb<br />
Passive constructions were resolved to active constructions for relation assignment purposes.<br />
<br />
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:Disco2011-shared-task-complete-dataset.zip&diff=8891File:Disco2011-shared-task-complete-dataset.zip2011-06-30T08:59:35Z<p>Biem: This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts.
The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score h</p>
<hr />
<div>This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts.<br />
The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. <br />
4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each. <br />
<br />
Phrases consist of two lemmas and come in three grammatical relations:<br />
- ADJ_NN: adjective modifying a noun<br />
- V_SUBJ: noun as a subject of a verb<br />
- V_OBJ: noun as an object of a verb<br />
Passive constructions were resolved active constructions for relation assignment purposes.<br />
<br />
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=8890DISCo 2011 shared task data: Compositionality judgments (Repository)2011-06-30T08:57:39Z<p>Biem: </p>
<hr />
<div>''Please copy and paste this text (while in edit mode) into the metadata file for your contribution to the [[ACL Data and Code Repository]] and edit it as appropriate.''<br />
<br />
<br />
* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licensed. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-lvel judgments are also available in this dataset. <br />
<br />
* '''Download''': (''link to download file'') <br />
<br />
<br />
[[Category:Data and code repository|Template for data]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=8889DISCo 2011 shared task data: Compositionality judgments (Repository)2011-06-30T08:57:27Z<p>Biem: </p>
<hr />
<div>''Please copy and paste this text (while in edit mode) into the metadata file for your contribution to the [[ACL Data and Code Repository]] and edit it as appropriate.''<br />
<br />
<br />
* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licesnded. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgements for ADJ_NN, V_SUBJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-lvel judgments are also available in this dataset. <br />
<br />
* '''Download''': (''link to download file'') <br />
<br />
<br />
[[Category:Data and code repository|Template for data]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=DISCo_2011_shared_task_data:_Compositionality_judgments_(Repository)&diff=8888DISCo 2011 shared task data: Compositionality judgments (Repository)2011-06-30T08:56:57Z<p>Biem: New page: ''Please copy and paste this text (while in edit mode) into the metadata file for your contribution to the ACL Data and Code Repository and edit it as appropriate.'' * '''ADCR ID:'''...</p>
<hr />
<div>''Please copy and paste this text (while in edit mode) into the metadata file for your contribution to the [[ACL Data and Code Repository]] and edit it as appropriate.''<br />
<br />
<br />
* '''ADCR ID:''' ADCR2011T007 <br />
<br />
* '''Name of Dataset:''' DISCo 2011 shared task dataset, see http://disco2011.fzi.de/ <br />
<br />
* '''Contributor:''' Chris Biemann, TU Darmstadt, Germany, biemann@tk.informatik.tu-darmstadt.de<br />
<br />
* '''Copyright:''' (c) 2011, Chris Biemann. Deposited in the [[ACL Data and Code Repository]] by Chris Biemann.<br />
<br />
* '''Licensing:''' This work is not licesnded. You can use it as you wish.<br />
<br />
* '''Citation:''' If you use the DISCo 2011 shared task dataset in your research, please include the following citation in any resulting papers: <br />
<br />
:: Biemann, C. and Giesbrecht, E. (2011): Distributional Semantics and Compositionality 2011: Shared Task Description and Results. Proceedings of the ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DISCo 2011), Portland, Oregon, USA. http://aclweb.org/anthology-new/W/W11/W11-1304.pdf<br />
<br />
* '''Description:''' The DISCo 2011 shared task dataset contains compositionality judgemnts for ADJ_NN, V_SUBKJ and V_OBJ phrases for German and English. These were aggregated over judgments on 5 sentence contexts each. The sentence-lvel judgments are also available in this dataset. <br />
<br />
* '''Download''': (''link to download file'') <br />
<br />
<br />
[[Category:Data and code repository|Template for data]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=Resources_by_Date_(Repository)&diff=8887Resources by Date (Repository)2011-06-30T08:50:16Z<p>Biem: </p>
<hr />
<div>This page is part of the [[ACL Data and Code Repository]].<br />
<br />
[[Instructions for contributors (Repository)|Uploading]]? When generating an ADCR ID (ACL Data and Code Repository Identifier), please use "T" for textual corpora, "S" for speech corpora, "C" for code, and "L" for language resources such as lexicons.<br />
<br />
<br />
* '''ADCR ID (Month Day, Year): Name of dataset or software (Repository)'''<br />
<br />
<br />
* ADCR2008T001 (June 9, 2008): [[CLAIR collection of fraud email (Repository)]]<br />
* ADCR2008C002 (June 11, 2008): [[Hierarchical Bayes Compiler (Repository)]]<br />
* ADCR2008C003 (June 11, 2008): [[MegaM: Maximum Entropy Model Optimization Package (Repository)]]<br />
* ADCR2008T004 (June 25, 2008): [[1443 Semantically Annotated Compound Nouns (Repository)]]<br />
* ADCR2010T005 (February 1, 2010): [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
* ADCR2010L001 (March 5, 2010): [[Database of Catalan Adjectives (Repository)]]<br />
* ADCR2010T006 (October 18, 2010): [[TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)]]<br />
* ADCR2011T007 (June 30, 2011): [[DISCo 2011 shared task data: Compositionality judgments (Repository)]]<br />
* ADCR2008_006<br />
<br />
<br />
[[Category:Data and code repository|Resources by Date]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8259TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-20T21:12:19Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T006 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T006, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_A-M.zip] TWSI version 2 letters N-Z: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_N-Z.zip]<br />
Supplementary Data version 2 (1): [http://aclweb.org/aclwiki/index.php?title=Image:TWSI615_source_sentences_1.zip] Supplementary Data version 2 (2): [http://aclweb.org/aclwiki/index.php?title=Image:TWST615_source_sentences_2.zip] <br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=8257TWSI Turk bootstrap Word Sense Inventory (Repository)2010-10-20T21:11:02Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010 <br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. <br />
<br />
* ''' See also:''' [[TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)]] : more data, same format.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8256TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-20T21:09:59Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T006 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_A-M.zip] TWSI version 2 letters N-Z: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_N-Z.zip]<br />
Supplementary Data version 2 (1): [http://aclweb.org/aclwiki/index.php?title=Image:TWSI615_source_sentences_1.zip] Supplementary Data version 2 (2): [http://aclweb.org/aclwiki/index.php?title=Image:TWST615_source_sentences_2.zip] <br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=8255TWSI Turk bootstrap Word Sense Inventory (Repository)2010-10-20T21:05:05Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010 <br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8250TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-18T21:04:35Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_A-M.zip] TWSI version 2 letters N-Z: [http://aclweb.org/aclwiki/index.php?title=Image:Turkboot615_N-Z.zip]<br />
Supplementary Data version 2 (1): [http://aclweb.org/aclwiki/index.php?title=Image:TWSI615_source_sentences_1.zip] Supplementary Data version 2 (2): [http://aclweb.org/aclwiki/index.php?title=Image:TWST615_source_sentences_2.zip] <br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8249TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-18T21:02:12Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: TWSI version 2 letters N-Z:<br />
Supplementary Data version 2 (1): Supplementary Data version 2 (2): <br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8248TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-18T21:01:44Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
- Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
- Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: TWSI version 2 letters N-Z:<br />
Supplementary Data version 2 (1): Supplementary Data version 2 (2): <br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_2.0_(Repository)&diff=8247TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)2010-10-18T21:01:14Z<p>Biem: New page: * '''ADCR ID:''' ADCR2010T005 * '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes TWSI Turk bootstrap Word Sense Inventory (Repository) * '''Contribut...</p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) 2.0, includes [[TWSI Turk bootstrap Word Sense Inventory (Repository)<br />
]]<br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), Octber 18th, 2010<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:'''<br />
* Version 1<br />
[http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
* Version 2: split due to file size limitations<br />
TWSI version 2 letters A-M: TWSI version 2 letters N-Z:<br />
Supplementary Data version 2 (1): Supplementary Data version 2 (2): <br />
<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=8246TWSI Turk bootstrap Word Sense Inventory (Repository)2010-10-18T20:59:21Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010 * '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=Resources_by_Date_(Repository)&diff=8245Resources by Date (Repository)2010-10-18T20:58:46Z<p>Biem: </p>
<hr />
<div>This page is part of the [[ACL Data and Code Repository]].<br />
<br />
[[Instructions for contributors (Repository)|Uploading]]? When generating an ADCR ID (ACL Data and Code Repository Identifier), please use "T" for textual corpora, "S" for speech corpora, "C" for code, and "L" for language resources such as lexicons.<br />
<br />
<br />
* '''ADCR ID (Month Day, Year): Name of dataset or software (Repository)'''<br />
<br />
<br />
* ADCR2008T001 (June 9, 2008): [[CLAIR collection of fraud email (Repository)]]<br />
* ADCR2008C002 (June 11, 2008): [[Hierarchical Bayes Compiler (Repository)]]<br />
* ADCR2008C003 (June 11, 2008): [[MegaM: Maximum Entropy Model Optimization Package (Repository)]]<br />
* ADCR2008T004 (June 25, 2008): [[1443 Semantically Annotated Compound Nouns (Repository)]]<br />
* ADCR2010T005 (February 1, 2010): [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
* ADCR2010L001 (March 5, 2010): [[Database of Catalan Adjectives (Repository)]]<br />
* ADCR2010T006 (October 18, 2010): [[TWSI Turk bootstrap Word Sense Inventory 2.0 (Repository)]]<br />
* ADCR2008_006<br />
<br />
<br />
[[Category:Data and code repository|Resources by Date]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:Turkboot615_N-Z.zip&diff=8244File:Turkboot615 N-Z.zip2010-10-18T20:57:11Z<p>Biem: This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 2.0.
This is the second part, target letters N-Z.
For the description of the process, please consult the paper for further documentation. In short, three Mturk</p>
<hr />
<div>This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 2.0. <br />
This is the second part, target letters N-Z.<br />
<br />
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided here: - "Substitutable words in context": Workers are presented a sentence with a target word and supply substitutions - "Are these words used with the same meaning?": Workers are presented a pair of sentences with the same target word marked in bold and can decide whether the meanings are identical, similar or different - "Match the Meaning" Workers are presented a sense inventory represented by prototypical sentences and align further sentences with the same target word to those senses. <br />
<br />
The TWSI is organized by target word: For the next most frequent 615 nouns in English Wikipedia that are not included already in the TWSI 1.0, (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word was used in this sense. <br />
<br />
This data has been curated and extracted from the output of a turk bootstrapping acquisition cycle. Raw data is not included here, but is available upon request.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:Turkboot615_A-M.zip&diff=8243File:Turkboot615 A-M.zip2010-10-18T20:56:28Z<p>Biem: TWSI (Turk bootstrap Word Sense Inventory) version 2.0.
This is the first part, target letters A-M.
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided </p>
<hr />
<div>TWSI (Turk bootstrap Word Sense Inventory) version 2.0. <br />
This is the first part, target letters A-M.<br />
<br />
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided here: - "Substitutable words in context": Workers are presented a sentence with a target word and supply substitutions - "Are these words used with the same meaning?": Workers are presented a pair of sentences with the same target word marked in bold and can decide whether the meanings are identical, similar or different - "Match the Meaning" Workers are presented a sense inventory represented by prototypical sentences and align further sentences with the same target word to those senses. <br />
<br />
The TWSI is organized by target word: For the next most frequent 615 nouns in English Wikipedia that are not included already in the TWSI 1.0, (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word was used in this sense. <br />
<br />
This data has been curated and extracted from the output of a turk bootstrapping acquisition cycle. Raw data is not included here, but is available upon request.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:TWST615_source_sentences_2.zip&diff=8242File:TWST615 source sentences 2.zip2010-10-18T20:53:42Z<p>Biem: Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 2.0
Part 2/2: concatenate parts to get full file.
The file "wiki_title_sent.txt" in this archive contains 4 tab-separated columns:
- sentence-id from corpus as referenced through</p>
<hr />
<div>Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 2.0 <br />
Part 2/2: concatenate parts to get full file.<br />
<br />
The file "wiki_title_sent.txt" in this archive contains 4 tab-separated columns: <br />
<br />
- sentence-id from corpus as referenced throughout the resource - number of sentence within article - title of article - sentence</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:TWSI615_source_sentences_1.zip&diff=8241File:TWSI615 source sentences 1.zip2010-10-18T20:53:10Z<p>Biem: Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 2.0
Part 1/2: concatenate parts to get full file.
The file "wiki_title_sent.txt" in this archive contains 4 tab-separated columns:
- sentence-id from corpus as referenced through</p>
<hr />
<div>Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 2.0 <br />
Part 1/2: concatenate parts to get full file.<br />
<br />
The file "wiki_title_sent.txt" in this archive contains 4 tab-separated columns: <br />
<br />
- sentence-id from corpus as referenced throughout the resource - number of sentence within article - title of article - sentence</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=8240TWSI Turk bootstrap Word Sense Inventory (Repository)2010-10-18T18:40:10Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010 / October 18th, 2010.<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Version 1: Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions. Version 2: Collection of more than 118,000 sentences for additional 615 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data version 1; <br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=8003TWSI Turk bootstrap Word Sense Inventory (Repository)2010-06-07T17:04:54Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010.<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1 [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397_source_sentences.zip] Supplementary Data<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:TWSI397_source_sentences.zip&diff=8002File:TWSI397 source sentences.zip2010-06-07T17:04:04Z<p>Biem: Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 1.0
The file "wiki_title_sent.txt" in this Archive is an extended version of the file "corpus/wiki_titles.txt" in the TWSI 1.0.
It contains 4 tab-separated columns:
- sentence-id f</p>
<hr />
<div>Supplementary data for the TWSI Turk Bootstrap Word Sense Inventory TWSI 1.0<br />
<br />
The file "wiki_title_sent.txt" in this Archive is an extended version of the file "corpus/wiki_titles.txt" in the TWSI 1.0.<br />
It contains 4 tab-separated columns:<br />
<br />
- sentence-id from corpus as referenced throughout the resource<br />
- number of sentence within article<br />
- title of article<br />
- sentence</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:TWSI397.zip&diff=7782File:TWSI397.zip2010-02-02T01:22:25Z<p>Biem: uploaded a new version of "Image:TWSI397.zip": The TWSI is organized by target word: For the most frequent 397 nouns in English Wikipedia (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated </p>
<hr />
<div>This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0. <br />
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided here:<br />
- "Substitutable words in context": Workers are presented a sentence with a target word and supply substitutions<br />
- "Are these words used with the same meaning?": Workers are presented a pair of sentences with the same target word marked in bold and can decide whether the meanings are identical, similar or different<br />
- "Match the Meaning" Workers are presented a sense inventory represented by prototypical sentences and align further sentences with the same target word to those senses. <br />
<br />
The TWSI is organized by target word: For the most frequent 397 nouns in English Wikipedia (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word was used in this sense. <br />
<br />
This data has been curated and extracted from the output of a turk bootstrapping acquisition cycle. Raw data is not included here, but is available upon request.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=7781TWSI Turk bootstrap Word Sense Inventory (Repository)2010-02-02T00:06:23Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010.<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?title=Image:TWSI397.zip] - TWSI version 1<br />
<br />
<br />
[[Category:Data and code repository|TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=File:TWSI397.zip&diff=7780File:TWSI397.zip2010-02-02T00:05:18Z<p>Biem: This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0.
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided he</p>
<hr />
<div>This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0. <br />
For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided here:<br />
- "Substitutable words in context": Workers are presented a sentence with a target word and supply substitutions<br />
- "Are these words used with the same meaning?": Workers are presented a pair of sentences with the same target word marked in bold and can decide whether the meanings are identical, similar or different<br />
- "Match the Meaning" Workers are presented a sense inventory represented by prototypical sentences and align further sentences with the same target word to those senses. <br />
<br />
The TWSI is organized by target word: For the most frequent 397 nouns in English Wikipedia (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word was used in this sense. <br />
<br />
This data has been curated and extracted from the output of a turk bootstrapping acquisition cycle. Raw data is not included here, but is available upon request.</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=7779TWSI Turk bootstrap Word Sense Inventory (Repository)2010-02-02T00:01:37Z<p>Biem: </p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://wortschatz.uni-leipzig.de/~cbiemann/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010.<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Collection of more than 50,000 sentences for 397 frequent target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?] - TWSI version 1<br />
<br />
<br />
[[Category:Data and code repository|TWSI Turk bootstrap Word Sense Inventory (Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=TWSI_Turk_bootstrap_Word_Sense_Inventory_(Repository)&diff=7778TWSI Turk bootstrap Word Sense Inventory (Repository)2010-02-02T00:00:42Z<p>Biem: New page: * '''ADCR ID:''' ADCR2010T005 * '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) * '''Contributor:''' [http://tangra.si.umich.edu/~radev/ Chris Biemann], Powerset (a M...</p>
<hr />
<div>* '''ADCR ID:''' ADCR2010T005 <br />
<br />
* '''Name of Dataset:''' TWSI (Turk bootstrap Word Sense Inventory) <br />
<br />
* '''Contributor:''' [http://tangra.si.umich.edu/~radev/ Chris Biemann], Powerset (a Microsoft company), February 1st, 2010.<br />
<br />
* '''Copyright:''' (c) 2010, Microsoft Corp. <br />
<br />
* '''Licensing:''' This work is licensed under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 ].<br />
<br />
* '''Citation:''' If you use the Turk bootstrap Word Sense Inventory in your research, please include the following citation in any resulting papers: <br />
<br />
::* C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India. , ''ACL Data and Code Repository'', ADCR2010T005, http://aclweb.org/aclwiki.<br />
<br />
* '''Description:''' Collection of more than 50,000 sentences for 397 target nouns from Wikipedia, sense-labeled and with substitutions.<br />
<br />
* '''Download:''' [http://aclweb.org/aclwiki/index.php?] - TWSI version 1<br />
<br />
<br />
[[Category:Data and code repository|TWSI Turk bootstrap Word Sense Inventory (Repository)]]</div>Biemhttps://aclweb.org/aclwiki/index.php?title=Resources_by_Date_(Repository)&diff=7777Resources by Date (Repository)2010-02-01T23:56:33Z<p>Biem: </p>
<hr />
<div>This page is part of the [[ACL Data and Code Repository]].<br />
<br />
[[Instructions for contributors (Repository)|Uploading]]? When generating an ADCR ID (ACL Data and Code Repository Identifier), please use "T" for textual corpora, "S" for speech corpora, "C" for code, and "L" for language resources such as lexicons.<br />
<br />
<br />
* '''ADCR ID (Month Day, Year): Name of dataset or software (Repository)'''<br />
<br />
<br />
* ADCR2008T001 (June 9, 2008): [[CLAIR collection of fraud email (Repository)]]<br />
* ADCR2008C002 (June 11, 2008): [[Hierarchical Bayes Compiler (Repository)]]<br />
* ADCR2008C003 (June 11, 2008): [[MegaM: Maximum Entropy Model Optimization Package (Repository)]]<br />
* ADCR2008T004 (June 25, 2008): [[1443 Semantically Annotated Compound Nouns (Repository)]]<br />
* ADCR2010T005 (February 1, 2010): [[TWSI Turk bootstrap Word Sense Inventory (Repository)]]<br />
* ADCR2008_006<br />
<br />
<br />
[[Category:Data and code repository|Resources by Date]]</div>Biem