Difference between revisions of "File:Disco2011-shared-task-complete-dataset.zip"
(uploaded a new version of "File:Disco2011-shared-task-complete-dataset.zip": DISCO 2011 Complete Dataset (Training and Test Data, Eval Scripts)) |
|
(No difference)
|
Latest revision as of 10:09, 10 March 2014
This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. 4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each.
Phrases consist of two lemmas and come in three grammatical relations: - ADJ_NN: adjective modifying a noun - V_SUBJ: noun as a subject of a verb - V_OBJ: noun as an object of a verb Passive constructions were resolved to active constructions for relation assignment purposes.
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Dimensions | User | Comment | |
---|---|---|---|---|
current | 10:09, 10 March 2014 | (265 KB) | Biem (talk | contribs) | DISCO 2011 Complete Dataset (Training and Test Data, Eval Scripts) |
01:59, 30 June 2011 | (265 KB) | Biem (talk | contribs) | This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score h |
You cannot overwrite this file.
File usage
There are no pages that use this file.