File:Disco2011-shared-task-complete-dataset.zip
This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. 4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each.
Phrases consist of two lemmas and come in three grammatical relations: - ADJ_NN: adjective modifying a noun - V_SUBJ: noun as a subject of a verb - V_OBJ: noun as an object of a verb Passive constructions were resolved to active constructions for relation assignment purposes.
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.
File history
Click on a date/time to view the file as it appeared at that time.
| Date/Time | Dimensions | User | Comment | |
|---|---|---|---|---|
| current | 04:59, 30 June 2011 | (265 KB) | Biem (Talk | contribs) | (This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score h) |
- Edit this file using an external application (See the setup instructions for more information)
File usage
There are no pages that link to this file.