Difference between revisions of "Data sets for NLG"

From ACL Wiki
Jump to: navigation, search
(Focus on Referring Expression Generation: heading renamed: it's GRE)
(+PIL corpus)
Line 7: Line 7:
 
This page lists sets of structured data to be used as input for natural language generation tasks, or to inform research on NLG.
 
This page lists sets of structured data to be used as input for natural language generation tasks, or to inform research on NLG.
  
== Focus on Content Selection, Aggregation ==
+
== Focus on studying the generation target ==
 +
=== PIL: Patient Information Leaflet corpus ===
 +
The [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ Patient Information Leaflet (PIL) corpus] is a [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton. ([http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz direct download link])
  
 +
== Focus on content selection, aggregation ==
 
=== SumTime Meteo ===
 
=== SumTime Meteo ===
  
Line 19: Line 22:
 
Project link: http://www.csd.abdn.ac.uk/research/sumtime/
 
Project link: http://www.csd.abdn.ac.uk/research/sumtime/
  
== Focus on Generating Referring Expressions ==
+
== Focus on generating referring expressions ==
 
+
 
Referring expression generation is a sub-task of NLG with an active research community.  
 
Referring expression generation is a sub-task of NLG with an active research community.  
  
Line 31: Line 33:
 
The [http://www.csd.abdn.ac.uk/~agatt/tuna/corpus/ TUNA Reference Corpus] is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment, and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis. ([http://www.csd.abdn.ac.uk/~agatt/tuna/corpus/corpus.zip direct download link])
 
The [http://www.csd.abdn.ac.uk/~agatt/tuna/corpus/ TUNA Reference Corpus] is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment, and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis. ([http://www.csd.abdn.ac.uk/~agatt/tuna/corpus/corpus.zip direct download link])
  
== Focus on Lexicalization ==
+
== Focus on lexicalization ==
 
...
 
...
  
== Focus on Syntax, Realization ==
+
== Focus on syntax, realization ==
 
...
 
...
  

Revision as of 03:50, 10 February 2009


This page lists sets of structured data to be used as input for natural language generation tasks, or to inform research on NLG.

Focus on studying the generation target

PIL: Patient Information Leaflet corpus

The Patient Information Leaflet (PIL) corpus is a searchable and browsable collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton. (direct download link)

Focus on content selection, aggregation

SumTime Meteo

These data contain predictions for meteorological parameters such as precipitation, temperature, wind speed, and cloud cover at various altitudes, at regular intervals for various points in the area of interest.

The weather corpus currently exists as an Access database and, alternatively, in form of CSV (ASCII) files.

Download and Info: SumTime-Meteo

Project link: http://www.csd.abdn.ac.uk/research/sumtime/

Focus on generating referring expressions

Referring expression generation is a sub-task of NLG with an active research community.

GRE3D3: Spatial Relations in Referring Expressions

A Web-based production experiment was conducted by Jette Viethen under the supervision of Robert Dale. The resulting GRE3D3 corpus contains 720 referring expressions for simple objects in simple 3D scenes. (direct download link)

TUNA Reference Corpus

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment, and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis. (direct download link)

Focus on lexicalization

...

Focus on syntax, realization

...

Siggen-logo.gif This page was imported semi-automatically from the NLG Resources Wiki which was run by ACL SIGGEN in the years 2005–2009. Please correct conversion errors and help update its contents.

Now this page is associated with the Natural Language Generation Portal.