Difference between revisions of "Data sets for NLG blog"

From ACL Wiki
Jump to navigation Jump to search
(Create blog to supplement NLG data sets)
 
m
Line 1: Line 1:
This page is a blog supplement to [[Data sets for NLG]], which lists comments about the data sets from users and other interested parties.  We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users.  Links to relevant papers and other resources are welcome
+
This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties.  We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users.  Links to relevant papers and other resources are welcome
  
== Weathergov ==
+
=== Weathergov ===
 
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data.  If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.
 
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data.  If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

Revision as of 06:21, 21 August 2019

This blog is a supplement to Data sets for NLG, which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome

Weathergov

The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts (blog post). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the SumTime corpus instead.