SIGFSM Shared task

From ACL Wiki
Revision as of 14:11, 19 July 2013 by Maletti (talk | contribs) (Created page with "Shared task notes for http://aclweb.org/aclwiki/index.php?title=SIGFSM SIGFSM == A good shared task == * introduces a new NLP application or concept to the community * f...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Shared task notes for [SIGFSM]

A good shared task

  • introduces a new NLP application or concept to the community
  • focuses on a standard task, but introduces a more uniform
 way to measure the quality of the solutions — thus helping
 rather than distracting the ongoing efforts, or
  • is a well specified setting for doing a similar kind of thing
 for unspecified languages.

Things to avoid

  • solution requires much of specialized language technology
 that does not have wide relevance.
  • the time window is too short
  • only the best solutions would survive, leaving the others redundant.
  • the shared task does not provide excitement and learning
  • success requires much local infrastructure.
  • the task focuses on English, ignoring 6000 other languages.
  • the task is data intensive or logic intensive.
  • the task does not generate opportunities for new languages.

Preliminary ideas

  • Morphology for more languages:
     Often morphological analyzers are presented as LREC-type
     papers.  They often suffer from poor statistic testing.
     The task could aim to set a new standard for doing such
     research.  The competition could be about reaching new scientific
     conclusions and having other good characteristics in the
     contributions.
     It is open whether such characteristics would be stated
     in advance or expected to be discovered by the winners.
  • Phonology:
     Learning a set of phonological rules.  Theory is open,
     but the evaluation would be based on:
     (i)  generalizations made (tested on unseen data set)
     (ii) linguistic elegance (judged by a jury)
* Implementation of FST algorithms:
     (i)   Implementation of determinization or composition
           algorithms for certain tasks.
           (determinization is often the most time consuming step,
           although is rarely needs to be, because the size
           rarely blows up exponentially)
     (ii)  Fast compilation of extended regular expressions
           (negations, compositions, projections).
           --- But do we need more libraries?  I guess no.
     (iii) Packing of open source lexical transducers.
           This is very interesting topic, but I see also
           a shared task also very problematic due to
           ongoing efforts and multiple approaches.
     (iv)  Fast implementations and learning of HMMs and
           similar models.
           Overlaps with some prior tasks.