SIGFSM Shared task

Shared task notes for [SIGFSM]

A good shared task

introduces a new NLP application or concept to the community
focuses on a standard task, but introduces a more uniform

 way to measure the quality of the solutions — thus helping
 rather than distracting the ongoing efforts, or

is a well specified setting for doing a similar kind of thing

 for unspecified languages.

Things to avoid

solution requires much of specialized language technology

 that does not have wide relevance.

the time window is too short
only the best solutions would survive, leaving the others redundant.
the shared task does not provide excitement and learning
success requires much local infrastructure.
the task focuses on English, ignoring 6000 other languages.
the task is data intensive or logic intensive.
the task does not generate opportunities for new languages.

Preliminary ideas

Morphology for more languages:

     Often morphological analyzers are presented as LREC-type
     papers.  They often suffer from poor statistic testing.

     The task could aim to set a new standard for doing such
     research.  The competition could be about reaching new scientific
     conclusions and having other good characteristics in the
     contributions.

     It is open whether such characteristics would be stated
     in advance or expected to be discovered by the winners.

Phonology:

     Learning a set of phonological rules.  Theory is open,
     but the evaluation would be based on:

     (i)  generalizations made (tested on unseen data set)
     (ii) linguistic elegance (judged by a jury)

* Implementation of FST algorithms:

     (i)   Implementation of determinization or composition
           algorithms for certain tasks.

           (determinization is often the most time consuming step,
           although is rarely needs to be, because the size
           rarely blows up exponentially)

     (ii)  Fast compilation of extended regular expressions
           (negations, compositions, projections).
           --- But do we need more libraries?  I guess no.

     (iii) Packing of open source lexical transducers.
           This is very interesting topic, but I see also
           a shared task also very problematic due to
           ongoing efforts and multiple approaches.

     (iv)  Fast implementations and learning of HMMs and
           similar models.
           Overlaps with some prior tasks.

SIGFSM Shared task

A good shared task

Things to avoid

Preliminary ideas

Navigation menu

Search