Difference between revisions of "Constrained Conditional Model"

From ACL Wiki
Jump to navigation Jump to search
(New page: Making complex decisions in real world problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate...)
 
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Making complex decisions in real world problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. Structured learning problems provide one such example, but the setting we study is broader. We are interested in cases where decisions depend on multiple models that cannot be learned simultaneously as well as cases where constraints among models' outcomes are available only at decision time.
+
A '''Constrained Conditional Model''' (CCM) is a [[machine learning]] and inference framework that refers to augmenting the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference.  
We have developed a general framework -- '''Constrained Conditional Models''' -- that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. While incorporating nonlocal dependencies in a probabilistic model can lead to intractable training and inference, our framework allows one to learn a rather simple (or multiple simple) model(s), and make decisions with more expressive models that take into account also global declerative (hard or soft) constraints. We have used this framework successfully in the context of multiple NLP and IE problems, starting with our work on named entities and relations (CoNLL'94) and our SRL work.
 
Our framework, which suggests to learn conditional models and use them as an objective function for a global constrained optimization problem, has been followed by a large body of work in NLP. Following (Roth and Yih, 2004) that has formalized global decision problems in the context of IE as constrained optimization problems and solved these optimization problems using Integer Linear Programming (ILP) we have seen (Punyakanok et al., 2005; Barzilay and Lapata, 2006; Clarke and Lapata, ; Marciniak and Strube, 2005) and others.
 
We have also studied theoretically training paradigms for CCMs and have developed an understanding for the advantages of different training regimes. Recently we studied unsupervised learning in this framework and have shown that declarative constraints can be used to take advantage of unlabeled data when training conditional models.
 
  
==Tutorials==
+
Models of this kind have recently attracted much attention within the NLP community.
 +
Formulating problems as constrained optimization problems over the output of learned models has several advantages. It allows one to focus on the modeling of problems by providing the opportunity to incorporate domain-specific knowledge as global constraints using a first order language. Using this declarative framework frees the developer from low level feature engineering while capturing the problem's domain-specific properties and guarantying exact inference. From a machine learning perspective it allows decoupling the stage of model generation (learning) from that of the constrained inference stage, thus helping to simplify the learning stage while improving the quality of the solutions.
  
  
==External links==
+
==Motivation==
<!-- Please keep this list in alphabetical order -->
+
Making decisions in many learning domains (such as natural language processing and computer vision problems) often involve assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. These settings are applicable to Structured Learning problems such as semantic role labeling but also for cases that require making use of multiple pre-learned components, such as summarization, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints.
* [http://tangra.si.umich.edu/~radev/webgraph/webgraph.pdf Bibliography of Webgraph Papers], also available in [http://tangra.si.umich.edu/~radev/webgraph/webgraph.bib bib format]
+
 
* [http://tangra.si.umich.edu/~radev/tut06/tut.pdf Graph-based Algorithms for Information Retrieval and Natural Language Processing], a tutorial at HLT-NAACL 2006
+
Constrained Conditional Models is a learning and inference framework that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. In most applications of this framework in NLP, following <ref>Dan Roth and Wen-tau Yih, [http://l2r.cs.uiuc.edu/~danr/Papers/RothYi04.pdf  "A Linear Programming Formulation for Global Inference in Natural Language Tasks."]  ''CoNLL'', (2004).</ref>, Integer Linear Programming (ILP) was used as the inference framework, although other algorithms can be used for that purpose.
* [http://www.textgraphs.org/ws06 TextGraphs: Graph-based Algorithms for Natural Language Processing], a workshop at HLT-NAACL 2006
+
 
* [http://www.textgraphs.org/ws07 TextGraphs-2: Graph-based Algorithms for Natural Language Processing], a workshop at HLT-NAACL 2007
+
 
 +
==Training Paradigms==
 +
=== Learning Local VS. Global Models ===
 +
The objective function used by CCMs can be decomposed and learned in several ways, ranging from a complete joint training of the model along with the constraints to completely decoupling between the learning and the inference stage. In the latter case, several local models are learned independently and the dependency between these models is considered only at decision time via a global decision process.  The advantages of each approach are discussed in <ref>Vasin Punyakanok and Dan Roth and Wen-Tau Yih and Dav Zimak, [http://l2r.cs.uiuc.edu/~danr/Papers/PRYZ05.pdf  "Learning and Inference over Constrained Output."]  ''IJCAI'', (2005).</ref>, which studies the two training paradigms: (1) local models: L+I (learning+inference) and (2) global model: IBT (Inference based training), and shows both theoretically and experimentally that while IBT (joint training) is best in the limit, under some conditions (basically, ”good” components”) L+I can generalize better.
 +
 
 +
=== Minimally Supervised CCM ===
 +
CCM can help reduce supervision by using domain knowledge (expressed as constraints) to drive learning. These setting were studied in
 +
<ref>Ming-Wei Chang and Lev Ratinov and Dan Roth, [http://l2r.cs.uiuc.edu/~danr/Papers/ChangRaRo07.pdf "Guiding Semi-Supervision with Constraint-Driven Learning."]  ''ACL'', (2007).</ref> and <ref>Ming-Wei Chang and Lev Ratinov and Dan Roth, [http://l2r.cs.uiuc.edu/~danr/Papers/ChangRaRo08.pdf  "Constraints as Prior Knowledge."] ''ICML Workshop on Prior Knowledge for Text and Language Processing}, (2008).</ref>. These works introduce semi-supervised Constraints Driven Learning
 +
(CODL) and show that by incorporating domain knowledge the performance of the learned model improves significantly.
 +
 +
=== Learning over Latent Representations ===
 +
CCMs were also applied to latent learning frameworks, where the learning problem is defined over a latent representation layer. Since the notion of a ''correct representation'' is inherently ill-defined no gold-labeled data regarding the representation decision is available to the learner. Identifying the correct (or optimal) learning representation is viewed as a structured prediction process and therefore modeled as a CCM.
 +
This problem was studied by several papers, in both supervised <ref>Ming-Wei Chang and Dan Goldwasser and Dan Roth and Vivek Srikumar, [http://l2r.cs.uiuc.edu/~danr/Papers/CGRS10.pdf  " Discriminative Learning over Constrained Latent Representations."] NAACL, (2010).</ref>  and unsupervised <ref>Ming-Wei Chang Dan Goldwasser Dan Roth and Yuancheng Tu, [http://l2r.cs.uiuc.edu/~danr/Papers/CGRT10.pdf "Unsupervised Constraint Driven Learning For Transliteration Discovery."]  NAACL, (2009).</ref>  settings and in all cases showed that explicitly modeling the interdependencies between representation decisions via constraints results in an improved performance.
 +
 
 +
== CCM for Natural Language Processing Applications ==
 +
The advantages of the CCM declarative formulation and the availability of off-the-shelf solvers have led to a large variety of natural language processing tasks being formulated within framework, including semantic role labeling <ref>Vasin Punyakanok, Dan Roth, Wen-tau Yih and Dav Zimak, [http://l2r.cs.uiuc.edu/~danr/Papers/PRYZ04.pdf "Semantic Role Labeling via Integer Linear Programming Inference."] COLING, (2004).</ref>, syntactic parsing  <ref>Sagae, K. and Miyao, Y. and Tsujii, J., [http://www.aclweb.org/anthology/P07-1079 "HPSG Parsing with Shallow Dependency Constraints."]  ACL, (2007).</ref>,  coreference resolution  <ref>P. Denis and J. Baldridge, [http://l2r.cs.uiuc.edu/~danr/Papers/PRYZ04.pdf "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming."]  NAACL-HLT, (2007).</ref>, summarization <ref>J. Clarke and M. Lapata, [http://www.jair.org/media/2433/live-2433-3730-jair.ps "Global Inference for Sentence Compression: An Integer Linear Programming Approach."]  Journal of Artificial Intelligence Research (JAIR), (2008).</ref>, transliteration <ref>D. Goldwasser and D. Roth, [http://l2r.cs.uiuc.edu/~danr/Papers/GoldwasserRo08a.pdf "Transliteration as Constrained Optimization."]  EMNLP, (2008).</ref> and joint information extraction <ref>D. Roth and W. Yih}, [http://l2r.cs.uiuc.edu/~danr/Papers/RothYi07.pdf "Global Inference for Entity and Relation Identification via a Linear
 +
Programming Formulation."]  Introduction to Statistical Relational Learning,MIT Press, (2007).</ref>.
 +
 
 +
Most of these works use an Integer Linear Programming solver to solve the decision problem. Although theoretically solving an Integer Linear Program is exponential in the size of the decision problem in practice using state-of-the-art solvers and sophisticated formulations <ref>André F. T. Martins, Noah A. Smith, and Eric P. Xing
 +
, [http://www.cs.cmu.edu/~nasmith/papers/martins+smith+xing.acl09.pdf "Concise Integer Linear Programming Formulations for Dependency Parsing ."] ACL, (2009).</ref>  large scale problems can be solved efficiently.
 +
 
 +
== Resources ==
 +
* '''CCM Tutorial''' [http://l2r.cs.uiuc.edu/~danr/Talks/ILP-CCM-Tutorial-NAACL10.pdf Integer Linear Programming in NLP – Constrained Conditional Models, NAACL-2010]
 +
* '''CCM Software''' [http://cogcomp.cs.illinois.edu/page/software_view/11 Learning Based Java]
 +
 
 +
== External links==
 +
* [http://l2r.cs.uiuc.edu/~cogcomp/wpt.php?pr_key=CCM University of Illinois Cognitive Computation Group]
 +
* [http://www-tsujii.is.s.u-tokyo.ac.jp/ilpnlp/ Workshop on Integer Linear Programming for Natural Language Processing, NAACL-2009]
 +
 
 +
==References==
 +
<references/>

Latest revision as of 21:17, 28 August 2010

A Constrained Conditional Model (CCM) is a machine learning and inference framework that refers to augmenting the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference.

Models of this kind have recently attracted much attention within the NLP community. Formulating problems as constrained optimization problems over the output of learned models has several advantages. It allows one to focus on the modeling of problems by providing the opportunity to incorporate domain-specific knowledge as global constraints using a first order language. Using this declarative framework frees the developer from low level feature engineering while capturing the problem's domain-specific properties and guarantying exact inference. From a machine learning perspective it allows decoupling the stage of model generation (learning) from that of the constrained inference stage, thus helping to simplify the learning stage while improving the quality of the solutions.


Motivation

Making decisions in many learning domains (such as natural language processing and computer vision problems) often involve assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. These settings are applicable to Structured Learning problems such as semantic role labeling but also for cases that require making use of multiple pre-learned components, such as summarization, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints.

Constrained Conditional Models is a learning and inference framework that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. In most applications of this framework in NLP, following [1], Integer Linear Programming (ILP) was used as the inference framework, although other algorithms can be used for that purpose.


Training Paradigms

Learning Local VS. Global Models

The objective function used by CCMs can be decomposed and learned in several ways, ranging from a complete joint training of the model along with the constraints to completely decoupling between the learning and the inference stage. In the latter case, several local models are learned independently and the dependency between these models is considered only at decision time via a global decision process. The advantages of each approach are discussed in [2], which studies the two training paradigms: (1) local models: L+I (learning+inference) and (2) global model: IBT (Inference based training), and shows both theoretically and experimentally that while IBT (joint training) is best in the limit, under some conditions (basically, ”good” components”) L+I can generalize better.

Minimally Supervised CCM

CCM can help reduce supervision by using domain knowledge (expressed as constraints) to drive learning. These setting were studied in [3] and [4]. These works introduce semi-supervised Constraints Driven Learning (CODL) and show that by incorporating domain knowledge the performance of the learned model improves significantly.

Learning over Latent Representations

CCMs were also applied to latent learning frameworks, where the learning problem is defined over a latent representation layer. Since the notion of a correct representation is inherently ill-defined no gold-labeled data regarding the representation decision is available to the learner. Identifying the correct (or optimal) learning representation is viewed as a structured prediction process and therefore modeled as a CCM. This problem was studied by several papers, in both supervised [5] and unsupervised [6] settings and in all cases showed that explicitly modeling the interdependencies between representation decisions via constraints results in an improved performance.

CCM for Natural Language Processing Applications

The advantages of the CCM declarative formulation and the availability of off-the-shelf solvers have led to a large variety of natural language processing tasks being formulated within framework, including semantic role labeling [7], syntactic parsing [8], coreference resolution [9], summarization [10], transliteration [11] and joint information extraction [12].

Most of these works use an Integer Linear Programming solver to solve the decision problem. Although theoretically solving an Integer Linear Program is exponential in the size of the decision problem in practice using state-of-the-art solvers and sophisticated formulations [13] large scale problems can be solved efficiently.

Resources

External links

References

  1. Dan Roth and Wen-tau Yih, "A Linear Programming Formulation for Global Inference in Natural Language Tasks." CoNLL, (2004).
  2. Vasin Punyakanok and Dan Roth and Wen-Tau Yih and Dav Zimak, "Learning and Inference over Constrained Output." IJCAI, (2005).
  3. Ming-Wei Chang and Lev Ratinov and Dan Roth, "Guiding Semi-Supervision with Constraint-Driven Learning." ACL, (2007).
  4. Ming-Wei Chang and Lev Ratinov and Dan Roth, "Constraints as Prior Knowledge." ICML Workshop on Prior Knowledge for Text and Language Processing}, (2008).
  5. Ming-Wei Chang and Dan Goldwasser and Dan Roth and Vivek Srikumar, " Discriminative Learning over Constrained Latent Representations." NAACL, (2010).
  6. Ming-Wei Chang Dan Goldwasser Dan Roth and Yuancheng Tu, "Unsupervised Constraint Driven Learning For Transliteration Discovery." NAACL, (2009).
  7. Vasin Punyakanok, Dan Roth, Wen-tau Yih and Dav Zimak, "Semantic Role Labeling via Integer Linear Programming Inference." COLING, (2004).
  8. Sagae, K. and Miyao, Y. and Tsujii, J., "HPSG Parsing with Shallow Dependency Constraints." ACL, (2007).
  9. P. Denis and J. Baldridge, "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming." NAACL-HLT, (2007).
  10. J. Clarke and M. Lapata, "Global Inference for Sentence Compression: An Integer Linear Programming Approach." Journal of Artificial Intelligence Research (JAIR), (2008).
  11. D. Goldwasser and D. Roth, "Transliteration as Constrained Optimization." EMNLP, (2008).
  12. D. Roth and W. Yih}, [http://l2r.cs.uiuc.edu/~danr/Papers/RothYi07.pdf "Global Inference for Entity and Relation Identification via a Linear Programming Formulation."] Introduction to Statistical Relational Learning,MIT Press, (2007).
  13. André F. T. Martins, Noah A. Smith, and Eric P. Xing , "Concise Integer Linear Programming Formulations for Dependency Parsing ." ACL, (2009).