DIRT Paraphrase Collection

From ACL Wiki
Revision as of 06:43, 8 January 2008 by Pdturney (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

DIRT (Discovery of Inference Rules from Text) is both an algorithm and a resulting knowledge collection created by Dekang Lin and Patrick Pantel at the University of Alberta. The algorithm automatically learns paraphrase expressions from text using the Distributional Hypothesis over paths in dependency trees. A path, extracted from a parse tree, is an expression that represents a binary relationship between two nouns. In short, if two paths tend to link the same sets of words, DIRT hypothesizes that the meanings of the corresponding patterns are similar.

The DIRT knowledge collection is the output of the DIRT algorithm over a 1GB set of newspaper text (San Jose Mercury, Wall Street Journal and AP Newswire from the TREC-9 collection). It extracted 7 million paths from the parse trees (231,000 unique) from which paraphrases were generated. For example, here are the Top-20 paraphrases "X solves Y" generated by DIRT:

Y is solved by X, X resolves Y, X finds a solution to Y, X tries to solve Y, X deals with Y, Y is resolved by X, X addresses Y, X seeks a solution to Y, X does something about Y, X solution to Y, Y is resolved in X, Y is solved through X, X rectifies Y, X copes with Y, X overcomes Y, X eases Y, X tackles Y, X alleviates Y, X corrects Y, X is a solution to Y, X makes Y worse, X irons out Y

Acquiring the Resource

The DIRT knowledge collection is available for research purposes by contacting its authors.



Please refer to the following publication when using this resource:

  • Dekang Lin and Patrick Pantel. 2001. Discovery of Inference Rules for Question Answering. Natural Language Engineering 7(4):343-360.


Discovery of Inference Rules from Text. Dekang Lin and Patrick Pantel. US Patent – A facility for discovering a set of inference rules (or paraphrases) by analyzing a corpus of natural language text.


Dekang Lin

Patrick Pantel