Difference between revisions of "TOEFL Synonym Questions (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
Line 5: Line 5:
  
  
{| class="wikitable"
+
{| border="1" cellpadding="5" cellspacing="1" width="100%"
 
|-
 
|-
 
! Reference for algorithm
 
! Reference for algorithm
 
! Reference for experiment
 
! Reference for experiment
 
! Algorithm
 
! Algorithm
! % Correct
+
! Correct
 
! 95% confidence
 
! 95% confidence
 
|-
 
|-
| Resnik 1995
+
| Resnik (1995)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| hybrid
 
| hybrid
 
| 20.31%
 
| 20.31%
 
| 12.89–31.83%
 
| 12.89–31.83%
 
|-
 
|-
| Leacock and Chodrow 1998
+
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| lexicon-based
 
| lexicon-based
 
| 21.88%
 
| 21.88%
 
| 13.91–33.21%
 
| 13.91–33.21%
 
|-
 
|-
| Lin 1998
+
| Lin (1998)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| hybrid
 
| hybrid
 
| 24.06%
 
| 24.06%
 
| 15.99–35.94%
 
| 15.99–35.94%
 
|-
 
|-
| Jiang and Conrath 1997
+
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| hybrid
 
| hybrid
 
| 25.00%
 
| 25.00%
 
| 15.99–35.94%
 
| 15.99–35.94%
 
|-
 
|-
| Landauer and Dumais 1997
+
| Landauer and Dumais (1997)
| Landauer and Dumais 1997
+
| Landauer and Dumais (1997)
 
| corpus-based
 
| corpus-based
 
| 64.38%
 
| 64.38%
Line 44: Line 44:
 
|-
 
|-
 
| Average non-English US college applicant
 
| Average non-English US college applicant
| Landauer and Dumais 1997
+
| Landauer and Dumais (1997)
 
| human
 
| human
 
| 64.50%
 
| 64.50%
 
| 53.01–74.88%
 
| 53.01–74.88%
 
|-
 
|-
| Turney 2001
+
| Turney (2001)
| Turney 2001
+
| Turney (2001)
 
| corpus-based
 
| corpus-based
 
| 73.75%
 
| 73.75%
 
| 62.71–82.96%
 
| 62.71–82.96%
 
|-
 
|-
| Hirst and St.-Onge 1998
+
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| lexicon-based
 
| lexicon-based
 
| 77.91%
 
| 77.91%
 
| 68.17–87.11%
 
| 68.17–87.11%
 
|-
 
|-
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz 2003
+
| Jarmasz and Szpakowicz (2003)
 
| lexicon-based
 
| lexicon-based
 
| 78.75%
 
| 78.75%
 
| 68.17–87.11%
 
| 68.17–87.11%
 
|-
 
|-
| Terra and Clarke 2003
+
| Terra and Clarke (2003)
| Terra and Clarke 2003
+
| Terra and Clarke (2003)
 
| corpus-based
 
| corpus-based
 
| 81.25%
 
| 81.25%
 
| 70.97–89.11%
 
| 70.97–89.11%
 
|-
 
|-
| Rapp 2003
+
| Rapp (2003)
| Rapp 2003
+
| Rapp (2003)
 
| corpus-based
 
| corpus-based
 
| 92.50%
 
| 92.50%
 
| ?
 
| ?
 
|-
 
|-
| Turney et al. 2003
+
| Turney et al. (2003)
 
| Turney et al. 2003
 
| Turney et al. 2003
 
| hybrid
 
| hybrid
 
| 97.50%
 
| 97.50%
 
| 91.26–99.70%
 
| 91.26–99.70%
 +
|-
 
|}
 
|}
  
  
 
* 95% confidence interval calculated using Binomial Exact Test
 
* 95% confidence interval calculated using Binomial Exact Test
 +
* table rows sorted in order of increasing percent correct

Revision as of 10:52, 11 May 2007

  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • introduced in Landauer and Dumais (1997)
  • subsequently used by many other researchers


Reference for algorithm Reference for experiment Algorithm Correct 95% confidence
Resnik (1995) Jarmasz and Szpakowicz (2003) hybrid 20.31% 12.89–31.83%
Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) lexicon-based 21.88% 13.91–33.21%
Lin (1998) Jarmasz and Szpakowicz (2003) hybrid 24.06% 15.99–35.94%
Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) hybrid 25.00% 15.99–35.94%
Landauer and Dumais (1997) Landauer and Dumais (1997) corpus-based 64.38% 52.90–74.80%
Average non-English US college applicant Landauer and Dumais (1997) human 64.50% 53.01–74.88%
Turney (2001) Turney (2001) corpus-based 73.75% 62.71–82.96%
Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) lexicon-based 77.91% 68.17–87.11%
Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) lexicon-based 78.75% 68.17–87.11%
Terra and Clarke (2003) Terra and Clarke (2003) corpus-based 81.25% 70.97–89.11%
Rapp (2003) Rapp (2003) corpus-based 92.50% ?
Turney et al. (2003) Turney et al. 2003 hybrid 97.50% 91.26–99.70%


  • 95% confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct