Difference between revisions of "MUC-7 (State of the art)"
Jump to navigation
Jump to search
(3 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* '''Recall:''' percentage of named entities defined in the corpus that were found by the program | * '''Recall:''' percentage of named entities defined in the corpus that were found by the program | ||
* Exact calculation of precision and recall is explained in the [http://www.itl.nist.gov/iad/894.02/related_projects/muc/muc_sw/muc_sw_manual.html MUC scoring software] | * Exact calculation of precision and recall is explained in the [http://www.itl.nist.gov/iad/894.02/related_projects/muc/muc_sw/muc_sw_manual.html MUC scoring software] | ||
+ | |||
* '''Training data:''' Training section of MUC-7 dataset | * '''Training data:''' Training section of MUC-7 dataset | ||
+ | * '''Dryrun data:''' Dryrun section of MUC-7 dataset | ||
* '''Testing data:''' Formal section of MUC-7 dataset | * '''Testing data:''' Formal section of MUC-7 dataset | ||
Line 15: | Line 17: | ||
! System name | ! System name | ||
! Short description | ! Short description | ||
+ | ! System type (1) | ||
! Main publications | ! Main publications | ||
! Software | ! Software | ||
− | ! Results | + | ! Results |
|- | |- | ||
− | | | + | | Annotator |
| Human annotator | | Human annotator | ||
+ | | - | ||
| [http://www.itl.nist.gov/iad/894.02/related_projects/muc/proceedings/muc_7_toc.html MUC-7 proceedings] | | [http://www.itl.nist.gov/iad/894.02/related_projects/muc/proceedings/muc_7_toc.html MUC-7 proceedings] | ||
− | | | + | | - |
| 97.60% | | 97.60% | ||
|- | |- | ||
| LTG | | LTG | ||
| Best MUC-7 participant | | Best MUC-7 participant | ||
+ | | H | ||
| Mikheev, Grover and Moens (1998) | | Mikheev, Grover and Moens (1998) | ||
− | | | + | | - |
| 93.39% | | 93.39% | ||
+ | |- | ||
+ | | Balie | ||
+ | | Unsupervised approach: no prior training | ||
+ | | U | ||
+ | | Nadeau, Turney and Matwin (2006) | ||
+ | | [http://balie.sourceforge.net sourceforge.net] | ||
+ | | 77.71% (2) | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Vocabulary transfer from training to testing | ||
+ | | S | ||
+ | | Whitelaw and Patrick (2003) | ||
+ | | - | ||
+ | | 58.89% (2) | ||
|- | |- | ||
|} | |} | ||
+ | |||
+ | * (1) '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid | ||
+ | * (2) Calculated on Enamex types only. | ||
== References == | == References == | ||
− | Mikheev, A., Grover, C. | + | Mikheev, A., Grover, C. and Moens, M. (1998). [http://www-nlpir.nist.gov/related_projects/muc/proceedings/muc_7_proceedings/ltg_muc7.pdf Description of the LTG system used for MUC-7]. ''Proceedings of the Seventh Message Understanding Conference (MUC-7)''. Fairfax, Virginia. |
+ | |||
+ | Nadeau, D., Turney, P. D. and Matwin, S. (2006) [http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-48727_e.html Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity]. ''Proceedings 19th Canadian Conference on Artificial Intelligence''. Québec, Canada. | ||
+ | |||
+ | Whitelaw, C. and Patrick, J. (2003) [http://www.springerlink.com/content/ju66c6a2734fl20u/ Evaluating Corpora for Named Entity Recognition Using Character-Level Features]. ''Proceeding of the 16th Australian Conference on AI''. Perth, Australia. | ||
== See also == | == See also == | ||
Line 42: | Line 68: | ||
* [[Named Entity Recognition (State of the art)|Named Entity Recognition]] | * [[Named Entity Recognition (State of the art)|Named Entity Recognition]] | ||
* [[State of the art]] | * [[State of the art]] | ||
+ | |||
+ | [[Category:State of the art]] |
Latest revision as of 06:51, 7 August 2007
- Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
- Precision: percentage of named entities found by the algorithm that are correct
- Recall: percentage of named entities defined in the corpus that were found by the program
- Exact calculation of precision and recall is explained in the MUC scoring software
- Training data: Training section of MUC-7 dataset
- Dryrun data: Dryrun section of MUC-7 dataset
- Testing data: Formal section of MUC-7 dataset
Table of results
System name | Short description | System type (1) | Main publications | Software | Results |
---|---|---|---|---|---|
Annotator | Human annotator | - | MUC-7 proceedings | - | 97.60% |
LTG | Best MUC-7 participant | H | Mikheev, Grover and Moens (1998) | - | 93.39% |
Balie | Unsupervised approach: no prior training | U | Nadeau, Turney and Matwin (2006) | sourceforge.net | 77.71% (2) |
Baseline | Vocabulary transfer from training to testing | S | Whitelaw and Patrick (2003) | - | 58.89% (2) |
- (1) System type: R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid
- (2) Calculated on Enamex types only.
References
Mikheev, A., Grover, C. and Moens, M. (1998). Description of the LTG system used for MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7). Fairfax, Virginia.
Nadeau, D., Turney, P. D. and Matwin, S. (2006) Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Proceedings 19th Canadian Conference on Artificial Intelligence. Québec, Canada.
Whitelaw, C. and Patrick, J. (2003) Evaluating Corpora for Named Entity Recognition Using Character-Level Features. Proceeding of the 16th Australian Conference on AI. Perth, Australia.