Difference between revisions of "Temporal Information Extraction (State of the art)"
Jump to navigation
Jump to search
(Adds references for Clinical TempEval papers) |
|||
(26 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | == TempEval 2007 == |
+ | * '''TempEval''', ''Temporal Relation Identification'', 2007: [http://www.timeml.org/tempeval/ web page] | ||
+ | |||
+ | == TempEval 2010 == | ||
+ | * '''TempEval-2''', ''Evaluating Events, Time Expressions, and Temporal Relations'', 2010: [http://www.timeml.org/tempeval2/ web page] | ||
+ | |||
+ | == TempEval 2013 == | ||
+ | * '''TempEval-3''', ''Evaluating Time Expressions, Events, and Temporal Relations'', 2013: [http://www.cs.york.ac.uk/semeval-2013/task1/ web page] | ||
− | ==Performance measures== | + | === Performance measures === |
− | ==Results== | + | === Results === |
− | + | Tables show the best result for each system. Lower scoring runs for the same system are not shown. | |
− | ===Task A: Temporal expression extraction and normalisation=== | + | ====Task A: Temporal expression extraction and normalisation==== |
− | + | {| width="100%" class="wikitable sortable" | |
− | {| | ||
|- | |- | ||
− | ! rowspan="3" | System name | + | ! rowspan="3" | System name (best run) |
! rowspan="3" | Short description | ! rowspan="3" | Short description | ||
! rowspan="3" | Main publication | ! rowspan="3" | Main publication | ||
Line 32: | Line 38: | ||
! Value | ! Value | ||
|- | |- | ||
− | | HeidelTime | + | | HeidelTime (t) |
− | | | + | | rule-based |
− | | Stro ̈tgen | + | | <ref name="Stroetgen-2013">Stro ̈tgen, J., Zell, J., and Gertz, M. [http://www.aclweb.org/anthology/S/S13/S13-2003.pdf Heideltime: Tuning english and developing spanish resources for tempeval-3]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 15–19.</ref> |
− | | | + | | 83.85 |
− | | | + | | 78.99 |
− | | | + | | 81.34 |
− | | | + | | 93.08 |
− | | | + | | 87.68 |
− | | | + | | 90.30 |
− | | | + | | 90.91 |
− | | | + | | '''85.95''' |
− | | | + | | '''77.61''' |
− | | | + | | [http://dbs.ifi.uni-heidelberg.de/index.php?id=129 Download] |
− | | | + | | [http://www.gnu.org/licenses/gpl.html GNU GPL v3] |
|- | |- | ||
− | | NavyTime | + | | NavyTime (1,2) |
− | | | + | | rule-based |
− | | Chambers | + | | <ref name="Chambers-2013">Chambers, N. [http://www.aclweb.org/anthology/S/S13/S13-2012.pdf Navytime: Event and time ordering from raw text]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 73–77.</ref> |
− | | | + | | 78.72 |
− | | | + | | '''80.43''' |
− | | | + | | 79.57 |
− | | | + | | 89.36 |
− | | | + | | '''91.30''' |
− | | | + | | '''90.32''' |
− | | | + | | 88.90 |
− | | | + | | 78.58 |
− | | | + | | 70.97 |
− | | | + | | - |
− | | | + | | - |
|- | |- | ||
− | | ManTIME | + | | ManTIME (4) |
− | | | + | | CRF, probabilistic post-processing pipeline, rule-based normaliser |
− | | Filannino | + | | <ref name="Filannino-2013">Filannino, M., Brown, G., and Nenadic, G. [http://www.aclweb.org/anthology/S/S13/S13-2009.pdf ManTIME: Temporal expression identification and normalization in the Tempeval-3 challenge]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evalu- ation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 53–57.</ref> |
− | | | + | | 78.86 |
− | | | + | | 70.29 |
− | | | + | | 74.33 |
− | | | + | | 95.12 |
− | | | + | | 84.78 |
− | | | + | | 89.66 |
− | | | + | | 86.31 |
− | | | + | | 76.92 |
− | | | + | | 68.97 |
− | | | + | | [http://www.cs.man.ac.uk/~filannim/projects/tempeval-3/ Demo & Download] |
− | | | + | | [http://www.gnu.org/licenses/gpl-2.0.html GNU GPL v2] |
|- | |- | ||
| SUTime | | SUTime | ||
− | | | + | | deterministic rule-based |
− | | Chang | + | | <ref name="Chang-2013">Chang, A., and Manning, C. D. [http://www.aclweb.org/anthology/S/S13/S13-2013.pdf SUTime: Evaluation in TempEval-3]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 78–82.</ref> |
− | | | + | | 78.72 |
− | | | + | | '''80.43''' |
− | | | + | | 79.57 |
− | | | + | | 89.36 |
− | | | + | | '''91.30''' |
− | | | + | | '''90.32''' |
− | | | + | | 88.90 |
− | | | + | | 74.60 |
− | | | + | | 67.38 |
− | | | + | | [http://nlp.stanford.edu/software/sutime.shtml Demo & Download] |
− | | | + | | [http://www.gnu.org/licenses/gpl-2.0.html GNU GPL v2] |
|- | |- | ||
− | | ATT | + | | ATT (2) |
− | | | + | | MaxEnt, third party normalisers |
− | | Jung | + | | <ref name="Jung-2013">Jung, H., and Stent, A. [http://www.aclweb.org/anthology/S/S13/S13-2004.pdf ATT1: Temporal annotation using big windows and rich syntactic and semantic features]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 20–24.</ref> |
− | | | + | | '''90.57''' |
− | | | + | | 69.57 |
− | | | + | | 78.69 |
− | | | + | | '''98.11''' |
− | | | + | | 75.36 |
− | | | + | | 85.25 |
− | | | + | | 91.34 |
− | | | + | | 76.91 |
− | | | + | | 65.57 |
− | | | + | | - |
− | | | + | | - |
|- | |- | ||
− | | ClearTK | + | | ClearTK (1,2) |
− | | | + | | SVM, Logistic Regression, third party normaliser |
− | | Bethard, 2013 | + | | <ref name="Bethard-2013">Bethard, S. [http://www.aclweb.org/anthology/S/S13/S13-2002.pdf ClearTK-TimeML: A minimalist approach to tempeval 2013]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), vol. 2, Association for Computational Linguistics, Association for Computational Linguistics, pp. 10–14.</ref> |
− | | | + | | 85.94 |
− | | | + | | 79.71 |
− | | | + | | '''82.71''' |
− | | | + | | 93.75 |
− | | | + | | 86.96 |
− | | | + | | 90.23 |
− | | | + | | '''93.33''' |
− | | | + | | 71.66 |
− | | | + | | 64.66 |
− | | | + | | [https://code.google.com/p/cleartk/ Download] |
− | | | + | | [http://opensource.org/licenses/BSD-3-Clause BSD-3 Clause] |
|- | |- | ||
| JU-CSE | | JU-CSE | ||
+ | | CRF, rule-based normaliser | ||
+ | | <ref name="Kolya-2013">Kolya, A. K., Kundu, A., Gupta, R., Ekbal, A., and Bandyopadhyay, S. [http://www.aclweb.org/anthology/S/S13/S13-2011.pdf JU_CSE: A CRF based approach to annotation of temporal expression, event and temporal relations]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 64–72.</ref> | ||
+ | | 81.51 | ||
+ | | 70.29 | ||
+ | | 75.49 | ||
+ | | 93.28 | ||
+ | | 80.43 | ||
+ | | 86.38 | ||
+ | | 87.39 | ||
+ | | 73.87 | ||
+ | | 63.81 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | KUL (2) | ||
+ | | Logistic regression, post-processing, rule-based normaliser | ||
+ | | <ref name="Kolomiyets-2013">Kolomiyets, O., and Moens, M.-F. [http://www.aclweb.org/anthology/S/S13/S13-2014.pdf KUL: Data-driven approach to temporal parsing of newswire articles]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceed- ings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 83–87.</ref> | ||
+ | | 76.99 | ||
+ | | 63.04 | ||
+ | | 69.32 | ||
+ | | 92.92 | ||
+ | | 76.09 | ||
+ | | 83.67 | ||
+ | | 88.56 | ||
+ | | 75.24 | ||
+ | | 62.95 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | FSS-TimEx | ||
+ | | rule-based | ||
+ | | <ref name="Zavarella-2013">Zavarella, V., and Tanev, H. [http://www.aclweb.org/anthology/S/S13/S13-2010.pdf FSS-TimEx for tempeval-3: Extracting temporal information from text]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 58–63.</ref> | ||
+ | | 52.03 | ||
+ | | 46.38 | ||
+ | | 49.04 | ||
+ | | 90.24 | ||
+ | | 80.43 | ||
+ | | 85.06 | ||
+ | | 81.08 | ||
+ | | 68.47 | ||
+ | | 58.24 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ====Task B: Event extraction and classification==== | ||
+ | {| width="100%" class="wikitable sortable" | ||
+ | |- | ||
+ | ! rowspan="3" | System name (best run) | ||
+ | ! rowspan="3" | Short description | ||
+ | ! rowspan="3" | Main publication | ||
+ | ! colspan="3" | Identification | ||
+ | ! colspan="3" | Attributes | ||
+ | ! rowspan="3" | Overall score | ||
+ | ! rowspan="3" | Software | ||
+ | ! rowspan="3" | License | ||
+ | |- | ||
+ | ! colspan="3" | Strict matching | ||
+ | ! colspan="3" | Accuracy | ||
+ | |- | ||
+ | ! Pre. | ||
+ | ! Rec. | ||
+ | ! F1 | ||
+ | ! Class | ||
+ | ! Tense | ||
+ | ! Aspect | ||
+ | |- | ||
+ | | ATT (1) | ||
| | | | ||
+ | | <ref name="Jung-2013"/> | ||
+ | | 81.44 | ||
+ | | 80.67 | ||
+ | | '''81.05''' | ||
+ | | 88.69 | ||
+ | | 73.37 | ||
+ | | 90.68 | ||
+ | | '''71.88''' | ||
| | | | ||
| | | | ||
+ | |- | ||
+ | | KUL (2) | ||
| | | | ||
+ | | <ref name="Kolomiyets-2013"/> | ||
+ | | 80.69 | ||
+ | | 77.99 | ||
+ | | 79.32 | ||
+ | | 88.46 | ||
+ | | - | ||
+ | | - | ||
+ | | 70.17 | ||
| | | | ||
| | | | ||
+ | |- | ||
+ | | ClearTK (4) | ||
| | | | ||
+ | | <ref name="Bethard-2013"/> | ||
+ | | 81.40 | ||
+ | | 76.38 | ||
+ | | 78.81 | ||
+ | | 86.12 | ||
+ | | 78.20 | ||
+ | | 90.86 | ||
+ | | 67.87 | ||
+ | | [https://code.google.com/p/cleartk/ Download] | ||
+ | | [http://opensource.org/licenses/BSD-3-Clause BSD-3 Clause] | ||
+ | |- | ||
+ | | NavyTime (1) | ||
| | | | ||
+ | | <ref name="Chambers-2013"/> | ||
+ | | 80.73 | ||
+ | | 79.87 | ||
+ | | 80.30 | ||
+ | | 84.03 | ||
+ | | 75.79 | ||
+ | | 91.26 | ||
+ | | 67.48 | ||
| | | | ||
| | | | ||
+ | |- | ||
+ | | Temp: (ESAfeature) | ||
| | | | ||
+ | | X, 2013 | ||
+ | | 78.33 | ||
+ | | 61.61 | ||
+ | | 68.97 | ||
+ | | 79.09 | ||
+ | | - | ||
+ | | - | ||
+ | | 54.55 | ||
| | | | ||
| | | | ||
|- | |- | ||
− | | | + | | JU_CSE |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
| | | | ||
+ | | <ref name="Kolya-2013"/> | ||
+ | | 80.85 | ||
+ | | 76.51 | ||
+ | | 78.62 | ||
+ | | 67.02 | ||
+ | | 74.56 | ||
+ | | 91.76 | ||
+ | | 52.69 | ||
| | | | ||
| | | | ||
|- | |- | ||
− | | FSS- | + | | FSS-TimeEx |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
| | | | ||
+ | | <ref name="Zavarella-2013"/> | ||
+ | | 63.13 | ||
+ | | 67.11 | ||
+ | | 65.06 | ||
+ | | 66.00 | ||
+ | | - | ||
+ | | - | ||
+ | | 42.94 | ||
| | | | ||
| | | | ||
Line 169: | Line 290: | ||
|} | |} | ||
− | ===Task | + | ====Task C: Annotating relations given gold entities==== |
+ | |||
+ | ====Task C relation only: Annotating relations given gold entities and related pairs==== | ||
+ | |||
+ | ====Task ABC: Temporal awareness evaluation==== | ||
+ | |||
+ | == Clinical TempEval 2015 == | ||
+ | * '''Clinical TempEval 2015''', ''Clinical TempEval'', 2015: [http://alt.qcri.org/semeval2015/task6/ web page] | ||
+ | |||
+ | === Performance measures === | ||
+ | |||
+ | === Results === | ||
+ | Tables show the best result for each system. Lower scoring runs for the same system are not shown. | ||
+ | |||
+ | ====Time expressions==== | ||
+ | {| width="100%" class="wikitable sortable" | ||
+ | |- | ||
+ | ! rowspan="2" | System name (best run) | ||
+ | ! rowspan="2" | Short description | ||
+ | ! rowspan="2" | Main publication | ||
+ | ! colspan="3" | Span | ||
+ | ! colspan="4" | Class | ||
+ | ! rowspan="2" | Software | ||
+ | ! rowspan="2" | License | ||
+ | |- | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! A | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | <ref name="Bethard-2015">Steven Bethard, Leon Derczynski, Guergana Savova, James Pustejovsky and Marc Verhagen. [http://www.aclweb.org/anthology/S15-2136 SemEval-2015 Task 6: Clinical TempEval]. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 806-814.</ref> | ||
+ | | 0.743 | ||
+ | | 0.372 | ||
+ | | 0.496 | ||
+ | | 0.723 | ||
+ | | 0.362 | ||
+ | | 0.483 | ||
+ | | 0.974 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | KPSCMI: run 1 | ||
+ | | Rule-based | ||
+ | | - | ||
+ | | 0.272 | ||
+ | | 0.782 | ||
+ | | 0.404 | ||
+ | | 0.223 | ||
+ | | 0.642 | ||
+ | | 0.331 | ||
+ | | 0.819 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | KPSCMI: run 3 | ||
+ | | Supervised machine learning | ||
+ | | - | ||
+ | | 0.693 | ||
+ | | 0.706 | ||
+ | | 0.699 | ||
+ | | 0.657 | ||
+ | | 0.669 | ||
+ | | 0.663 | ||
+ | | 0.948 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | UFPRSheffield-SVM: run 2 | ||
+ | | Supervised machine learning | ||
+ | | <ref name="Tissot-2015">Hegler Tissot, Genevieve Gorrell, Angus Roberts, Leon Derczynski and Marcos Didonet Del Fabro. [http://www.aclweb.org/anthology/S15-2141 UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval]. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 835-839.</ref> | ||
+ | | 0.741 | ||
+ | | 0.655 | ||
+ | | 0.695 | ||
+ | | 0.723 | ||
+ | | 0.640 | ||
+ | | 0.679 | ||
+ | | 0.977 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | UFPRSheffield-Hynx: run 5 | ||
+ | | Rule-based | ||
+ | | <ref name="Tissot-2015"/> | ||
+ | | 0.411 | ||
+ | | 0.795 | ||
+ | | 0.542 | ||
+ | | 0.391 | ||
+ | | 0.756 | ||
+ | | 0.516 | ||
+ | | 0.952 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 1-3 | ||
+ | | Supervised machine learning | ||
+ | | <ref name="Velupillai-2015">Sumithra Velupillai, Danielle L Mowery, Samir Abdelrahman, Lee Christensen and Wendy Chapman. [http://www.aclweb.org/anthology/S15-2137 BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge]. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 815-819.</ref> | ||
+ | | 0.797 | ||
+ | | 0.664 | ||
+ | | 0.725 | ||
+ | | 0.778 | ||
+ | | 0.652 | ||
+ | | 0.709 | ||
+ | | 0.978 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
− | === | + | ====Event expressions==== |
+ | {| width="100%" class="wikitable sortable" | ||
+ | |- | ||
+ | ! rowspan="2" | System name (best run) | ||
+ | ! rowspan="2" | Short description | ||
+ | ! rowspan="2" | Main publication | ||
+ | ! colspan="3" | Span | ||
+ | ! colspan="4" | Modality | ||
+ | ! colspan="4" | Degree | ||
+ | ! colspan="4" | Polarity | ||
+ | ! colspan="4" | Type | ||
+ | ! rowspan="2" | Software | ||
+ | ! rowspan="2" | License | ||
+ | |- | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! A | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! A | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! A | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! A | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | <ref name="Bethard-2015"/> | ||
+ | | 0.876 | ||
+ | | 0.810 | ||
+ | | 0.842 | ||
+ | | 0.810 | ||
+ | | 0.749 | ||
+ | | 0.778 | ||
+ | | 0.924 | ||
+ | | 0.871 | ||
+ | | 0.806 | ||
+ | | 0.838 | ||
+ | | 0.995 | ||
+ | | 0.800 | ||
+ | | 0.740 | ||
+ | | 0.769 | ||
+ | | 0.913 | ||
+ | | 0.846 | ||
+ | | 0.783 | ||
+ | | 0.813 | ||
+ | | 0.966 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 1-3 | ||
+ | | Supervised machine learning | ||
+ | | <ref name="Velupillai-2015"/> | ||
+ | | 0.887 | ||
+ | | 0.864 | ||
+ | | 0.875 | ||
+ | | 0.834 | ||
+ | | 0.813 | ||
+ | | 0.824 | ||
+ | | 0.942 | ||
+ | | 0.882 | ||
+ | | 0.859 | ||
+ | | 0.870 | ||
+ | | 0.994 | ||
+ | | 0.868 | ||
+ | | 0.846 | ||
+ | | 0.857 | ||
+ | | 0.979 | ||
+ | | 0.834 | ||
+ | | 0.812 | ||
+ | | 0.823 | ||
+ | | 0.941 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
− | === | + | ====Temporal relations==== |
+ | Phase 1: text only | ||
+ | {| width="100%" class="wikitable sortable" | ||
+ | |- | ||
+ | ! rowspan="2" | System name (best run) | ||
+ | ! rowspan="2" | Short description | ||
+ | ! rowspan="2" | Main publication | ||
+ | ! colspan="3" | To Document Time | ||
+ | ! colspan="6" | Narrative Containers | ||
+ | ! rowspan="2" | Software | ||
+ | ! rowspan="2" | License | ||
+ | |- | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | <ref name="Bethard-2015"/> | ||
+ | | 0.600 | ||
+ | | 0.555 | ||
+ | | 0.577 | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | Baseline | ||
+ | | TIMEX3 to closest EVENT | ||
+ | | <ref name="Bethard-2015"/> | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.368 | ||
+ | | 0.061 | ||
+ | | 0.104 | ||
+ | | 0.400 | ||
+ | | 0.061 | ||
+ | | 0.106 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 2 | ||
+ | | Supervised machine learning | ||
+ | | <ref name="Velupillai-2015"/> | ||
+ | | 0.712 | ||
+ | | 0.693 | ||
+ | | 0.702 | ||
+ | | 0.080 | ||
+ | | 0.142 | ||
+ | | 0.102 | ||
+ | | 0.094 | ||
+ | | 0.179 | ||
+ | | 0.123 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
− | == | + | Phase 2: manual EVENTs and TIMEX3s |
− | + | {| width="100%" class="wikitable sortable" | |
− | + | |- | |
− | + | ! rowspan="2" | System name (best run) | |
+ | ! rowspan="2" | Short description | ||
+ | ! rowspan="2" | Main publication | ||
+ | ! colspan="3" | To Document Time | ||
+ | ! colspan="6" | Narrative Containers | ||
+ | ! rowspan="2" | Software | ||
+ | ! rowspan="2" | License | ||
+ | |- | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | <ref name="Bethard-2015"/> | ||
+ | | - | ||
+ | | - | ||
+ | | 0.608 | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | Baseline | ||
+ | | TIMEX3 to closest EVENT | ||
+ | | <ref name="Bethard-2015"/> | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.514 | ||
+ | | 0.170 | ||
+ | | 0.255 | ||
+ | | 0.554 | ||
+ | | 0.170 | ||
+ | | 0.260 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 2 | ||
+ | | Supervised machine learning | ||
+ | | <ref name="Velupillai-2015"/> | ||
+ | | - | ||
+ | | - | ||
+ | | 0.791 | ||
+ | | 0.109 | ||
+ | | 0.210 | ||
+ | | 0.143 | ||
+ | | 0.140 | ||
+ | | 0.254 | ||
+ | | 0.181 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
==References== | ==References== | ||
+ | <references/> | ||
+ | Unsorted | ||
* UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., and Pustejovsky, J. [http://www.aclweb.org/anthology/S/S13/S13-2001.pdf Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 1–9. | * UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., and Pustejovsky, J. [http://www.aclweb.org/anthology/S/S13/S13-2001.pdf Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations]. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 1–9. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Laokulrat, N., Miwa, M., Tsuruoka, Y., and Chikayama, T. [http://www.aclweb.org/anthology/S/S13/S13-2015.pdf UTTime: Temporal relation classification using deep syntactic features]. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 88– 92. | * Laokulrat, N., Miwa, M., Tsuruoka, Y., and Chikayama, T. [http://www.aclweb.org/anthology/S/S13/S13-2015.pdf UTTime: Temporal relation classification using deep syntactic features]. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 88– 92. | ||
Latest revision as of 20:45, 9 June 2015
TempEval 2007
- TempEval, Temporal Relation Identification, 2007: web page
TempEval 2010
- TempEval-2, Evaluating Events, Time Expressions, and Temporal Relations, 2010: web page
TempEval 2013
- TempEval-3, Evaluating Time Expressions, Events, and Temporal Relations, 2013: web page
Performance measures
Results
Tables show the best result for each system. Lower scoring runs for the same system are not shown.
Task A: Temporal expression extraction and normalisation
System name (best run) | Short description | Main publication | Identification | Normalisation | Overall score | Software | License | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Strict matching | Lenient matching | Accuracy | |||||||||||
Pre. | Rec. | F1 | Pre. | Rec. | F1 | Type | Value | ||||||
HeidelTime (t) | rule-based | [1] | 83.85 | 78.99 | 81.34 | 93.08 | 87.68 | 90.30 | 90.91 | 85.95 | 77.61 | Download | GNU GPL v3 |
NavyTime (1,2) | rule-based | [2] | 78.72 | 80.43 | 79.57 | 89.36 | 91.30 | 90.32 | 88.90 | 78.58 | 70.97 | - | - |
ManTIME (4) | CRF, probabilistic post-processing pipeline, rule-based normaliser | [3] | 78.86 | 70.29 | 74.33 | 95.12 | 84.78 | 89.66 | 86.31 | 76.92 | 68.97 | Demo & Download | GNU GPL v2 |
SUTime | deterministic rule-based | [4] | 78.72 | 80.43 | 79.57 | 89.36 | 91.30 | 90.32 | 88.90 | 74.60 | 67.38 | Demo & Download | GNU GPL v2 |
ATT (2) | MaxEnt, third party normalisers | [5] | 90.57 | 69.57 | 78.69 | 98.11 | 75.36 | 85.25 | 91.34 | 76.91 | 65.57 | - | - |
ClearTK (1,2) | SVM, Logistic Regression, third party normaliser | [6] | 85.94 | 79.71 | 82.71 | 93.75 | 86.96 | 90.23 | 93.33 | 71.66 | 64.66 | Download | BSD-3 Clause |
JU-CSE | CRF, rule-based normaliser | [7] | 81.51 | 70.29 | 75.49 | 93.28 | 80.43 | 86.38 | 87.39 | 73.87 | 63.81 | - | - |
KUL (2) | Logistic regression, post-processing, rule-based normaliser | [8] | 76.99 | 63.04 | 69.32 | 92.92 | 76.09 | 83.67 | 88.56 | 75.24 | 62.95 | - | - |
FSS-TimEx | rule-based | [9] | 52.03 | 46.38 | 49.04 | 90.24 | 80.43 | 85.06 | 81.08 | 68.47 | 58.24 | - | - |
Task B: Event extraction and classification
System name (best run) | Short description | Main publication | Identification | Attributes | Overall score | Software | License | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Strict matching | Accuracy | ||||||||||
Pre. | Rec. | F1 | Class | Tense | Aspect | ||||||
ATT (1) | [5] | 81.44 | 80.67 | 81.05 | 88.69 | 73.37 | 90.68 | 71.88 | |||
KUL (2) | [8] | 80.69 | 77.99 | 79.32 | 88.46 | - | - | 70.17 | |||
ClearTK (4) | [6] | 81.40 | 76.38 | 78.81 | 86.12 | 78.20 | 90.86 | 67.87 | Download | BSD-3 Clause | |
NavyTime (1) | [2] | 80.73 | 79.87 | 80.30 | 84.03 | 75.79 | 91.26 | 67.48 | |||
Temp: (ESAfeature) | X, 2013 | 78.33 | 61.61 | 68.97 | 79.09 | - | - | 54.55 | |||
JU_CSE | [7] | 80.85 | 76.51 | 78.62 | 67.02 | 74.56 | 91.76 | 52.69 | |||
FSS-TimeEx | [9] | 63.13 | 67.11 | 65.06 | 66.00 | - | - | 42.94 |
Task C: Annotating relations given gold entities
Task ABC: Temporal awareness evaluation
Clinical TempEval 2015
- Clinical TempEval 2015, Clinical TempEval, 2015: web page
Performance measures
Results
Tables show the best result for each system. Lower scoring runs for the same system are not shown.
Time expressions
System name (best run) | Short description | Main publication | Span | Class | Software | License | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | A | |||||
Baseline | Memorize | [10] | 0.743 | 0.372 | 0.496 | 0.723 | 0.362 | 0.483 | 0.974 | - | - |
KPSCMI: run 1 | Rule-based | - | 0.272 | 0.782 | 0.404 | 0.223 | 0.642 | 0.331 | 0.819 | - | - |
KPSCMI: run 3 | Supervised machine learning | - | 0.693 | 0.706 | 0.699 | 0.657 | 0.669 | 0.663 | 0.948 | - | - |
UFPRSheffield-SVM: run 2 | Supervised machine learning | [11] | 0.741 | 0.655 | 0.695 | 0.723 | 0.640 | 0.679 | 0.977 | - | - |
UFPRSheffield-Hynx: run 5 | Rule-based | [11] | 0.411 | 0.795 | 0.542 | 0.391 | 0.756 | 0.516 | 0.952 | - | - |
BluLab: run 1-3 | Supervised machine learning | [12] | 0.797 | 0.664 | 0.725 | 0.778 | 0.652 | 0.709 | 0.978 | - | - |
Event expressions
System name (best run) | Short description | Main publication | Span | Modality | Degree | Polarity | Type | Software | License | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | A | P | R | F1 | A | P | R | F1 | A | P | R | F1 | A | |||||
Baseline | Memorize | [10] | 0.876 | 0.810 | 0.842 | 0.810 | 0.749 | 0.778 | 0.924 | 0.871 | 0.806 | 0.838 | 0.995 | 0.800 | 0.740 | 0.769 | 0.913 | 0.846 | 0.783 | 0.813 | 0.966 | - | - |
BluLab: run 1-3 | Supervised machine learning | [12] | 0.887 | 0.864 | 0.875 | 0.834 | 0.813 | 0.824 | 0.942 | 0.882 | 0.859 | 0.870 | 0.994 | 0.868 | 0.846 | 0.857 | 0.979 | 0.834 | 0.812 | 0.823 | 0.941 | - | - |
Temporal relations
Phase 1: text only
System name (best run) | Short description | Main publication | To Document Time | Narrative Containers | Software | License | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | |||||
Baseline | Memorize | [10] | 0.600 | 0.555 | 0.577 | - | - | - | - | - | - | - | - |
Baseline | TIMEX3 to closest EVENT | [10] | - | - | - | 0.368 | 0.061 | 0.104 | 0.400 | 0.061 | 0.106 | - | - |
BluLab: run 2 | Supervised machine learning | [12] | 0.712 | 0.693 | 0.702 | 0.080 | 0.142 | 0.102 | 0.094 | 0.179 | 0.123 | - | - |
Phase 2: manual EVENTs and TIMEX3s
System name (best run) | Short description | Main publication | To Document Time | Narrative Containers | Software | License | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | |||||
Baseline | Memorize | [10] | - | - | 0.608 | - | - | - | - | - | - | - | - |
Baseline | TIMEX3 to closest EVENT | [10] | - | - | - | 0.514 | 0.170 | 0.255 | 0.554 | 0.170 | 0.260 | - | - |
BluLab: run 2 | Supervised machine learning | [12] | - | - | 0.791 | 0.109 | 0.210 | 0.143 | 0.140 | 0.254 | 0.181 | - | - |
References
- ↑ Stro ̈tgen, J., Zell, J., and Gertz, M. Heideltime: Tuning english and developing spanish resources for tempeval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 15–19.
- ↑ 2.0 2.1 Chambers, N. Navytime: Event and time ordering from raw text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 73–77.
- ↑ Filannino, M., Brown, G., and Nenadic, G. ManTIME: Temporal expression identification and normalization in the Tempeval-3 challenge. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evalu- ation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 53–57.
- ↑ Chang, A., and Manning, C. D. SUTime: Evaluation in TempEval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 78–82.
- ↑ 5.0 5.1 Jung, H., and Stent, A. ATT1: Temporal annotation using big windows and rich syntactic and semantic features. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 20–24.
- ↑ 6.0 6.1 Bethard, S. ClearTK-TimeML: A minimalist approach to tempeval 2013. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), vol. 2, Association for Computational Linguistics, Association for Computational Linguistics, pp. 10–14.
- ↑ 7.0 7.1 Kolya, A. K., Kundu, A., Gupta, R., Ekbal, A., and Bandyopadhyay, S. JU_CSE: A CRF based approach to annotation of temporal expression, event and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 64–72.
- ↑ 8.0 8.1 Kolomiyets, O., and Moens, M.-F. KUL: Data-driven approach to temporal parsing of newswire articles. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceed- ings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 83–87.
- ↑ 9.0 9.1 Zavarella, V., and Tanev, H. FSS-TimEx for tempeval-3: Extracting temporal information from text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 58–63.
- ↑ 10.0 10.1 10.2 10.3 10.4 10.5 Steven Bethard, Leon Derczynski, Guergana Savova, James Pustejovsky and Marc Verhagen. SemEval-2015 Task 6: Clinical TempEval. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 806-814.
- ↑ 11.0 11.1 Hegler Tissot, Genevieve Gorrell, Angus Roberts, Leon Derczynski and Marcos Didonet Del Fabro. UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 835-839.
- ↑ 12.0 12.1 12.2 12.3 Sumithra Velupillai, Danielle L Mowery, Samir Abdelrahman, Lee Christensen and Wendy Chapman. BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), (Denver, Colorado, June 2015), Association for Computational Linguistics, pp. 815-819.
Unsorted
- UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., and Pustejovsky, J. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 1–9.
- Laokulrat, N., Miwa, M., Tsuruoka, Y., and Chikayama, T. UTTime: Temporal relation classification using deep syntactic features. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 88– 92.