TempEval 2007
- TempEval, Temporal Relation Identification, 2007: web page
TempEval 2010
- TempEval-2, Evaluating Events, Time Expressions, and Temporal Relations, 2010: web page
TempEval 2013
- TempEval-3, Evaluating Time Expressions, Events, and Temporal Relations, 2013: web page
Performance measures
Results
Tables show the best result for each system. Lower scoring runs for the same system are not shown.
Task A: Temporal expression extraction and normalisation
System name (best run)
|
Short description
|
Main publication
|
Identification
|
Normalisation
|
Overall score
|
Software
|
License
|
Strict matching
|
Lenient matching
|
Accuracy
|
Pre.
|
Rec.
|
F1
|
Pre.
|
Rec.
|
F1
|
Type
|
Value
|
HeidelTime (t)
|
rule-based
|
Stro ̈tgen et al., 2013
|
83.85
|
78.99
|
81.34
|
93.08
|
87.68
|
90.30
|
90.91
|
85.95
|
77.61
|
Download
|
GNU GPL v3
|
NavyTime (1,2)
|
rule-based
|
Chambers, 2013
|
78.72
|
80.43
|
79.57
|
89.36
|
91.30
|
90.32
|
88.90
|
78.58
|
70.97
|
-
|
-
|
ManTIME (4)
|
CRF, probabilistic post-processing pipeline, rule-based normaliser
|
Filannino et al., 2013
|
78.86
|
70.29
|
74.33
|
95.12
|
84.78
|
89.66
|
86.31
|
76.92
|
68.97
|
Demo & Download
|
GNU GPL v2
|
SUTime
|
deterministic rule-based
|
Chang et al., 2013
|
78.72
|
80.43
|
79.57
|
89.36
|
91.30
|
90.32
|
88.90
|
74.60
|
67.38
|
Demo & Download
|
GNU GPL v2
|
ATT (2)
|
MaxEnt, third party normalisers
|
Jung et al., 2013
|
90.57
|
69.57
|
78.69
|
98.11
|
75.36
|
85.25
|
91.34
|
76.91
|
65.57
|
-
|
-
|
ClearTK (1,2)
|
SVM, Logistic Regression, third party normaliser
|
Bethard, 2013
|
85.94
|
79.71
|
82.71
|
93.75
|
86.96
|
90.23
|
93.33
|
71.66
|
64.66
|
Download
|
BSD-3 Clause
|
JU-CSE
|
CRF, rule-based normaliser
|
Kolya et al., 2013
|
81.51
|
70.29
|
75.49
|
93.28
|
80.43
|
86.38
|
87.39
|
73.87
|
63.81
|
-
|
-
|
KUL (2)
|
Logistic regression, post-processing, rule-based normaliser
|
Kolomiyets et al., 2013
|
76.99
|
63.04
|
69.32
|
92.92
|
76.09
|
83.67
|
88.56
|
75.24
|
62.95
|
-
|
-
|
FSS-TimEx
|
rule-based
|
Zavarella et al., 2013
|
52.03
|
46.38
|
49.04
|
90.24
|
80.43
|
85.06
|
81.08
|
68.47
|
58.24
|
-
|
-
|
Task B: Event extraction and classification
System name (best run)
|
Short description
|
Main publication
|
Identification
|
Attributes
|
Overall score
|
Software
|
License
|
Strict matching
|
Accuracy
|
Pre.
|
Rec.
|
F1
|
Class
|
Tense
|
Aspect
|
ATT (1)
|
|
Jung et al., 2013
|
81.44
|
80.67
|
81.05
|
88.69
|
73.37
|
90.68
|
71.88
|
|
|
KUL (2)
|
|
Kolomiyets et al., 2013
|
80.69
|
77.99
|
79.32
|
88.46
|
-
|
-
|
70.17
|
|
|
ClearTK (4)
|
|
Bethard, 2013
|
81.40
|
76.38
|
78.81
|
86.12
|
78.20
|
90.86
|
67.87
|
Download
|
BSD-3 Clause
|
NavyTime (1)
|
|
Chambers, 2013
|
80.73
|
79.87
|
80.30
|
84.03
|
75.79
|
91.26
|
67.48
|
|
|
Temp: (ESAfeature)
|
|
X, 2013
|
78.33
|
61.61
|
68.97
|
79.09
|
-
|
-
|
54.55
|
|
|
JU_CSE
|
|
Kolya et al., 2013
|
80.85
|
76.51
|
78.62
|
67.02
|
74.56
|
91.76
|
52.69
|
|
|
FSS-TimeEx
|
|
Zavarella et al., 2013
|
63.13
|
67.11
|
65.06
|
66.00
|
-
|
-
|
42.94
|
|
|
Task C: Annotating relations given gold entities
Task C relation only: Annotating relations given gold entities and related pairs
Task ABC: Temporal awareness evaluation
Clinical TempEval 2015
- Clinical TempEval 2015, Clinical TempEval, 2015: web page
Performance measures
Results
Tables show the best result for each system. Lower scoring runs for the same system are not shown.
Time expressions
System name (best run)
|
Short description
|
Main publication
|
Span
|
Class
|
Software
|
License
|
P
|
R
|
F1
|
P
|
R
|
F1
|
A
|
Baseline: memorize
|
-
|
-
|
0.743
|
0.372
|
0.496
|
0.723
|
0.362
|
0.483
|
0.974
|
-
|
-
|
KPSCMI: run 1
|
Rule-based
|
-
|
0.272
|
0.782
|
0.404
|
0.223
|
0.642
|
0.331
|
0.819
|
-
|
-
|
KPSCMI: run 3
|
Supervised machine learning
|
-
|
0.693
|
0.706
|
0.699
|
0.657
|
0.669
|
0.663
|
0.948
|
-
|
-
|
UFPRSheffield-SVM: run 2
|
Supervised machine learning
|
-
|
0.741
|
0.655
|
0.695
|
0.723
|
0.640
|
0.679
|
0.977
|
-
|
-
|
UFPRSheffield-Hynx: run 5
|
Rule-based
|
-
|
0.411
|
0.795
|
0.542
|
0.391
|
0.756
|
0.516
|
0.952
|
-
|
-
|
BluLab: run 1-3
|
Supervised machine learning
|
-
|
0.797
|
0.664
|
0.725
|
0.778
|
0.652
|
0.709
|
0.978
|
-
|
-
|
Event expressions
System name (best run)
|
Short description
|
Main publication
|
Span
|
Modality
|
Degree
|
Polarity
|
Type
|
Software
|
License
|
P
|
R
|
F1
|
P
|
R
|
F1
|
A
|
P
|
R
|
F1
|
A
|
P
|
R
|
F1
|
A
|
P
|
R
|
F1
|
A
|
Baseline
|
Memorize
|
-
|
0.876
|
0.810
|
0.842
|
0.810
|
0.749
|
0.778
|
0.924
|
0.871
|
0.806
|
0.838
|
0.995
|
0.800
|
0.740
|
0.769
|
0.913
|
0.846
|
0.783
|
0.813
|
0.966
|
-
|
-
|
BluLab: run 1-3
|
Supervised machine learning
|
-
|
0.887
|
0.864
|
0.875
|
0.834
|
0.813
|
0.824
|
0.942
|
0.882
|
0.859
|
0.870
|
0.994
|
0.868
|
0.846
|
0.857
|
0.979
|
0.834
|
0.812
|
0.823
|
0.941
|
-
|
-
|
Temporal relations
System name (best run)
|
Short description
|
Main publication
|
To Document Time
|
Narrative Containers
|
Software
|
License
|
P
|
R
|
F1
|
P
|
R
|
F1
|
P
|
R
|
F1
|
Phase 1: text only
|
Baseline
|
Memorize
|
-
|
0.600
|
0.555
|
0.577
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Baseline
|
TIMEX3 to closest EVENT
|
-
|
-
|
-
|
-
|
0.368
|
0.061
|
0.104
|
0.400
|
0.061
|
0.106
|
-
|
-
|
BluLab: run 2
|
Supervised machine learning
|
-
|
0.712
|
0.693
|
0.702
|
0.080
|
0.142
|
0.102
|
0.094
|
0.179
|
0.123
|
-
|
-
|
Phase 2: manual EVENTs and TIMEX3s
|
Baseline
|
Memorize
|
-
|
-
|
-
|
0.608
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
Baseline
|
TIMEX3 to closest EVENT
|
-
|
-
|
-
|
-
|
0.433
|
0.162
|
0.235
|
0.469
|
0.162
|
0.240
|
-
|
-
|
BluLab: run 2
|
Supervised machine learning
|
-
|
-
|
-
|
0.791
|
0.109
|
0.210
|
0.143
|
0.140
|
0.254
|
0.181
|
-
|
-
|
References
- UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., and Pustejovsky, J. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 1–9.
- Bethard, S. ClearTK-TimeML: A minimalist approach to tempeval 2013. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), vol. 2, Association for Computational Linguistics, Association for Computational Linguistics, pp. 10–14.
- Stro ̈tgen, J., Zell, J., and Gertz, M. Heideltime: Tuning english and developing spanish resources for tempeval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 15–19.
- Jung, H., and Stent, A. ATT1: Temporal annotation using big windows and rich syntactic and semantic features. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 20–24.
- Filannino, M., Brown, G., and Nenadic, G. ManTIME: Temporal expression identification and normalization in the Tempeval-3 challenge. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evalu- ation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 53–57.
- Zavarella, V., and Tanev, H. FSS-TimEx for tempeval-3: Extracting temporal information from text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 58–63.
- Kolya, A. K., Kundu, A., Gupta, R., Ekbal, A., and Bandyopadhyay, S. JU_CSE: A CRF based approach to annotation of temporal expression, event and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 64–72.
- Chambers, N. Navytime: Event and time ordering from raw text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 73–77.
- Chang, A., and Manning, C. D. SUTime: Evaluation in TempEval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 78–82.
- Kolomiyets, O., and Moens, M.-F. KUL: Data-driven approach to temporal parsing of newswire articles. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceed- ings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 83–87.
- Laokulrat, N., Miwa, M., Tsuruoka, Y., and Chikayama, T. UTTime: Temporal relation classification using deep syntactic features. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 88– 92.
See also
External links