Paloma Martínez

Also published as: Paloma Martinez


2023

pdf bib
PTUK-HULAT at ArAIEval Shared Task Fine-tuned Distilbert to Predict Disinformative Tweets
Areej Jaber | Paloma Martinez
Proceedings of ArabicNLP 2023

Disinformation involves the dissemination of incomplete, inaccurate, or misleading information; it has the objective, goal, or purpose of deliberately or intentionally lying to others aboutthe truth. The spread of disinformative information on social media has serious implications, and it causes concern among internet users in different aspects. Automatic classification models are required to detect disinformative posts on social media, especially on Twitter. In this article, DistilBERT multilingual model was fine-tuned to classify tweets either as dis-informative or not dis-informative in Subtask 2A of the ArAIEval shared task. The system outperformed the baseline and achieved F1 micro 87% and F1 macro 80%. Our system ranked 11 compared with all participants.

2022

pdf bib
UC3M-PUCPR at SemEval-2022 Task 11: An Ensemble Method of Transformer-based Models for Complex Named Entity Recognition
Elisa Schneider | Renzo M. Rivera-Zavala | Paloma Martinez | Claudia Moro | Emerson Paraiso
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study introduces the system submitted to the SemEval 2022 Task 11: MultiCoNER (Multilingual Complex Named Entity Recognition) by the UC3M-PUCPR team. We proposed an ensemble of transformer-based models for entity recognition in cross-domain texts. Our deep learning method benefits from the transformer architecture, which adopts the attention mechanism to handle the long-range dependencies of the input text. Also, the ensemble approach for named entity recognition (NER) improved the results over baselines based on individual models on two of the three tracks we participated in. The ensemble model for the code-mixed task achieves an overall performance of 76.36% F1-score, a 2.85 percentage point increase upon our individually best model for this task, XLM-RoBERTa-large (73.51%), outperforming the baseline provided for the shared task by 18.26 points. Our preliminary results suggest that contextualized language models ensembles can, even if modestly, improve the results in extracting information from unstructured data.

2020

pdf bib
Combining financial word embeddings and knowledge-based features for financial text summarization UC3M-MC System at FNS-2020
Jaime Baldeon Suarez | Paloma Martínez | Jose Luis Martínez
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This paper describes the systems proposed by HULAT research group from Universidad Carlos III de Madrid (UC3M) and MeaningCloud (MC) company to solve the FNS 2020 Shared Task on summarizing financial reports. We present a narrative extractive approach that implements a statistical model comprised of different features that measure the relevance of the sentences using a combination of statistical and machine learning methods. The key to the model’s performance is its accurate representation of the text, since the word embeddings used by the model have been trained with the summaries of the training dataset and therefore capture the most salient information from the reports. The systems’ code can be found at https://github.com/jaimebaldeon/FNS-2020.

2019

pdf bib
Deep neural model with enhanced embeddings for pharmaceutical and chemical entities recognition in Spanish clinical text
Renzo Rivera | Paloma Martínez
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

In this work, we introduce a Deep Learning architecture for pharmaceutical and chemical Named Entity Recognition in Spanish clinical cases texts. We propose a hybrid model approach based on two Bidirectional Long Short-Term Memory (Bi-LSTM) network and Conditional Random Field (CRF) network using character, word, concept and sense embeddings to deal with the extraction of semantic, syntactic and morphological features. The approach was evaluated on the PharmaCoNER Corpus obtaining an F-measure of 85.24% for subtask 1 and 49.36% for subtask2. These results prove that deep learning methods with specific domain embedding representations can outperform the state-of-the-art approaches.

2017

pdf bib
Exploring Convolutional Neural Networks for Sentiment Analysis of Spanish tweets
Isabel Segura-Bedmar | Antonio Quirós | Paloma Martínez
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Spanish is the third-most used language on the internet, after English and Chinese, with a total of 7.7% (more than 277 million of users) and a huge internet growth of more than 1,400%. However, most work on sentiment analysis has been focused on English. This paper describes a deep learning system for Spanish sentiment analysis. To the best of our knowledge, this is the first work that explores the use of a convolutional neural network to polarity classification of Spanish tweets.

pdf bib
LABDA at SemEval-2017 Task 10: Extracting Keyphrases from Scientific Publications by combining the BANNER tool and the UMLS Semantic Network
Isabel Segura-Bedmar | Cristóbal Colón-Ruiz | Paloma Martínez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the system presented by the LABDA group at SemEval 2017 Task 10 ScienceIE, specifically for the subtasks of identification and classification of keyphrases from scientific articles. For the task of identification, we use the BANNER tool, a named entity recognition system, which is based on conditional random fields (CRF) and has obtained successful results in the biomedical domain. To classify keyphrases, we study the UMLS semantic network and propose a possible linking between the keyphrase types and the UMLS semantic groups. Based on this semantic linking, we create a dictionary for each keyphrase type. Then, a feature indicating if a token is found in one of these dictionaries is incorporated to feature set used by the BANNER tool. The final results on the test dataset show that our system still needs to be improved, but the conditional random fields and, consequently, the BANNER system can be used as a first approximation to identify and classify keyphrases.

pdf bib
LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network
Víctor Suárez-Paniagua | Isabel Segura-Bedmar | Paloma Martínez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe our participation at the subtask of extraction of relationships between two identified keyphrases. This task can be very helpful in improving search engines for scientific articles. Our approach is based on the use of a convolutional neural network (CNN) trained on the training dataset. This deep learning model has already achieved successful results for the extraction relationships between named entities. Thus, our hypothesis is that this model can be also applied to extract relations between keyphrases. The official results of the task show that our architecture obtained an F1-score of 0.38% for Keyphrases Relation Classification. This performance is lower than the expected due to the generic preprocessing phase and the basic configuration of the CNN model, more complex architectures are proposed as future work to increase the classification rate.

2016

pdf bib
LABDA at the 2016 BioASQ challenge task 4a: Semantic Indexing by using ElasticSearch
Isabel Segura-Bedmar | Adrián Carruana | Paloma Martínez
Proceedings of the Fourth BioASQ workshop

2015

pdf bib
Exploring Word Embedding for Drug Name Recognition
Isabel Segura-Bedmar | Víctor Suárez-Paniagua | Paloma Martínez
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Detecting drugs and adverse events from Spanish social media streams
Isabel Segura-Bedmar | Ricardo Revert | Paloma Martínez
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Extracting drug indications and adverse drug reactions from Spanish health social media
Isabel Segura-Bedmar | Santiago de la Peña González | Paloma Martínez
Proceedings of BioNLP 2014

2013

pdf bib
SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013)
Isabel Segura-Bedmar | Paloma Martínez | María Herrero-Zazo
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2010

pdf bib
UC3M System: Determining the Extent, Type and Value of Time Expressions in TempEval-2
María Teresa Vicente-Díez | Julián Moreno Schneider | Paloma Martínez
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib
An Empirical Approach to a Preliminary Successful Identification and Resolution of Temporal Expressions in Spanish News Corpora
María Teresa Vicente-Díez | Doaa Samy | Paloma Martínez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Dating of contents is relevant to multiple advanced Natural Language Processing (NLP) applications, such as Information Retrieval or Question Answering. These could be improved by using techniques that consider a temporal dimension in their processes. To achieve it, an accurate detection of temporal expressions in data sources must be firstly done, dealing with them in an appropriated standard format that captures the time value of the expressions once resolved, and allows reasoning without ambiguity, in order to increase the range of search and the quality of the results to be returned. These tasks are completely necessary for NLP applications if an efficient temporal reasoning is afterwards expected. This work presents a typology of time expressions based on an empirical inductive approach, both from a structural perspective and from the point of view of their resolution. Furthermore, a method for the automatic recognition and resolution of temporal expressions in Spanish contents is provided, obtaining promising results when it is tested by means of an evaluation corpus.

pdf bib
A preliminary approach to extract drugs by combining UMLS resources and USAN naming conventions
Isabel Segura-Bedmar | Paloma Martínez | Doaa Samy
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2002

pdf bib
Integrating Spanish Linguistic Resources in a Web Site Assistant
Paloma Martínez | Ana García-Serrano | Alberto Ruiz-Cristina
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Adapting and Extending Lexical Resources in a Dialogue System
Ana García-Serrano | Paloma Martínez | Luis Rodrigo
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management