Helena Gómez-Adorno

Also published as: Helena Gomez-Adorno, Helena Gómez-adorno


2023

pdf bib
HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter
Juan Vásquez | Scott Andersen | Gemma Bel-enguix | Helena Gómez-adorno | Sergio-luis Ojeda-trueba
The 7th Workshop on Online Abuse and Harms (WOAH)

In the past few years, the NLP community has actively worked on detecting LGBT+Phobia in online spaces, using textual data publicly available Most of these are for the English language and its variants since it is the most studied language by the NLP community. Nevertheless, efforts towards creating corpora in other languages are active worldwide. Despite this, the Spanish language is an understudied language regarding digital LGBT+Phobia. The only corpus we found in the literature was for the Peninsular Spanish dialects, which use LGBT+phobic terms different than those in the Mexican dialect. For this reason, we present Homo-MEX, a novel corpus for detecting LGBT+Phobia in Mexican Spanish. In this paper, we describe our data-gathering and annotation process. Also, we present a classification benchmark using various traditional machine learning algorithms and two pre-trained deep learning models to showcase our corpus classification potential.

2020

pdf bib
Automatic Word Association Norms (AWAN)
Jorge Reyes-Magaña | Gerardo Sierra Martínez | Gemma Bel-Enguix | Helena Gomez-Adorno
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

Word Association Norms (WAN) are collections that present stimuli words and the set of their associated responses. The corpus is widely used in diverse areas of expertise. In order to reduce the effort to have a good quality resource that can be reproduced in many languages with minimum sources, a methodology to build Automatic Word Association Norms is proposed (AWAN). The methodology has an input of two simple elements: a) dictionary, and b) pre-processed Word Embeddings. This new kind of WAN is evaluated in two ways: i) learning word embeddings based on the node2vec algorithm and comparing them with human annotated benchmarks, and ii) performing a lexical search for a reverse dictionary. Both evaluations are done in a weighted graph with the AWAN lexical elements. The results showed that the methodology produces good quality AWANs.

pdf bib
MineriaUNAM at SemEval-2020 Task 3: Predicting Contextual WordSimilarity Using a Centroid Based Approach and Word Embeddings
Helena Gomez-Adorno | Gemma Bel-Enguix | Jorge Reyes-Magaña | Benjamín Moreno | Ramón Casillas | Daniel Vargas
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our systems to solve Task 3 of Semeval-2020, which aims to predict the effect that context has on human perception of similarity of words. The task consists of two subtasks in English, Croatian, Finnish, and Slovenian: (1) predicting the change of similarity and (2) predicting the human scores of similarity, both of them for a pair of words within two different contexts. We tackled the problem by developing two systems, the first one uses a centroid approach and word vectors. The second one uses the ELMo language model, which is trained for each pair of words with the given context. Our approach achieved the highest score in subtask 2 for the English language.

pdf bib
Enhancing Job Searches in Mexico City with Language Technologies
Gerardo Sierra Martínez | Gemma Bel-Enguix | Helena Gómez-Adorno | Juan Manuel Torres Moreno | Tonatiuh Hernández-García | Julio V Guadarrama-Olvera | Jesús-Germán Ortiz-Barajas | Ángela María Rojas | Tomas Damerau | Soledad Aragón Martínez
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

In this paper, we show the enhancing of the Demanded Skills Diagnosis (DiCoDe: Diagnóstico de Competencias Demandadas), a system developed by Mexico City’s Ministry of Labor and Employment Promotion (STyFE: Secretaría de Trabajo y Fomento del Empleo de la Ciudad de México) that seeks to reduce information asymmetries between job seekers and employers. The project uses webscraping techniques to retrieve job vacancies posted on private job portals on a daily basis and with the purpose of informing training and individual case management policies as well as labor market monitoring. For this purpose, a collaboration project between STyFE and the Language Engineering Group (GIL: Grupo de Ingeniería Lingüística) was established in order to enhance DiCoDe by applying NLP models and semantic analysis. By this collaboration, DiCoDe’s job vacancies system’s macro-structure and its geographic referencing at the city hall (municipality) level were improved. More specifically, dictionaries were created to identify demanded competencies, skills and abilities (CSA) and algorithms were developed for dynamic classifying of vacancies and identifying terms for searches on free text, in order to improve the results and processing time of queries.

2019

pdf bib
MineriaUNAM at SemEval-2019 Task 5: Detecting Hate Speech in Twitter using Multiple Features in a Combinatorial Framework
Luis Enrique Argota Vega | Jorge Carlos Reyes-Magaña | Helena Gómez-Adorno | Gemma Bel-Enguix
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents our approach to the Task 5 of Semeval-2019, which aims at detecting hate speech against immigrants and women in Twitter. The task consists of two sub-tasks, in Spanish and English: (A) detection of hate speech and (B) classification of hateful tweets as aggressive or not, and identification of the target harassed as individual or group. We used linguistically motivated features and several types of n-grams (words, characters, functional words, punctuation symbols, POS, among others). For task A, we trained a Support Vector Machine using a combinatorial framework, whereas for task B we followed a multi-labeled approach using the Random Forest classifier. Our approach achieved the highest F1-score in sub-task A for the Spanish language.