Application of process mining techniques to improve cardiac risk prediction.
Authorship
A.C.L.
Master in Masive Data Analisys Tecnologies: Big Data
A.C.L.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 17:00
07.16.2025 17:00
Summary
Cardiac rehabilitation constitutes a structured clinical process involving multiple interdependent phases, individualized medical decisions, and the participation of various healthcare professionals. This sequential and adaptive nature allows modeling the program as a business process, thus facilitating its analysis. However, studies in this context face significant limitations inherent to real medical databases: data is often scarce, due to both economic cost and temporal requirements for collection; many existing records are not useful for specific analyses; and, finally, there is a high presence of missing values, as not all patients undergo the same tests. To address these limitations, this work proposes an architecture based on a conditional variational autoencoder (CVAE) for synthesizing realistic and coherent clinical records. The main objective is to increase the size and diversity of the available dataset in order to improve the predictive capacity of cardiac risk models and avoid resorting to certain risky tests, such as ergometry. The results show that the proposed architecture enables the generation of coherent and realistic synthetic data, whose use improves the accuracy of various classifiers employed to detect risk, surpassing the state-of-the-art in deep learning approaches.
Cardiac rehabilitation constitutes a structured clinical process involving multiple interdependent phases, individualized medical decisions, and the participation of various healthcare professionals. This sequential and adaptive nature allows modeling the program as a business process, thus facilitating its analysis. However, studies in this context face significant limitations inherent to real medical databases: data is often scarce, due to both economic cost and temporal requirements for collection; many existing records are not useful for specific analyses; and, finally, there is a high presence of missing values, as not all patients undergo the same tests. To address these limitations, this work proposes an architecture based on a conditional variational autoencoder (CVAE) for synthesizing realistic and coherent clinical records. The main objective is to increase the size and diversity of the available dataset in order to improve the predictive capacity of cardiac risk models and avoid resorting to certain risky tests, such as ergometry. The results show that the proposed architecture enables the generation of coherent and realistic synthetic data, whose use improves the accuracy of various classifiers employed to detect risk, surpassing the state-of-the-art in deep learning approaches.
Direction
LAMA PENIN, MANUEL (Tutorships)
VIDAL AGUIAR, JUAN CARLOS (Co-tutorships)
LAMA PENIN, MANUEL (Tutorships)
VIDAL AGUIAR, JUAN CARLOS (Co-tutorships)
Court
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
Design and Implementation of an Accelerator for Data Quality Controls in Data Ingestion and Transformation Processes
Authorship
J.C.M.
Master in Masive Data Analisys Tecnologies: Big Data
J.C.M.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 16:00
07.16.2025 16:00
Summary
This Master's Thesis presents the design and implementation of an accelerator for the automated configuration of data quality controls in data ingestion and transformation processes, within modern analytical architectures based on Snowflake. The solution adopts a declarative approach that allows users to define the desired controls without having to implement the underlying technical logic, promoting reusability, traceability, and scalability. The architecture relies on native Snowflake components such as streams, tasks, and stored procedures to build a reactive system capable of autonomously executing controls whenever changes occur in their configuration. Supported validations include freshness checks, uniqueness, null values, and consistency between tables. The system is complemented by a web interface that enables visual definition of controls without writing SQL code, and a Power BI dashboard that facilitates monitoring by both technical and business users. These components enhance its practical applicability and support adoption in real-world environments without requiring advanced expertise. As a proof of concept, a use case with data from the Formula 1 championship was simulated, demonstrating the system's effectiveness in error detection, result structuring, and automation of repetitive tasks. Although currently limited to Snowflake and not validated under high-concurrency scenarios, its modular and generalizable design enables future extension to new control types and organizational environments.
This Master's Thesis presents the design and implementation of an accelerator for the automated configuration of data quality controls in data ingestion and transformation processes, within modern analytical architectures based on Snowflake. The solution adopts a declarative approach that allows users to define the desired controls without having to implement the underlying technical logic, promoting reusability, traceability, and scalability. The architecture relies on native Snowflake components such as streams, tasks, and stored procedures to build a reactive system capable of autonomously executing controls whenever changes occur in their configuration. Supported validations include freshness checks, uniqueness, null values, and consistency between tables. The system is complemented by a web interface that enables visual definition of controls without writing SQL code, and a Power BI dashboard that facilitates monitoring by both technical and business users. These components enhance its practical applicability and support adoption in real-world environments without requiring advanced expertise. As a proof of concept, a use case with data from the Formula 1 championship was simulated, demonstrating the system's effectiveness in error detection, result structuring, and automation of repetitive tasks. Although currently limited to Snowflake and not validated under high-concurrency scenarios, its modular and generalizable design enables future extension to new control types and organizational environments.
Direction
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
Graña Omil, Ángel (Co-tutorships)
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
Graña Omil, Ángel (Co-tutorships)
Court
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
FireWatcher: Detection of forest fires in early stages with low power systems
Authorship
M.J.C.F.
Máster Universitario en Internet de las Cosas - IoT
M.J.C.F.
Máster Universitario en Internet de las Cosas - IoT
Defense date
07.14.2025 11:00
07.14.2025 11:00
Summary
This project aims to develop an IoT system for early wildfire detection in remote rural areas. The device is powered by solar energy, allowing autonomous and sustainable operation. It uses gas, temperature, and humidity sensors to detect potential fire outbreaks and transmits data wirelessly via LoRaWAN. Energy harvesting and power management strategies were implemented to extend the system's lifespan. Future work includes training prediction models and deploying a real sensor network.
This project aims to develop an IoT system for early wildfire detection in remote rural areas. The device is powered by solar energy, allowing autonomous and sustainable operation. It uses gas, temperature, and humidity sensors to detect potential fire outbreaks and transmits data wirelessly via LoRaWAN. Energy harvesting and power management strategies were implemented to extend the system's lifespan. Future work includes training prediction models and deploying a real sensor network.
Direction
PARDO SECO, FERNANDO RAFAEL (Tutorships)
VAZQUEZ ALVAREZ, ALVARO (Co-tutorships)
PARDO SECO, FERNANDO RAFAEL (Tutorships)
VAZQUEZ ALVAREZ, ALVARO (Co-tutorships)
Court
CARIÑENA AMIGO, MARIA PURIFICACION (Chairman)
Burguillo Rial, Juan Carlos (Secretary)
Pardo Martínez, Xoan Carlos (Member)
CARIÑENA AMIGO, MARIA PURIFICACION (Chairman)
Burguillo Rial, Juan Carlos (Secretary)
Pardo Martínez, Xoan Carlos (Member)
Application of Novel Anomaly Detection Models in Big Data Contexts
Authorship
M.F.L.
Master in Masive Data Analisys Tecnologies: Big Data
M.F.L.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 16:30
07.16.2025 16:30
Summary
This work addresses the problem of early anomaly detection in corporate cybersecurity environments, where the volume and velocity of data hinder the application of traditional solutions. The project is set within a realistic context of network traffic and log analysis that monitors user behavior with respect to applications, enabling the identification of suspicious patterns in UEBA (User and Entity Behavior Analytics) scenarios. In this context, a beta-Variational Autoencoder (beta-VAE) model is proposed and developed for anomaly detection, featuring a mean-field variational inference approach and capable of modeling complex behaviors through a probabilistic latent space. The solution is implemented on a scalable architecture based on Big Data ecosystem technologies such as Kafka, Spark, and Airflow. Unlike traditional autoencoders, the VAE provides advantages in latent space regularization and in quantifying the uncertainty associated with each reconstruction, which improves the interpretability and explanation of detected anomalies. The model was implemented in a modular and flexible manner, facilitating tuning and scalability, and was evaluated using public cybersecurity datasets. The results show competitive performance in anomaly detection, surpassing classical approaches in certain aspects and clearly demonstrating its potential for integration into streaming data processing pipelines. In addition, the model was integrated into a professional and reproducible infrastructure capable of operating in real time over large-scale data streams, representing a significant contribution to automated threat detection in Big Data environments. Finally, future work is outlined, focusing on real-world data application, performance optimization, and improved accuracy through advanced probabilistic interpretability techniques.
This work addresses the problem of early anomaly detection in corporate cybersecurity environments, where the volume and velocity of data hinder the application of traditional solutions. The project is set within a realistic context of network traffic and log analysis that monitors user behavior with respect to applications, enabling the identification of suspicious patterns in UEBA (User and Entity Behavior Analytics) scenarios. In this context, a beta-Variational Autoencoder (beta-VAE) model is proposed and developed for anomaly detection, featuring a mean-field variational inference approach and capable of modeling complex behaviors through a probabilistic latent space. The solution is implemented on a scalable architecture based on Big Data ecosystem technologies such as Kafka, Spark, and Airflow. Unlike traditional autoencoders, the VAE provides advantages in latent space regularization and in quantifying the uncertainty associated with each reconstruction, which improves the interpretability and explanation of detected anomalies. The model was implemented in a modular and flexible manner, facilitating tuning and scalability, and was evaluated using public cybersecurity datasets. The results show competitive performance in anomaly detection, surpassing classical approaches in certain aspects and clearly demonstrating its potential for integration into streaming data processing pipelines. In addition, the model was integrated into a professional and reproducible infrastructure capable of operating in real time over large-scale data streams, representing a significant contribution to automated threat detection in Big Data environments. Finally, future work is outlined, focusing on real-world data application, performance optimization, and improved accuracy through advanced probabilistic interpretability techniques.
Direction
GALLEGO FONTENLA, VICTOR JOSE (Tutorships)
Cereijo García, Pablo (Co-tutorships)
GALLEGO FONTENLA, VICTOR JOSE (Tutorships)
Cereijo García, Pablo (Co-tutorships)
Court
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
Characterization of oceanographic conditions through dynamic clustering of the wave spectrum
Authorship
M.G.L.
Master in Masive Data Analisys Tecnologies: Big Data
M.G.L.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 16:30
07.16.2025 16:30
Summary
This work presents a methodology for characterizing oceanographic conditions through the clustering of bidimensional wave spectra. A nonparametric Bayesian model based on Switching Linear Dynamical Systems over Gaussian Processes (SLDS-GP), combined with a Hierarchical Dirichlet Process (HDP), was employed. This approach allows for the automatic identification of patterns in marine wave conditions without predefining the number of groups, adapting to the complexity of the observed data. The methodology was applied to spectral data from a buoy of the National Data Buoy Center (NDBC) located off the coast of North Carolina (USA), covering the years 2017 and 2018. The analysis identified twelve clusters representative of the marine dynamics in the area. A spectral partitioning procedure was then applied to each cluster to recognize the present sea systems and classify them according to their physical meaning (wind sea, swell, or transitional sea). The results show that the model is capable of detecting coherent structures in the bidimensional frequency-direction space, as well as capturing certain dynamics of transitions between states, offering an interpretable representation of wave conditions.
This work presents a methodology for characterizing oceanographic conditions through the clustering of bidimensional wave spectra. A nonparametric Bayesian model based on Switching Linear Dynamical Systems over Gaussian Processes (SLDS-GP), combined with a Hierarchical Dirichlet Process (HDP), was employed. This approach allows for the automatic identification of patterns in marine wave conditions without predefining the number of groups, adapting to the complexity of the observed data. The methodology was applied to spectral data from a buoy of the National Data Buoy Center (NDBC) located off the coast of North Carolina (USA), covering the years 2017 and 2018. The analysis identified twelve clusters representative of the marine dynamics in the area. A spectral partitioning procedure was then applied to each cluster to recognize the present sea systems and classify them according to their physical meaning (wind sea, swell, or transitional sea). The results show that the model is capable of detecting coherent structures in the bidimensional frequency-direction space, as well as capturing certain dynamics of transitions between states, offering an interpretable representation of wave conditions.
Direction
FELIX LAMAS, PAULO MANUEL (Tutorships)
RODRIGUEZ PRESEDO, JESUS MARIA (Co-tutorships)
FELIX LAMAS, PAULO MANUEL (Tutorships)
RODRIGUEZ PRESEDO, JESUS MARIA (Co-tutorships)
Court
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
Biomedical Image Segmentation Based on Foundation Models Adapted Without Retraining and With Uncertainty Estimation
Authorship
F.G.S.
Master in artificial intelligence
F.G.S.
Master in artificial intelligence
Defense date
07.18.2025 09:30
07.18.2025 09:30
Summary
Two important shortcomings limit the effectiveness of current learning-based solutions for biomedical image segmentation. One major issue is that new segmentation tasks typically demand the training or fine-tuning of new models, a resource- intensive process requiring significant machine learning expertise that is often beyond the reach of medical researchers and clinicians. The second critical limitation is that most existing segmentation methods yield only a single, deterministic segmentation mask, despite the considerable uncertainty often present regarding what constitutes correct segmentation. This uncertainty arises from both inherent data variability (aleatoric) and the model’s own knowledge gaps (epistemic). This work specifically addresses the estimation of these uncertainties in the segmentation process. By understanding and quantifying these uncertainties, we can significantly increase the explainability and interpretability of segmentation models, enabling more confident and informed decision-making in vital medical applications. We propose to develop a generalized method to analyze these different uncertainty types without requiring model retraining.
Two important shortcomings limit the effectiveness of current learning-based solutions for biomedical image segmentation. One major issue is that new segmentation tasks typically demand the training or fine-tuning of new models, a resource- intensive process requiring significant machine learning expertise that is often beyond the reach of medical researchers and clinicians. The second critical limitation is that most existing segmentation methods yield only a single, deterministic segmentation mask, despite the considerable uncertainty often present regarding what constitutes correct segmentation. This uncertainty arises from both inherent data variability (aleatoric) and the model’s own knowledge gaps (epistemic). This work specifically addresses the estimation of these uncertainties in the segmentation process. By understanding and quantifying these uncertainties, we can significantly increase the explainability and interpretability of segmentation models, enabling more confident and informed decision-making in vital medical applications. We propose to develop a generalized method to analyze these different uncertainty types without requiring model retraining.
Direction
Pardo López, Xosé Manuel (Tutorships)
Pardo López, Xosé Manuel (Tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
Predictive analysis on time series for cost reduction in raw material procurement within the food industry.
Authorship
X.I.M.
Master in Masive Data Analisys Tecnologies: Big Data
X.I.M.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 17:00
07.16.2025 17:00
Summary
SOSFood is a European initiative that proposes the use of massive data exploitation and Machine Learning technologies to provide a holistic view of the European food system, building personalized predictive tools that support actors in the food chain in making well-informed decisions. This work develops a predictive analysis framework based on the proposal and application of time series modeling to reduce the procurement costs of raw materials, specifically for a Greek company participating in the project. Forecasting models based on statistical methods from the ARIMA family and deep learning methods from the N-BEATS architecture are implemented on the use case. An analysis and comparison of results is carried out, assessing the reliability of the predictions and their possible incorporation into the decision support system.
SOSFood is a European initiative that proposes the use of massive data exploitation and Machine Learning technologies to provide a holistic view of the European food system, building personalized predictive tools that support actors in the food chain in making well-informed decisions. This work develops a predictive analysis framework based on the proposal and application of time series modeling to reduce the procurement costs of raw materials, specifically for a Greek company participating in the project. Forecasting models based on statistical methods from the ARIMA family and deep learning methods from the N-BEATS architecture are implemented on the use case. An analysis and comparison of results is carried out, assessing the reliability of the predictions and their possible incorporation into the decision support system.
Direction
FELIX LAMAS, PAULO MANUEL (Tutorships)
RODRIGUEZ PRESEDO, JESUS MARIA (Co-tutorships)
FELIX LAMAS, PAULO MANUEL (Tutorships)
RODRIGUEZ PRESEDO, JESUS MARIA (Co-tutorships)
Court
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
Autotracking module for long range aerial objects
Authorship
I.A.M.S.
Master in artificial intelligence
I.A.M.S.
Master in artificial intelligence
Defense date
07.18.2025 10:00
07.18.2025 10:00
Summary
This thesis focuses on the evaluation of different tracking methods integrated in MMTracking, an open-source library that provides implementations for single-object and multiple-object trackers, exploring the capacities of each tracker in detail, to identify the best method for maximization of the capture of relevant information about the moving object. The evaluation was performed using known benchmarks such as MOTChallenge and OTB2015 that provide diverse conditions and scenarios. The results led to a comprehensive analysis of each method, showing which tracking method handles better each scenario. Additionally, this study contributes to the continuous search of tracking algorithms by providing insights and identifying areas for improvement.
This thesis focuses on the evaluation of different tracking methods integrated in MMTracking, an open-source library that provides implementations for single-object and multiple-object trackers, exploring the capacities of each tracker in detail, to identify the best method for maximization of the capture of relevant information about the moving object. The evaluation was performed using known benchmarks such as MOTChallenge and OTB2015 that provide diverse conditions and scenarios. The results led to a comprehensive analysis of each method, showing which tracking method handles better each scenario. Additionally, this study contributes to the continuous search of tracking algorithms by providing insights and identifying areas for improvement.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Blanco Freire, Lara (Co-tutorships)
Dago Casas, Pablo (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Blanco Freire, Lara (Co-tutorships)
Dago Casas, Pablo (Co-tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
KIME: Kumite Intelligent Movement Evaluation
Authorship
H.M.C.
Master in artificial intelligence
H.M.C.
Master in artificial intelligence
Defense date
07.18.2025 10:30
07.18.2025 10:30
Summary
This thesis addresses the challenge of objectively analyzing light-contact Karate (Kumite) fights, where rapid and precise techniques must be judged in real time, by harnessing computer vision and deep learning solely from video recordings. Traditional scoring relies on human referees, which introduces subjectivity, potential bias, and limited capacity to process large volumes of footage for athlete scouting and performance evaluation. To overcome these limitations, three interrelated components were developed: First, a data extraction pipeline was devised to locate and segment moments of interest in full-length match videos. By combining scoreboard change detection via a lightweight CNN and manual validation, a curated dataset of scoring and non-scoring events was generated. Second, a workflow was created to distinguish the two fighters, Aka and Ao, through tatami boundary detection, person detection, instance segmentation, and color-based filtering. Object tracking was then applied to reduce computational load while maintaining identity consistency across frames, resulting in a validated classification dataset. Finally, transfer learning strategies were explored for classifying individual frames as scoring or non-scoring actions and assigning the correct athlete and point value. Two approaches were compared: freezing the feature extractor versus fine-tuning upper layers of a pretrained image classifier. The frozen-backbone model demonstrated superior generalization and achieved low false positives rate, an attribute essential for real-world integration into semi-automated judging or analytics systems. Overall, this work demonstrates the feasibility of a non-intrusive, video-only solution for Kumite analysis and lays a foundation for further development toward real-time deployment, enhanced explainability, and expanded tactical insights.
This thesis addresses the challenge of objectively analyzing light-contact Karate (Kumite) fights, where rapid and precise techniques must be judged in real time, by harnessing computer vision and deep learning solely from video recordings. Traditional scoring relies on human referees, which introduces subjectivity, potential bias, and limited capacity to process large volumes of footage for athlete scouting and performance evaluation. To overcome these limitations, three interrelated components were developed: First, a data extraction pipeline was devised to locate and segment moments of interest in full-length match videos. By combining scoreboard change detection via a lightweight CNN and manual validation, a curated dataset of scoring and non-scoring events was generated. Second, a workflow was created to distinguish the two fighters, Aka and Ao, through tatami boundary detection, person detection, instance segmentation, and color-based filtering. Object tracking was then applied to reduce computational load while maintaining identity consistency across frames, resulting in a validated classification dataset. Finally, transfer learning strategies were explored for classifying individual frames as scoring or non-scoring actions and assigning the correct athlete and point value. Two approaches were compared: freezing the feature extractor versus fine-tuning upper layers of a pretrained image classifier. The frozen-backbone model demonstrated superior generalization and achieved low false positives rate, an attribute essential for real-world integration into semi-automated judging or analytics systems. Overall, this work demonstrates the feasibility of a non-intrusive, video-only solution for Kumite analysis and lays a foundation for further development toward real-time deployment, enhanced explainability, and expanded tactical insights.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
RODRIGUEZ FERNANDEZ, ISMAEL (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
RODRIGUEZ FERNANDEZ, ISMAEL (Co-tutorships)
Court
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
IGLESIAS RODRIGUEZ, ROBERTO (Chairman)
CORRALES RAMON, JUAN ANTONIO (Secretary)
ALONSO MORAL, JOSE MARIA (Member)
Analysis of Federated Learning Strategies Robust to Heterogeneous Data
Authorship
R.M.E.
Master in Masive Data Analisys Tecnologies: Big Data
R.M.E.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.15.2025 16:00
07.15.2025 16:00
Summary
Federated Learning (FL) enables decentralized training across clients without exposing raw data, but suffers under heterogeneous data distributions. Traditional algorithms often fail to generalize globally or to personalize locally. In this work, we present FLProtector, a dual-model FL framework where each client learns a local increment over a shared global model and dynamically selects which model to use at inference time. This selection is performed via a client-specific autoencoder, trained to detect out-of-distribution inputs. To improve robustness, FLProtector also incorporates a gradient consistency-based aggregation mechanism, which adaptively downweights updates from clients that deviate from the expected optimization path. We evaluate FLProtector under varying degrees of heterogeneity using the challenging Digit-Five benchmark. Results show that it consistently outperforms standard FL methods and state-of-the-art personalized approaches, achieving a superior balance between personalization and generalization. The system proves robust even in the presence of adversarial clients, and ablation studies confirm the complementary roles of its core components. Finally, the approach demonstrates competitive performance without requiring sensitive hyperparameter tuning, making it a practical solution for real-world FL deployments.
Federated Learning (FL) enables decentralized training across clients without exposing raw data, but suffers under heterogeneous data distributions. Traditional algorithms often fail to generalize globally or to personalize locally. In this work, we present FLProtector, a dual-model FL framework where each client learns a local increment over a shared global model and dynamically selects which model to use at inference time. This selection is performed via a client-specific autoencoder, trained to detect out-of-distribution inputs. To improve robustness, FLProtector also incorporates a gradient consistency-based aggregation mechanism, which adaptively downweights updates from clients that deviate from the expected optimization path. We evaluate FLProtector under varying degrees of heterogeneity using the challenging Digit-Five benchmark. Results show that it consistently outperforms standard FL methods and state-of-the-art personalized approaches, achieving a superior balance between personalization and generalization. The system proves robust even in the presence of adversarial clients, and ablation studies confirm the complementary roles of its core components. Finally, the approach demonstrates competitive performance without requiring sensitive hyperparameter tuning, making it a practical solution for real-world FL deployments.
Direction
IGLESIAS RODRIGUEZ, ROBERTO (Tutorships)
GARCIA POLO, FRANCISCO JAVIER (Co-tutorships)
IGLESIAS RODRIGUEZ, ROBERTO (Tutorships)
GARCIA POLO, FRANCISCO JAVIER (Co-tutorships)
Court
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
Simulation of Basketball Strategies through Reinforcement Learning in a Board Game Environment
Authorship
L.M.L.
Master in Masive Data Analisys Tecnologies: Big Data
L.M.L.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.15.2025 17:00
07.15.2025 17:00
Summary
Reinforcement Learning is an advanced artificial intelligence technique that allows an agent to learn to make optimal decisions through interaction with its environment and feedback received in the form of rewards. This Master’s Thesis explores its application in the sports domain, specifically in basketball, through the development of a simulated environment implemented with a modular architecture. For this purpose, the Gymnasium library was used for defining the environment, Pygame for visualization, and MLflow for managing and tracking experiments, which facilitated the analysis of the agent. The environment, structured on a board divided into cells representing the basketball court, allows players to move, pass the ball, or shoot according to defined rules. The simulation was developed through four progressive versions, each introducing a higher level of complexity: from an isolated agent without opposition, through the incorporation of passing actions and shooting probabilities, to an advanced version with active defense, differentiated functional roles, and real data extracted from the NBA. Throughout training, the agent showed a clear evolution: from impulsive actions such as immediate shots to more elaborate strategies in the advanced versions, prioritizing ball possession, searching for spaces, and selecting the most suitable player to shoot. To evaluate this progression, specific metrics were defined such as average reward per episode, duration of plays, percentage of early shots, and heat maps showing the spatial distribution of key events. The integration of real NBA data allowed analysis of the agent’s ability to adapt its decisions to functional profiles such as point guards, forwards, or centers, demonstrating a partial understanding of basketball’s tactical logic. Although the system presents certain limitations, such as the mapping of real data to the board or excessive defensive penalties, the results indicate the viability of reinforcement learning to simulate tactical sports behaviors. Moreover, this work establishes a solid foundation for future expansions, including multi-agent training with decentralized coordination, autonomous defensive learning, or the simulation of complete game sequences including restarts, fouls, and possession changes. All of this brings the model closer to a more functional and realistic representation of basketball.
Reinforcement Learning is an advanced artificial intelligence technique that allows an agent to learn to make optimal decisions through interaction with its environment and feedback received in the form of rewards. This Master’s Thesis explores its application in the sports domain, specifically in basketball, through the development of a simulated environment implemented with a modular architecture. For this purpose, the Gymnasium library was used for defining the environment, Pygame for visualization, and MLflow for managing and tracking experiments, which facilitated the analysis of the agent. The environment, structured on a board divided into cells representing the basketball court, allows players to move, pass the ball, or shoot according to defined rules. The simulation was developed through four progressive versions, each introducing a higher level of complexity: from an isolated agent without opposition, through the incorporation of passing actions and shooting probabilities, to an advanced version with active defense, differentiated functional roles, and real data extracted from the NBA. Throughout training, the agent showed a clear evolution: from impulsive actions such as immediate shots to more elaborate strategies in the advanced versions, prioritizing ball possession, searching for spaces, and selecting the most suitable player to shoot. To evaluate this progression, specific metrics were defined such as average reward per episode, duration of plays, percentage of early shots, and heat maps showing the spatial distribution of key events. The integration of real NBA data allowed analysis of the agent’s ability to adapt its decisions to functional profiles such as point guards, forwards, or centers, demonstrating a partial understanding of basketball’s tactical logic. Although the system presents certain limitations, such as the mapping of real data to the board or excessive defensive penalties, the results indicate the viability of reinforcement learning to simulate tactical sports behaviors. Moreover, this work establishes a solid foundation for future expansions, including multi-agent training with decentralized coordination, autonomous defensive learning, or the simulation of complete game sequences including restarts, fouls, and possession changes. All of this brings the model closer to a more functional and realistic representation of basketball.
Direction
Losada Carril, David Enrique (Tutorships)
González López, Manuel (Co-tutorships)
Losada Carril, David Enrique (Tutorships)
González López, Manuel (Co-tutorships)
Court
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
Optimization of Human Resource Management: Assignment, Feasibility, and Performance Analysis in Projects
Authorship
B.M.V.
Master in Masive Data Analisys Tecnologies: Big Data
B.M.V.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 17:30
07.16.2025 17:30
Summary
This Master's Thesis aims to design and implement a system for the efficient allocation of human resources to business projects, based on advanced data analysis and machine learning techniques. The system addresses a real planning problem in dynamic environments by integrating multiple variables such as staff skills, workload, execution times, and previous experience to provide personalized task assignment recommendations. It is based on data extracted from Jira, a project management platform, applying techniques for data cleaning, anonymization, and semantic enrichment. The prediction model is structured into several functional modules: time estimation, skills classification, generation of employee technical profiles, and prediction of optimal candidates. The system architecture is supported by a PostgreSQL relational database and is orchestrated through automated scripts deployed with Docker, allowing for manual or scheduled execution. The generated results are presented in an Excel file with structured sheets to facilitate analysis by managers, including filters by tasks, employees, and projects. The system evaluation was carried out in collaboration with a project manager, validating the usefulness of the recommendations in a real-world environment. The conclusions acknowledge the system’s limitations, such as the absence of subjective variables (motivations, preferences) and the need to improve the granularity in assigning experience by skill. Finally, future improvement directions are proposed, such as the integration of a work calendar, the use of interactive dashboards (Power BI), the expansion of the pre-labeled dataset, and the adoption of advanced orchestration tools like Apache Airflow. This project represents a modular, scalable, and data-driven solution to optimize human talent management in organizations, with a practical approach focused on strategic decision-making.
This Master's Thesis aims to design and implement a system for the efficient allocation of human resources to business projects, based on advanced data analysis and machine learning techniques. The system addresses a real planning problem in dynamic environments by integrating multiple variables such as staff skills, workload, execution times, and previous experience to provide personalized task assignment recommendations. It is based on data extracted from Jira, a project management platform, applying techniques for data cleaning, anonymization, and semantic enrichment. The prediction model is structured into several functional modules: time estimation, skills classification, generation of employee technical profiles, and prediction of optimal candidates. The system architecture is supported by a PostgreSQL relational database and is orchestrated through automated scripts deployed with Docker, allowing for manual or scheduled execution. The generated results are presented in an Excel file with structured sheets to facilitate analysis by managers, including filters by tasks, employees, and projects. The system evaluation was carried out in collaboration with a project manager, validating the usefulness of the recommendations in a real-world environment. The conclusions acknowledge the system’s limitations, such as the absence of subjective variables (motivations, preferences) and the need to improve the granularity in assigning experience by skill. Finally, future improvement directions are proposed, such as the integration of a work calendar, the use of interactive dashboards (Power BI), the expansion of the pre-labeled dataset, and the adoption of advanced orchestration tools like Apache Airflow. This project represents a modular, scalable, and data-driven solution to optimize human talent management in organizations, with a practical approach focused on strategic decision-making.
Direction
Sánchez Vila, Eduardo Manuel (Tutorships)
Ramos Macías, Óscar (Co-tutorships)
Sánchez Vila, Eduardo Manuel (Tutorships)
Ramos Macías, Óscar (Co-tutorships)
Court
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
Development of a Computer Vision-Based Tool for the Automatic Detection of Basketball Shots and Court Position Analysis.
Authorship
A.M.R.
Master in artificial intelligence
A.M.R.
Master in artificial intelligence
Defense date
07.17.2025 09:30
07.17.2025 09:30
Summary
This work presents a modular computer vision system for automatic detection of basketball shots and court position analysis using single-camera footage. Aimed at democratizing access to sports analytics and reducing reliance on manual annotation, the tool integrates state-of-the-art object detection (YOLO and RT-DETR), tracking (ByteTrack), and homography-based court mapping to position players. It detects shot attempts, classifies outcomes (made/missed), assigns possession, and generates both annotated videos and structured datasets. Evaluated on real amateur videos, the system demonstrates robust performance across spatial, temporal, and classification metrics. These results highlight its potential as a practical and accessible solution for automated basketball analytics.
This work presents a modular computer vision system for automatic detection of basketball shots and court position analysis using single-camera footage. Aimed at democratizing access to sports analytics and reducing reliance on manual annotation, the tool integrates state-of-the-art object detection (YOLO and RT-DETR), tracking (ByteTrack), and homography-based court mapping to position players. It detects shot attempts, classifies outcomes (made/missed), assigns possession, and generates both annotated videos and structured datasets. Evaluated on real amateur videos, the system demonstrates robust performance across spatial, temporal, and classification metrics. These results highlight its potential as a practical and accessible solution for automated basketball analytics.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
MALLO ANTELO, JAIME (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
MALLO ANTELO, JAIME (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
State-of-the-Art Voice Models for Galician Language Using a Small-to-Medium TTS Dataset
Authorship
A.M.S.
Master in artificial intelligence
A.M.S.
Master in artificial intelligence
Defense date
07.17.2025 10:00
07.17.2025 10:00
Summary
Text-to-speech (TTS) synthesis plays a crucial role in human-computer interaction and remains a hot research topic in the speech technology and machine learning communities. With advances in deep learning techniques and increased computing power, deep neural network-based TTS systems have emerged as a powerful alternative to traditional methods. Recently, end-to-end deep learning TTS models have produced impressive natural-sounding and high-quality results. However, extending these models to multiple languages and speakers is challenging, especially for low-to-medium resource languages such as Galician. In our study, we use an open small-to-medium Galician TTS dataset to train different voice models in Galician. We also apply synthetic data generation to address identified shortcomings in the original dataset. We explore state-of-the-art architectures, including training from scratch and transfer learning techniques. The resulting models are validated and compared through subjective and automatic evaluations.
Text-to-speech (TTS) synthesis plays a crucial role in human-computer interaction and remains a hot research topic in the speech technology and machine learning communities. With advances in deep learning techniques and increased computing power, deep neural network-based TTS systems have emerged as a powerful alternative to traditional methods. Recently, end-to-end deep learning TTS models have produced impressive natural-sounding and high-quality results. However, extending these models to multiple languages and speakers is challenging, especially for low-to-medium resource languages such as Galician. In our study, we use an open small-to-medium Galician TTS dataset to train different voice models in Galician. We also apply synthetic data generation to address identified shortcomings in the original dataset. We explore state-of-the-art architectures, including training from scratch and transfer learning techniques. The resulting models are validated and compared through subjective and automatic evaluations.
Direction
BUGARIN DIZ, ALBERTO JOSE (Tutorships)
MAGARIÑOS IGLESIAS, MARIA DEL CARMEN (Co-tutorships)
BUGARIN DIZ, ALBERTO JOSE (Tutorships)
MAGARIÑOS IGLESIAS, MARIA DEL CARMEN (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Anomaly Detection Using Autoencoder Models in Industrial Environments
Authorship
F.M.S.
Master in artificial intelligence
F.M.S.
Master in artificial intelligence
Defense date
07.17.2025 10:30
07.17.2025 10:30
Summary
The increasing connectivity and automation in Industry 4.0 environments has introduced new challenges for ensuring operational reliability and security. Anomaly detection plays a crucial role in identifying failures and cyberattacks that could compromise production systems. This work investigates the use of autoencoder-based models for unsupervised anomaly detection in both network traffic and sensor data, collected from a simulated cocktail production system. A fully-connected autoencoder is employed to detect deviations in Modbus network flows, while a sequence-to-one LSTM-autoencoder is used to model temporal patterns in multivariate sensor streams. Both models are trained on normal data and evaluated under realistic attack scenarios, including Modbus register tampering and SYN Flood denial-of-service. Experimental results demonstrate that autoencoders can effectively detect anomalies in industrial settings, with LSTM-based models offering improved performance in environments with cyclic behavior.
The increasing connectivity and automation in Industry 4.0 environments has introduced new challenges for ensuring operational reliability and security. Anomaly detection plays a crucial role in identifying failures and cyberattacks that could compromise production systems. This work investigates the use of autoencoder-based models for unsupervised anomaly detection in both network traffic and sensor data, collected from a simulated cocktail production system. A fully-connected autoencoder is employed to detect deviations in Modbus network flows, while a sequence-to-one LSTM-autoencoder is used to model temporal patterns in multivariate sensor streams. Both models are trained on normal data and evaluated under realistic attack scenarios, including Modbus register tampering and SYN Flood denial-of-service. Experimental results demonstrate that autoencoders can effectively detect anomalies in industrial settings, with LSTM-based models offering improved performance in environments with cyclic behavior.
Direction
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
Court
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Taboada González, José Ángel (Chairman)
MERA PEREZ, DAVID (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Advanced Evolution of a Formula 1 Data System : Predictive Analytics and New Metrics
Authorship
L.P.M.
Master in Masive Data Analisys Tecnologies: Big Data
L.P.M.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.15.2025 17:30
07.15.2025 17:30
Summary
This project presents the development of a comprehensive analysis and prediction system focused on the world of Formula 1, combining machine learning techniques, data processing, and interactive visualization. It emerges as a response to the lack of open, accessible, and technically robust tools that allow for the exploration of driver and team performance from an analytical perspective. The proposed solution is structured around a complete and up-to-date database, designed to integrate technical, meteorological, and contextual information about each race session. It is fed through automated processes that enable continuous data collection, cleaning, and normalization. Based on this foundation, regression models have been built to estimate both qualifying times and ideal lap times, using advanced algorithms such as XGBoost, Random Forest, and multilayer neural networks. These models are trained and validated using robust techniques, taking into account the variable and dynamic nature of data in Formula 1. In addition to predictive models, the system includes a set of custom metrics that allow for evaluating driver performance, identifying systematic deviations from expectations, and analyzing strategic decisions such as tire choices and race efficiency. These indicators offer a richer and more contextualized perspective than traditional metrics. All analysis is integrated into an interactive interface developed with Streamlit, which connects in real time to a database hosted on Snowflake. This tool allows for the visual exploration of both predictions and metrics, adapting to different user profiles, from technical staff to enthusiasts.
This project presents the development of a comprehensive analysis and prediction system focused on the world of Formula 1, combining machine learning techniques, data processing, and interactive visualization. It emerges as a response to the lack of open, accessible, and technically robust tools that allow for the exploration of driver and team performance from an analytical perspective. The proposed solution is structured around a complete and up-to-date database, designed to integrate technical, meteorological, and contextual information about each race session. It is fed through automated processes that enable continuous data collection, cleaning, and normalization. Based on this foundation, regression models have been built to estimate both qualifying times and ideal lap times, using advanced algorithms such as XGBoost, Random Forest, and multilayer neural networks. These models are trained and validated using robust techniques, taking into account the variable and dynamic nature of data in Formula 1. In addition to predictive models, the system includes a set of custom metrics that allow for evaluating driver performance, identifying systematic deviations from expectations, and analyzing strategic decisions such as tire choices and race efficiency. These indicators offer a richer and more contextualized perspective than traditional metrics. All analysis is integrated into an interactive interface developed with Streamlit, which connects in real time to a database hosted on Snowflake. This tool allows for the visual exploration of both predictions and metrics, adapting to different user profiles, from technical staff to enthusiasts.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Curra Durán, Alberto (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Curra Durán, Alberto (Co-tutorships)
Court
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
Scraping and Authorship Analysis: Identifying Writing Patterns in Online Forums Using Machine Learning Models
Authorship
R.R.J.
Master in Masive Data Analisys Tecnologies: Big Data
R.R.J.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 16:00
07.16.2025 16:00
Summary
In a digital environment characterized by anonymity and the rise of fake identities, authorship attribution emerges as an essential tool for inferring who is behind a text based on its linguistic style. This work addresses authorship attribution in online forums, with a particular focus on users involved in discussions about cryptocurrency transactions. To this end, data was collected using web scraping techniques, selecting posts from previously identified authors. The texts were preprocessed and represented using character n-grams, applying vectorization schemes such as TF-IDF. Different classification models were then trained and evaluated, ranging from traditional approaches such as SVM, Rocchio, or Random Forest to deep language models such as BERT. The results allow for comparing the performance of the different models and analyzing their ability to identify persistent stylistic patterns even when users operate with disposable accounts or multiple aliases.
In a digital environment characterized by anonymity and the rise of fake identities, authorship attribution emerges as an essential tool for inferring who is behind a text based on its linguistic style. This work addresses authorship attribution in online forums, with a particular focus on users involved in discussions about cryptocurrency transactions. To this end, data was collected using web scraping techniques, selecting posts from previously identified authors. The texts were preprocessed and represented using character n-grams, applying vectorization schemes such as TF-IDF. Different classification models were then trained and evaluated, ranging from traditional approaches such as SVM, Rocchio, or Random Forest to deep language models such as BERT. The results allow for comparing the performance of the different models and analyzing their ability to identify persistent stylistic patterns even when users operate with disposable accounts or multiple aliases.
Direction
Losada Carril, David Enrique (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
Losada Carril, David Enrique (Tutorships)
Pérez Vilarelle, Laura (Co-tutorships)
Court
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
MUCIENTES MOLINA, MANUEL FELIPE (Chairman)
IGLESIAS RODRIGUEZ, ROBERTO (Secretary)
RIOS VIQUEIRA, JOSE RAMON (Member)
Development of an AI-assisted Data Operating System (Data OS)
Authorship
A.R.P.
Master in Masive Data Analisys Tecnologies: Big Data
A.R.P.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.16.2025 17:30
07.16.2025 17:30
Summary
This project presents the implementation of a metadata-driven ingestion system built on the Microsoft Fabric platform. The main objective was to migrate and adapt a framework originally developed in Azure Data Factory, preserving its functional logic while leveraging the native capabilities of the new environment. The system is designed to handle data loads from heterogeneous sources into a cloud-based analytical environment. The entire process is metadata-driven, using structured external files to define the behavior of each object without modifying execution logic. The document includes a detailed analysis of the original framework, the technical migration process, the improvements implemented, and the results obtained, as well as a proposal for future steps to continue evolving the system within the corporate ecosystem.
This project presents the implementation of a metadata-driven ingestion system built on the Microsoft Fabric platform. The main objective was to migrate and adapt a framework originally developed in Azure Data Factory, preserving its functional logic while leveraging the native capabilities of the new environment. The system is designed to handle data loads from heterogeneous sources into a cloud-based analytical environment. The entire process is metadata-driven, using structured external files to define the behavior of each object without modifying execution logic. The document includes a detailed analysis of the original framework, the technical migration process, the improvements implemented, and the results obtained, as well as a proposal for future steps to continue evolving the system within the corporate ecosystem.
Direction
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
Acedo Nieto, Carlos (Co-tutorships)
Martínez Torres, María (Co-tutorships)
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
Acedo Nieto, Carlos (Co-tutorships)
Martínez Torres, María (Co-tutorships)
Court
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
VIDAL AGUIAR, JUAN CARLOS (Chairman)
GARCIA POLO, FRANCISCO JAVIER (Secretary)
MERA PEREZ, DAVID (Member)
IoT system for opportunistic communications in maritime-shore environments
Authorship
R.J.S.G.
Máster Universitario en Internet de las Cosas - IoT
R.J.S.G.
Máster Universitario en Internet de las Cosas - IoT
Defense date
07.14.2025 11:30
07.14.2025 11:30
Summary
Although digitisation and IoT are reaching many sectors of society and industry, the primary sector, and especially those more traditionalist parts of it such as shellfishing activities, are lagging behind in the incorporation of new technologies. Resistance to change and the difficulty of implementing solutions at sea means that mussel rafts do not have any digital monitoring or control mechanisms. This leads to extreme inefficiency in the sector. This work proposes the creation of an IoT system that can be exposed to the extreme conditions of the sea to monitor certain variables of interest. In particular, the focus was put on the control of mussel production in the mussel rafts from the measurement of its sinking, which is produced by the fattening of the product. This creation was the result of several successive phases of design, implementation and testing of the system. The end result is a device that has proven to be able to send data safely from the sea, taking advantage of LoRaWAN communication technology, and using an energy harvesting and storage system based on solar panels. The data sent is made available to the user stored in a time series database and can be visualised through a control panel. All this has been built following the paradigm of Opportunistic Edge Computing, processing information as close as possible to the end devices, not overloading Cloud infrastructures and allowing flexible coverage thanks to the possible movement of Cloud access gateways to areas with no coverage.
Although digitisation and IoT are reaching many sectors of society and industry, the primary sector, and especially those more traditionalist parts of it such as shellfishing activities, are lagging behind in the incorporation of new technologies. Resistance to change and the difficulty of implementing solutions at sea means that mussel rafts do not have any digital monitoring or control mechanisms. This leads to extreme inefficiency in the sector. This work proposes the creation of an IoT system that can be exposed to the extreme conditions of the sea to monitor certain variables of interest. In particular, the focus was put on the control of mussel production in the mussel rafts from the measurement of its sinking, which is produced by the fattening of the product. This creation was the result of several successive phases of design, implementation and testing of the system. The end result is a device that has proven to be able to send data safely from the sea, taking advantage of LoRaWAN communication technology, and using an energy harvesting and storage system based on solar panels. The data sent is made available to the user stored in a time series database and can be visualised through a control panel. All this has been built following the paradigm of Opportunistic Edge Computing, processing information as close as possible to the end devices, not overloading Cloud infrastructures and allowing flexible coverage thanks to the possible movement of Cloud access gateways to areas with no coverage.
Direction
Cotos Yáñez, José Manuel (Tutorships)
Fernández Caramés, Tiago (Co-tutorships)
Cotos Yáñez, José Manuel (Tutorships)
Fernández Caramés, Tiago (Co-tutorships)
Court
CARIÑENA AMIGO, MARIA PURIFICACION (Chairman)
Burguillo Rial, Juan Carlos (Secretary)
Pardo Martínez, Xoan Carlos (Member)
CARIÑENA AMIGO, MARIA PURIFICACION (Chairman)
Burguillo Rial, Juan Carlos (Secretary)
Pardo Martínez, Xoan Carlos (Member)
Detection of Hate Speech in Spanish Texts Published on Social Media
Authorship
O.T.P.
Master in Masive Data Analisys Tecnologies: Big Data
O.T.P.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.15.2025 16:30
07.15.2025 16:30
Summary
The rise of hate speech on social media platforms poses serious technological and social challenges. This study presents a comprehensive approach to the automatic detection of hate speech in Spanish texts, with a focus on misogyny and anti-immigrant sentiment. We carry out an in-depth linguistic and semantic analysis. Additionally, we evaluate multiple classification strategies, ranging from classical machine learning models to Transformer-based architectures and Large Language Models (LLMs). Furthermore, we explore the impact of different preprocessing strategies and apply a domain adaptation technique based on guided masking using hate-related lexicons. Experiments are conducted on two benchmark datasets, AMI (IberEval 2018) and HatEval (SemEval 2019), yielding results that outperform previous competition winners and later work done. Our findings highlight the importance of preprocessing and hyperparameter search robustness, and adaptation mechanisms in improving classification performance. Also, LLMs such as GPT-4o achieve competitive results without task-specific fine-tuning, and adapted general-purpose models show measurable improvements over their non-adapted counterparts.
The rise of hate speech on social media platforms poses serious technological and social challenges. This study presents a comprehensive approach to the automatic detection of hate speech in Spanish texts, with a focus on misogyny and anti-immigrant sentiment. We carry out an in-depth linguistic and semantic analysis. Additionally, we evaluate multiple classification strategies, ranging from classical machine learning models to Transformer-based architectures and Large Language Models (LLMs). Furthermore, we explore the impact of different preprocessing strategies and apply a domain adaptation technique based on guided masking using hate-related lexicons. Experiments are conducted on two benchmark datasets, AMI (IberEval 2018) and HatEval (SemEval 2019), yielding results that outperform previous competition winners and later work done. Our findings highlight the importance of preprocessing and hyperparameter search robustness, and adaptation mechanisms in improving classification performance. Also, LLMs such as GPT-4o achieve competitive results without task-specific fine-tuning, and adapted general-purpose models show measurable improvements over their non-adapted counterparts.
Direction
FERNANDEZ PICHEL, MARCOS (Tutorships)
ARAGON SAENZPARDO, MARIO EZRA (Co-tutorships)
FERNANDEZ PICHEL, MARCOS (Tutorships)
ARAGON SAENZPARDO, MARIO EZRA (Co-tutorships)
Court
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
Optimization of hotel review categorization using machine learning techniques and analysis of their impact on decision-making in the Quality department.
Authorship
A.V.R.
Master in Masive Data Analisys Tecnologies: Big Data
A.V.R.
Master in Masive Data Analisys Tecnologies: Big Data
Defense date
07.15.2025 18:00
07.15.2025 18:00
Summary
In the tourism sector, the analysis and classification of customer reviews are essential for establishments to improve their services, optimize user experience, and maintain competitiveness in an increasingly demanding market. Due to the high volume of reviews generated daily, their linguistic diversity, and the semantic complexity of the content, conducting an exhaustive manual analysis is unfeasible. Therefore, it is crucial to have automatic categorization systems capable of efficiently processing and classifying this information. This work builds on an initial hierarchical multi-label automatic categorization system, focused on maximizing the detection of all relevant aspects within the reviews. A set of progressive improvements is proposed, including the expansion of the training set, the update of embedding models, and the integration of techniques based on advanced language models. These improvements lead to a more balanced system that significantly increases the ability to detect relevant categories without compromising classification quality, clearly outperforming the initial setup and providing greater value for decision-making in the continuous improvement of services.
In the tourism sector, the analysis and classification of customer reviews are essential for establishments to improve their services, optimize user experience, and maintain competitiveness in an increasingly demanding market. Due to the high volume of reviews generated daily, their linguistic diversity, and the semantic complexity of the content, conducting an exhaustive manual analysis is unfeasible. Therefore, it is crucial to have automatic categorization systems capable of efficiently processing and classifying this information. This work builds on an initial hierarchical multi-label automatic categorization system, focused on maximizing the detection of all relevant aspects within the reviews. A set of progressive improvements is proposed, including the expansion of the training set, the update of embedding models, and the integration of techniques based on advanced language models. These improvements lead to a more balanced system that significantly increases the ability to detect relevant categories without compromising classification quality, clearly outperforming the initial setup and providing greater value for decision-making in the continuous improvement of services.
Direction
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Comesaña García, Alejandra (Co-tutorships)
MUCIENTES MOLINA, MANUEL FELIPE (Tutorships)
Comesaña García, Alejandra (Co-tutorships)
Court
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)
RODRIGUEZ PRESEDO, JESUS MARIA (Chairman)
Triñanes Fernández, Joaquín Ángel (Secretary)
GALLEGO FONTENLA, VICTOR JOSE (Member)