Development of a profiling and analysis tool for real-time Big Data applications
Authorship
A.B.P.
Bachelor’s Degree in Informatics Engineering
A.B.P.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 11:45
02.20.2025 11:45
Summary
With the exponential advancement of technology today, increasingly large amounts of data are being generated and managed. To handle this enormous volume of data, various tools have been developed over the years in the field of Big Data. One of the most relevant techniques is High-Performance Computing (HPC), as it enables efficient use of available resources in computing clusters to accelerate the processing and analysis of large datasets. The IgnisHPC framework combines HPC and Big Data tasks, allowing the development of multi-language applications in which processing is parallelized. Other Big Data frameworks, such as Spark and Hadoop, provide web interfaces that greatly facilitate the visualization of cluster states. Therefore, this final degree project will develop a profiling tool that extracts data from each of the jobs a user has launched and displays them in a web interface, significantly improving the analysis of task execution status. The first step will be to choose a database for storing the interface data. Then, the frontend the application's user interface will be developed, with an initial discussion on the most suitable web technology. Next, the backend will be implemented to handle requests from IgnisHPC, ensuring the database is updated and reflected in the interface. Finally, functions will be added to IgnisHPC to send HTTP requests to the interface’s backend.
With the exponential advancement of technology today, increasingly large amounts of data are being generated and managed. To handle this enormous volume of data, various tools have been developed over the years in the field of Big Data. One of the most relevant techniques is High-Performance Computing (HPC), as it enables efficient use of available resources in computing clusters to accelerate the processing and analysis of large datasets. The IgnisHPC framework combines HPC and Big Data tasks, allowing the development of multi-language applications in which processing is parallelized. Other Big Data frameworks, such as Spark and Hadoop, provide web interfaces that greatly facilitate the visualization of cluster states. Therefore, this final degree project will develop a profiling tool that extracts data from each of the jobs a user has launched and displays them in a web interface, significantly improving the analysis of task execution status. The first step will be to choose a database for storing the interface data. Then, the frontend the application's user interface will be developed, with an initial discussion on the most suitable web technology. Next, the backend will be implemented to handle requests from IgnisHPC, ensuring the database is updated and reflected in the interface. Finally, functions will be added to IgnisHPC to send HTTP requests to the interface’s backend.
Direction
PICHEL CAMPOS, JUAN CARLOS (Tutorships)
PIÑEIRO POMAR, CESAR ALFREDO (Co-tutorships)
PICHEL CAMPOS, JUAN CARLOS (Tutorships)
PIÑEIRO POMAR, CESAR ALFREDO (Co-tutorships)
Court
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Search Technologies and Longitudinal Analysis of Song Lyrics: The Case of Bruce Springsteen's Discography
Authorship
M.B.C.
Bachelor’s Degree in Informatics Engineering
M.B.C.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 12:15
02.20.2025 12:15
Summary
Natural language processing (NLP) technologies have undergone significant evolution in the past recent years, driven by advances in model architectures such as Transformers and access to large volumes of data. These techniques enable the development of models capable of understanding, interpreting, and generating human-like text, facilitating applications such as emotion analysis, advanced conversational assistants, personalized recommendation systems, and automatic content generation. In this work, different text processing and information retrieval models are used to conduct an exploratory analysis of song lyrics and extract conclusions about Bruce Springsteen's discography. This analysis is particularly interesting due to the artist's extensive musical career and the emotional complexity of his persona: the artist himself has stated in various interviews that he experienced different depressive episodes throughout his life and has always attempted to reflect his emotional state in his work. This type of study is transferable to other artists. However, Springsteen was chosen due to the vast amount of available data and the ability to validate the obtained results against real information he has publicly shared. The development of this project presents several technological challenges, starting with the automatic extraction of the song corpus (for which focused tracking and information extraction methods were employed), the development of search technologies (allowing users to perform free-text queries on song lyrics), and subsequently, advanced language analysis (to detect emotions or other psychological aspects in the lyrics, for instance). First, an indexer was created to allow song searches by title or lyrics, returning the most relevant results in a reasonable time. Then, different variations of the state-of-the-art BERT model were applied to analyze and explore the discography. This includes the ability to generate song summaries, topic clouds, emotion analysis, and estimates of disorder indicators such as depression. These techniques were implemented in separate modules, and an interactive widget-based interface was developed so that users can easily filter and visualize useful information for their research. Beyond exploring the song corpus and enabling customized searches, the developed technology allows for analyzing potential correlations between the artist's emotional phases and his musical style, both in terms of lyrics (topics, emotions, etc.) and structure (rhythm and key). Based on the software results and music theory, and by comparing different periods of the artist’s career, it is possible to observe a certain correlation. In some cases, the differences between songs from a depressive period and those from a neutral or joyful period are quite pronounced.
Natural language processing (NLP) technologies have undergone significant evolution in the past recent years, driven by advances in model architectures such as Transformers and access to large volumes of data. These techniques enable the development of models capable of understanding, interpreting, and generating human-like text, facilitating applications such as emotion analysis, advanced conversational assistants, personalized recommendation systems, and automatic content generation. In this work, different text processing and information retrieval models are used to conduct an exploratory analysis of song lyrics and extract conclusions about Bruce Springsteen's discography. This analysis is particularly interesting due to the artist's extensive musical career and the emotional complexity of his persona: the artist himself has stated in various interviews that he experienced different depressive episodes throughout his life and has always attempted to reflect his emotional state in his work. This type of study is transferable to other artists. However, Springsteen was chosen due to the vast amount of available data and the ability to validate the obtained results against real information he has publicly shared. The development of this project presents several technological challenges, starting with the automatic extraction of the song corpus (for which focused tracking and information extraction methods were employed), the development of search technologies (allowing users to perform free-text queries on song lyrics), and subsequently, advanced language analysis (to detect emotions or other psychological aspects in the lyrics, for instance). First, an indexer was created to allow song searches by title or lyrics, returning the most relevant results in a reasonable time. Then, different variations of the state-of-the-art BERT model were applied to analyze and explore the discography. This includes the ability to generate song summaries, topic clouds, emotion analysis, and estimates of disorder indicators such as depression. These techniques were implemented in separate modules, and an interactive widget-based interface was developed so that users can easily filter and visualize useful information for their research. Beyond exploring the song corpus and enabling customized searches, the developed technology allows for analyzing potential correlations between the artist's emotional phases and his musical style, both in terms of lyrics (topics, emotions, etc.) and structure (rhythm and key). Based on the software results and music theory, and by comparing different periods of the artist’s career, it is possible to observe a certain correlation. In some cases, the differences between songs from a depressive period and those from a neutral or joyful period are quite pronounced.
Direction
Losada Carril, David Enrique (Tutorships)
FERNANDEZ PICHEL, MARCOS (Co-tutorships)
Losada Carril, David Enrique (Tutorships)
FERNANDEZ PICHEL, MARCOS (Co-tutorships)
Court
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Yatter 2.0: Support for the translation of declarative rules between YARRRML and RML
Authorship
R.B.V.
Bachelor’s Degree in Informatics Engineering
R.B.V.
Bachelor’s Degree in Informatics Engineering
Defense date
02.21.2025 10:30
02.21.2025 10:30
Summary
This project covers an improvement of the code and an expansion of the functionalities of the Yatter tool. This application facilitates the generation of RML mappings to RDF content, used in projects related to artificial intelligence and the semantic web, using YARRRML, a language designed to create mappings in a more developer-friendly format. The implemented improvements optimize performance, make the code more readable and modular, and expand its ability to handle data and cover more complex and diverse mappings.
This project covers an improvement of the code and an expansion of the functionalities of the Yatter tool. This application facilitates the generation of RML mappings to RDF content, used in projects related to artificial intelligence and the semantic web, using YARRRML, a language designed to create mappings in a more developer-friendly format. The implemented improvements optimize performance, make the code more readable and modular, and expand its ability to handle data and cover more complex and diverse mappings.
Direction
CHAVES FRAGA, DAVID (Tutorships)
CHAVES FRAGA, DAVID (Tutorships)
Court
CHAVES FRAGA, DAVID (Student’s tutor)
CHAVES FRAGA, DAVID (Student’s tutor)
Production plant of formic acid from methanol and carbon monoxide
Authorship
P.C.L.
Bachelor's Degree in Chemical Engeneering
P.C.L.
Bachelor's Degree in Chemical Engeneering
Defense date
02.10.2025 09:30
02.10.2025 09:30
Summary
The project consists of the design of a plant for the production of 10,000 t/year of formic acid in two stages. The first reaction consists of the carbonylation of carbon monoxide and methanol to give methyl formate. This compound is then hydrolysed in a second reactor to formic acid at 85% purity. Paula Campo López is in charge of the design of the R-202 formate hydrolysis reactor in Section 200. On the other hand, Alejandro de Prado González is responsible for the design of the T-302 distillation column in Section 300.
The project consists of the design of a plant for the production of 10,000 t/year of formic acid in two stages. The first reaction consists of the carbonylation of carbon monoxide and methanol to give methyl formate. This compound is then hydrolysed in a second reactor to formic acid at 85% purity. Paula Campo López is in charge of the design of the R-202 formate hydrolysis reactor in Section 200. On the other hand, Alejandro de Prado González is responsible for the design of the T-302 distillation column in Section 300.
Direction
FRANCO URIA, MARIA AMAYA (Tutorships)
SINEIRO TORRES, JORGE (Co-tutorships)
FRANCO URIA, MARIA AMAYA (Tutorships)
SINEIRO TORRES, JORGE (Co-tutorships)
Court
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
Elliptic curves and applications in cryptography
Authorship
X.C.A.
Double Bachelor's Degree in Informatics Engineering and Mathematics
X.C.A.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.02.2025 11:30
07.02.2025 11:30
Summary
The aim of this work is to provide a thorough study of elliptic curves, a particular case of algebraic curves that has occupied a prominent position in several branches of mathematics, such as algebraic geometry and number theory, and that has found significant applications in modern cryptography. In order to provide a detailed analysis on both the theoretical aspects and their applications, this bachelor thesis begins by introducing key concepts and results from algebraic geometry that serve as the fundamental framework for the subsequent development. Next, the formal definition of an elliptic curve is presented, as well as its classification based on the j-invariant. After discussing one of its most notable properties, that being its group structure, the focus shifts to the theoretical properties of elliptic curves over finite fields, which are the main object of interest in the final part, where their practical application in cryptography is explored.
The aim of this work is to provide a thorough study of elliptic curves, a particular case of algebraic curves that has occupied a prominent position in several branches of mathematics, such as algebraic geometry and number theory, and that has found significant applications in modern cryptography. In order to provide a detailed analysis on both the theoretical aspects and their applications, this bachelor thesis begins by introducing key concepts and results from algebraic geometry that serve as the fundamental framework for the subsequent development. Next, the formal definition of an elliptic curve is presented, as well as its classification based on the j-invariant. After discussing one of its most notable properties, that being its group structure, the focus shifts to the theoretical properties of elliptic curves over finite fields, which are the main object of interest in the final part, where their practical application in cryptography is explored.
Direction
ALONSO TARRIO, LEOVIGILDO (Tutorships)
ALONSO TARRIO, LEOVIGILDO (Tutorships)
Court
GARCIA RODICIO, ANTONIO (Chairman)
CAO LABORA, DANIEL (Secretary)
Gómez Tato, Antonio M. (Member)
GARCIA RODICIO, ANTONIO (Chairman)
CAO LABORA, DANIEL (Secretary)
Gómez Tato, Antonio M. (Member)
Design of a CO2 Capture Facility Integrated with a Ca(OH)2 Recovery Cycle
Authorship
M.C.D.
Bachelor's Degree in Chemical Engeneering
M.C.D.
Bachelor's Degree in Chemical Engeneering
Defense date
02.10.2025 10:10
02.10.2025 10:10
Summary
This project aims to design a CO2 capture facility integrated with a Ca(OH)2 recovery cycle. The plant will be located in Curtis, in the province of A Coruña, specifically in the Curtis-Teixeiro industrial park. It will have a capacity of 1 Mt/year of CO2 with a purity of 97,12% and will operate continuously for 330 days per year, 24 hours per day. The process involves capturing CO2 directly from the atmosphere and, simultaneously, recovering CO2 from the combustion gases generated in the plant to meet its energy demands. To achieve this, chemical absorption using the carbonation-calcination cycle will be employed. The facility is divided into four main sections: absorption, reaction, calcination, and hydration. Paula Fariña will design the A-101 absorber in the absorption section, while María Chao will be responsible for designing the R-201 reactor in the reaction section.
This project aims to design a CO2 capture facility integrated with a Ca(OH)2 recovery cycle. The plant will be located in Curtis, in the province of A Coruña, specifically in the Curtis-Teixeiro industrial park. It will have a capacity of 1 Mt/year of CO2 with a purity of 97,12% and will operate continuously for 330 days per year, 24 hours per day. The process involves capturing CO2 directly from the atmosphere and, simultaneously, recovering CO2 from the combustion gases generated in the plant to meet its energy demands. To achieve this, chemical absorption using the carbonation-calcination cycle will be employed. The facility is divided into four main sections: absorption, reaction, calcination, and hydration. Paula Fariña will design the A-101 absorber in the absorption section, while María Chao will be responsible for designing the R-201 reactor in the reaction section.
Direction
BELLO BUGALLO, PASTORA MARIA (Tutorships)
BELLO BUGALLO, PASTORA MARIA (Tutorships)
Court
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
Implementation and Optimization of a Virtualization Environment with Proxmox VE for Efficient Resource Management in a Data Center
Authorship
D.C.P.
Bachelor’s Degree in Informatics Engineering
D.C.P.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 12:45
02.20.2025 12:45
Summary
This work seeks to design, implement and optimize a virtualization environment for a small organization in order to improve the management efficiency of computational, storage and network resources in its data center, guaranteeing stable performance and adequate use of available hardware. The main part of the work focuses on achieving this with Proxmox VE, configuring a three-node cluster, with a shared storage system and high availability and backup mechanisms. Proxmox VE is demonstrated to be an efficient solution for environments with limited resources, providing advanced tools for centralized management and optimization of hardware usage. However, limitations were also identified such as the reliance on the NFS server, which could generate bottlenecks. Finally, extensions are proposed that include migration to a distributed storage system and the incorporation of automation and monitoring tools. Additionally, budget permitting, another avenue for improvement could be evaluating alternative virtualization environments to improve performance and scalability in future implementations.
This work seeks to design, implement and optimize a virtualization environment for a small organization in order to improve the management efficiency of computational, storage and network resources in its data center, guaranteeing stable performance and adequate use of available hardware. The main part of the work focuses on achieving this with Proxmox VE, configuring a three-node cluster, with a shared storage system and high availability and backup mechanisms. Proxmox VE is demonstrated to be an efficient solution for environments with limited resources, providing advanced tools for centralized management and optimization of hardware usage. However, limitations were also identified such as the reliance on the NFS server, which could generate bottlenecks. Finally, extensions are proposed that include migration to a distributed storage system and the incorporation of automation and monitoring tools. Additionally, budget permitting, another avenue for improvement could be evaluating alternative virtualization environments to improve performance and scalability in future implementations.
Direction
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Álvarez Calvo, Francisco Javier (Co-tutorships)
CARIÑENA AMIGO, MARIA PURIFICACION (Tutorships)
Álvarez Calvo, Francisco Javier (Co-tutorships)
Court
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
Production plant of formic acid from methanol and carbon monoxide
Authorship
A.D.P.G.
Bachelor's Degree in Chemical Engeneering
A.D.P.G.
Bachelor's Degree in Chemical Engeneering
Defense date
02.10.2025 09:30
02.10.2025 09:30
Summary
he project consists of the design of a plant for the production of 10,000 t/year of formic acid in two stages. The first reaction consists of the carbonylation of carbon monoxide and methanol to give methyl formate. This compound is then hydrolysed in a second reactor to formic acid at 85% purity. Paula Campo Lopez is in charge of the design of the R-202 formate hydrolysis reactor in Section 200. On the other hand, Alejandro de Prado Gonzalez is responsible for the design of the T-302 distillation column in Section 300.
he project consists of the design of a plant for the production of 10,000 t/year of formic acid in two stages. The first reaction consists of the carbonylation of carbon monoxide and methanol to give methyl formate. This compound is then hydrolysed in a second reactor to formic acid at 85% purity. Paula Campo Lopez is in charge of the design of the R-202 formate hydrolysis reactor in Section 200. On the other hand, Alejandro de Prado Gonzalez is responsible for the design of the T-302 distillation column in Section 300.
Direction
FRANCO URIA, MARIA AMAYA (Tutorships)
SINEIRO TORRES, JORGE (Co-tutorships)
FRANCO URIA, MARIA AMAYA (Tutorships)
SINEIRO TORRES, JORGE (Co-tutorships)
Court
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
HOSPIDO QUINTANA, ALMUDENA (Chairman)
GONZALEZ GARCIA, SARA (Secretary)
González Álvarez, Julia (Member)
The fast Fourier transform
Authorship
P.D.V.
Double Bachelor's Degree in Informatics Engineering and Mathematics
P.D.V.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.02.2025 11:00
07.02.2025 11:00
Summary
Since their rediscovery in the fifties, the class of algorithms known as Fast Fourier Transforms (FFTs) have been fundamental in numerous fields within mathematics, science and engineering. It comes as no surprise that the Cooley-Tukey algorithm (commonly known as the FFT) is widely recognized as one of the most important algorithms of the 20th century. On this project, we aim to provide an structured and well-grounded approach to the development of FFTs. We begin with the mathematical foundations of Lp spaces and the continuous Fourier Transform, that provide a new way to look into functions via their spectrum of frequencies. Later, we introduce the Discrete Fourier Transform (DFT) as a numerical tool to enable Fourier Methods. Computing the DFT for large input sizes is only doable because of FFT algorithms. Finally, we present a brief overview of two major application domains: digital signal processing and data compression. In particular, we review digital audio filters and examine the role of FFTs in JPEG image compression.
Since their rediscovery in the fifties, the class of algorithms known as Fast Fourier Transforms (FFTs) have been fundamental in numerous fields within mathematics, science and engineering. It comes as no surprise that the Cooley-Tukey algorithm (commonly known as the FFT) is widely recognized as one of the most important algorithms of the 20th century. On this project, we aim to provide an structured and well-grounded approach to the development of FFTs. We begin with the mathematical foundations of Lp spaces and the continuous Fourier Transform, that provide a new way to look into functions via their spectrum of frequencies. Later, we introduce the Discrete Fourier Transform (DFT) as a numerical tool to enable Fourier Methods. Computing the DFT for large input sizes is only doable because of FFT algorithms. Finally, we present a brief overview of two major application domains: digital signal processing and data compression. In particular, we review digital audio filters and examine the role of FFTs in JPEG image compression.
Direction
LOPEZ SOMOZA, LUCIA (Tutorships)
LOPEZ SOMOZA, LUCIA (Tutorships)
Court
QUINTELA ESTEVEZ, PEREGRINA (Chairman)
TRINCHET SORIA, ROSA Mª (Secretary)
DIAZ RAMOS, JOSE CARLOS (Member)
QUINTELA ESTEVEZ, PEREGRINA (Chairman)
TRINCHET SORIA, ROSA Mª (Secretary)
DIAZ RAMOS, JOSE CARLOS (Member)
Study and application of AWS Rekognition for automatic recognition of clothing labels in user images.
Authorship
E.F.D.S.
Double Bachelor's Degree in Informatics Engineering and Mathematics
E.F.D.S.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
02.20.2025 10:00
02.20.2025 10:00
Summary
Nowadays there are multiple tools to perform image classification processes, such as convolutional neural networks and transformers. However, the Zara brand continues to perform labeling manually, resulting in a set of inaccurate labels. For this reason, this study explores the implementation of automated methods to improve the results obtained manually. The purpose of this research is to evaluate and analyze the effectiveness of the AWS Rekognition Custom Labels service for labeling garments. The adopted strategy aims to identify the limits of the service for the referred task through a feasibility analysis of the source dataset. The project development starts with a preliminary analysis of the dataset to determine its suitability for model training. Subsequently, an examination of the service constraints is performed, considering five main variables: the total number of images, the interrelationship between labels, the type of label, the number of images available for each label, and the influence of each label on the others. To achieve this, several resources will be used such as the service itself, an initial dataset and a REST API developed for this project. The main findings include the low relevance of the total number of images, as well as the limitations associated with the type of tag and the importance of the tags not being overly related.
Nowadays there are multiple tools to perform image classification processes, such as convolutional neural networks and transformers. However, the Zara brand continues to perform labeling manually, resulting in a set of inaccurate labels. For this reason, this study explores the implementation of automated methods to improve the results obtained manually. The purpose of this research is to evaluate and analyze the effectiveness of the AWS Rekognition Custom Labels service for labeling garments. The adopted strategy aims to identify the limits of the service for the referred task through a feasibility analysis of the source dataset. The project development starts with a preliminary analysis of the dataset to determine its suitability for model training. Subsequently, an examination of the service constraints is performed, considering five main variables: the total number of images, the interrelationship between labels, the type of label, the number of images available for each label, and the influence of each label on the others. To achieve this, several resources will be used such as the service itself, an initial dataset and a REST API developed for this project. The main findings include the low relevance of the total number of images, as well as the limitations associated with the type of tag and the importance of the tags not being overly related.
Direction
Carreira Nouche, María José (Tutorships)
Rodríguez Díez, Helio (Co-tutorships)
Carreira Nouche, María José (Tutorships)
Rodríguez Díez, Helio (Co-tutorships)
Court
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
Metaheuristics of the TSP: A didactic and computational tour.
Authorship
E.F.D.S.
Double Bachelor's Degree in Informatics Engineering and Mathematics
E.F.D.S.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
02.13.2025 12:45
02.13.2025 12:45
Summary
During the history of computing, routing problems have attracted great interest due to their multiple applications in different fields, such as planning and logistics. This study focuses on the traveling salesman problem or TSP. Specifically, on the techniques to solve it in an approximate way in polynomial time, the metaheuristics. The main objective of this study is to provide a guide to understand four of the most important ones, both theoretically and computationally. For this purpose, a literature review was performed, finding relevant information of them and synthesizing it. They are: tabu search, simulated annealing, genetic algorithm and ant colony optimization. For the computational part, R implementations of all metaheuristics were performed and evaluated with different instances of the TSPLIB library. As a result, it was obtained that there is no metaheuristic better than the rest in all aspects. Tabu search and ant colony optimization obtain very promising results in terms of distance to optimal cost, however, they are temporarily more expensive than the other two. Simulated annealing obtains somewhat worse results than the previous ones, but in a very fast way. Finally, the genetic algorithm obtains very bad results in a relatively acceptable time. In conclusion, this work serves as a guide to people who want to understand these concepts.
During the history of computing, routing problems have attracted great interest due to their multiple applications in different fields, such as planning and logistics. This study focuses on the traveling salesman problem or TSP. Specifically, on the techniques to solve it in an approximate way in polynomial time, the metaheuristics. The main objective of this study is to provide a guide to understand four of the most important ones, both theoretically and computationally. For this purpose, a literature review was performed, finding relevant information of them and synthesizing it. They are: tabu search, simulated annealing, genetic algorithm and ant colony optimization. For the computational part, R implementations of all metaheuristics were performed and evaluated with different instances of the TSPLIB library. As a result, it was obtained that there is no metaheuristic better than the rest in all aspects. Tabu search and ant colony optimization obtain very promising results in terms of distance to optimal cost, however, they are temporarily more expensive than the other two. Simulated annealing obtains somewhat worse results than the previous ones, but in a very fast way. Finally, the genetic algorithm obtains very bad results in a relatively acceptable time. In conclusion, this work serves as a guide to people who want to understand these concepts.
Direction
CASAS MENDEZ, BALBINA VIRGINIA (Tutorships)
CASAS MENDEZ, BALBINA VIRGINIA (Tutorships)
Court
RODRIGUEZ CASAL, ALBERTO (Chairman)
ALONSO TARRIO, LEOVIGILDO (Secretary)
SALGADO SECO, MODESTO RAMON (Member)
RODRIGUEZ CASAL, ALBERTO (Chairman)
ALONSO TARRIO, LEOVIGILDO (Secretary)
SALGADO SECO, MODESTO RAMON (Member)
Memetic algorithms for the MC-TTRP
Authorship
N.F.O.
Double Bachelor's Degree in Informatics Engineering and Mathematics
N.F.O.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.03.2025 10:40
07.03.2025 10:40
Summary
A metaheuristic is a high-level search procedure designed to guide subordinate heuristics in order to efficiently explore solution spaces in complex optimization problems, especially those where exact methods are computationally infeasible. These techniques do not guarantee to find the optimal solution, but seek to obtain good quality solutions in reasonable times, which makes them especially useful in real environments.Within this framework, evolutionary algorithms, which are inspired by principles of biological evolution to explore complex search spaces, stand out. Among them, genetic and memetic algorithms are particularly relevant. Genetic algorithms employ mechanisms such as selection, crossover and mutation to generate new solutions, while memetic algorithms combine this global exploration with local improvement strategies to further optimize each solution. These methods have been successfully applied in solving a variety of complex problems, including routing problems. These consist of finding the optimal set of paths that a fleet of vehicles must take to serve a set of customers. A generalization of routing problems is the multicompartment truck and trailer routing problem (MC-TTRP). This problem considers two types of compartmentalized vehicles, trucks and trailers that must be towed, and two types of customers with different service constraints and requiring multiple types of cargo, resulting in the existence of multiple types of routes to optimize distribution. In this work we have explored genetic and memetic algorithms, studying how the operators used work and how to obtain a memetic algorithm that allows us to solve a complex problem. Routing problems have also been studied, with a greater emphasis on the MC-TTRP, offering a linear and mixed integer programming model that enables us to model the problem in a mathematical way. Using this knowledge, a C++ algorithm has been implemented to obtain the optimal routes for any instance of the MC-TTRP.
A metaheuristic is a high-level search procedure designed to guide subordinate heuristics in order to efficiently explore solution spaces in complex optimization problems, especially those where exact methods are computationally infeasible. These techniques do not guarantee to find the optimal solution, but seek to obtain good quality solutions in reasonable times, which makes them especially useful in real environments.Within this framework, evolutionary algorithms, which are inspired by principles of biological evolution to explore complex search spaces, stand out. Among them, genetic and memetic algorithms are particularly relevant. Genetic algorithms employ mechanisms such as selection, crossover and mutation to generate new solutions, while memetic algorithms combine this global exploration with local improvement strategies to further optimize each solution. These methods have been successfully applied in solving a variety of complex problems, including routing problems. These consist of finding the optimal set of paths that a fleet of vehicles must take to serve a set of customers. A generalization of routing problems is the multicompartment truck and trailer routing problem (MC-TTRP). This problem considers two types of compartmentalized vehicles, trucks and trailers that must be towed, and two types of customers with different service constraints and requiring multiple types of cargo, resulting in the existence of multiple types of routes to optimize distribution. In this work we have explored genetic and memetic algorithms, studying how the operators used work and how to obtain a memetic algorithm that allows us to solve a complex problem. Routing problems have also been studied, with a greater emphasis on the MC-TTRP, offering a linear and mixed integer programming model that enables us to model the problem in a mathematical way. Using this knowledge, a C++ algorithm has been implemented to obtain the optimal routes for any instance of the MC-TTRP.
Direction
CASAS MENDEZ, BALBINA VIRGINIA (Tutorships)
CASAS MENDEZ, BALBINA VIRGINIA (Tutorships)
Court
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Reconstruction of Phylogenetic trees using Quantum Computing
Authorship
N.F.O.
Double Bachelor's Degree in Informatics Engineering and Mathematics
N.F.O.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
02.20.2025 10:30
02.20.2025 10:30
Summary
Quantum computing is a field of computer science that uses principles of quantum physics to solve problems more efficiently than classical computing, especially in areas such as optimization. Bioinformatics, on the other hand, is a field that combines elements of biology and computer science to analyze large biological data sets. A prominent example of this discipline is genomics, which includes the generation of phylogenetic trees, key tools for understanding the biological evolution of species. The reconstruction of these trees represents a computational problem that is very difficult to solve due to its complexity. This work explores whether quantum computing can offer effective solutions to address this problem. In this context, the performance of quantum computation and quantum optimization algorithms has been studied, with emphasis on Quantum Annealing and the Quantum Approximate Optimization Algorithm (QAOA). Based on these approaches, a quantum algorithm capable of reconstructing phylogenies by cutting graphs has been developed. The proposed algorithm was implemented and tested on currently available quantum hardware, obtaining satisfactory results that demonstrate its potential to solve complex problems in the area of bioinformatics.
Quantum computing is a field of computer science that uses principles of quantum physics to solve problems more efficiently than classical computing, especially in areas such as optimization. Bioinformatics, on the other hand, is a field that combines elements of biology and computer science to analyze large biological data sets. A prominent example of this discipline is genomics, which includes the generation of phylogenetic trees, key tools for understanding the biological evolution of species. The reconstruction of these trees represents a computational problem that is very difficult to solve due to its complexity. This work explores whether quantum computing can offer effective solutions to address this problem. In this context, the performance of quantum computation and quantum optimization algorithms has been studied, with emphasis on Quantum Annealing and the Quantum Approximate Optimization Algorithm (QAOA). Based on these approaches, a quantum algorithm capable of reconstructing phylogenies by cutting graphs has been developed. The proposed algorithm was implemented and tested on currently available quantum hardware, obtaining satisfactory results that demonstrate its potential to solve complex problems in the area of bioinformatics.
Direction
Fernández Pena, Anselmo Tomás (Tutorships)
PICHEL CAMPOS, JUAN CARLOS (Co-tutorships)
Fernández Pena, Anselmo Tomás (Tutorships)
PICHEL CAMPOS, JUAN CARLOS (Co-tutorships)
Court
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
Mathematical Aspects of Concept Drift
Authorship
F.F.M.
Double Bachelor's Degree in Informatics Engineering and Mathematics
F.F.M.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.03.2025 11:25
07.03.2025 11:25
Summary
This work addresses the phenomenon of Concept Drift, which arises in dynamic and nonstationary environments where the statistical relationships between model variables change over time, thus affecting the performance of machine learning algorithms. The main objective is to develop a modification of the KSWIN algorithm, part of the RiverML library, which is based on the Kolmogorov-Smirnov test. The proposed modification incorporates multiple hypothesis tests and the Benjamini-Hochberg correction in order to enhance the statistical robustness of the test and reduce the false positive rate. Several configurations of the detector are proposed, targeting both the monitoring of data drawn from continuous distributions and the evaluation of performance metrics. For the latter approach, a mechanism is introduced to identify the type of drift, using non-parametric inference techniques. For the first case, a testing environment with artificially generated data is designed. In the second, the work integrates a comparative study developed in a Bachelor’s Thesis in Computer Engineering, focused on the empirical evaluation of several drift detection algorithms from the literature. The experiments show a significant reduction in the false positive rate without compromising test power, improving the effectiveness of both the original algorithm and other classical detectors. Furthermore, the ability to identify the type of drift adds practical value to one of the proposed configurations.
This work addresses the phenomenon of Concept Drift, which arises in dynamic and nonstationary environments where the statistical relationships between model variables change over time, thus affecting the performance of machine learning algorithms. The main objective is to develop a modification of the KSWIN algorithm, part of the RiverML library, which is based on the Kolmogorov-Smirnov test. The proposed modification incorporates multiple hypothesis tests and the Benjamini-Hochberg correction in order to enhance the statistical robustness of the test and reduce the false positive rate. Several configurations of the detector are proposed, targeting both the monitoring of data drawn from continuous distributions and the evaluation of performance metrics. For the latter approach, a mechanism is introduced to identify the type of drift, using non-parametric inference techniques. For the first case, a testing environment with artificially generated data is designed. In the second, the work integrates a comparative study developed in a Bachelor’s Thesis in Computer Engineering, focused on the empirical evaluation of several drift detection algorithms from the literature. The experiments show a significant reduction in the false positive rate without compromising test power, improving the effectiveness of both the original algorithm and other classical detectors. Furthermore, the ability to identify the type of drift adds practical value to one of the proposed configurations.
Direction
CRUJEIRAS CASAIS, ROSA MARÍA (Tutorships)
CRUJEIRAS CASAIS, ROSA MARÍA (Tutorships)
Court
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Fine-grained semantic indexing of biomedical texts with linguistic models
Authorship
M.G.L.
Bachelor’s Degree in Informatics Engineering
M.G.L.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 17:00
02.20.2025 17:00
Summary
This Final Degree Project (TFG) addresses the semantic indexing of biomedical texts through the use of large linguistic models (LLMs), with the aim of improving access to information in biomedicine through the automated assignment of MeSH descriptors. The proposed method consists of several stages. First, the MeSH ontology obtained through BioPortal is preprocessed. Next, biomedical abstracts previously indexed with coarse-grained labels are selected for subsequent semantic refinement. The methodology employs a zero-shot prompting strategy with the LLaMa3 model, developing and optimizing different prompt configurations to improve classification. The ensemble combination of the most effective strategies allowed to significantly optimize the system's performance. Finally, the model is evaluated using standardized metrics (precision, recall and F-measure) to analyze its performance and determine its viability in biomedical indexing tasks. The results show that LLaMa3 outperforms traditional weakly supervised methods in terms of precision, recall and F-measure, consolidating itself as an effective alternative for biomedical indexing. However, challenges persist in terms of computational efficiency and scalability, especially for its implementation in large volumes of data. The analysis of the assigned labels allowed to identify performance patterns and define strategies to improve the quality of semantic indexing. To address these challenges, semantic search using vector databases is explored as a possible computational optimization strategy. However, the results obtained did not reach the expected quality in terms of indexing, suggesting the need for additional adjustments in threshold settings and the representation of the semantic context. In conclusion, this work validates the potential of generative language models in biomedical indexing, highlighting the importance of optimizing their performance and scalability for their application in large volumes of data. These findings lay the foundation for future research aimed at improving the efficiency and accuracy of semantic indexing systems in biomedicine.
This Final Degree Project (TFG) addresses the semantic indexing of biomedical texts through the use of large linguistic models (LLMs), with the aim of improving access to information in biomedicine through the automated assignment of MeSH descriptors. The proposed method consists of several stages. First, the MeSH ontology obtained through BioPortal is preprocessed. Next, biomedical abstracts previously indexed with coarse-grained labels are selected for subsequent semantic refinement. The methodology employs a zero-shot prompting strategy with the LLaMa3 model, developing and optimizing different prompt configurations to improve classification. The ensemble combination of the most effective strategies allowed to significantly optimize the system's performance. Finally, the model is evaluated using standardized metrics (precision, recall and F-measure) to analyze its performance and determine its viability in biomedical indexing tasks. The results show that LLaMa3 outperforms traditional weakly supervised methods in terms of precision, recall and F-measure, consolidating itself as an effective alternative for biomedical indexing. However, challenges persist in terms of computational efficiency and scalability, especially for its implementation in large volumes of data. The analysis of the assigned labels allowed to identify performance patterns and define strategies to improve the quality of semantic indexing. To address these challenges, semantic search using vector databases is explored as a possible computational optimization strategy. However, the results obtained did not reach the expected quality in terms of indexing, suggesting the need for additional adjustments in threshold settings and the representation of the semantic context. In conclusion, this work validates the potential of generative language models in biomedical indexing, highlighting the importance of optimizing their performance and scalability for their application in large volumes of data. These findings lay the foundation for future research aimed at improving the efficiency and accuracy of semantic indexing systems in biomedicine.
Direction
TABOADA IGLESIAS, MARÍA JESÚS (Tutorships)
TABOADA IGLESIAS, MARÍA JESÚS (Tutorships)
Court
TABOADA IGLESIAS, MARÍA JESÚS (Student’s tutor)
TABOADA IGLESIAS, MARÍA JESÚS (Student’s tutor)
Use of GOES-R data in the cloud for sargassum monitoring
Authorship
B.G.L.
Bachelor’s Degree in Informatics Engineering
B.G.L.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 11:00
02.20.2025 11:00
Summary
The massive proliferation events of Sargassum have become a significant environmental and socio-economic issue in the Caribbean Sea. Its monitoring using low-orbit satellite sensors, such as MODIS or OLCI, presents limitations due to the low frequency of image acquisition. In this context, the geostationary satellite GOES-16 offers a promising alternative thanks to its high temporal frequency, enabling continuous monitoring of Sargassum dynamics. This study explores the potential of GOES-16 ABI sensor data for the detection and monitoring of Sargassum, using the NDVI index as an identification tool. A methodology was developed based on the efficient downloading and processing of data available on AWS, including resolution reduction techniques, hourly product generation, NDVI calculation, and noise reduction through statistical filters. The results confirm the feasibility of using GOES-16 for Sargassum tracking, demonstrating a high agreement with previous studies based on other sensors. The high temporal frequency allows for more detailed surveillance, facilitating the prediction of movements and possible coastal strandings. As future improvements, the integration with higher spatial resolution sensors and the implementation of a predictive system based on oceanographic and meteorological models are proposed.
The massive proliferation events of Sargassum have become a significant environmental and socio-economic issue in the Caribbean Sea. Its monitoring using low-orbit satellite sensors, such as MODIS or OLCI, presents limitations due to the low frequency of image acquisition. In this context, the geostationary satellite GOES-16 offers a promising alternative thanks to its high temporal frequency, enabling continuous monitoring of Sargassum dynamics. This study explores the potential of GOES-16 ABI sensor data for the detection and monitoring of Sargassum, using the NDVI index as an identification tool. A methodology was developed based on the efficient downloading and processing of data available on AWS, including resolution reduction techniques, hourly product generation, NDVI calculation, and noise reduction through statistical filters. The results confirm the feasibility of using GOES-16 for Sargassum tracking, demonstrating a high agreement with previous studies based on other sensors. The high temporal frequency allows for more detailed surveillance, facilitating the prediction of movements and possible coastal strandings. As future improvements, the integration with higher spatial resolution sensors and the implementation of a predictive system based on oceanographic and meteorological models are proposed.
Direction
Triñanes Fernández, Joaquín Ángel (Tutorships)
Triñanes Fernández, Joaquín Ángel (Tutorships)
Court
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
Model-based clustering
Authorship
N.G.S.D.V.
Double Bachelor's Degree in Informatics Engineering and Mathematics
N.G.S.D.V.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.03.2025 12:10
07.03.2025 12:10
Summary
Clustering is an unsupervised statistical technique that aims to automatically identify homogeneous groups of observations within a dataset. Its usefulness has been consolidated across various disciplines, particularly in the current context of massive data generation, thanks to its ability to identify groups in complex and high-dimensional data. Although heuristic methods such as k-means or hierarchical techniques have traditionally been used, these approaches present limitations, such as the lack of a solid theoretical foundation or the difficulty in determining the optimal number of groups. In contrast, model-based clustering (MBC) offers a statistically grounded alternative by modeling the data as a finite mixture of probability distributions. This approach allows for rigorous inferences, the selection of appropriate models, justifiable determination of the number of groups, and the evaluation of uncertainty in the assignment of observations. This work presents the theoretical foundations of model-based clustering, with a focus on Gaussian mixture models, which are the most widely used, as well as the EM algorithm for parameter estimation and model selection criteria, including the choice of the number of clusters. Additionally, practical examples are presented using the mclust package in R.
Clustering is an unsupervised statistical technique that aims to automatically identify homogeneous groups of observations within a dataset. Its usefulness has been consolidated across various disciplines, particularly in the current context of massive data generation, thanks to its ability to identify groups in complex and high-dimensional data. Although heuristic methods such as k-means or hierarchical techniques have traditionally been used, these approaches present limitations, such as the lack of a solid theoretical foundation or the difficulty in determining the optimal number of groups. In contrast, model-based clustering (MBC) offers a statistically grounded alternative by modeling the data as a finite mixture of probability distributions. This approach allows for rigorous inferences, the selection of appropriate models, justifiable determination of the number of groups, and the evaluation of uncertainty in the assignment of observations. This work presents the theoretical foundations of model-based clustering, with a focus on Gaussian mixture models, which are the most widely used, as well as the EM algorithm for parameter estimation and model selection criteria, including the choice of the number of clusters. Additionally, practical examples are presented using the mclust package in R.
Direction
AMEIJEIRAS ALONSO, JOSE (Tutorships)
AMEIJEIRAS ALONSO, JOSE (Tutorships)
Court
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Majadas Soto, José Javier (Chairman)
SALGADO RODRIGUEZ, MARIA DEL PILAR (Secretary)
CASARES DE CAL, MARIA ANGELES (Member)
Efficient semantic segmentation of land cover images using an encoder-decoder architecture
Authorship
I.L.C.
Double Bachelor's Degree in Informatics Engineering and Mathematics
I.L.C.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
02.20.2025 11:30
02.20.2025 11:30
Summary
In the area of remote sensing, there is great interest in collecting land cover information to identify and classify the different types of surfaces present on the ground, such as vegetated areas, water bodies, urban soils, grasslands, forests or agricultural areas, among others. On the other hand, semantic image segmentation allows assigning a label to each pixel of the image, classifying them into different categories or specific classes, which facilitates the interpretation and analysis of satellite or aerial images. The use of deep learning techniques has proven to be effective in the field of computer vision, specifically in semantic segmentation tasks. However, these models are very computationally expensive, and often require the use of specialised hardware and optimisation techniques to improve the efficiency and feasibility of training and inference. In this Bechelor Thesis, the aim is to test different models with encoder-decoder architecture, trying to improve the efficiency and feasibility of training even with large amounts of data. From the existing parallelism techniques for multiGPU training, data parallelism will be used, selecting a PyTorch module that implements it in an efficient way. In addition, using 16-bit mixed floating-point precision reduces memory usage and makes better use of the GPU hardware, performing training in half the time without affecting the quality of the segmentation.
In the area of remote sensing, there is great interest in collecting land cover information to identify and classify the different types of surfaces present on the ground, such as vegetated areas, water bodies, urban soils, grasslands, forests or agricultural areas, among others. On the other hand, semantic image segmentation allows assigning a label to each pixel of the image, classifying them into different categories or specific classes, which facilitates the interpretation and analysis of satellite or aerial images. The use of deep learning techniques has proven to be effective in the field of computer vision, specifically in semantic segmentation tasks. However, these models are very computationally expensive, and often require the use of specialised hardware and optimisation techniques to improve the efficiency and feasibility of training and inference. In this Bechelor Thesis, the aim is to test different models with encoder-decoder architecture, trying to improve the efficiency and feasibility of training even with large amounts of data. From the existing parallelism techniques for multiGPU training, data parallelism will be used, selecting a PyTorch module that implements it in an efficient way. In addition, using 16-bit mixed floating-point precision reduces memory usage and makes better use of the GPU hardware, performing training in half the time without affecting the quality of the segmentation.
Direction
Argüello Pedreira, Francisco Santiago (Tutorships)
Blanco Heras, Dora (Co-tutorships)
Argüello Pedreira, Francisco Santiago (Tutorships)
Blanco Heras, Dora (Co-tutorships)
Court
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
Mathematical Methods of Artificial Inteligence
Authorship
P.L.P.
Double Bachelor's Degree in Informatics Engineering and Mathematics
P.L.P.
Double Bachelor's Degree in Informatics Engineering and Mathematics
Defense date
07.03.2025 10:00
07.03.2025 10:00
Summary
This thesis explores the mathematical foundations of artificial intelligence, focusing on neural networks and their research lines. It begins with a detailed analysis of neural networks, covering foundational concepts such as architecture and training, and also research topics like expressivity, optimization, generalization, and explainability. The Vapnik-Chervonenkis (VC) dimension is introduced as a theoretical framework to quantify the capacity of models, offering insights into their generalization ability and limitations. To address the curse of dimensionality, the thesis discusses dimensionality reduction techniques, including principal component analysis (PCA) and linear discriminant analysis (LDA), showcasing their role in improving model efficiency without sacrificing performance. Finally, the mathematical capabilities of large language models like GPT are evaluated. Leveraging examples from reasoning and problem-solving tasks, this work investigates how these models process and generate mathematically rigorous outputs.
This thesis explores the mathematical foundations of artificial intelligence, focusing on neural networks and their research lines. It begins with a detailed analysis of neural networks, covering foundational concepts such as architecture and training, and also research topics like expressivity, optimization, generalization, and explainability. The Vapnik-Chervonenkis (VC) dimension is introduced as a theoretical framework to quantify the capacity of models, offering insights into their generalization ability and limitations. To address the curse of dimensionality, the thesis discusses dimensionality reduction techniques, including principal component analysis (PCA) and linear discriminant analysis (LDA), showcasing their role in improving model efficiency without sacrificing performance. Finally, the mathematical capabilities of large language models like GPT are evaluated. Leveraging examples from reasoning and problem-solving tasks, this work investigates how these models process and generate mathematically rigorous outputs.
Direction
Nieto Roig, Juan José (Tutorships)
Nieto Roig, Juan José (Tutorships)
Court
Nieto Roig, Juan José (Student’s tutor)
Nieto Roig, Juan José (Student’s tutor)
Web Assistant for the Automation of Medical Image Annotation in the Hepatic Field
Authorship
R.O.F.
Bachelor’s Degree in Informatics Engineering
R.O.F.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 12:00
02.20.2025 12:00
Summary
This bachelor’s thesis is part of the REMOVIRT H3D project, which aims to create a three-dimensional representation of a patient’s liver using artificial intelligence techniques, enabling surgeons to carry out detailed surgical planning. This first phase of the project focuses on developing an informed database to train our neural network. The main objective is to develop a web application that allows annotation of CT or MRI images and to create an automatic assistant that, using a pre-trained neural network, identifies the region of interest and proposes an initial annotation of the liver. To achieve this, an initial investigation was conducted to understand medical imaging standards, particularly focusing on DICOM, and the technologies needed for the project (Cornerstone.js, Vue.js). Subsequently, we designed and implemented a database that stores medical images and the annotations made on them, while also ensuring the security and traceability of the information, which are fundamental pillars of the application. After building the data infrastructure, an interactive web application was developed to enable users to manually annotate the images. This tool integrates advanced features, such as the visualization of the liver's three views and the management of different annotation versions. The next step was integrating the neural network, which identifies regions of interest and proposes initial annotations that must later be reviewed and modified by specialists. Future objectives include the continuous retraining of the neural network using real data collected from the platform, the deployment of different components in Dockers, and the automation of information traceability checks. Overall, this project aims to lay the foundation for implementing an advanced and accessible system that enhances the diagnosis and treatment of liver cancer, reducing the time and costs associated with surgical interventions while improving the accuracy and efficiency of medical procedures.
This bachelor’s thesis is part of the REMOVIRT H3D project, which aims to create a three-dimensional representation of a patient’s liver using artificial intelligence techniques, enabling surgeons to carry out detailed surgical planning. This first phase of the project focuses on developing an informed database to train our neural network. The main objective is to develop a web application that allows annotation of CT or MRI images and to create an automatic assistant that, using a pre-trained neural network, identifies the region of interest and proposes an initial annotation of the liver. To achieve this, an initial investigation was conducted to understand medical imaging standards, particularly focusing on DICOM, and the technologies needed for the project (Cornerstone.js, Vue.js). Subsequently, we designed and implemented a database that stores medical images and the annotations made on them, while also ensuring the security and traceability of the information, which are fundamental pillars of the application. After building the data infrastructure, an interactive web application was developed to enable users to manually annotate the images. This tool integrates advanced features, such as the visualization of the liver's three views and the management of different annotation versions. The next step was integrating the neural network, which identifies regions of interest and proposes initial annotations that must later be reviewed and modified by specialists. Future objectives include the continuous retraining of the neural network using real data collected from the platform, the deployment of different components in Dockers, and the automation of information traceability checks. Overall, this project aims to lay the foundation for implementing an advanced and accessible system that enhances the diagnosis and treatment of liver cancer, reducing the time and costs associated with surgical interventions while improving the accuracy and efficiency of medical procedures.
Direction
COMESAÑA FIGUEROA, ENRIQUE (Tutorships)
COMESAÑA FIGUEROA, ENRIQUE (Tutorships)
Court
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
ARIAS RODRIGUEZ, JUAN ENRIQUE (Chairman)
Querentes Hermida, Raquel Esther (Secretary)
PIÑEIRO POMAR, CESAR ALFREDO (Member)
Design and implementation of activity management application for Ludibot
Authorship
C.R.R.
Bachelor’s Degree in Informatics Engineering
C.R.R.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 16:00
02.20.2025 16:00
Summary
En un trabajo desarrollado por un compañero hace un par de cursos, se diseñó un skill para poder comunicarnos con el robot Ludibot de Furhat. La aplicación web consiste en un servidor al que los usuarios puedan acceder para diseñar las sesiones que quieren llevar a cabo en el ordenador. Otra opción para los usuarios de esta web será la posibilidad de lanzar las sesiones desde la propia web para así evitar problemas cargando la sesión. Finalmente, el último apartado creado en la aplicación sería el módulo de estadísticas, donde podrían ver los resultados de la ejecución de la sesión.
En un trabajo desarrollado por un compañero hace un par de cursos, se diseñó un skill para poder comunicarnos con el robot Ludibot de Furhat. La aplicación web consiste en un servidor al que los usuarios puedan acceder para diseñar las sesiones que quieren llevar a cabo en el ordenador. Otra opción para los usuarios de esta web será la posibilidad de lanzar las sesiones desde la propia web para así evitar problemas cargando la sesión. Finalmente, el último apartado creado en la aplicación sería el módulo de estadísticas, donde podrían ver los resultados de la ejecución de la sesión.
Direction
CATALA BOLOS, ALEJANDRO (Tutorships)
CONDORI FERNANDEZ, OLINDA NELLY (Co-tutorships)
CATALA BOLOS, ALEJANDRO (Tutorships)
CONDORI FERNANDEZ, OLINDA NELLY (Co-tutorships)
Court
CATALA BOLOS, ALEJANDRO (Student’s tutor)
CONDORI FERNANDEZ, OLINDA NELLY (Student’s tutor)
CATALA BOLOS, ALEJANDRO (Student’s tutor)
CONDORI FERNANDEZ, OLINDA NELLY (Student’s tutor)
JSONSchema2SHACL: Extraction and Translation of Constraints for Knowledge Graphs
Authorship
O.S.M.
Bachelor’s Degree in Informatics Engineering
O.S.M.
Bachelor’s Degree in Informatics Engineering
Defense date
02.21.2025 10:00
02.21.2025 10:00
Summary
The final degree project consists of developing a Python library that enables the extraction of constraints on JSON data based on a given JSONSchema (https://json-schema.org/). Once these constraints are identified, they must be translated into SHACL Shapes (https://www.w3.org/TR/shacl/) for the validation of knowledge graphs constructed from JSON input data. The entire development will be accompanied by a set of unit tests to maintain code coverage above 80%. The tests will be integrated into the CI/CD system of the tool's public repository, and the tool will be available on PyPI. The development will be part of a broader system capable of extracting and combining SHACL Shapes from various sources, such as XSD, CSVW, OWL, or RDB, helping knowledge engineers reduce manual work in generating these constraints.
The final degree project consists of developing a Python library that enables the extraction of constraints on JSON data based on a given JSONSchema (https://json-schema.org/). Once these constraints are identified, they must be translated into SHACL Shapes (https://www.w3.org/TR/shacl/) for the validation of knowledge graphs constructed from JSON input data. The entire development will be accompanied by a set of unit tests to maintain code coverage above 80%. The tests will be integrated into the CI/CD system of the tool's public repository, and the tool will be available on PyPI. The development will be part of a broader system capable of extracting and combining SHACL Shapes from various sources, such as XSD, CSVW, OWL, or RDB, helping knowledge engineers reduce manual work in generating these constraints.
Direction
CHAVES FRAGA, DAVID (Tutorships)
CHAVES FRAGA, DAVID (Tutorships)
Court
CHAVES FRAGA, DAVID (Student’s tutor)
CHAVES FRAGA, DAVID (Student’s tutor)
Interactive Analysis Tool of the Ancient Galician-Portuguese Toponymic Inventory
Authorship
P.V.P.
Bachelor’s Degree in Informatics Engineering
P.V.P.
Bachelor’s Degree in Informatics Engineering
Defense date
02.20.2025 11:15
02.20.2025 11:15
Summary
This Final Degree Project (FDP) contains the information detailing the process of building a computer system, which aims to enhance the Ancient Galician-Portuguese Toponymic Inventory (ILG). It is based on four pilars: - Analysis of the effectiveness of the Linguakit: on the one hand, an analysis was carried out, prior to the deilimitation of the scope of the FDP, with the aim of determining the direction in which to direct efforts. On the other hand, the effectiveness and use of Linguakit was improved and the analysis tool was created. - REST API: REST server (Java Spring Boot) that contains most of the business logic. - Database: inherited from the FDP 'Explotación del inventario toponímico gallego-portugués antiguo' by Andrea Rey Presas, which sill be modified acording to new needs. The purpose of this inheritance will be to complement the toponymic exploration functionality, developed in the aforementioned FDP, with the erichment functionality that is going to be carried out in this final degree project. - Web client: interface (ReactJS) that provides the user with the possibility of interacting with the system and thus being able to use the toponym analysis tool.
This Final Degree Project (FDP) contains the information detailing the process of building a computer system, which aims to enhance the Ancient Galician-Portuguese Toponymic Inventory (ILG). It is based on four pilars: - Analysis of the effectiveness of the Linguakit: on the one hand, an analysis was carried out, prior to the deilimitation of the scope of the FDP, with the aim of determining the direction in which to direct efforts. On the other hand, the effectiveness and use of Linguakit was improved and the analysis tool was created. - REST API: REST server (Java Spring Boot) that contains most of the business logic. - Database: inherited from the FDP 'Explotación del inventario toponímico gallego-portugués antiguo' by Andrea Rey Presas, which sill be modified acording to new needs. The purpose of this inheritance will be to complement the toponymic exploration functionality, developed in the aforementioned FDP, with the erichment functionality that is going to be carried out in this final degree project. - Web client: interface (ReactJS) that provides the user with the possibility of interacting with the system and thus being able to use the toponym analysis tool.
Direction
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
VARELA BARREIRO, FRANCISCO JAVIER (Co-tutorships)
Gamallo Otero, Pablo (Co-tutorships)
RIOS VIQUEIRA, JOSE RAMON (Tutorships)
VARELA BARREIRO, FRANCISCO JAVIER (Co-tutorships)
Gamallo Otero, Pablo (Co-tutorships)
Court
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)
VAZQUEZ CENDON, MARIA ELENA (Chairman)
VARELA HERNANDEZ, ANXO (Secretary)
CONDORI FERNANDEZ, OLINDA NELLY (Member)