M-Learning: enfoque heurístico para recompensas diferidas en el aprendizaje por refuerzo

Mora Cortés, Marlon Sneider; Perdomo Charry , César Andrey; Perdomo Charry , Oscar Julián

M-Learning: enfoque heurístico para recompensas diferidas en el aprendizaje por refuerzo

dc.contributor.advisor	Perdomo Charry, César Andrey
dc.contributor.author	Mora Cortés, Marlon Sneider
dc.contributor.author	Perdomo Charry , César Andrey
dc.contributor.author	Perdomo Charry , Oscar Julián
dc.contributor.orcid	Perdomo Charry, Cesar Andrey [0000-0001-7310-4618]
dc.date.accessioned	2025-03-10T20:43:32Z
dc.date.available	2025-03-10T20:43:32Z
dc.date.created	2025-02-21
dc.description	El diseño actual de los métodos de aprendizaje por refuerzo requiere grandes recursos computacionales. Algoritmos como Deep Q-Network (DQN) han obtenido resultados sobresalientes en el avance de este campo. Sin embargo, la necesidad de ajustar miles de parámetros y ejecutar millones de episodios de entrenamiento sigue siendo un reto importante. Este documento propone un análisis comparativo entre el algoritmo Q-Learning, que sentó las bases del Deep Q-Learning, y nuestro método propuesto, denominado M-Learning. La comparación se lleva a cabo utilizando Procesos de Decisión de Markov con recompensa retardada como marco general del banco de pruebas. En primer lugar, este documento proporciona una descripción completa de los principales retos relacionados con la implementación de Q-Learning, especialmente en lo que respecta a sus múltiples parámetros. A continuación, se presentan los fundamentos de nuestra heurística propuesta, incluida su formulación, y se describe en detalle el algoritmo. La metodología utilizada para comparar ambos algoritmos consistió en entrenarlos en el entorno de Frozen Lake. Los resultados experimentales, junto con un análisis de las mejores soluciones, demuestran que nuestra propuesta requiere menos episodios y presenta una menor variabilidad en los resultados. En concreto, M-Learning entrena a los agentes un 30,7% más rápido en el entorno determinista y un 61,66% más rápido en el entorno estocástico. Además, consigue una mayor consistencia, reduciendo la desviación estándar de las puntuaciones en un 58,37% y un 49,75% en los entornos determinista y estocástico, respectivamente.
dc.description.abstract	The current design of reinforcement learning methods requires extensive computational resources. Algorithms such as Deep Q-Network (DQN) have obtained outstanding results in advancing the field. However, the need to tune thousands of parameters and run millions of training episodes remains a significant challenge. This document proposes a comparative analysis between the Q-Learning algorithm, which laid the foundations for Deep Q-Learning, and our proposed method, termed M-Learning. The comparison is conducted using Markov Decision Processes with delayed reward as a general test bench framework. Firstly, this document provides a full description of the main challenges related to implementing Q-Learning, particularly concerning its multiple parameters. Then, the foundations of our proposed heuristic are presented, including its formulation, and the algorithm is described in detail. The methodology used to compare both algorithms involved training them in the Frozen Lake environment. The experimental results, along with an analysis of the best solutions, demonstrate that our proposal requires fewer episodes and exhibits reduced variability in the outcomes. Specifically, M-Learning trains agents 30.7% faster in the deterministic environment and 61.66% faster in the stochastic environment. Additionally, it achieves greater consistency, reducing the standard deviation of scores by 58.37% and 49.75% in the deterministic and stochastic settings, respectively.
dc.format.mimetype	pdf
dc.identifier.uri	http://hdl.handle.net/11349/93453
dc.language.iso	spa
dc.publisher	Universidad Distrital Francisco José de Caldas
dc.relation.references	B. Cottier, R. Rahman, L. Fattorini, N. Maslej, and D. Owen, “The rising costs of training frontier ai models,” 2024.
dc.relation.references	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013.
dc.relation.references	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, Feb 2015.
dc.relation.references	A. K. Sadhu and A. Konar, “Improving the speed of convergence of multi- agent q-learning for cooperative task-planning by a robot-team,” Robotics and Autonomous Systems, vol. 92, pp. 66–80, 2017
dc.relation.references	L. Canese, G. C. Cardarilli, M. M. Dehghan Pir, L. Di Nunzio, and S. Span`o, “Design and development of multi-agent reinforcement learn- ing intelligence on the robotarium platform for embedded system appli- cations,” Electronics, vol. 13, no. 10, 2024.
dc.relation.references	J. Torres, Introducci´on al aprendizaje por refuerzo profundo: Teor´ıa y pr´actica en Python. Direct Publishing, Independently Published, 2021.
dc.relation.references	M. Lapan, Deep Reinforcement Learning Hands-On. Birmingham, UK: Packt Publishing, 2018.
dc.relation.references	N. Balaji, S. Kiefer, P. Novotn´y, G. A. P´erez, and M. Shirmohammadi, “On the complexity of value iteration,” 2019.
dc.relation.references	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018.
dc.relation.references	B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-learning algorithms: A comprehensive classification and applications,” IEEE Access, vol. 7, pp. 133653–133667, 2019.
dc.relation.references	S. Liu, X. Hu, and K. Dong, “Adaptive double fuzzy systems based q- learning for pursuit-evasion game,” IFAC-PapersOnLine, vol. 55, no. 3, pp. 251–256, 2022. 16th IFAC Symposium on Large Scale Complex Sys- tems: Theory and Applications LSS 2022
dc.relation.references	A. G. d. Silva Junior, D. H. d. Santos, A. P. F. d. Negreiros, J. M. V. B. d. S. Silva, and L. M. G. Gonc¸alves, “High-level path planning for an autonomous sailboat robot using q-learning,” Sensors, vol. 20, no. 6, 2020.
dc.relation.references	M. E. C¸ imen, Z. Garip, Y. Yalc¸ın, M. Kutlu, and A. F. Boz, “Self adaptive methods for learning rate parameter of q-learning algorithm,” Journal of Intelligent Systems: Theory and Applications, vol. 6, no. 2, p. 191–198, 2023.
dc.relation.references	L. Zhang, L. Tang, S. Zhang, Z. Wang, X. Shen, and Z. Zhang, “A self-adaptive reinforcement-exploration q-learning algorithm,” Symmetry, vol. 13, no. 6, 2021.
dc.relation.references	J. Huang, Z. Zhang, and X. Ruan, “An improved dyna-q algorithm in- spired by the forward prediction mechanism in the rat brain for mobile robot path planning,” Biomimetics, vol. 9, no. 6, 2024.
dc.relation.references	S. Xu, Y. Gu, X. Li, C. Chen, Y. Hu, Y. Sang, and W. Jiang, “Indoor emer- gency path planning based on the q-learning optimization algorithm,” IS- PRS International Journal of Geo-Information, vol. 11, no. 1, 2022.
dc.relation.references	A. dos Santos Mignon and R. L. de Azevedo da Rocha, “An adaptive im- plementation of ϵ-greedy in reinforcement learning,” Procedia Computer cience, vol. 109, pp. 1146–1151, 2017. 8th International Conference on Ambient Systems, Networks and Technologies, ANT-2017 and the 7th International Conference on Sustainable Energy Information Technology, SEIT 2017, 16-19 May 2017, Madeira, Portugal.
dc.relation.references	M. Zhang, W. Cai, and L. Pang, “Predator-prey reward based q- learning coverage path planning for mobile robot,” IEEE Access, vol. 11, pp. 29673–29683, 2023.
dc.relation.references	W. Jin, R. Gu, and Y. Ji, “Reward function learning for q-learning-based geographic routing protocol,” IEEE Communications Letters, vol. 23, no. 7, pp. 1236–1239, 2019.
dc.relation.references	X. Ou, Q. Chang, and N. Chakraborty, “Simulation study on reward func- tion of reinforcement learning in gantry work cell scheduling,” Journal of Manufacturing Systems, vol. 50, pp. 1–8, 2019.
dc.relation.references	Y. Li, H. Wang, J. Fan, and Y. Geng, “A novel q-learning algorithm based on improved whale optimization algorithm for path planning,” PLOS ONE, vol. 17, no. 12, p. e0279438, 2022.
dc.relation.references	S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Advances in Engineering Software, vol. 95, pp. 51–67, 2016.
dc.relation.references	H. Sowerby, Z.-H. Zhou, and M. L. Littman, “Designing rewards for fast learning,” ArXiv, vol. abs/2205.15400, 2022.
dc.relation.references	G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
dc.rights.acceso	Abierto (Texto Completo)
dc.rights.accessrights	RestrictedAccess
dc.subject	Aprendizaje por refuerzo
dc.subject	Dilema exploración-explotación
dc.subject	Q-Learning
dc.subject	Frozen lake
dc.subject	Enfoque heurístico
dc.subject.keyword	Reinforcement learning
dc.subject.keyword	Exploration-exploitation dilemma
dc.subject.keyword	Q-Learning
dc.subject.keyword	Frozen Lake
dc.subject.keyword	Heuristic approach
dc.subject.lemb	Ingeniería Electrónica -- Tesis y Disertaciones Académicas
dc.subject.lemb	Minería de datos
dc.subject.lemb	Aprendizaje por experiencia
dc.subject.lemb	Aprendizaje por descubrimiento
dc.title	M-Learning: enfoque heurístico para recompensas diferidas en el aprendizaje por refuerzo
dc.title.titleenglish	M-Learning: heuristic approach for delayed rewards in reinforcement learning
dc.type	bachelorThesis
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.degree	Producción Académica
dc.type.driver	info:eu-repo/semantics/bachelorThesis

Archivos

Bloque original

Mostrando 1 - 2 de 2

Nombre:: MoraCortesMarlonSneider2025.pdf
Tamaño:: 3.97 MB
Formato:: Adobe Portable Document Format
Descripción:: Trabajo de grado

Descargar

Nombre:: Licencia de uso y publicacion.pdf
Tamaño:: 1.92 MB
Formato:: Adobe Portable Document Format
Descripción:: Licencia de uso y publicación

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 7 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Ingeniería Electrónica