Implementación de redes tipo transformer en la selección estratégica de perfiles laborales a nivel empresarial
| dc.contributor.advisor | Ferro Escobar, Roberto | |
| dc.contributor.author | Tovar Sánchez, Juan Sebastián | |
| dc.contributor.author | Castro Castellanos, Cristian Camilo | |
| dc.contributor.orcid | Ferro Escobar, Roberto [0000-0002-8978-538X] | |
| dc.date.accessioned | 2025-03-16T20:39:30Z | |
| dc.date.available | 2025-03-16T20:39:30Z | |
| dc.date.created | 2024-08-13 | |
| dc.description | En este proyecto se desarrolla la implementación de un modelo RAG (Retrieval aumented Generation), encaminado a su aplicación en el contexto del reclutamiento y la selección de personal (limitado a las áreas relacionadas a Ingeniería Electrónica), para ello se tiene como punto de partida la obtención de una base de datos documental (conformada por archivos tipo PDF), pasando por una fase de preprocesamiento basada en limpieza de texto y tokenizacion, para posteriormente convertirse en una base de datos vectorizada. Los datos son preparados para el entrenamiento del modelo mediante operaciones de chuking e indexing, permitiendo en consecuencia la inclusión de un LLM (Large Language Model) basado en un modelo transformer, el cual, junto a mecanismos de búsqueda vectorial y aprendizaje por similitud, permiten la generación de lenguaje y la recuperación de información respectivamente. Es así como al hacer un proceso de integración de cada una de las partes se conforma el RAG, con base a ello se pretende encontrar los mejores parámetros de acuerdo a las condiciones dadas, evaluando el rendimiento obtenido en cada caso, en busca del mejor resultado. | |
| dc.description.abstract | In this project, a RAG (Retrieval-Augmented Generation) model is developed for application in the context of recruitment and personnel selection (limited to areas related to Electronic Engineering). The starting point is the creation of a document database (composed of PDF files), followed by a preprocessing phase based on text cleaning and tokenization, which is then converted into a vectorized database. The data is prepared for model training through chunking and indexing operations, enabling the inclusion of a Large Language Model (LLM) based on a transformer model. This model, along with vector search mechanisms and similarity learning, allows for language generation and information retrieval, respectively. By integrating each of these components, the RAG model is constructed. The aim is to find the best parameters according to the given conditions, evaluating the performance obtained in each case to achieve the best result. | |
| dc.format.mimetype | ||
| dc.identifier.uri | http://hdl.handle.net/11349/93707 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Distrital Francisco José de Caldas | |
| dc.relation.references | [Nvidia,2024]¿Qué Es un Modelo Transformer? | Blog de NVIDIA. (n.d.). Retrieved April 20, 2024, from 1 | |
| dc.relation.references | [LangChain,2024] ChatGPT Over Your Data. (n.d.). Retrieved April 21, 2024, from https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ | |
| dc.relation.references | [Lewis,2024] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (n.d.). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Retrieved April 21, 2024, from https://github.com/huggingface/transformers/blob/master/ | |
| dc.relation.references | [Nvidia,2024] What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs. (n.d.). Retrieved April 21, 2024, from https://blogs.nvidia.com/blog/what-is-retrieval-augmented generation/ | |
| dc.relation.references | [Pasupat,2024] Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-Augmented Language Model Pre-Training. | |
| dc.relation.references | [Sun z,2022] Sun, Z., Wang, X., Tay, Y., Yang, Y., & Denny, Z. (2022). Recitation augmented language models.. https://doi.org/10.48550/arxiv.2210.01296 | |
| dc.relation.references | [Tanay d,2022] Tanay, D., Paranjape, B., Hajishirzi, H., & Zettlemoyer, L. (2022). Core: a retrieve-then-edit framework for counterfactual data generation.. https://doi.org/10.48550/arxiv.2210.04873 | |
| dc.relation.references | [Izacardd G,2022] Izacard, G., Lewis, P., Lomelí, M., Hosseini, L., Petroni, F., Schick, T., … & Grave, É. (2022). Atlas: few-shot learning with retrieval augmented language models.. https://doi.org/10.48550/arxiv.2208.03299 | |
| dc.relation.references | [Glab,2022] Glaß, M., Rossiello, G., Mahbub, C., & Gliozzo, A. (2021). Robust retrieval augmented generation for zero-shot slot filling.. https://doi.org/10.48550/arxiv.2108.13934 | |
| dc.relation.references | [Yang z,2023] Yang, Z., Wei, P., Liu, Z., Korthikanti, V., Nie, W., Huang, D., … & Anandkumar, A. (2023). Re-vilm: retrieval-augmented visual language model for zero and few-shot image captioning.. https://doi.org/10.48550/arxiv.2302.04858 | |
| dc.relation.references | [Kimk B,2023] Kim, B., Seo, S., Han, S., Erdenee, E., & Chang, B. (2021). Distilling the knowledge of large-scale generative models into retrieval models for efficient open-domain conversation.. https://doi.org/10.18653/v1/2021.findings-emnlp.286 | |
| dc.relation.references | [NT.M.,2020] N. T. M. Trang and M. Shcherbakov, "Vietnamese Question Answering System f rom Multilingual BERT Models to Monolingual BERT Model," 2020 9th International Conference System Modeling and Advancement in Research Trends | |
| dc.relation.references | (SMART), [Moradabad, 2020] Moradabad, India, 2020, pp. 201-206, doi: 10.1109/SMART50582.2020.9337155. keywords: {Training;Bit error rate;Systems modeling;Knowledge discovery;Natural language processing;Task analysis;Read only memory;Question answering system;BERT;PhoBERT;DeepPavlov;multilingual BERT model;monolingual BERT model;Vietnamese Question Answering} | |
| dc.relation.references | Ghani and I. K. Raharjana, "Chatbots in Academia: A Retrieval-Augmented Generation Approach for Improved Efficient Information Access," 2024 16th International Conference on Knowledge and Smart Technology (KST), Krabi, Thailand, 2024, pp. 259-264, doi: 10.1109/KST61284.2024.10499652. keywords: {Analytical models;Databases;Virtual assistants;Search methods;Natural languages;Oral communication;Chatbots;Academic Chatbots;Retrieval-Augmented Generation;Large Language Models;Technology}.]¿Qué es LangChain? | IBM. (n.d.). Retrieved May 7, 2024, from https://www.ibm.com/mx es/topics/langchain | |
| dc.relation.references | [Stork ai,2024] Descripción general del marco de LlamaIndex | Stork. (n.d.). Retrieved May 7, 2024, from https://www.stork.ai/es/blog/an-overview-of-the-llamaindex-framework | |
| dc.relation.references | [Xataca,2023] LLaMA 3: qué es y qué novedades tiene la nueva versión de la IA que se integrará en Facebook, Instagram y WhatsApp con Meta AI. (n.d.). Retrieved May 7, 2024, from https://www.xataka.com/basics/llama-3-que-que-novedades-tiene-nueva-version-ia que-se-integrara-facebook-instagram-whatsapp-meta-ai | |
| dc.relation.references | [Victor M,2024] Mixtral: El Modelo de Lenguaje de Código Abierto que Transforma la IA - Víctor Mollá. (n.d.). Retrieved May 7, 2024, from https://www.victormolla.com/mixtral el-modelo-de-lenguaje-de-c%C3%B3digo-abierto-que-transforma-la-ia | |
| dc.relation.references | [Microsoft,2024] microsoft/MiniLM-L12-H384-uncased · Hugging Face. (n.d.). Retrieved May 13, 2024, from https://huggingface.co/microsoft/MiniLM-L12-H384-uncased | |
| dc.relation.references | [Microsoft,2023] unilm/minilm at master · microsoft/unilm · GitHub. (n.d.). Retrieved May 13, 2024, from https://github.com/microsoft/unilm/tree/master/minilm | |
| dc.relation.references | [eweek,2024] 6 Best Large Language Models (LLMs) in 2024. (n.d.). Retrieved May 21, 2024, from https://www.eweek.com/artificial-intelligence/best-large-language-models/ | |
| dc.relation.references | [Rothman, 2022] Denis Rothman; Antonio Gulli, Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4, Packt Publishing, 2022 | |
| dc.relation.references | [X. Zheng, 2021]X. Zheng, C. Zhang and P. C. Woodland, "Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition," 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 2021, pp. 162 168, doi: 10.1109/ASRU51503.2021.9688232. keywords: {Training;Adaptation models;Bit error rate;Optimization methods;Switches;Computer architecture;Artificial neural networks;Bidirectional LM;GPT;GPT-2;BERT} | |
| dc.relation.references | [Y. Liu, 2023]Y. Liu, H. Huang, J. Gao and S. Gai, "A study of Chinese Text Classification based on a new type of BERT pre-training," 2023 5th International Conference on Natural Language Processing (ICNLP), Guangzhou, China, 2023, pp. 303-307, doi: 10.1109/ICNLP58431.2023.00062. keywords: {Training;Knowledge engineering;Text categorization;Semantics;Feature extraction;Natural mining;Chinese TC;BERT model;RoBERTa;BERT-BiGRU} | |
| dc.relation.references | [S. Jhajaria, 2023] S. Jhajaria and D. Kaur, "Study and Comparative Analysis of ChatGPT, GPT and DAll-E2," 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1-5, doi: 10.1109/ICCCNT56998.2023.10307823. keywords: {Training;Computer vision;Analytical models;Visualization;Computational modeling;Training data;Chatbots;Natural language processing;GPT;ChatGPT;Dall-E2;comparative Analysis} | |
| dc.relation.references | [I. Goodfellow, 2016] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, pp. 103-110 | |
| dc.rights.acceso | Abierto (Texto Completo) | |
| dc.rights.accessrights | OpenAccess | |
| dc.subject | LangChain | |
| dc.subject | RAG | |
| dc.subject | LlamaIndex | |
| dc.subject | NLP | |
| dc.subject | Inteligencia artificial | |
| dc.subject.keyword | LangChain | |
| dc.subject.keyword | RAG | |
| dc.subject.keyword | LlamaIndex | |
| dc.subject.keyword | NLP | |
| dc.subject.keyword | Artificial intelligence | |
| dc.subject.lemb | Ingeniería Electrónica -- Tesis y disertaciones académicas | |
| dc.subject.lemb | Inteligencia computacional | spa |
| dc.subject.lemb | Procesamiento de lenguaje natural | spa |
| dc.subject.lemb | Redes transformer (Aprendizaje profundo) | spa |
| dc.subject.lemb | Planificación de recursos humanos | spa |
| dc.subject.lemb | Administración de personal | spa |
| dc.title | Implementación de redes tipo transformer en la selección estratégica de perfiles laborales a nivel empresarial | |
| dc.title.titleenglish | Implementation of transformer-type networks in the strategic selection of job profiles at the corporate level | |
| dc.type | bachelorThesis | |
| dc.type.coar | http://purl.org/coar/resource_type/c_7a1f | |
| dc.type.degree | Monografía | |
| dc.type.driver | info:eu-repo/semantics/bachelorThesis |
Archivos
Bloque original
1 - 3 de 3
Cargando...
- Nombre:
- TovarSanchezJuanSebastian2024.pdf
- Tamaño:
- 2.63 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Trabajo de Grado
No hay miniatura disponible
- Nombre:
- TovarSanchezJuanSebastian2024Anexos.zip
- Tamaño:
- 279.56 KB
- Formato:
No hay miniatura disponible
- Nombre:
- Licencia de uso y publicacion.pdf
- Tamaño:
- 216.65 KB
- Formato:
- Adobe Portable Document Format
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 7 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
