Automatización del aprovisionamiento de infraestructura para lagos de datos (Data Lakes) en la nube de AWS para organizaciones data driven
| dc.contributor.advisor | Leguizamón Páez, Miguel Ángel | |
| dc.contributor.author | Rodríguez Serrato , Julián David | |
| dc.contributor.author | Quiñonez Zapata, Martín Camilo | |
| dc.contributor.orcid | Leguizamón Páez, Miguel Ángel [0000-0003-0457-0126] | |
| dc.date.accessioned | 2025-11-04T15:02:02Z | |
| dc.date.available | 2025-11-04T15:02:02Z | |
| dc.date.created | 2025-09-29 | |
| dc.description | Este proyecto propone el diseño e implementación de un framework integral que automatiza la creación y gestión de un lago de datos en Amazon Web Services (AWS). La iniciativa surge ante las dificultades que enfrentan las organizaciones para desplegar infraestructuras de datos seguras, escalables y consistentes de forma manual. Mediante el uso de Infraestructura como Código (IaC) con Terraform, pipelines CI/CD con Jenkins y GitHub, y arquitecturas serverless basadas en AWS Lambda y Step Functions, se logra un entorno completamente automatizado que reduce errores, tiempos de aprovisionamiento y costos operativos. La arquitectura sigue el modelo Medallón (Aterrizaje, Bronce, Plata y Oro), garantizando un flujo de datos controlado desde su ingesta hasta el análisis final, integrando servicios como S3, Glue, Athena, IAM, CloudTrail y DataZone. Además, el proyecto aplica principios DevOps y DataOps junto con la metodología Scrum, lo que permitió una implementación iterativa, validación continua y adaptación ágil a los requerimientos. El resultado es una infraestructura modular, reproducible y segura, que demuestra cómo la automatización acelera la transformación digital y consolida el camino hacia una cultura organizacional orientada a los datos. | |
| dc.description.abstract | This project proposes the design and implementation of a comprehensive framework that automates the creation and management of a data lake on Amazon Web Services (AWS). The initiative arises from the difficulties organizations face in manually deploying secure, scalable, and consistent data infrastructures. By using Infrastructure as Code (IaC) with Terraform, CI/CD pipelines with Jenkins and GitHub, and serverless architectures based on AWS Lambda and Step Functions, a fully automated environment is achieved that reduces errors, provisioning times, and operating costs. The architecture follows the Medallion model (Landing, Bronze, Silver, and Gold), ensuring a controlled data flow from ingestion to final analysis, integrating services such as S3, Glue, Athena, IAM, CloudTrail, and DataZone. Furthermore, the project applies DevOps and DataOps principles along with the Scrum methodology, enabling iterative implementation, continuous validation, and agile adaptation to requirements. The result is a modular, reproducible, and secure infrastructure that demonstrates how automation accelerates digital transformation and consolidate the way for a data-driven organizational culture. | |
| dc.format.mimetype | ||
| dc.identifier.uri | http://hdl.handle.net/11349/99665 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Distrital Francisco José de Caldas | |
| dc.relation.references | Nargesian, F., Zhu, E., Miller, R. J., & Pu, K. Q. (2019). Data lake management: Challenges and opportunities [Documento técnico]. University of Toronto. https://www.cs.toronto.edu/~fnargesian/Data_Lake_Management.pdf | |
| dc.relation.references | Wieder, P., & Nolte, H. (2022). Toward data lakes as central building blocks for data management and analysis. Frontiers in Big Data, 5, 945720. https://doi.org/10.3389/fdata.2022.945720 | |
| dc.relation.references | Hai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590. https://doi.org/10.1109/TKDE.2023.3270101 | |
| dc.relation.references | Sreepathy, H. V., Rao, B. D., Jaysubramanian, M. K., & Rao, B. D. (2024). Data ingestions as a service (DIaaS): A unified interface for heterogeneous data ingestion, transformation, and metadata management for data lake. IEEE Access, 12, 156131-156145. https://doi.org/10.1109/ACCESS.2024.3479736 | |
| dc.relation.references | Azzabi, S., Alfughi, Z., & Ouda, A. (2024). Data lakes: A survey of concepts and architectures. Computers, 13(7), 183. https://doi.org/10.3390/computers13070183 | |
| dc.relation.references | Khine, P. P., & Wang, Z. S. (2018). Data lake: A new ideology in big data era. ITM Web of Conferences, 17, 03025. https://doi.org/10.1051/itmconf/20181703025 | |
| dc.relation.references | Nambiar, A., & Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big Data and Cognitive Computing, 6(4), 132. https://doi.org/10.3390/bdcc6040132 | |
| dc.relation.references | HashiCorp. (s.f.). Terraform. https://www.terraform.io/ | |
| dc.relation.references | Amazon Web Services. (2021). What is cloud scalability? https://aws.amazon.com/what-is-cloud-scalability/ | |
| dc.relation.references | Morris, K. (2021). Infrastructure as code: Designing and delivering dynamic systems for the cloud age (3ra ed.). O'Reilly Media. | |
| dc.relation.references | Huerlo Quintero, J. R. (2020). Terraform como herramienta para automatizar la creación de infraestructuras siguiendo el concepto "Infraestructura como código" [Tesis de pregrado]. Pontificia Universidad Católica del Ecuador. | |
| dc.relation.references | Rahman, A., Mahdavi-Hezaveh, R., & Williams, L. (2019). A systematic mapping study of infrastructure as code research. Information and Software Technology, 108, 65-77. https://doi.org/10.1016/j.infsof.2018.12.004 | |
| dc.relation.references | Wang, H., Kishiyama, B., Lopez, D., & Yang, J. (2024). An overview of infrastructure as code (IaC) with performance and availability assessment on Google Cloud Platform. En K. Daimi & A. Al Sadoon (Eds.), Proceedings of the Second International Conference on Advances in Computing Research (ACR'24). Lecture Notes in Networks and Systems (Vol. 956). Springer. https://doi.org/10.1007/978-3-031-56950-0_41 | |
| dc.relation.references | Jenkins. (s.f.). Jenkins: Build great things at any scale. https://www.jenkins.io/ | |
| dc.relation.references | Docker Inc. (s.f.). Docker: Accelerated container application development. https://www.docker.com/ | |
| dc.relation.references | Fischer, H., Wiener, M., Strahringer, S., Kotlarsky, J., & Bley, K. (2023). Data-driven organizations: Review, conceptual framework, and empirical illustration. Australasian Journal of Information Systems, 27. https://doi.org/10.3127/ajis.v27i0.4425 | |
| dc.relation.references | Jorba, J., & Joaquín, L. S. (2020). Automatización de infraestructura IT con IaC [Trabajo final de máster]. Universitat Oberta de Catalunya. https://openaccess.uoc.edu/handle/10609/108666 | |
| dc.relation.references | Madhala, P., Li, H., & Helander, N. (2020). Organizational capabilities in data-driven value creation: A literature review. En Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KMIS (pp. 108-116). SciTePress. https://doi.org/10.5220/0010175601080116 | |
| dc.relation.references | Behera, L., & Chilukoori, V. V. R. (2024). End-to-end data pipelines: Redefining the architecture of data engineering in cloud environments. ESP International Journal of Advancements in Science & Technology, 2(4), 26-33. https://doi.org/10.56472/25839233/IJAST-V2I4P104 | |
| dc.relation.references | Moreno Martínez, J. (2022). CI/CD en infraestructura como código (IaC). Caso real en AWS [Trabajo final de máster]. Universitat Oberta de Catalunya. | |
| dc.relation.references | Ravat, F., & Zhao, Y. (2019). Data lakes: Trends and perspectives. En International Conference on Database and Expert Systems Applications. Springer. | |
| dc.relation.references | Tacuri Pajuña, F. M. (2023). Estrategias de arquitectura de solución escalables con aprovisionamiento de infraestructura automática (Infrastructure as Code - IaC) [Tesis de pregrado]. Universidad Politécnica Salesiana. | |
| dc.relation.references | Kumara, I., Garriga, M., Romeu, A. U., Di Nucci, D., Palomba, F., Tamburri, D. A., & van den Heuvel, W.-J. (2021). The do's and don'ts of infrastructure code: A systematic gray literature review. Information and Software Technology, 137, 106593. https://doi.org/10.1016/j.infsof.2021.106593 | |
| dc.relation.references | Manchana, R. (2023). Building a modern data foundation in the cloud: Data lakes and data lakehouses as key enablers. Journal of Artificial Intelligence, Machine Learning and Data Science, 1(1), 1098-1108. | |
| dc.relation.references | Robertson, K. (2022). Driven by data - A case study on how to become a more data-driven organization [Tesis de pregrado]. Haaga-Helia University of Applied Sciences. | |
| dc.relation.references | IBM. (s.f.). Almacenes de datos, data lakes y lakehouses de datos. https://www.ibm.com/es-es/think/topics/data warehouse-vs-data-lake-vs-data-lakehouse | |
| dc.relation.references | Integrating data warehouses with data lakes: A unified analytics solution. (2023). Innovative Computer Sciences Journal, 9(1). https://inscipub.com/ICSJ/article/view/ | |
| dc.relation.references | Ravi, V. K., Ayyagar, A., Krishna, K., Goel, P., Chhapola, A., & Jain, A. (2023). Data lake implementation in enterprise environments. International Journal of Progressive Research in Engineering Management and Science, 3(11), 449–469. https://doi.org/10.58257/IPREMS32250 | |
| dc.relation.references | Agudelo Patiño, J. C. (2020). Data lakes: Aplicaciones, herramientas y arquitecturas [Monografía de pregrado]. Universidad Tecnológica de Pereira. | |
| dc.relation.references | Morales, D., & Campos, A. (2024). Plantillas para la automatización de la infraestructura tecnológica en la nube de AWS para startups (CloudFlex) [Tesis de pregrado]. Universidad Distrital Francisco José de Caldas. https://repository.udistrital.edu.co/server/api/core/bitstreams/89e70e29-b623-44d7-9920-25d77164d609/content | |
| dc.relation.references | Ducuara, J. (2023). Migración a una arquitectura en la nube para el procesamiento de datos abiertos oceanográficos [Tesis de pregrado]. Universidad Católica de Colombia. https://repository.ucatolica.edu.co/server/api/core/bitstreams/2c67fc70-d311-48aa-86f9-184a52f2df84/content | |
| dc.relation.references | GitHub. (s.f.). GitHub: Where the world builds software. https://github.com/ | |
| dc.relation.references | Acuña, J. G. (2025, 7 de enero). Costos de un servidor: diferencias entre servidores on premise vs en la nube y costes. Pleo. https://blog.pleo.io/es/costos-servidor | |
| dc.relation.references | Maddula, S. (2024, 3 de octubre). Estimates for data warehouse cost [+comparison]. Hevo. https://hevodata.com/learn/data-warehouse-cost/ | |
| dc.relation.references | Cost breakdown of cloud and on-premise software. (2021, 4 de marzo). Centerbase. https://centerbase.com/blog/cost-breakdown-of-cloud-and-on-premise-software/ | |
| dc.relation.references | How to build a data warehouse from scratch: Cost + examples. (2024, 2 de julio). Airbyte. https://airbyte.com/data-engineering-resources/building-data-warehouse | |
| dc.relation.references | Dearmer, A. (2020, 11 de noviembre). True costs of building and implementing your data warehouse. Integrate.io. https://www.integrate.io/blog/the-true-cost-of-a-data-warehouse/ | |
| dc.relation.references | How much does a data warehouse cost? (2025). Data Sleek. https://data-sleek.com/blog/how-much-does-a-data-warehouse-cost/ | |
| dc.relation.references | Ahmed, I. (2021, 31 de marzo). Estimaciones de costos para la construcción de un almacén de datos. Astera. https://www.astera.com/es/type/blog/building-a-data-warehouse-cost-estimation/ | |
| dc.relation.references | Data warehouse cost guide (updated internal & external costs). (2025). Datakulture. https://datakulture.com/blog/data-warehouse-cost-estimator/ | |
| dc.relation.references | How much does it cost to set up a data warehouse in 2024? (2024, 4 de abril). LinkedIn. https://www.linkedin.com/pulse/how-much-does-cost-set-up-data-warehouse-2024-datakulture-fwevc/ | |
| dc.relation.references | Actian. (s.f.). Total cost of usage: The key to understanding the true costs of a cloud data warehouse [Documento técnico]. https://go.actian.com/rs/176-HNM-524/images/Actian%20Total%20Cost%20of%20Usage%20Whitepaper.pdf | |
| dc.relation.references | Servidor para rack Dell PowerEdge R740: servidores | Dell España. (s. f.). Dell. https://www.dell.com/es-es/shop/servidores-dell-poweredge/smart-selection-poweredge-r740-server/spd/poweredge-r740/per7403r#features_section | |
| dc.relation.references | Servidor para rack Dell PowerEdge R640: servidores | Dell España. (s. f.). Dell. https://www.dell.com/es-es/shop/servidores-dell-poweredge/smart-selection-poweredge-r640-server/spd/poweredge-r640/per6404r | |
| dc.relation.references | Dell EMC PowerStore Price Calculator. (s. f.). https://icgintl.com/dell-emc-powerstore-price-calculator | |
| dc.relation.references | Router-Switch. (s. f.) https://www.router-switch.com/es/s6730-h24x6c.html | |
| dc.relation.references | APC Smart-UPS VT,16 kW /20 kVA al mejor precio. (s. f.). Pst de Colombia Expertos En Servidores, Almacenamiento, Impresión y Redes HP IBM - DELL – ORACLE. https://servidoresalmacenamientoredes.com/ups-apc/12-apc-smart-ups-vt16-kw-20-kva.html | |
| dc.relation.references | Integra-Smart. (s. f.) https://www.integra-smart.com/product-page/aire-acondicionado-precisión-para-data-center | |
| dc.relation.references | Limited, R. S. (s. f.). Precio Huawei USG6000 - Lista de precios de Huawei 2022. https://itprice.com/es/huawei-price-list/usg6000.html | |
| dc.relation.references | VeEAM Pricing & Instance Calculator. (s. f.). https://www.veeam.com/solutions/small-business/pricing-calculator.html | |
| dc.relation.references | Revolution Soft. (s. f.). Comprar VMware VSphere Hypervisor (ESXI) 8 - Revolution Soft | Colombia. https://revolutionsoft.com.co/vmware/vmware-vsphere-hypervisor-esxi-8.html#/122-cpus-8_cpus | |
| dc.relation.references | Base Database Service pricing. (s. f.). Oracle. https://www.oracle.com/database/base-database-service/pricing/ | |
| dc.relation.references | Paessler PRTG Network Monitor pricing | Choose your plan. (n.d.). Paessler - the Monitoring Experts. https://www.paessler.com/pricing | |
| dc.relation.references | ingeniero datacenter Salario en Colombia—Salario medio. (s. f.). Talent.com. https://co.talent.com/salary?job=ingeniero+datacenter | |
| dc.relation.references | Sueldo: Senior Database Administrator en Bogota, Colombia 2025. (s. f.). Glassdoor. https://www.glassdoor.com.mx/Sueldos/bogota-colombia-senior-database-administrator-sueldo-SRCH_IL.0,15_IM1064_KO16,45.htm | |
| dc.relation.references | Sueldo: Systems Administrator en Colombia 2025. (s. f.). Glassdoor. https://www.glassdoor.com.mx/Sueldos/colombia-systems-administrator-sueldo-SRCH_IL.0,8_IN54_KO9,30.htm | |
| dc.relation.references | Comunicado de XM sobre las variables del mercado de energía en agosto de 2024. (s. f.). Portal XM. https://www.xm.com.co/noticias/7131-comunicado-de-xm-sobre-las-variables-del-mercado-de-energia-en-agosto-de-2024 | |
| dc.relation.references | Lago de datos en AWS - AWS Pricing Calculator. (n.d.). https://calculator.aws/#/estimate?id=81a100804ff44ff08397c37564ce1db713079ba2 | |
| dc.rights.acceso | Abierto (Texto Completo) | |
| dc.rights.accessrights | OpenAccess | |
| dc.subject | Lagos de datos | |
| dc.subject | AWS | |
| dc.subject | IaC | |
| dc.subject | Automatización | |
| dc.subject | DataOps | |
| dc.subject | DevOps | |
| dc.subject.keyword | Data lake | |
| dc.subject.keyword | AWS | |
| dc.subject.keyword | IaC | |
| dc.subject.keyword | Automation | |
| dc.subject.keyword | DataOps | |
| dc.subject.keyword | Devops | |
| dc.subject.lemb | Ingeniería Telemática -- Tesis y disertaciones académicas | |
| dc.subject.lemb | Informática en la nube | |
| dc.subject.lemb | Datos masivos | |
| dc.subject.lemb | Automatización | |
| dc.subject.lemb | Amazon Web Services | |
| dc.subject.lemb | Ingeniería de software | |
| dc.title | Automatización del aprovisionamiento de infraestructura para lagos de datos (Data Lakes) en la nube de AWS para organizaciones data driven | |
| dc.title.titleenglish | Automating Infrastructure Provisioning for Data Lakes in the AWS Cloud for Data-Driven Organizations | |
| dc.type | bachelorThesis | |
| dc.type.coar | http://purl.org/coar/resource_type/c_7a1f | |
| dc.type.degree | Monografía | |
| dc.type.driver | info:eu-repo/semantics/bachelorThesis |
Archivos
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 7 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
