Automatización del aprovisionamiento de infraestructura para lagos de datos (Data Lakes) en la nube de AWS para organizaciones data driven

dc.contributor.advisorLeguizamón Páez, Miguel Ángel
dc.contributor.authorRodríguez Serrato , Julián David
dc.contributor.authorQuiñonez Zapata, Martín Camilo
dc.contributor.orcidLeguizamón Páez, Miguel Ángel [0000-0003-0457-0126]
dc.date.accessioned2025-11-04T15:02:02Z
dc.date.available2025-11-04T15:02:02Z
dc.date.created2025-09-29
dc.descriptionEste proyecto propone el diseño e implementación de un framework integral que automatiza la creación y gestión de un lago de datos en Amazon Web Services (AWS). La iniciativa surge ante las dificultades que enfrentan las organizaciones para desplegar infraestructuras de datos seguras, escalables y consistentes de forma manual. Mediante el uso de Infraestructura como Código (IaC) con Terraform, pipelines CI/CD con Jenkins y GitHub, y arquitecturas serverless basadas en AWS Lambda y Step Functions, se logra un entorno completamente automatizado que reduce errores, tiempos de aprovisionamiento y costos operativos. La arquitectura sigue el modelo Medallón (Aterrizaje, Bronce, Plata y Oro), garantizando un flujo de datos controlado desde su ingesta hasta el análisis final, integrando servicios como S3, Glue, Athena, IAM, CloudTrail y DataZone. Además, el proyecto aplica principios DevOps y DataOps junto con la metodología Scrum, lo que permitió una implementación iterativa, validación continua y adaptación ágil a los requerimientos. El resultado es una infraestructura modular, reproducible y segura, que demuestra cómo la automatización acelera la transformación digital y consolida el camino hacia una cultura organizacional orientada a los datos.
dc.description.abstractThis project proposes the design and implementation of a comprehensive framework that automates the creation and management of a data lake on Amazon Web Services (AWS). The initiative arises from the difficulties organizations face in manually deploying secure, scalable, and consistent data infrastructures. By using Infrastructure as Code (IaC) with Terraform, CI/CD pipelines with Jenkins and GitHub, and serverless architectures based on AWS Lambda and Step Functions, a fully automated environment is achieved that reduces errors, provisioning times, and operating costs. The architecture follows the Medallion model (Landing, Bronze, Silver, and Gold), ensuring a controlled data flow from ingestion to final analysis, integrating services such as S3, Glue, Athena, IAM, CloudTrail, and DataZone. Furthermore, the project applies DevOps and DataOps principles along with the Scrum methodology, enabling iterative implementation, continuous validation, and agile adaptation to requirements. The result is a modular, reproducible, and secure infrastructure that demonstrates how automation accelerates digital transformation and consolidate the way for a data-driven organizational culture.
dc.format.mimetypepdf
dc.identifier.urihttp://hdl.handle.net/11349/99665
dc.language.isospa
dc.publisherUniversidad Distrital Francisco José de Caldas
dc.relation.referencesNargesian, F., Zhu, E., Miller, R. J., & Pu, K. Q. (2019). Data lake management: Challenges and opportunities [Documento técnico]. University of Toronto. https://www.cs.toronto.edu/~fnargesian/Data_Lake_Management.pdf
dc.relation.referencesWieder, P., & Nolte, H. (2022). Toward data lakes as central building blocks for data management and analysis. Frontiers in Big Data, 5, 945720. https://doi.org/10.3389/fdata.2022.945720
dc.relation.referencesHai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590. https://doi.org/10.1109/TKDE.2023.3270101
dc.relation.referencesSreepathy, H. V., Rao, B. D., Jaysubramanian, M. K., & Rao, B. D. (2024). Data ingestions as a service (DIaaS): A unified interface for heterogeneous data ingestion, transformation, and metadata management for data lake. IEEE Access, 12, 156131-156145. https://doi.org/10.1109/ACCESS.2024.3479736
dc.relation.referencesAzzabi, S., Alfughi, Z., & Ouda, A. (2024). Data lakes: A survey of concepts and architectures. Computers, 13(7), 183. https://doi.org/10.3390/computers13070183
dc.relation.referencesKhine, P. P., & Wang, Z. S. (2018). Data lake: A new ideology in big data era. ITM Web of Conferences, 17, 03025. https://doi.org/10.1051/itmconf/20181703025
dc.relation.referencesNambiar, A., & Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big Data and Cognitive Computing, 6(4), 132. https://doi.org/10.3390/bdcc6040132
dc.relation.referencesHashiCorp. (s.f.). Terraform. https://www.terraform.io/
dc.relation.referencesAmazon Web Services. (2021). What is cloud scalability? https://aws.amazon.com/what-is-cloud-scalability/
dc.relation.referencesMorris, K. (2021). Infrastructure as code: Designing and delivering dynamic systems for the cloud age (3ra ed.). O'Reilly Media.
dc.relation.referencesHuerlo Quintero, J. R. (2020). Terraform como herramienta para automatizar la creación de infraestructuras siguiendo el concepto "Infraestructura como código" [Tesis de pregrado]. Pontificia Universidad Católica del Ecuador.
dc.relation.referencesRahman, A., Mahdavi-Hezaveh, R., & Williams, L. (2019). A systematic mapping study of infrastructure as code research. Information and Software Technology, 108, 65-77. https://doi.org/10.1016/j.infsof.2018.12.004
dc.relation.referencesWang, H., Kishiyama, B., Lopez, D., & Yang, J. (2024). An overview of infrastructure as code (IaC) with performance and availability assessment on Google Cloud Platform. En K. Daimi & A. Al Sadoon (Eds.), Proceedings of the Second International Conference on Advances in Computing Research (ACR'24). Lecture Notes in Networks and Systems (Vol. 956). Springer. https://doi.org/10.1007/978-3-031-56950-0_41
dc.relation.referencesJenkins. (s.f.). Jenkins: Build great things at any scale. https://www.jenkins.io/
dc.relation.referencesDocker Inc. (s.f.). Docker: Accelerated container application development. https://www.docker.com/
dc.relation.referencesFischer, H., Wiener, M., Strahringer, S., Kotlarsky, J., & Bley, K. (2023). Data-driven organizations: Review, conceptual framework, and empirical illustration. Australasian Journal of Information Systems, 27. https://doi.org/10.3127/ajis.v27i0.4425
dc.relation.referencesJorba, J., & Joaquín, L. S. (2020). Automatización de infraestructura IT con IaC [Trabajo final de máster]. Universitat Oberta de Catalunya. https://openaccess.uoc.edu/handle/10609/108666
dc.relation.referencesMadhala, P., Li, H., & Helander, N. (2020). Organizational capabilities in data-driven value creation: A literature review. En Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KMIS (pp. 108-116). SciTePress. https://doi.org/10.5220/0010175601080116
dc.relation.referencesBehera, L., & Chilukoori, V. V. R. (2024). End-to-end data pipelines: Redefining the architecture of data engineering in cloud environments. ESP International Journal of Advancements in Science & Technology, 2(4), 26-33. https://doi.org/10.56472/25839233/IJAST-V2I4P104
dc.relation.referencesMoreno Martínez, J. (2022). CI/CD en infraestructura como código (IaC). Caso real en AWS [Trabajo final de máster]. Universitat Oberta de Catalunya.
dc.relation.referencesRavat, F., & Zhao, Y. (2019). Data lakes: Trends and perspectives. En International Conference on Database and Expert Systems Applications. Springer.
dc.relation.referencesTacuri Pajuña, F. M. (2023). Estrategias de arquitectura de solución escalables con aprovisionamiento de infraestructura automática (Infrastructure as Code - IaC) [Tesis de pregrado]. Universidad Politécnica Salesiana.
dc.relation.referencesKumara, I., Garriga, M., Romeu, A. U., Di Nucci, D., Palomba, F., Tamburri, D. A., & van den Heuvel, W.-J. (2021). The do's and don'ts of infrastructure code: A systematic gray literature review. Information and Software Technology, 137, 106593. https://doi.org/10.1016/j.infsof.2021.106593
dc.relation.referencesManchana, R. (2023). Building a modern data foundation in the cloud: Data lakes and data lakehouses as key enablers. Journal of Artificial Intelligence, Machine Learning and Data Science, 1(1), 1098-1108.
dc.relation.referencesRobertson, K. (2022). Driven by data - A case study on how to become a more data-driven organization [Tesis de pregrado]. Haaga-Helia University of Applied Sciences.
dc.relation.referencesIBM. (s.f.). Almacenes de datos, data lakes y lakehouses de datos. https://www.ibm.com/es-es/think/topics/data warehouse-vs-data-lake-vs-data-lakehouse
dc.relation.referencesIntegrating data warehouses with data lakes: A unified analytics solution. (2023). Innovative Computer Sciences Journal, 9(1). https://inscipub.com/ICSJ/article/view/
dc.relation.referencesRavi, V. K., Ayyagar, A., Krishna, K., Goel, P., Chhapola, A., & Jain, A. (2023). Data lake implementation in enterprise environments. International Journal of Progressive Research in Engineering Management and Science, 3(11), 449–469. https://doi.org/10.58257/IPREMS32250
dc.relation.referencesAgudelo Patiño, J. C. (2020). Data lakes: Aplicaciones, herramientas y arquitecturas [Monografía de pregrado]. Universidad Tecnológica de Pereira.
dc.relation.referencesMorales, D., & Campos, A. (2024). Plantillas para la automatización de la infraestructura tecnológica en la nube de AWS para startups (CloudFlex) [Tesis de pregrado]. Universidad Distrital Francisco José de Caldas. https://repository.udistrital.edu.co/server/api/core/bitstreams/89e70e29-b623-44d7-9920-25d77164d609/content
dc.relation.referencesDucuara, J. (2023). Migración a una arquitectura en la nube para el procesamiento de datos abiertos oceanográficos [Tesis de pregrado]. Universidad Católica de Colombia. https://repository.ucatolica.edu.co/server/api/core/bitstreams/2c67fc70-d311-48aa-86f9-184a52f2df84/content
dc.relation.referencesGitHub. (s.f.). GitHub: Where the world builds software. https://github.com/
dc.relation.referencesAcuña, J. G. (2025, 7 de enero). Costos de un servidor: diferencias entre servidores on premise vs en la nube y costes. Pleo. https://blog.pleo.io/es/costos-servidor
dc.relation.referencesMaddula, S. (2024, 3 de octubre). Estimates for data warehouse cost [+comparison]. Hevo. https://hevodata.com/learn/data-warehouse-cost/
dc.relation.referencesCost breakdown of cloud and on-premise software. (2021, 4 de marzo). Centerbase. https://centerbase.com/blog/cost-breakdown-of-cloud-and-on-premise-software/
dc.relation.referencesHow to build a data warehouse from scratch: Cost + examples. (2024, 2 de julio). Airbyte. https://airbyte.com/data-engineering-resources/building-data-warehouse
dc.relation.referencesDearmer, A. (2020, 11 de noviembre). True costs of building and implementing your data warehouse. Integrate.io. https://www.integrate.io/blog/the-true-cost-of-a-data-warehouse/
dc.relation.referencesHow much does a data warehouse cost? (2025). Data Sleek. https://data-sleek.com/blog/how-much-does-a-data-warehouse-cost/
dc.relation.referencesAhmed, I. (2021, 31 de marzo). Estimaciones de costos para la construcción de un almacén de datos. Astera. https://www.astera.com/es/type/blog/building-a-data-warehouse-cost-estimation/
dc.relation.referencesData warehouse cost guide (updated internal & external costs). (2025). Datakulture. https://datakulture.com/blog/data-warehouse-cost-estimator/
dc.relation.referencesHow much does it cost to set up a data warehouse in 2024? (2024, 4 de abril). LinkedIn. https://www.linkedin.com/pulse/how-much-does-cost-set-up-data-warehouse-2024-datakulture-fwevc/
dc.relation.referencesActian. (s.f.). Total cost of usage: The key to understanding the true costs of a cloud data warehouse [Documento técnico]. https://go.actian.com/rs/176-HNM-524/images/Actian%20Total%20Cost%20of%20Usage%20Whitepaper.pdf
dc.relation.referencesServidor para rack Dell PowerEdge R740: servidores | Dell España. (s. f.). Dell. https://www.dell.com/es-es/shop/servidores-dell-poweredge/smart-selection-poweredge-r740-server/spd/poweredge-r740/per7403r#features_section
dc.relation.referencesServidor para rack Dell PowerEdge R640: servidores | Dell España. (s. f.). Dell. https://www.dell.com/es-es/shop/servidores-dell-poweredge/smart-selection-poweredge-r640-server/spd/poweredge-r640/per6404r
dc.relation.referencesDell EMC PowerStore Price Calculator. (s. f.). https://icgintl.com/dell-emc-powerstore-price-calculator
dc.relation.referencesRouter-Switch. (s. f.) https://www.router-switch.com/es/s6730-h24x6c.html
dc.relation.referencesAPC Smart-UPS VT,16 kW /20 kVA al mejor precio. (s. f.). Pst de Colombia Expertos En Servidores, Almacenamiento, Impresión y Redes HP IBM - DELL – ORACLE. https://servidoresalmacenamientoredes.com/ups-apc/12-apc-smart-ups-vt16-kw-20-kva.html
dc.relation.referencesIntegra-Smart. (s. f.) https://www.integra-smart.com/product-page/aire-acondicionado-precisión-para-data-center
dc.relation.referencesLimited, R. S. (s. f.). Precio Huawei USG6000 - Lista de precios de Huawei 2022. https://itprice.com/es/huawei-price-list/usg6000.html
dc.relation.referencesVeEAM Pricing & Instance Calculator. (s. f.). https://www.veeam.com/solutions/small-business/pricing-calculator.html
dc.relation.referencesRevolution Soft. (s. f.). Comprar VMware VSphere Hypervisor (ESXI) 8 - Revolution Soft | Colombia. https://revolutionsoft.com.co/vmware/vmware-vsphere-hypervisor-esxi-8.html#/122-cpus-8_cpus
dc.relation.referencesBase Database Service pricing. (s. f.). Oracle. https://www.oracle.com/database/base-database-service/pricing/
dc.relation.referencesPaessler PRTG Network Monitor pricing | Choose your plan. (n.d.). Paessler - the Monitoring Experts. https://www.paessler.com/pricing
dc.relation.referencesingeniero datacenter Salario en Colombia—Salario medio. (s. f.). Talent.com. https://co.talent.com/salary?job=ingeniero+datacenter
dc.relation.referencesSueldo: Senior Database Administrator en Bogota, Colombia 2025. (s. f.). Glassdoor. https://www.glassdoor.com.mx/Sueldos/bogota-colombia-senior-database-administrator-sueldo-SRCH_IL.0,15_IM1064_KO16,45.htm
dc.relation.referencesSueldo: Systems Administrator en Colombia 2025. (s. f.). Glassdoor. https://www.glassdoor.com.mx/Sueldos/colombia-systems-administrator-sueldo-SRCH_IL.0,8_IN54_KO9,30.htm
dc.relation.referencesComunicado de XM sobre las variables del mercado de energía en agosto de 2024. (s. f.). Portal XM. https://www.xm.com.co/noticias/7131-comunicado-de-xm-sobre-las-variables-del-mercado-de-energia-en-agosto-de-2024
dc.relation.referencesLago de datos en AWS - AWS Pricing Calculator. (n.d.). https://calculator.aws/#/estimate?id=81a100804ff44ff08397c37564ce1db713079ba2
dc.rights.accesoAbierto (Texto Completo)
dc.rights.accessrightsOpenAccess
dc.subjectLagos de datos
dc.subjectAWS
dc.subjectIaC
dc.subjectAutomatización
dc.subjectDataOps
dc.subjectDevOps
dc.subject.keywordData lake
dc.subject.keywordAWS
dc.subject.keywordIaC
dc.subject.keywordAutomation
dc.subject.keywordDataOps
dc.subject.keywordDevops
dc.subject.lembIngeniería Telemática -- Tesis y disertaciones académicas
dc.subject.lembInformática en la nube
dc.subject.lembDatos masivos
dc.subject.lembAutomatización
dc.subject.lembAmazon Web Services
dc.subject.lembIngeniería de software
dc.titleAutomatización del aprovisionamiento de infraestructura para lagos de datos (Data Lakes) en la nube de AWS para organizaciones data driven
dc.title.titleenglishAutomating Infrastructure Provisioning for Data Lakes in the AWS Cloud for Data-Driven Organizations
dc.typebachelorThesis
dc.type.coarhttp://purl.org/coar/resource_type/c_7a1f
dc.type.degreeMonografía
dc.type.driverinfo:eu-repo/semantics/bachelorThesis

Archivos

Bloque original

Mostrando 1 - 2 de 2
No hay miniatura disponible
Nombre:
QuiñonezZapataMartinCamilo2025.pdf
Tamaño:
9.54 MB
Formato:
Adobe Portable Document Format
No hay miniatura disponible
Nombre:
Licencia de uso y publicación.pdf
Tamaño:
278.9 KB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
7 KB
Formato:
Item-specific license agreed upon to submission
Descripción: