Automatización del aprovisionamiento de infraestructura para lagos de datos (Data Lakes) en la nube de AWS para organizaciones data driven
Fecha
Autor corporativo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Compartir
Director
Altmetric
Resumen
This project proposes the design and implementation of a comprehensive framework that automates the creation and management of a data lake on Amazon Web Services (AWS). The initiative arises from the difficulties organizations face in manually deploying secure, scalable, and consistent data infrastructures. By using Infrastructure as Code (IaC) with Terraform, CI/CD pipelines with Jenkins and GitHub, and serverless architectures based on AWS Lambda and Step Functions, a fully automated environment is achieved that reduces errors, provisioning times, and operating costs. The architecture follows the Medallion model (Landing, Bronze, Silver, and Gold), ensuring a controlled data flow from ingestion to final analysis, integrating services such as S3, Glue, Athena, IAM, CloudTrail, and DataZone. Furthermore, the project applies DevOps and DataOps principles along with the Scrum methodology, enabling iterative implementation, continuous validation, and agile adaptation to requirements. The result is a modular, reproducible, and secure infrastructure that demonstrates how automation accelerates digital transformation and consolidate the way for a data-driven organizational culture.
