Metodología para la gestión de información agricola implementando cubos de datos para el fortalecimiento de aplicaciones espaciales con machine learning
Fecha
Autores
Autor corporativo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Compartir
Altmetric
Resumen
Contemporary agriculture faces challenges arising from climate change, economic pressures, and population growth that require the adoption of techniques and systems capable of improving productivity without compromising environmental sustainability or food security. In this context, satellite observations together with machine learning methods can contribute to monitoring and decision-making; however, their practical application is limited by input heterogeneity, the lack of interoperability standards, and radiometric variability among products. This thesis addresses these limitations by proposing a methodology for the management of agricultural information through multitemporal raster data cubes. The proposed methodology defines a modular workflow: requirements definition and field data capture (FieldMaps); acquisition of multitemporal series (PlanetScope and Sentinel-2); preprocessing and normalization (TOA and surface reflectance, atmospheric corrections with ENVI); calculation of spectral indices (NDVI, GNDVI, CLGreen, TVI, among others); spectral segmentation (mean-shift); and assembly of the raster cube in ArcGIS Pro. The cube organizes information by pixel and date, enabling temporal queries and the systematic extraction of training vectors for regression and classification models. Reference meteorological variables (e.g., NASA-POWER) are also incorporated to complement the inputs, with emphasis on their use as auxiliary data. Validation was performed through two case studies. The first involved estimating the phenological stage of onion in Tota (Boyacá), using 17 PlanetScope scenes (Dec 2023–May 2024) and a field control point recorded on 19 May 2024; linear regression, a multilayer perceptron (MLP) neural network, and Random Forest were compared, with the MLP obtaining the best results (R² = 0.91, MSE = 4.07). The second study addressed detection and classification of agricultural cover in prioritized areas (Putumayo, Guaviare, and Antioquia), comparing Random Forest and the Spectral Angle Mapper (SAM); Random Forest showed higher overall accuracy (94.4%), Kappa = 0.84, and recall for the “Crop” class close to 96%. The analysis highlights that organizing inputs into raster cubes contributes to greater spatial and radiometric coherence among sources and facilitates experimental repeatability. Nonetheless, practical limitations were identified: direct inclusion of meteorological variables in the models produced signs of overfitting in some cases; the availability of Surface Reflectance (SR) or Analysis Ready Data (ARD) products improves spectral consistency; and discrimination of very similar species may require inputs with higher spectral resolution. Consequently, cautious use of auxiliary variables is advised, along with prioritization of SR/ARD products and evaluation of hyperspectral inputs when discrimination requirements justify them. Overall, the document presents a modest technical proposal applicable to operational contexts, accompanied by empirical evidence and practical recommendations for its implementation and scaling in settings with varying resources and capacities.
