Metodología machine learning para el tratamiento de imágenes computarizadas en pacientes con cancer de pulmon
| dc.contributor.advisor | Salcedo Parra, Octavio José | |
| dc.contributor.author | Cely Granados , Oscar Leonardo | |
| dc.contributor.orcid | Salcedo Parra Octavio José [0000-0002-0767-8522] | |
| dc.date.accessioned | 2025-09-22T19:09:16Z | |
| dc.date.available | 2025-09-22T19:09:16Z | |
| dc.date.created | 2025-06-12 | |
| dc.description | La presente investigación propone una metodología computacional orientada a la identificación de patrones asociados al cáncer de pulmón, utilizando exclusivamente herramientas de machine learning y deep learning implementadas en Python. El estudio se fundamenta en el análisis del conjunto de datos LIDC-IDRI (The Lung Image Database Consortium and Image Database Resource Initiative), proporcionado por el Instituto Nacional del Cáncer de los Estados Unidos de América, el cual contiene imágenes médicas en formato DICOM (Digital Imaging and Communications in Medicine). DICOM es el estándar internacional para la transmisión, almacenamiento y procesamiento de imágenes médicas, que permite integrar información del paciente, características de adquisición y la imagen misma en un único archivo. Además de las imágenes en formato DICOM, el conjunto de datos incluye segmentaciones radiólogas, recuentos de nódulos y diagnósticos clínicos en archivos estructurados. Esta metodología se enfoca en el procesamiento, integración y análisis de grandes volúmenes de datos, con el objetivo de explorar correlaciones y comportamientos significativos dentro de las variables disponibles. Aunque el propósito no es realizar diagnóstico clínico directo, los patrones encontrados podrían servir como base para investigaciones futuras y como apoyo en el desarrollo de sistemas de ayuda al diagnóstico. Cada paciente puede generar entre 10 y 15 GB de información, lo que plantea desafíos relevantes en cuanto al procesamiento eficiente, la organización y la interpretación de datos. Este trabajo busca contribuir al fortalecimiento del análisis computacional aplicado al cáncer de pulmón, desde una perspectiva ingenieril, exploratoria y centrada en el aprovechamiento de datos médicos complejos. | |
| dc.description.abstract | This research proposes a computational methodology aimed at identifying patterns associated with lung cancer, using exclusively machine learning and deep learning tools implemented in Python. The study is based on the analysis of the LIDC-IDRI dataset (The Lung Image Database Consortium and Image Database Resource Initiative), provided by the U.S. National Cancer Institute, which contains medical images in DICOM (Digital Imaging and Communications in Medicine) format. DICOM is the international standard for the transmission, storage, and processing of medical images, allowing the integration of patient information, acquisition characteristics, and the image itself into a single file. In addition to DICOM images, the dataset includes radiologist segmentations, nodule counts, and clinical diagnoses in structured files. This methodology focuses on the processing, integration, and analysis of large volumes of data, with the aim of exploring significant correlations and behaviors within the available variables. Although the purpose is not to provide direct clinical diagnosis, the patterns identified could serve as a basis for future research and support the development of diagnostic assistance systems. Each patient can generate between 10 and 15 GB of information, which poses relevant challenges regarding efficient processing, organization, and data interpretation. This work seeks to contribute to strengthening computational analysis applied to lung cancer, from an engineering, exploratory perspective, centered on leveraging complex medical data. | |
| dc.format.mimetype | ||
| dc.identifier.uri | http://hdl.handle.net/11349/99154 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Distritral Francisco José de Caldas | |
| dc.relation.references | Alsinglawi, B., Alshari, O., Alorjani, M., Mubin, O., Alnajjar, F., Novoa, M., & Darwish, O. (2022). An explainable machine learning framework for lung cancer hospital length of stay prediction. Scientific Reports, 12, 607. https://doi.org/10.1038/s41598-021-04608-7 Armato, S. G., McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer, C. R., Reeves, A. P., ... & Clarke, L. P. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915– 931. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159. Chicco, D., Warrens, M. J., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more informative than F1 score and accuracy in binary classification evaluation. Information Processing & Management, 58(2), 102447. Contreras Bravo, L. E., & Padilla Beltrán, J. E. (2024). Ciencia de datos con Python: Transformación y selección de variables. Ediciones de la U. Cuevas Álvarez, A. (2018). Aplicaciones gráficas con Python 3. Ra-Ma Editorial. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., & Aerts, H. J. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, 18(8), 500–510. Hush, J. (2021). Python para el análisis de datos: Una guía para principiantes para aprender el análisis de datos con la programación Python. Giovanni Tortora Ediciones. Jacobs, C., et al. (2021). Deep learning for lung cancer detection and segmentation in CT scans. Medical Image Analysis, 67, 101839. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Litjens, G., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. Little, R. J. A., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). Wiley. Mitchell, T. M. (1997). Machine Learning. McGraw-Hill. Petrelli, M. (2023). Machine Learning for Earth Sciences: Using Python to Solve Geological Problems. Springer. https://doi.org/10.1007/978-3-031-09832-5 Roth, H. R., Lu, L., Liu, J., Yao, J., Seff, A., & Summers, R. M. (2015). Improving computer-aided detection using convolutional neural networks and random view aggregation. IEEE Transactions on Medical Imaging, 35(5), 1170–1181. Russell, R. (2018). Machine Learning: Guía paso a paso para implementar algoritmos de Machine Learning con Python. Ediciones Kindle. Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517. Setio, A. A. A., et al. (2016). Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Transactions on Medical Imaging, 35(5), 1160–1169. Shen, W., Zhou, M., Yang, F., Yang, C., & Tian, J. (2017). Multi-scale convolutional neural networks for lung nodule classification. Information Processing in Medical Imaging, 10265, 588–599. Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. Wang, S., Zhang, H., Liu, Z., & Liu, Y. (2022). A novel deep learning method to predict lung cancer long-term survival with biological knowledge incorporated gene expression images and clinical data. Frontiers in Genetics, 13, 800853. https://doi.org/10.3389/fgene.2022.800853 Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: An overview and application in radiology. Insights into Imaging, 9(4), 611–629. Yoo, Y., et al. (2020). Deep learning in medical image analysis: A review. Medical Image Analysis, 60, 101857. Batista, G. E., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5-6), 519-533. van der Laan, M. J., Hubbard, A. E., & Jewell, N. P. (2004). Estimation of treatment effects in randomized trials with incomplete data. Statistical Science, 19(3), 308–326. Steyerberg, E. W., et al. (2010). Clinical prediction models: a practical approach to development, validation, and updating. Springer Science & Business Media. Reeves, A. P., et al. (2007). On measuring the change in size of pulmonary nodules. IEEE Transactions on Medical Imaging, 26(4), 504–517. Ypsilantis, P. P., & Montana, G. (2016). Recurrent convolutional networks for pulmonary nodule detection in CT imaging. arXiv preprint arXiv:1607.06416. Rubin, D. B., et al. (2007). The design and analysis of nonrandomized studies to estimate causal effects. Journal of the American Statistical Association, 102(478), 1073–1085. Zhang, Z., et al. (2022). Missing data in clinical studies: issues and solutions. Annals of Translational Medicine, 10(10), 543. van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). CRC press. Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25-36. | |
| dc.rights.acceso | Abierto (Texto Completo) | |
| dc.rights.accessrights | OpenAccess | |
| dc.subject | Cáncer de pulmón | |
| dc.subject | LIDC-IDRI, | |
| dc.subject | Aprendizaje automático | |
| dc.subject | Imágenes medicas | |
| dc.subject | Aprendizaje profundo | |
| dc.subject | Análisis de datos | |
| dc.subject | Redes neuronales convolucionales | |
| dc.subject | Metodología | |
| dc.subject | DICOM | |
| dc.subject.keyword | Lung cancer | |
| dc.subject.keyword | LIDC-IDRI, | |
| dc.subject.keyword | Machine learning | |
| dc.subject.keyword | Medical images | |
| dc.subject.keyword | Deep learning | |
| dc.subject.keyword | Data analysis | |
| dc.subject.keyword | Convolutional neural networks | |
| dc.subject.keyword | Methodology | |
| dc.subject.keyword | DICOM | |
| dc.subject.lemb | Maestría en Ciencias de la Información y las Comunicaciones -- Tesis y disertaciones académicas | |
| dc.title | Metodología machine learning para el tratamiento de imágenes computarizadas en pacientes con cancer de pulmon | |
| dc.title.titleenglish | Machine learning methodology for the processing of computerized images in lung cancer patients | |
| dc.type | masterThesis | |
| dc.type.coar | http://purl.org/coar/resource_type/c_7a1f | |
| dc.type.degree | Investigación-Innovación | |
| dc.type.driver | info:eu-repo/semantics/bachelorThesis |
Archivos
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 7 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
