Modelo de procesamiento paralelo en arquitecturas heterogéneas para la construcción de grafos en el ensamblaje de-Novo de genomas
Fecha
Autores
Autor corporativo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Compartir
Director
Altmetric
Resumen
In the present project, a massive parallel processing model on heterogeneous architectures was designed to accelerate and facilitate the processing of k-mers in the tasks related to the construction of graphs in the de-novo genomic assembly. The model includes 3 main contributions: a new data structure called CISK to represent in an indexed and compact way the super k-mers and their minimizers and two massive parallelization patterns, one to obtain the canonical m-mers of a set of reads and another to perform the search for super k-mers based on seeds type minimizer. During the project, 4 evaluation processes were performed: - a preliminary evaluation that allowed determining that the de-novo assembly process is the most complex stage and with the highest computational requirements of a typical workflow of genomic and transcriptomic reads, - a second evaluation that showed that the tasks associated with the treatment of k-mers are processes that represent bottlenecks due to their high demand of memory, - a third evaluation that allowed select the disk partitioning techniques based on super k-mers using seeds type minimizer as base methodology for the design of massive parallel computing model to process k-mers on heterogeneous platforms, - and finally an evaluation of the proposed model that evidenced its advantages obtaining a speed-up of 4.31x on similar processes in highly recognized k-mers counting tools that perform parallelization in CPU. The model implementation code is available in the repository https://github.com/BioinfUD/K-mersCL. This implementation consists of a host code and two kernels in OpenCL, one for canonical minimizer and another for signature.