Evènement pour le groupe BioInformatique et Visualisation

Date 2012-11-19  14:00-15:30
TitreEfficient data structures for large-scale genome sequencing data. 
RésuméTo assemble and/or detect variants in large-scale genomics data, an algorithmic ingredient of choice is the de Bruijn graph. On large datasets, analyzing such graph requires a significant amount of system memory. This talk deals with ultra-low memory representation of de Bruijn graphs. We propose a new encoding, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 Gb of memory in 23 hours. The article (and in the near future, the public source code) can be found on the webpage of Minia: http://minia.genouest.org/ 
Lieusalle de réunion Sud, CGFB 
OrateurGuillaume Rizk 

Aucun document lié à cet événement.

Retour à l'index