Evènement pour le groupe BioInformatique et Visualisation

Date 2015-03-26  11:00-12:00
TitreA hierarchical model with latent variables to represent high-dimensional and highly correlated data ; application to the modeling of genotypic data to help detect genetic risk factors of diseases 
RésuméThe first part of the talk will be dedicated to the general presentation of the FLTM model. We have designed a novel class of probabilistic graphical models, called forest of latent tree models (FLTM). We have developed a scalable algorithm able to construct such FLTMs. The construction is a bottom up process based on hierarchical clustering. The (correlated) variables grouped in the same cluster are subsumed by a latent variable. This latent variable is created through the learning of a simplistic Bayesian network. Our construction algorithm relaxes the constraints imposed by alternative proposals, based on Bayesian networks, such as binarity of the tree topology or binarity of the cardinality of the latent variables. However our algorithm can still be run on a laptop (less than one day for 5 000 observations each described by 100 000 variables). Besides, as regards the modeling of genetic data at the genome scale, our construction algorithm allows long-range dependences. Thus our algorithm is more faithful to the underlying biological reality. The second part of the talk will briefly focus on the construction algorithm. We have studied the impact of the choice of the clustering method plugged in the construction algorithm. So far, we have used two versions of the CAST algorithm, DBSCAN and the Louvain method. We are currently writing a novel version of our algorithm, to decrease the time and space complexities. The first motivation for the design of the FLTM model was to implement dimension reduction, thanks to the multi-scale approach, in further downstream analyses. Visualization, multidimensional partitioning and data mining are examples of such downstream analyses. Besides visualization of the complex dependences intrinsic to genetic data (the so-called linkage disequilibrium), another motivation was to implement multilocus genome-wide associations studies (GWASs). The third part of the talk will focus on the use of the FLTM model to further implement GWAs, when applied to genetic data. The purpose of GWASs is to identify dependences - or associations - between genotypic data and a studied disease. GWASs rely on genetic markers. Single Nucleotide Polymorphim (SNP) is one of the most popular classes of genetic markers. In standard (single-SNP) GWASs, linkage disequilibrium is not fully exploited. In contrast, the FLTM model allows to implement a multilocus GWAS where latent variables (that is groups of correlated SNPs) are tested against the disease. We will first present a validation study about the use of FLTMs in GWASs, based on both simulated and real data. First results on real data will then be shown. On-going works related to GWASs concern the integration of multiple heterogeneous sources of data (literature, databases available on the Internet, experimental data (omics)). The objective is to increase the power of the GWAS. For a more generic use, we are now developing a visualization tool based on Tulip software. In the specific case of the modeling of genetic data, the end-user will be able to navigate through the FLTM, zoom in narrower regions of the genome and have feedback about some downstream analysis (e.g. a GWAS) thanks to various layers of annotations. Finally, we are also in the process of parallelizing the code of the construction algorithm. As regards genetic analysis, our collaborators are the INSERM at Nantes and the INRA at Angers. Much effort is devoted to the use of the FLTMs for data mining and visual exploration of genetic data in the context of the ANR project SAMOGWAS (Numerical Models). However, more generically, the FLTM model fits the purposes of compression of spatially highly correlated data (e.g. image processing) and of abstraction. 
OrateurChristine Sinoquet 
UrlLINA, Université de Nantes 

Aucun document lié à cet événement.

Retour à l'index