Pinpointing disease-causing regulatory genetic variants by multi-omics and machine learning
Determining the genetic cause of rare disorders is crucial for the affected families, enabling genetic testing among relatives and providing a rationale for therapies. However, for most of the rare disease patients undergoing DNA sequencing, which variant is pathogenic remains unclear. I will present a blend of multi-omics and machine learning approaches to address this problem.
We and others have shown that sequencing RNA, additional to the DNA of patients, boosts the diagnosis rate of rare disease patients by revealing pathogenic gene regulatory defects that still cannot be predicted from genotype. Novel algorithms are needed to realize the potential of RNA-sequencing and other omics in revealing the causes of rare diseases. We formalize this problem as an outlier detection task, with the twist that here, outliers are the signal of interest and not artifacts to exclude from the data. I will present OUTRIDER , a method based on a denoising auto-encoder, that allows detecting expression outliers controlling for technical and biological confounding effects.
Having identified a pathogenic gene regulatory defect, the last piece of the puzzle is the genetic variant causing it. Machine learning applied to high-throughput genomics technologies is making drastic progresses in unraveling how every step of gene expression is genetically encoded. However, lack of standardization of such models has hampered their impact in medical research. We are co-developing Kipoi, a collaborative initiative to define standards and foster sharing and re-use of trained machine learning models in genomics . Our repository (kipoi.org) contains over 2,000 trained models of transcriptional and post-transcriptional mechanisms. Using a modular modeling approach leveraging Kipoi, we built MMSplice , the first ranked splicing effect predictor at the CAGI5 challenge (Critical Assessment of Genome Interpretation).
 Bretchmann et al. OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data, AJHG, 2018
 Avsec et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics, Nature biotechnol., 2019
 Cheng et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing, Genome Biology, 2019.
Assistant Professor for Computational Biology at TUM (Technische Universität München)
Technische Universität München - Dpt Informatics
Directeur CBIO, Enseignant Chercheur Mines ParisTech
Domain 3 - U900 - CBIO - Bioinformatics, Biostatistics Epidemiology and Computational Systems