Research

Variant to Disease
Variant to Function
Variant to Care
Arrow 1 to 2
Arrow 2 to 3
Arrow 3 to 1
Our research strives to catalyze repeated traversal of the 'genomic medicine cycle,' driving the discovery, biological understanding, and clinical translation of the genetic underpinnings of human disease:
Variant to Disease
Which genetic variants are associated with disease? We address this question by aggregating and analyzing large-scale genomic data from deeply phenotyped disease cohorts and population biobanks.
Variant to Function
How do genetic variants contribute to disease? We develop functionally informed computational frameworks, leveraging increasingly rich multimodal biological data & evolving techniques capable of integrating such data.
Variant to Care
What can we return to participants about their genomic data? Drawing on our discoveries, we work synergistically with clinical partners to inform more accurate and personalized strategies for disease diagnosis and care.

Variant to Disease

Millions of variants are identified in an individual’s genome through genetic testing. To pinpoint the tiny fraction that may contribute to a particular disease phenotype, we aggregate genomic data from tens of thousands of individuals with the disease (cases) and compare them to tens of thousands of individuals without that disease (controls). Through case-control association analyses, we identify genomic variants or regions that are (statistically) most likely to influence disease risk. These analyses operate across multiple biological and analytical scales, including the exome, genome, proteome, interactome, and population phenome. We draw from public biobanks as well as assemble deeply phenotyped cohorts to enable the discovery of both common and rare risk variants. We develop new methods, build scalable analytical pipelines, and share foundational resources to catalyze genomic discovery and downstream translational research.

Relevant Publications:

Variant to Function

As we move from identifying which genetic variants are associated with disease to understanding how they contribute to disease risk, our current best insights come from protein-truncating variants (PTVs), which presumably act through loss of function by truncating the full-length gene product. However, PTVs represent only a small fraction – typically less than 5% – of the protein-altering variants uncovered by exome sequencing. This proportion becomes exceedingly smaller when considering that the exome itself constitutes less than 2% of the human genome. In clinical settings, this limitation translates into missed opportunities: despite the availability of genome-wide testing, we are able to return actionable genetic findings for only a small fraction of patients. We aim to develop scalable and interpretable computational frameworks to unlock the vast “dark matter” of the genome beyond PTVs:

95% of the exome: missense variants. Unlike PTVs, missense variants cause only a single amino acid change in the protein and often exhibit variable, context-dependent effects that elude detection by conventional frequency-based genetic association testing. Our approach to capturing this context dependency is grounded in systems biology, emphasizing the interactions among molecular components rather than isolated individual features. We define “interactions” broadly – building on our prior work evaluating missense variants through protein-protein interactions, and extending it to more complex relationships beyond direct physical contact. Harnessing the growing depth of multi-omics data that illuminate complex molecular networks and cellular context, we build multimodal, integrative frameworks to simultaneously address which missense variants contribute to disease and how they do so.

Relevant Publications:

98% of the genome: non-coding variants. Scientists have long recognized that many disease-causing variants reside in genomic regions that do not encode proteins. Yet, systematically distinguishing pathogenic from benign non-coding variants remains difficult, as we lack a clear picture of which stretches of the non-coding genome are essential for human health. Measuring the essentiality – or "constraint" – of protein-coding genes has already transformed disease gene discovery and clinical variant interpretation. We are at the forefront of efforts to extend such a framework to the non-coding genome. Building on our initial progress within the Genome Aggregation Database (gnomAD) consortium, we continue to develop innovative methods that leverage broader genomic context to better characterize the uncharted “dark” genome.

Relevant Publications:

Variant to Care

Ultimately, the genomic medicine cycle is fulfilled when new discoveries deliver tangible benefits back to patients through improved care. Achieving this leap often requires efforts beyond any single research group. We have taken leadership roles in several international consortia, bringing together research institutions and medical centers, and harmonizing clinical and genomic data across deeply phenotyped cohorts worldwide. We implement a collaborative model in which data are shared, jointly analyzed, and returned to participating sites, delivering insights and resources that would be difficult to obtain in isolation. Our flagship projects with the Epi25 Collaborative and the International League Against Epilepsy (ILAE) Consortium have enabled the largest genetic analyses of epilepsy to date, uncovering a broad spectrum of genetic variation underlying the complex genetic architecture of the disorder. Through these efforts, we are beginning to connect and provide clinicians with actionable insights directly relevant to their patients. In partnership with UChicago Medicine and the Data for the Common Good (D4CG), we are now extending this collaborative model to integrate genomic data with additional modalities – including imaging, electrophysiology, and electronic health records – and to develop patient-centered, multimodal approaches for more precise mapping of an individual’s genotype to phenotype.

Relevant Publications:

Resources: