Research
Variant to Disease
Millions of variants are identified in an individual’s genome through genetic testing. To pinpoint the tiny fraction that may contribute to a particular disease phenotype, we aggregate genomic data from tens of thousands of individuals with the disease (cases) and compare them to tens of thousands of individuals without that disease (controls). Through case-control association analyses, we identify genomic variants or regions that are (statistically) most likely to influence disease risk. These analyses operate across multiple biological and analytical scales, including the exome, genome, proteome, interactome, and population phenome. We draw from public biobanks as well as assemble deeply phenotyped cohorts to enable the discovery of both common and rare risk variants. We develop new methods, build scalable analytical pipelines, and share foundational resources to catalyze genomic discovery and downstream translational research.
Relevant Publications:
- Epi25 Collaborative (Chen S, first author). Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes. Nature Neuroscience. 2024
- International League Against Epilepsy Consortium on Complex Epilepsies (Chen S*, co-first author). GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nature Genetics. 2023
- Sealock JM, Ivankovic F, Liao C, Chen S, Churchhouse C, … Neale BM. Tutorial: Guidelines for quality filtering of whole-exome and whole-genome sequencing data for population-scale association analyses. Nature Protocols. 2025
Variant to Function
As we move from identifying which genetic variants are associated with disease to understanding how they contribute to disease risk, our current best insights come from protein-truncating variants (PTVs), which presumably act through loss of function by truncating the full-length gene product. However, PTVs represent only a small fraction – typically less than 5% – of the protein-altering variants uncovered by exome sequencing. This proportion becomes exceedingly smaller when considering that the exome itself constitutes less than 2% of the human genome. In clinical settings, this limitation translates into missed opportunities: despite the availability of genome-wide testing, we are able to return actionable genetic findings for only a small fraction of patients. We aim to develop scalable and interpretable computational frameworks to unlock the vast “dark matter” of the genome beyond PTVs:
95% of the exome: missense variants. Unlike PTVs, missense variants cause only a single amino acid change in the protein and often exhibit variable, context-dependent effects that elude detection by conventional frequency-based genetic association testing. Our approach to capturing this context dependency is grounded in systems biology, emphasizing the interactions among molecular components rather than isolated individual features. We define “interactions” broadly – building on our prior work evaluating missense variants through protein-protein interactions, and extending it to more complex relationships beyond direct physical contact. Harnessing the growing depth of multi-omics data that illuminate complex molecular networks and cellular context, we build multimodal, integrative frameworks to simultaneously address which missense variants contribute to disease and how they do so.
Relevant Publications:
- U K, Zhang SM, Pokharel S, Pratyush P, Qaderi F, Liu D, Zhao J, Kc DB, Chen S. Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis. Methods in Molecular Biology. 2025
- Chen S*, Fragoza R*, Klei L, Liu Y, Wang J, Roeder K†, Devlin B†, Yu H†. An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nature Genetics. 2018
- Chen S*, Wang J*, Cicek E, Roeder K†, Yu H†, Devlin B†. De novo missense variants disrupting protein-protein interactions affect risk for autism through gene co-expression and protein networks in neuronal cell types. Molecular Autism. 2020
- Chen S, Liu Y, Zhang Y, Wierbowski S, Lipkin S, Wei X, Yu H. A full-proteome, interaction-specific characterization of mutational hotspots across human cancers. Genome Research. 2022
98% of the genome: non-coding variants. Scientists have long recognized that many disease-causing variants reside in genomic regions that do not encode proteins. Yet, systematically distinguishing pathogenic from benign non-coding variants remains difficult, as we lack a clear picture of which stretches of the non-coding genome are essential for human health. Measuring the essentiality – or "constraint" – of protein-coding genes has already transformed disease gene discovery and clinical variant interpretation. We are at the forefront of efforts to extend such a framework to the non-coding genome. Building on our initial progress within the Genome Aggregation Database (gnomAD) consortium, we continue to develop innovative methods that leverage broader genomic context to better characterize the uncharted “dark” genome.
Relevant Publications:
- Chen S*†, Laurent F*, Goodrich JK, … Genome Aggregation Database Consortium; O'Donnell-Luria A, Solomonson M, Seed C, Martin AR, Talkowski ME, Rehm HL, Daly MJ, Tiao G, Neale BM, MacArthur DG, Karczewski KJ†. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024
Variant to Care
Ultimately, the genomic medicine cycle is fulfilled when new discoveries deliver tangible benefits back to patients through improved care. Achieving this leap often requires efforts beyond any single research group. We have taken leadership roles in several international consortia, bringing together research institutions and medical centers, and harmonizing clinical and genomic data across deeply phenotyped cohorts worldwide. We implement a collaborative model in which data are shared, jointly analyzed, and returned to participating sites, delivering insights and resources that would be difficult to obtain in isolation. Our flagship projects with the Epi25 Collaborative and the International League Against Epilepsy (ILAE) Consortium have enabled the largest genetic analyses of epilepsy to date, uncovering a broad spectrum of genetic variation underlying the complex genetic architecture of the disorder. Through these efforts, we are beginning to connect and provide clinicians with actionable insights directly relevant to their patients. In partnership with UChicago Medicine and the Data for the Common Good (D4CG), we are now extending this collaborative model to integrate genomic data with additional modalities – including imaging, electrophysiology, and electronic health records – and to develop patient-centered, multimodal approaches for more precise mapping of an individual’s genotype to phenotype.
Relevant Publications:
- Epi25 Collaborative (Chen S, first author). Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes. Nature Neuroscience. 2024
- International League Against Epilepsy Consortium on Complex Epilepsies (Chen S*, co-first author). GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nature Genetics. 2023
- Leu C, Avbersek A, Stevelink R, Custodio HM, Chen S, Speed D, … Epi25 Collaborative, EpiPGX Consortium; Sisodiya SM. Genome-wide association meta-analyses of drug-resistant epilepsy. EBioMedicine. 2025
Resources:
- Epi25 WES Browser epi25.broadinstitute.org
- epiGAD www.epigad.org
- D4CG commons.cri.uchicago.edu