top of page



Wenjian Bi, Wei Zhou, Peipei Zhang, Yaoyao Sun, Weihua Yue,  Seunggeun Lee  (2023) Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank American Journal of Human Genetics, 110 (5), 762-773

Jangho Kim*, Junhyeong Lee*, Kisung Nam, Seunggeun Lee (2023) Genome-wide study on 72,298 Korean individuals in Korean biobank data for 76 traits identifies hundreds of novel loci, Scientific Reports, 13 (1), 1526


Adrian I. Campos, Shinichi Namba, ..., Seunggeun Lee, ..., Loic Yengo (2023) Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores, Nature Genetics, In press



Wei Zhou, Wenjian Bi, Zhangchen Zhao, Kushal K Dey, Karthik A Jagadeesh, Konrad J Karczewski, Mark J Daly, Benjamin M Neale, Seunggeun Lee (2022) SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests, Nature Genetics, 54 (10), 1466-1469


Kisung Nam*, Jangho Kim*, Seunggeun Lee (2022) Genome-wide study on 72,298 Korean individuals in Korean biobank data for 76 traits identifies hundreds of novel loci, Cell Genomics, 2 (10), 100189
Zhangchen Zhao*, Lars G Fritsche, Jennifer A Smith, Bhramar Mukherjee, Seunggeun Lee (2022) The Construction of Multi-ethnic Polygenic Risk Score Using Transfer Learning, American Journal of Human Genetics, 109 (11), 1998-2008


Yongwen Zhuang*, Brooke N Wolford, Kisung Nam, Wenjian Bi, Wei Zhou, Cristen J Willer, Bhramar Mukherjee, Seunggeun Lee (2022) Incorporating family disease history and controlling case–control imbalance for population-based genetic association studies, Bioinformatics, 38, 4337-4343


Yatong Li*, Seunggeun Lee (2022) Integrating external controls in case–control studies improves power for rare‐variant tests
, Genetic Epidemiology, 46, 145-158

Wei Zhou, ..., Seunggeun Lee, ..., Cristen J. Willer, Mark J. Daly, Benjamin M. Neale (2022) Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease, Cell Genomics,  2 (10), 100192


Masahiro Kanai, et al (2022) Meta-analysis fine-mapping is often miscalibrated at single-variant resolution, Cell Genomics,  2 (10), 100210

Rounak Dey, Wei Zhou, et al (2022) Efficient and accurate frailty model approach for genome-wide survival association analysis in large-scale biobanks, Nature Communications, 13 (1), 1-13
Vivek Sriram, Manu Shivakumar, Sang-Hyuk Jung, Yonghyun Nam, Lisa Bang, Anurag Verma, Seunggeun Lee, Eun Kyung Choe, Dokyoon Kim (2022) NETMAGE: A human disease phenotype map generator for the network-based visualization of phenome-wide association study results, GigaScience, 11
Jiacong Du, Lauren J Beesley, Seunggeun Lee, Xiang Zhou, Walter Dempsey, Bhramar Mukherjee (2022) Optimal diagnostic test allocation strategy during the COVID‐19 pandemic and beyond, Statistics in medicine, 41, 310-327
Jingchunzi Shi*, Michael Boehnke, Seunggeun Lee (2021) Trans-ethnic meta-analysis of rare variants in sequencing association studies, Biostatistics, 22, 706-722


Wenjian Bi*, Wei Zhou, Rounak Dey, Bhramar Mukherjee, Joshua N Sampson, Seunggeun Lee (2021) Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes, American Journal of Human Genetics, 108, 5-6

Diptavo Dutta*, Peter VandeHaar, Lars G Fritsche, Sebastian Zöllner, Michael Boehnke, Laura J Scott, Seunggeun Lee (2021) A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank, American Journal of Human Genetics, 108, 669-681

Yatong Li, Seunggeun Lee (2021) Novel score test to increase power in association test by integrating external controls, Genetic Epidemiology, 45, 293-304

Wenjian Bi*, Seunggeun Lee (2021) Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data, Frontiers in Genetics, 12, 682638

Zhangchen Zhao*, Stephen Salerno, Xu Shi,  Seunggeun Lee, Bhramar Mukherjee, Lars G Fritsche (2021) Understanding the Patterns of Serological Testing for COVID-19 Pre-and Post-Vaccination Rollout in Michigan, Journal of clinical medicine, 10, 4341
Lars G Fritsche, Ying Ma, Daiwei Zhang, Maxwell Salvatore, Seunggeun Lee, Xiang Zhou, Bhramar Mukherjee (2021) On cross-ancestry cancer polygenic risk scores, PLoS genetics, 17, e1009670
Seungjin Ryu, Jeehae Han, Trina M Norden‐Krichmar, Quanwei Zhang, Seunggeun Lee, Zhengdong Zhang, Gil Atzmon, Laura J Niedernhofer, Paul D Robbins, Nir Barzilai, Nicholas J Schork, Yousin Suh (2021) Genetic signature of human longevity in PKC and NF‐kB signaling, Aging cell, 20, e13362


Zhou, W.*#, Zhao, Z.*#, Nielsen, J.B, Fritsche, L.G., LeFaive, J., Gagliano Taliun, S.A., Bi, W., Gabrielsen, M.E., Daly, M.J., Neale, B.M., Hveem, K., Abecasis, G.R., Willer, C.J., Lee S. (2020) Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts, Nature Genetics, In press, (preprint doi:

# equal contribution

Bi, W.*, Fritsche, L.G., Mukherjee, B., Kim, S., Lee, S. (2020) A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank, American Journal of Human Genetics, In press (

Zhao, Z.*, Bi, W.*, Zhou, W., VandeHaar, P., Fritsche, L.G., Lee, S. (2020) UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test, American Journal of Human Genetics, 106, 3-12

Shi, J.*, Boehnke, M., Lee, S. (2020) Trans-ethnic meta-analysis of rare variants in sequencing association studies, 
Biostatistics, in press, doi:

Zhang, D.*,  Dey, R., Lee, S. (2020), Fast and robust ancestry prediction using principal component analysis, Bioinformatics, in press, preprint:

Taliun, S.A., VandeHaar, P., [... including Lee, S., ...], Abecasis, G.R. (2020),  Exploring and visualizing large-scale genetic associations by using PheWeb, Nature Genetics, 52, 550–552

Dutta, D.*, Brummett,C.M., [... ], Lee, S., Clauw, D.J., Scott, L.J. (2020), Heritability of the fibromyalgia phenotype varies by age, Arthritis & Rheumatology, 72, 815-823


Bi, W.*, Zhao, Z.*, Dey, R., Fritsche, L.G., Mukherjee, B., Lee, S. (2019) A Novel Method for Genome-Wide Scale Phenome-Wide G×E Analysis and its Application to UK Biobank, American Journal of Human Genetics, 105, 1182-1192

Dey, R.*, Nielsen, J.B., Fritsche, L.G., Zhou, W., Zhu, H., Willer, C.J.,  Lee, S. (2019) Robust Meta-Analysis of Biobank-based Genome-wide Association Studies with Unbalanced Binary Phenotypes, Genetic Epidemiology, 43, 462-476

Dey, R.*, and Lee, S. (2019) Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model, Journal of Multivariate Analysis, 173, 145-164 

Dutta, D.*, Gagliano Taliun, S.A., Weinstock, J.S., Zawistowski, M., Sidore, C., Fritsche, L.G., Cucca, F., Schlessinger, D., Abecasis, G.R., Brummett, C.M., Lee, S. (2019) Meta-MultiSKAT: Multiple phenotype meta-analysis for region-based association test, Genetic Epidemiology, 43, 800-814 

Dutta, D.*, Brummett, C.M., Moser, S.E., Fritsche, L.G., Tsodikov, A., Lee, S., Clauw, D.J., Scott, L.J. (2019) Heritability of the fibromyalgia phenotype varies by age, Arthritis & Rheumatology, in press



Zhou, W.*, Nielsen, J.B., Fritsche, L.G., Dey, R., Elvestad, M.B., Wolford, B.N., LeFaive, J., VandeHaar, P., Gagliano, S.A., Gifford, A., Bastarache, L.A., Wei, W-Q, Denny, J.C., Lin, M., Hveem, K., Kang, H.M., Abecasis, G.R., Willer, C.J.#, Lee S.#.  (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies, Nature Genetics, 50, 1335-1341

# equal contribution

Nielsen, J.B., Thorolfsdottir, R.B., [... including Lee, S., ...], Willer, C.J. (2018) Biobank-driven genomic discovery yields new insight into atrial fibrillation biologyNature Genetics, 50, 1234-1239 

Dutta, D.*, Scott, L., Boehnke, M., Lee, S.  (2018) Multi-SKAT: General framework to test multiple phenotype associations of rare variants,  Genetic Epidemiology, 43(1), 4-23, (preprint: biorxiv).



Chen, H, Huffman, J.E., [... including Lee, S., ...], Lin, X (2018) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies, AJHG, in press

Yu, Y., Xia, Lu, Lee, S., Zhou, X., Stringham, H.M., Boehnke, M., Mukherjee, B. (2018) Subset-Based Analysis using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes, Human Heredity, in press




Dey, R.*, Schmidt, E.M., Abecasis, G.R., Lee, S. (2017) A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS, American Journal of Human Genetics, 101, 37-49.


Lee, S., Kim, S., Fuchsberger, C.  (2017) Improving power for rare variant tests by integrating external controls, Genetic Epidemiology, 41, 610-619.

Lee, S., Sun, W., Wright, F.A., Zou, F. (2017) An improved and explicit surrogate variable analysis procedure by coefficient adjustment, Biometrika, 104, 303-316.


He, Z.*, Zhang, M., Lee, S., Smith, J.A., Kardia, S., Diez Roux, A.V., Mukherjee, B. (2017) Set-based tests for gene-environment interaction in longitudinal studies,  Journal of the American Statistical Association, 101, 340-352.


He, Z.*, Lee, S., Zhang, M., Smith, J.A., Guo, X., Palmas, W., Kardia, S., Iuliana, I., Mukherjee, B. (2017) Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA), Genetic Epidemiology, 41, 801-810.


He, Z.,  Xu, B., Lee, S., Ionita-Laza, I. (2017) Unified sequence-based association tests allowing for multiple functional annotation scores, and applications to meta-analysis of noncoding variation in Metabochip data, American Journal of Human Genetics,  41, 801-810.


Liu, G.,  Mukherjee, B., Lee, S.,  Lee, A.W,  Wu, A.H.,  Bandera, E.V.,  Jensen, A.,  Rossing, M.A,  Moysich, K.B., Chang-Claude, J.,  Doherty, J.,  Gentry-Maharaj, A.,  Kiemeney, L.,  Modugno, F.,  Massuger, L.,  Goode, E.L.,   Fridley, B., Terry, K.L., Cramer, D.W.,  Anton-Culver, H.,  Ziogas, A., Tyrer, J.P., Schildkraut, J.M., Kjaer, S.K., Webb, P.M., Ness, R.B., Pike, M.C.,  Menon,  U.,  Berchuck, A., Pharoah, P.D.,  Risch, H., Pearce, C.L, the Ovarian Cancer Association Consortium (2017) Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence, American Journal of Epidemiology, 187, 366-377.


Gauderman, W.Z.,  Mukherjee, B.,  Aschard, H.,  Hsu, L.,  Lewinger, J.P., Patel, C.J., Witte, J.S., Amos, C.,  Tai, C.,  Conti, D., Torgerson, D.G., Lee, S.,  Chatterjee, N. (2017) Update on the State of the Science for Analytical Methods for Gene-Environment Interactions (GxE), American Journal of Epidemiology, 186, 762-770.


Kim, D., Basile, A., Bang, L., Lee, S.,  Ritchie, M., Saykin, A., Nho, K. (2017) Knowledge-driven binning approach for rare variant association analysis: Application to neuroimaging biomarkers in Alzheimer's disease,  BMC Medical Informatics and Decision Making, DOI:10.1186/s12911-017-0454-0.




Shi, J.* and Lee, S. (2016) A novel random effect model for GWAS meta-analysis and its application to trans-ethnic meta-analysis, Biometrics, 72, 945-54.


Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2016) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies, Biostatistics, 17, 1-15.

Supplementary Materials: [download]


X Wang, Z Zhang, N Morris, T Cai, S Lee,  C Wang, TW Yu, CA Walsh, X Lin. (2016) Rare variant association test in family-based sequencing studies, Briefings in Bioinformatics,  bbw083.


Lin, X., Lee, S., Wu, M.C., Wang, C., Chen, H., Li, Z., Lin, X. (2016) Test for rare variants by environment interactions in sequencing association studies, Biometrics, 17, 1-15.


Mensah-Ablorh, A., Lindstrom, S., Haiman, C.A., Henderson, B.E., Marchand, L.L, Lee, S., Stram, D.O., Eliassen, H., Price, A., Kraft, P. (2016) Meta-analysis of rare variant association tests in multi-ethnic populations, Genetic Epidemiology, 40, 57-65.


Ware EB, Smith JA, Mukherjee B, Lee S, Kardia SL, Diez-Roux AV (2015) Applying Novel Methods for Assessing Individual- and Neighborhood-Level Social and Psychosocial Environment Interactions with Genetic Factors in the Prediction of Depressive Symptoms in the Multi-Ethnic Study of Atherosclerosis, Behav Genet., 46, 89-99.




Ma, C.*, Boehnke, M., Lee, S. and the GoT2D investigators (2015) Evaluating the calibration and power of three gene-based association tests for the X chromosome, Genetic Epidemiology, 39, 499-508.


Urrutia, E., Lee, S., Maity, A., Zhao, N., Shen, J., Li, Y., Wu, M.C. (2015) Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT), Statistics and Its Interfaces, 8, 495-505.


He, Z.*, Zhang, M., Lee, S., Smith, J.A., Guo, X., Palmas, W., Kardia, S., Diez Roux, A.V., Mukherjee, B. (2015) Set based tests for genetic association in longitudinal studies, Biometrics, 71(3):606-615.  


He, Z.*, Payne, E.K., Mukherjee, B., Lee, S., Smith, J.A., Ware, E.B., Sánchez, B.N., Seeman, T.E., Kardia, S., Diez Roux, A.V. (2015) Association between stress response genes and features of diurnal cortisol curves in the Multi-Ethnic Study of Atherosclerosis: a new multi-phenotype approach for gene-based association tests, PLoS One, 10(5):e0126637.  




Lee, S., Abecasis, G.R., Boehnke, M., Lin, X. (2014). Rare-Variant Association Analysis: Study Designs and Statistical Tests. American Journal of Human Genetics, 95, 5-23.  


Lee, S., Zou, F. and Wright, F.A. (2014). Convergence of sample eigenvalues, eigenvectors and PC scores for ultra-high dimensional data. Biometrika, 101, 484-490.  


Mukherjee, B., Chen, Y-H, Ko, Y-A, He, Z., Lee, S., Zhang, M., Park, SK. (2014). Statistical strategies for modeling gene-environment interactions in longitudinal cohort studies. Statistical Approaches to Gene-Environment Interactions for Complex Phenotypes, Cambridge, MA: MIT Press, in press.  




Lee, S., Teslovich, T., Boehnke, M., Lin, X. (2013). General framework for meta-analysis of rare variants in sequencing association studies. American Journal of Human Genetics, 93, 42-53. 


Ionita-Laza, I.#, Lee, S.#, Makarov, V., Buxbaum, J. Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. American Journal of Human Genetics, 92, 841-853. 

#Joint first author.


Wang, X., Lee, S., Zhu, X., Redline, S., Lin, X. (2013). GEE-based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genetic Epidemiology, 37, 778-786. 


Lin, X., Lee, S., Christiani, D. and Lin, X. (2013). Test for interactions between a Gene/SNP-set and Environment/Treatment in generalized linear models. Biostatistics, 14(4): 667–681.


Wu, M.C., Maity, A., Lee, S., Simmons, E.M., Molldrem, J.J. and Armistead, P.M. (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genetic Epidemiology, 37, 267-275.


Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. Lin, X. (2013). Family-based association tests for sequence data, and comparisons with population-based association tests. European Journal of Human Geneticsdoi: 10.1038/ejhg.2012.308.


Barnett, I., Lee, S. and Lin, X. (2013). Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies. Genetic Epidemiology, 37, 142-151.




Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J. Nickerson, D.A., NHLBI GO Exome Sequencing ProjectESP Lung Project Team, Christiani, D.C., Wurfel, M.M. and Lin, X. (2012). Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.


Lee, S., Wu, M. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762-775.




Wu, M.#, Lee, S.#, Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics, 89, 82-93. 

#Joint first author.


Lee, S., Wright, F.A. and Zou, F. (2011). Control of population stratification by correlation-selected principal components. Biometrics, 67, 967-974.


Collaborative Cross Consortium (2011). The Genome Architecture of the Collaborative Cross Mouse Genetic Reference Population, Genetics, 190, 389-401.


Sun, W., Lee, S., Zhabotynsky, V., Zou, F., Wright, F.A., Crowley, J.J., Yun, Z. , Buus, R., Miller, D., Wang, J., McMillan, L., de Villena, F. and Sullivan, P.F. (2011). Transcriptome atlases of mouse brain reveals differential expression across brain regions and genetic backgroudsG3: Genes, Genomes, Genetics, 2, 203-211.


Wright, F.A., Strug, L.J., Doshi, V.K., Commander, C.W., Blackman, S.M., Sun, L., Berthiaume, Y., Cutler, D., Cojocaru, A., Collaco, J.M, Corey, M., Dorfman, R., Goddard, K., Green, D., Kent Jr, J.W., Ethan, Lange, Lee, S., Li, W., Luo, J., Mayhew, G., Naughton, K., Pace, R., Pare, P., Rommens, J. Sanfrod, A., Stonebraker, J.R., Sun, W., Taylor, C., Vanscoy, L.L., Zou, F., Blangero, J., Zielenski, J., ONeal, W.K., Drumm, M.L., Durie1, P.R., Knowles, M.R., Cutting, G.R. (2011). Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2. Nature Genetics, 43, 539-546.


Li, W., Sun, L., Corey, M., Zou, F., Lee, S., Cojocaru, A.L., Taylor, C., Blackman, S.M., Stephenson, A., Sandford, A.J., Dorfman, R., Drumm, M.L., Cutting, G.R., Knowles, M.R., Durie, P., Wright F.A., and Strug L.J. (2011). Understanding the population structure of North American patients with cystic fibrosis. Clinical Genetics, 79, 136-46.




Lee, S., Zou, F. and Wright, F.A. (2010). Convergence and prediction of principal component scores in high dimensional settings. Annals of Statistics, 38, 3605-3629.


Zou, F., Lee, S., Knowles, M.R. and Wright, F.A. (2010). Quantification of Population Structure Using Correlated SNPs by Shrinkage Principal Components. Human Heredity, 70, 9-22.


Zou, F., Huang, H., Lee, S. and Hoeschele, I. (2010). Nonparametric bayesian variable selection with applications to multiple quantitative trait loci mapping with epistasis and gene-environment interaction. Genetics, 186, 385-394.


Before 2010


Lee, S., Sullivan, P.F., Zou, F. and Wright, F.A. (2008). Comment on a simple and improved correction for population stratification. American Journal of Human Genetics, 82, 524-526.


Jeong, J., Choi, M., Cho, Y.,  Lee, S., Oh, J., Park, J., Cho, Y., Lee, I., Kim, S., Han, S., Choi, K. and Chung, I. (2008). Chronic gastrointestinal symptoms and quality of life in the Korean population. World Journal of Gastroenterology, 14(41), 6388-6394.


Sullivan, P.F., Lin, D., Tzeng, J-Y, E van den Oord, Perkins, D., Stroup, T.S., Wagner, M., Lee, S., Wright, F.A., Zou, F., Liu, W., Downing, A.M., Lieberman, J. and Close S.L. (2008). Genomewide association for schizophrenia in the CATIE study: results of stage 1. Molecular Psychiatry, 13(6), 570-84.


* doctoral student/research assistant under my supervision

bottom of page