top of page


Preprint and submitted

Shi, J.*, Schmidt, E.M, Abecasis, G.R., Lee, S. (2018) A Score Test for Jointly Testing the Fixed and Random Effects in Generalized Linear Mixed Models, submitted.

D Dutta*, P VandeHaar, L Scott, M Boehnke, S Lee (2019) A powerful subset-based gene-set analysis method identifies novel associations and improves interpretation in UK Biobank, bioRxiv; doi:

Li, Y., Lee, S. (2020) Novel score test to increase power in association test by integrating external controls, submitted


Zhou, W.*#, Zhao, Z.*#, Nielsen, J.B, Fritsche, L.G., LeFaive, J., Gagliano Taliun, S.A., Bi, W., Gabrielsen, M.E., Daly, M.J., Neale, B.M., Hveem, K., Abecasis, G.R., Willer, C.J., Lee S. (2020) Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts, Nature Genetics, In press, (preprint doi:

# equal contribution

Bi, W.*, Fritsche, L.G., Mukherjee, B., Kim, S., Lee, S. (2020) A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank, American Journal of Human Genetics, In press (

Zhao, Z.*, Bi, W.*, Zhou, W., VandeHaar, P., Fritsche, L.G., Lee, S. (2020) UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test, American Journal of Human Genetics, 106, 3-12

Shi, J.*, Boehnke, M., Lee, S. (2020) Trans-ethnic meta-analysis of rare variants in sequencing association studies, 
Biostatistics, in press, doi:

Zhang, D.*,  Dey, R., Lee, S. (2020), Fast and robust ancestry prediction using principal component analysis, Bioinformatics, in press, preprint:

Taliun, S.A., VandeHaar, P., [... including Lee, S., ...], Abecasis, G.R. (2020),  Exploring and visualizing large-scale genetic associations by using PheWeb, Nature Genetics, 52, 550–552

Dutta, D.*, Brummett,C.M., [... ], Lee, S., Clauw, D.J., Scott, L.J. (2020), Heritability of the fibromyalgia phenotype varies by age, Arthritis & Rheumatology, 72, 815-823


Bi, W.*, Zhao, Z.*, Dey, R., Fritsche, L.G., Mukherjee, B., Lee, S. (2019) A Novel Method for Genome-Wide Scale Phenome-Wide G×E Analysis and its Application to UK Biobank, American Journal of Human Genetics, 105, 1182-1192

Dey, R.*, Nielsen, J.B., Fritsche, L.G., Zhou, W., Zhu, H., Willer, C.J.,  Lee, S. (2019) Robust Meta-Analysis of Biobank-based Genome-wide Association Studies with Unbalanced Binary Phenotypes, Genetic Epidemiology, 43, 462-476

Dey, R.*, and Lee, S. (2019) Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model, Journal of Multivariate Analysis, 173, 145-164 

Dutta, D.*, Gagliano Taliun, S.A., Weinstock, J.S., Zawistowski, M., Sidore, C., Fritsche, L.G., Cucca, F., Schlessinger, D., Abecasis, G.R., Brummett, C.M., Lee, S. (2019) Meta-MultiSKAT: Multiple phenotype meta-analysis for region-based association test, Genetic Epidemiology, 43, 800-814 

Dutta, D.*, Brummett, C.M., Moser, S.E., Fritsche, L.G., Tsodikov, A., Lee, S., Clauw, D.J., Scott, L.J. (2019) Heritability of the fibromyalgia phenotype varies by age, Arthritis & Rheumatology, in press



Zhou, W.*, Nielsen, J.B., Fritsche, L.G., Dey, R., Elvestad, M.B., Wolford, B.N., LeFaive, J., VandeHaar, P., Gagliano, S.A., Gifford, A., Bastarache, L.A., Wei, W-Q, Denny, J.C., Lin, M., Hveem, K., Kang, H.M., Abecasis, G.R., Willer, C.J.#, Lee S.#.  (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies, Nature Genetics, 50, 1335-1341

# equal contribution

Nielsen, J.B., Thorolfsdottir, R.B., [... including Lee, S., ...], Willer, C.J. (2018) Biobank-driven genomic discovery yields new insight into atrial fibrillation biologyNature Genetics, 50, 1234-1239 

Dutta, D.*, Scott, L., Boehnke, M., Lee, S.  (2018) Multi-SKAT: General framework to test multiple phenotype associations of rare variants,  Genetic Epidemiology, 43(1), 4-23, (preprint: biorxiv).



Chen, H, Huffman, J.E., [... including Lee, S., ...], Lin, X (2018) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies, AJHG, in press

Yu, Y., Xia, Lu, Lee, S., Zhou, X., Stringham, H.M., Boehnke, M., Mukherjee, B. (2018) Subset-Based Analysis using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes, Human Heredity, in press




Dey, R.*, Schmidt, E.M., Abecasis, G.R., Lee, S. (2017) A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS, American Journal of Human Genetics, 101, 37-49.


Lee, S., Kim, S., Fuchsberger, C.  (2017) Improving power for rare variant tests by integrating external controls, Genetic Epidemiology, 41, 610-619.

Lee, S., Sun, W., Wright, F.A., Zou, F. (2017) An improved and explicit surrogate variable analysis procedure by coefficient adjustment, Biometrika, 104, 303-316.


He, Z.*, Zhang, M., Lee, S., Smith, J.A., Kardia, S., Diez Roux, A.V., Mukherjee, B. (2017) Set-based tests for gene-environment interaction in longitudinal studies,  Journal of the American Statistical Association, 101, 340-352.


He, Z.*, Lee, S., Zhang, M., Smith, J.A., Guo, X., Palmas, W., Kardia, S., Iuliana, I., Mukherjee, B. (2017) Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA), Genetic Epidemiology, 41, 801-810.


He, Z.,  Xu, B., Lee, S., Ionita-Laza, I. (2017) Unified sequence-based association tests allowing for multiple functional annotation scores, and applications to meta-analysis of noncoding variation in Metabochip data, American Journal of Human Genetics,  41, 801-810.


Liu, G.,  Mukherjee, B., Lee, S.,  Lee, A.W,  Wu, A.H.,  Bandera, E.V.,  Jensen, A.,  Rossing, M.A,  Moysich, K.B., Chang-Claude, J.,  Doherty, J.,  Gentry-Maharaj, A.,  Kiemeney, L.,  Modugno, F.,  Massuger, L.,  Goode, E.L.,   Fridley, B., Terry, K.L., Cramer, D.W.,  Anton-Culver, H.,  Ziogas, A., Tyrer, J.P., Schildkraut, J.M., Kjaer, S.K., Webb, P.M., Ness, R.B., Pike, M.C.,  Menon,  U.,  Berchuck, A., Pharoah, P.D.,  Risch, H., Pearce, C.L, the Ovarian Cancer Association Consortium (2017) Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence, American Journal of Epidemiology, 187, 366-377.


Gauderman, W.Z.,  Mukherjee, B.,  Aschard, H.,  Hsu, L.,  Lewinger, J.P., Patel, C.J., Witte, J.S., Amos, C.,  Tai, C.,  Conti, D., Torgerson, D.G., Lee, S.,  Chatterjee, N. (2017) Update on the State of the Science for Analytical Methods for Gene-Environment Interactions (GxE), American Journal of Epidemiology, 186, 762-770.


Kim, D., Basile, A., Bang, L., Lee, S.,  Ritchie, M., Saykin, A., Nho, K. (2017) Knowledge-driven binning approach for rare variant association analysis: Application to neuroimaging biomarkers in Alzheimer's disease,  BMC Medical Informatics and Decision Making, DOI:10.1186/s12911-017-0454-0.




Shi, J.* and Lee, S. (2016) A novel random effect model for GWAS meta-analysis and its application to trans-ethnic meta-analysis, Biometrics, 72, 945-54.


Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2016) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies, Biostatistics, 17, 1-15.

Supplementary Materials: [download]


X Wang, Z Zhang, N Morris, T Cai, S Lee,  C Wang, TW Yu, CA Walsh, X Lin. (2016) Rare variant association test in family-based sequencing studies, Briefings in Bioinformatics,  bbw083.


Lin, X., Lee, S., Wu, M.C., Wang, C., Chen, H., Li, Z., Lin, X. (2016) Test for rare variants by environment interactions in sequencing association studies, Biometrics, 17, 1-15.


Mensah-Ablorh, A., Lindstrom, S., Haiman, C.A., Henderson, B.E., Marchand, L.L, Lee, S., Stram, D.O., Eliassen, H., Price, A., Kraft, P. (2016) Meta-analysis of rare variant association tests in multi-ethnic populations, Genetic Epidemiology, 40, 57-65.


Ware EB, Smith JA, Mukherjee B, Lee S, Kardia SL, Diez-Roux AV (2015) Applying Novel Methods for Assessing Individual- and Neighborhood-Level Social and Psychosocial Environment Interactions with Genetic Factors in the Prediction of Depressive Symptoms in the Multi-Ethnic Study of Atherosclerosis, Behav Genet., 46, 89-99.




Ma, C.*, Boehnke, M., Lee, S. and the GoT2D investigators (2015) Evaluating the calibration and power of three gene-based association tests for the X chromosome, Genetic Epidemiology, 39, 499-508.


Urrutia, E., Lee, S., Maity, A., Zhao, N., Shen, J., Li, Y., Wu, M.C. (2015) Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT), Statistics and Its Interfaces, 8, 495-505.


He, Z.*, Zhang, M., Lee, S., Smith, J.A., Guo, X., Palmas, W., Kardia, S., Diez Roux, A.V., Mukherjee, B. (2015) Set based tests for genetic association in longitudinal studies, Biometrics, 71(3):606-615.  


He, Z.*, Payne, E.K., Mukherjee, B., Lee, S., Smith, J.A., Ware, E.B., Sánchez, B.N., Seeman, T.E., Kardia, S., Diez Roux, A.V. (2015) Association between stress response genes and features of diurnal cortisol curves in the Multi-Ethnic Study of Atherosclerosis: a new multi-phenotype approach for gene-based association tests, PLoS One, 10(5):e0126637.  




Lee, S., Abecasis, G.R., Boehnke, M., Lin, X. (2014). Rare-Variant Association Analysis: Study Designs and Statistical Tests. American Journal of Human Genetics, 95, 5-23.  


Lee, S., Zou, F. and Wright, F.A. (2014). Convergence of sample eigenvalues, eigenvectors and PC scores for ultra-high dimensional data. Biometrika, 101, 484-490.  


Mukherjee, B., Chen, Y-H, Ko, Y-A, He, Z., Lee, S., Zhang, M., Park, SK. (2014). Statistical strategies for modeling gene-environment interactions in longitudinal cohort studies. Statistical Approaches to Gene-Environment Interactions for Complex Phenotypes, Cambridge, MA: MIT Press, in press.  




Lee, S., Teslovich, T., Boehnke, M., Lin, X. (2013). General framework for meta-analysis of rare variants in sequencing association studies. American Journal of Human Genetics, 93, 42-53. 


Ionita-Laza, I.#, Lee, S.#, Makarov, V., Buxbaum, J. Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. American Journal of Human Genetics, 92, 841-853. 

#Joint first author.


Wang, X., Lee, S., Zhu, X., Redline, S., Lin, X. (2013). GEE-based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genetic Epidemiology, 37, 778-786. 


Lin, X., Lee, S., Christiani, D. and Lin, X. (2013). Test for interactions between a Gene/SNP-set and Environment/Treatment in generalized linear models. Biostatistics, 14(4): 667–681.


Wu, M.C., Maity, A., Lee, S., Simmons, E.M., Molldrem, J.J. and Armistead, P.M. (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genetic Epidemiology, 37, 267-275.


Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. Lin, X. (2013). Family-based association tests for sequence data, and comparisons with population-based association tests. European Journal of Human Geneticsdoi: 10.1038/ejhg.2012.308.


Barnett, I., Lee, S. and Lin, X. (2013). Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies. Genetic Epidemiology, 37, 142-151.




Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J. Nickerson, D.A., NHLBI GO Exome Sequencing ProjectESP Lung Project Team, Christiani, D.C., Wurfel, M.M. and Lin, X. (2012). Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.


Lee, S., Wu, M. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762-775.




Wu, M.#, Lee, S.#, Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics, 89, 82-93. 

#Joint first author.


Lee, S., Wright, F.A. and Zou, F. (2011). Control of population stratification by correlation-selected principal components. Biometrics, 67, 967-974.


Collaborative Cross Consortium (2011). The Genome Architecture of the Collaborative Cross Mouse Genetic Reference Population, Genetics, 190, 389-401.


Sun, W., Lee, S., Zhabotynsky, V., Zou, F., Wright, F.A., Crowley, J.J., Yun, Z. , Buus, R., Miller, D., Wang, J., McMillan, L., de Villena, F. and Sullivan, P.F. (2011). Transcriptome atlases of mouse brain reveals differential expression across brain regions and genetic backgroudsG3: Genes, Genomes, Genetics, 2, 203-211.


Wright, F.A., Strug, L.J., Doshi, V.K., Commander, C.W., Blackman, S.M., Sun, L., Berthiaume, Y., Cutler, D., Cojocaru, A., Collaco, J.M, Corey, M., Dorfman, R., Goddard, K., Green, D., Kent Jr, J.W., Ethan, Lange, Lee, S., Li, W., Luo, J., Mayhew, G., Naughton, K., Pace, R., Pare, P., Rommens, J. Sanfrod, A., Stonebraker, J.R., Sun, W., Taylor, C., Vanscoy, L.L., Zou, F., Blangero, J., Zielenski, J., ONeal, W.K., Drumm, M.L., Durie1, P.R., Knowles, M.R., Cutting, G.R. (2011). Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2. Nature Genetics, 43, 539-546.


Li, W., Sun, L., Corey, M., Zou, F., Lee, S., Cojocaru, A.L., Taylor, C., Blackman, S.M., Stephenson, A., Sandford, A.J., Dorfman, R., Drumm, M.L., Cutting, G.R., Knowles, M.R., Durie, P., Wright F.A., and Strug L.J. (2011). Understanding the population structure of North American patients with cystic fibrosis. Clinical Genetics, 79, 136-46.




Lee, S., Zou, F. and Wright, F.A. (2010). Convergence and prediction of principal component scores in high dimensional settings. Annals of Statistics, 38, 3605-3629.


Zou, F., Lee, S., Knowles, M.R. and Wright, F.A. (2010). Quantification of Population Structure Using Correlated SNPs by Shrinkage Principal Components. Human Heredity, 70, 9-22.


Zou, F., Huang, H., Lee, S. and Hoeschele, I. (2010). Nonparametric bayesian variable selection with applications to multiple quantitative trait loci mapping with epistasis and gene-environment interaction. Genetics, 186, 385-394.


Before 2010


Lee, S., Sullivan, P.F., Zou, F. and Wright, F.A. (2008). Comment on a simple and improved correction for population stratification. American Journal of Human Genetics, 82, 524-526.


Jeong, J., Choi, M., Cho, Y.,  Lee, S., Oh, J., Park, J., Cho, Y., Lee, I., Kim, S., Han, S., Choi, K. and Chung, I. (2008). Chronic gastrointestinal symptoms and quality of life in the Korean population. World Journal of Gastroenterology, 14(41), 6388-6394.


Sullivan, P.F., Lin, D., Tzeng, J-Y, E van den Oord, Perkins, D., Stroup, T.S., Wagner, M., Lee, S., Wright, F.A., Zou, F., Liu, W., Downing, A.M., Lieberman, J. and Close S.L. (2008). Genomewide association for schizophrenia in the CATIE study: results of stage 1. Molecular Psychiatry, 13(6), 570-84.


* doctoral student/research assistant under my supervision

bottom of page