Biostatistics Seminars

The Department of Biostatistics at the University of Michigan is proud to invite leading scholars from around the world to visit Ann Arbor to share their expertise, wisdom and experience. All are welcome to attend these seminars, which are held in-person.

Marc Suchard, MD, PhD

Professor of Biostatistics, Biomathematics, & Human Genetics
University of California, Los Angeles

Learn more about this presenter

DATE: Thursday, January 25, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Approximate gradients for inference of partially-observed stochastic processes

ABSTRACT: Bayesian computation remains onerous at scale for inference under many discrete-valued stochastic process-based models, while these models remain ubiquitous across biology and public health. In this talk, we will explore how one can construct computationally efficient approximations to the gradient of the data likelihood under continuous-time Markov chain (CTMC) models with respect to their high-dimensional parameters. CTMCs underpin the most popular models for learning about how rapidly evolving pathogens change over time and space to give rise to human infection, and the dimensionality of these problems are daunting. With these approximations in hand, a new variant of Hamiltonian Monte Carlo (HMC) becomes tractable to explore the parameter posterior, and we bound the approximation error using several small tricks from matrix analysis. This new sampling approach enables the introduction of a novel random-effects CTMC model that captures biological realism previously missing. Applied to the analysis of early SARS-CoV-2 genomes, the random-effects remove bias in inference of the location and timing of the pathogen's split-over into humans, while the approximate-gradient-based machinery is over an order of magnitude more time efficient than conventional sampling approaches.

TOPICS: Bayesian Statistics, Bioinformatics, Computational Statistics, COVID-19, Data Integration, Epidemiology and Public Health, Genetics Research, Infectious Diseases, Statistical Genetics, Stochastic Models


Edward Kennedy, PhD

Associate Professor, Statistics and Data Science
Carnegie Mellon University

Learn more about this presenter

DATE: Thursday, February 08, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Doubly robust capture-recapture methods for estimating population size

ABSTRACT: Estimation of population size using incomplete lists has a long history across many biological and social sciences. For example, human rights groups often construct partial lists of victims of armed conflicts, to estimate the total number of victims. Earlier statistical methods for this setup often use parametric assumptions, or rely on suboptimal plug-in-type nonparametric estimators; but both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given measured covariates, we make several contributions. First, we derive the nonparametric efficiency bound for estimating the capture probability, which indicates the best possible performance of any estimator, and sheds light on the statistical limits of capture-recapture methods. Then we present a new estimator, that has a double robustness property new to capture-recapture, and is near-optimal in a nonasymptotic sense, under relatively mild nonparametric conditions. Next, we give a confidence interval construction method for total population size from generic capture probability estimators, and prove nonasymptotic near-validity. Finally, we apply them to estimate the number of killings and disappearances in Peru during its internal armed conflict between 1980 and 2000.

TOPICS: Causal Inference, Health Policy, High-Dimensional Data, Machine Learning, Nonparametric / Semiparametric Modeling, Personalized Medicine, Precision Health


Yong Chen, PhD

Professor, Biostatistics
University of Pennsylvania

Learn more about this presenter

DATE: Thursday, February 22, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: PDA: Privacy-preserving Distributed Algorithms and statistical inference in the era of real-world data networks

ABSTRACT: With the increasing availability of electronic health records (EHR) data, it is important to effectively integrate evidence from multiple data sources to enable reproducible scientific discovery. However, we are still facing practical challenges in data integration, such as protection of data privacy, the high dimensionality of features, and heterogeneity across different datasets. Aim to facilitate efficient multi-institutional data analysis without sharing individual patient data (IPD), we developed a toolbox of Privacy-preserving Distributed Algorithms (PDA) that conduct distributed learning and inference for various models, such as association analyses, causal inference, cluster analyses, counterfactual analyses, and beyond. Our algorithms do not require iterative communication across sites and are able to account for heterogeneity across different hospitals. The validity and efficiency of PDA are also demonstrated with real-world use cases in Observational Health Data Sciences and Informatics (OHDSI), PCORnets including PEDSnet and OneFlorida, and Penn Medicine Biobank (PMBB).

TOPICS: Data Integration


Hilary Finucane, PhD

Associated Scientist in the Program in Medical and Population Genetics; Associate Member in the Genetics Program at the Stanley Center for Psychiatric Research
Broad Institute

Learn more about this presenter

DATE: Thursday, March 07, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Insights from complex trait fine-mapping across diverse biobanks

ABSTRACT: Despite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified. In this talk, I will discuss some advantages and pitfalls of statistical fine-mapping. I will then describe our group's fine-mapping of 148 complex traits in three large-scale biobanks (BioBank Japan, FinnGen, and UK Biobank; total n = 811,261) and how biobank fine-mapping is useful for gene prioritization.

TOPICS: Genetics Research, Genomics Research, Statistical Genetics


Debdeep Pati, PhD

Professor, Statistics
Texas A&M University

Learn more about this presenter

DATE: Thursday, March 14, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Bayesian fair clustering

ABSTRACT: The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on fair clustering. A popular notion of fairness in clustering mandates the clusters to be balanced, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework developed in Chierichetti et al. (NeurIPS, 2017), this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions. We first rigorously define a notion of fair clustering in the population level under a model mis-specified framework, with minimal assumptions on the data-generating mechanism. We then specify a Bayesian model equipped with a novel hierarchical prior specification to encode the notion of balance in resulting clusters, and whose posterior targets this population-level object. A carefully developed collapsed Gibbs sampler ensures efficient computation, with a key ingredient being a novel scheme for non-uniform sampling from the space of binary matrices with fixed margins, utilizing techniques from optimal transport towards constructing proposals. Impressive empirical success of the proposed methodology is demonstrated across varied numerical experiments, and benchmark data sets. Importantly, the benefits of our approach are not merely limited to the specific model we propose -- thinking from a generative modeling perspective allows us to provide concrete guidelines for prior calibration that ensures desired distribution of balance a-priori, develop a concrete notion of optimal recovery in the fair clustering problem, and device schemes for principled performance evaluations of algorithms.

TOPICS: Bayesian Statistics, Nonparametric / Semiparametric Modeling


Andrew Vickers, PhD

Attending Research Methodologist
Memorial Sloan Kettering Cancer Center

Learn more about this presenter

DATE: Thursday, March 21, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: If calibration, discrimination, Brier, lift gain, precision recall, F1, Youden, AUC, and 27 other accuracy metrics can’t tell you if a prediction model (or diagnostic test, or marker) is of clinical value, what should you use instead?

ABSTRACT: A typical paper on a prediction model (or diagnostic test or marker) presents some accuracy metrics - say, an AUC of 0.75 and a calibration plot that doesn’t look too bad – and then recommends that the model (or test or marker) can be used in clinical practice. But how high an AUC (or Brier or F1 score) is high enough? What level of miscalibration would be too much? The problem is redoubled when comparing two different models (or tests or markers). What if one prediction model has better discrimination but the other has better calibration? What if one diagnostic test has better sensitivity but worse specificity? Note that it doesn’t help to state a general preference, such as “if we think sensitivity is more important, we should take the test with the higher sensitivity” because this does not allow to evaluate trade-offs (e.g. test A with sensitivity of 80% and specificity of 70% vs. test B with sensitivity of 81% and specificity of 30%). The talk will start by showing a series of everyday examples of prognostic models, demonstrating that it is difficult to tell which is the better model, or whether to use a model at all, on the basis of routinely reported accuracy metrics such as AUC, Brier or calibration. We then give the background to decision curve analysis, a net benefit approach first introduced about 15 years ago, and show how this methodology gives clear answers about whether to use a model (or test or marker) and which is best. Decision curve analysis has been recommended in editorials in many major journals, including JAMA, JCO and the Annals of Internal Medicine, and is very widely used in the medical literature, with well over 2000 empirical uses a year.

TOPICS: Artificial Intelligence, Personalized Medicine, Predictive Modeling


Ali Shojaie, PhD

Professor of Biostatistics & Statistics, Assoc. Chair
University of Washington

Learn more about this presenter

DATE: Thursday, March 28, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Learning from Changing Times: Analyzing Non-Stationary High-Dimensional Time Series

ABSTRACT: High-dimensional time series from changing environments are collected in many applications and are prevalent in biology and medicine. Ignoring these changes, by assuming stationarity, results in erroneous conclusions. Moreover, as we demonstrate in neuroscience applications, the changes in high-dimensional time series can provide valuable information about changes in brain function. Motivated by these applications, we first present a three-step procedure, based on total variation penalty, for consistent estimation of both structural change points and parameters of high-dimensional piecewise vector autoregressive (VAR) models. In the second part of the talk, we consider the setting where the changes in the VAR parameters can be modeled by the states of an (unobserved) discrete Markov process, leading to a high-dimensional Markov switching VAR model. We propose an approximate Expectation-Maximization (EM) algorithm to estimate the model parameters and establish the consistency of the resulting estimates.


Fan Bu, PhD; Nicholas Hartman, PhD; Kevin He, PhD; Menggang Yu, PhD

Fan Bu, PhD Assistant Professor, Biostatistics
Nicholas Hartman, PhD Assistant Research Professor, Biostatistics
Kevin He, PhD Associate Professor
Menggang Yu, PhD Professor
University of Michigan

Learn more about this presenter

DATE: Thursday, April 04, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Meet the New Biostatistics Faculty Members

ABSTRACT: Fan Bu's Research Interests: Bayesian statistics, statistical computation, dynamic and stochastic models, spatio-temporal models, and network inference. With applications in infectious disease models, health data science, and computational social science. Likelihood-based inference on federated and distributed data; Vaccine safety surveillance using sequential observational data; Partially observed epidemics on dynamic contact networks

Nicholas Hartman's Research Interests: Survival analysis, clustered/correlated data analysis, healthcare provider profiling, organ transplantation, renal disease; Methods to evaluate prognostic survival models under complex data structures; Adjustments for unobserved confounding in healthcare provider evaluations; Robust estimation methods for clustered data; Policy-change analyses and access to kidney transplantation

Kevin He's Research Interests: Survival analysis, healthcare provider profiling, risk prediction, data integration, machine learning, statistical optimization, causal inference and statistical genetics, organ transplantation, kidney dialysis, psoriasis, cancer and stroke; He currently holds an R01 (PI) for improving statistical methods for profiling healthcare providers.

Menggang Yu's Research Interests: Causal inference and observational studies; Risk prediction; Clinical Biostatistics; Treatment Selection


Min Qian, PhD

Associate Professor, Biostatistics
Columbia University

Learn more about this presenter

DATE: Thursday, April 11, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Online Learning for Personalized Policies Using Mobile Health Data

ABSTRACT: With the increasing focus on improving personal health and fitness using smart devices and wearables, it is crucial to create a mobile clinical decision support system. In this work, we consider the development of personalized policies that allow different intervention recommendations for individuals with the same observed features. Personalized policy represents a paradigm shift from one decision rule for all users to an individualized decision rule for each user. Aiming to optimize the expected rewards, we propose using a generalized linear mixed modeling framework where population effects and individual deviations from the population effects are modeled as fixed and random effects, respectively, and synthesized to form the personalized policy. We introduce a contextual bandit algorithm to learn the personalized policies. This approach is theoretically justified using a regret bound and illustrated using the IntelliCare Suite of Apps with the goal of maximizing the push notification response rate given past app usage and other contextual factors.

TOPICS: Causal Inference, Clinical Trials, Machine Learning, Wearable devices and mobile health


Luis E. Nieto-Barajas, PhD

Professor, Statistics
Instituto Tecnológico Autónomo de México

Learn more about this presenter

DATE: Tuesday, April 16, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Optimal stratification of survival data via Bayesian nonparametric mixtures

ABSTRACT: The stratified proportional hazards model represents a simple solution to account for heterogeneity within the data while keeping the multiplicative e ffect on the hazard function. Strata are typically defined a priori by resorting to the values taken by a categorical covariate. A general framework is proposed, which allows for the stratification of a generic accelerated life time model, including as a special case the Weibull proportional hazard model. The stratification is determined a posteriori by taking into account that strata might be characterized by different baseline survivals as well as different effects of the predictors. This is achieved by considering a Bayesian nonparametric mixture model and the posterior distribution it induces on the space of data partitions. The optimal stratification is then identified by means of the variation of information criterion and, in turn, stratum-specific inference is carried out. The performance of the proposed method and its robustness to the presence of right-censored observations are investigated by means of an extensive simulation study. A further illustration is provided by the analysis of a data set extracted from the University of Massachusetts AIDS Research Unit IMPACT Study.

TOPICS: Bayesian Statistics, Computational Statistics, Survival Analysis, Clustering


Yanxun Xu, PhD

Associate Professor, Applied Mathematics and Statistics
Johns Hopkins University

Learn more about this presenter

DATE: Thursday, April 18, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Precision Medicine in HIV

ABSTRACT: The use of antiretroviral therapy (ART) has significantly reduced HIV-related mortality and morbidity, transforming HIV infection to a chronic disease with the care now focusing on treatment adherence, comorbidities including mental health, and other long-term outcomes. Since combination ART with three or more drugs of different mechanisms or against different targets is recommended for all people living with HIV (PWH) and they must continue on it indefinitely once started, understanding the long-term ART effects on health outcomes and personalizing ART treatment based on individuals’ characteristics is crucial for optimizing PWH’s health outcomes and facilitating precision medicine in HIV. In this talk, I will present methods designed to learn and understand the impact of ART on the health outcomes of PWH, and explore the future of HIV care through innovative and individualized approaches.

TOPICS: Artificial Intelligence, Bayesian Statistics, Causal Inference, Chronic Diseases, Dynamic Treatment, Epidemiology and Public Health, Global Public Health, Infectious Diseases, LGBT Health, Longitudinal / Correlated Data, Machine Learning, Mental Health, Nonparametric / Semiparametric Modeling, Personalized Medicine, Precision Health