You are viewing the site in preview mode

Skip to main content

Joint modeling of multistate survival processes with informative examination scheme: application to progressions in diabetes

Abstract

Background

Multistate survival models (MSMs) are widely used in the medical field of clinical studies. For example, in type 2 diabetes mellitus (T2D), these models can be applied to describe progression in T2D by predefining several T2D states based on available biometric measurements such as hemoglobin A1 C (HbA1c). In most cases, MSMs come with an assumption that the examination process is independent of disease progression. However, in practice, complete independence between disease progression and examination processes is unrealistic, as the frequency at which a patient accesses healthcare may vary based on treatment and/or control of the health condition.

Methods

We built a joint model of a 4-state transition process of T2D with informative examination scheme (i.e., the patterns of examination times are not random). Risk factors including age, sex, race, and socioeconomic disadvantage were included in a log-linear model examining T2D transition intensities and healthcare visit frequencies. Parameters of the joint model are estimated under the framework of likelihood function by the expectation–maximization (EM) algorithm.

Results

The joint model demonstrated that people living in neighborhoods with greater socioeconomic disadvantage had a lower healthcare visit frequency under all 4 defined T2D statuses. Evaluation of race/ethnicity revealed that comparing to non-Hispanic White patients, Black patients had higher risk for progressing from Normal to Prediabetes, T2D, and Uncontrolled T2D states.

Conclusions

Our joint model offers a framework for analyzing multistate survival processes while accounting for the dependence between disease progression and examination frequency. Unlike traditional MSMs that estimate only transition intensities, our model captures variations in healthcare visit frequencies across different disease states, providing a more comprehensive understanding of disease dynamics and healthcare access patterns.

Peer Review reports

Background

Chronic diseases, including cardiovascular disease (CVD), diabetes, cancer, and chronic kidney disease (CKD) are associated with a high degree of morbidity and mortality and the cost burdens countries worldwide [1, 2]. Many chronic diseases are characterized by progressively worsening stages. For example, T2D can be defined by four states according to the HbA1c value (normal < 5.7%, prediabetes 5.7–6.4%, T2D ≥ 6.5% and uncontrolled T2D ≥ 7% in non-pregnant adults) [3]. Also, a 7-state transition model was defined to learn dementia progression with all states related to the buildup of amyloid plaque in brain and cortical thickness [4]. Examining the rate of disease progression among chronic disease states can improve early detection and intervention to prevent or decrease the severity of disease [5]. Multistate survival models (MSM) are an extension of traditional survival analysis that provides a framework for modeling potentially bi-directional changes between multiple states, including disease deterioration and recovery [6]. MSM models use the theory from continuous time Markov chains (CTMC) [7] to learn the transition rate between several predefined disease states and perform inference on whether these transition rates vary as a function of variables of interest [8].

If the actual transition times between states are known, analytic approaches to build MSM models stem from traditional survival analysis. However, often the exact time of an individual’s health status change cannot be determined. Rather, the transition time between health states is located between two examination times, an occurrence know as interval censoring [9,10,11]. For the formal definition of interval censoring, for patient \(i\), let \({t}_{d}\) denote the time to event, \({t}_{d}\) is interval censored if only an interval \((L, R]\) is observed as \({t}_{d}\in \left(L, R\right].\) For two consecutive observed times \({t}_{ij}\) and \({t}_{i(j+1)}\), interval censoring is considered independent if \(P\left({t}_{ij}<{t}_{d}\le {t}_{i(j+1)}|L={t}_{ij}, R={t}_{i(j+1)}\right)=P({t}_{ij}<{t}_{d}\le {t}_{i(j+1)})\), indicating that the distribution of \(L\) and \(R\) is free of parameters involved in the distribution of \({t}_{d}\) [6, 11]. In some clinical trial studies, all observation times are predefined ensuring the independence of the disease process with the observation scheme [12]. However, in the observational study, the observation times usually depend on the disease process [13]. Grüger et al. [13] formally defined the concept of informative observation scheme, where the distribution of observation times is influenced by the parameters of the disease process. Informative observation scheme will imply dependent interval censoring [6]. So understanding the observation scheme is important to address the challenge of interval censoring in multistate models. There are two types of examination schemes, noninformative or informative schemes (Fig. 1) [13]. In the noninformative examination scheme examination times are independent of disease status. An example in practice is prescheduled examination times, defined as panel data. For informative schemes, examination times are dependent on the patients’ health status. For example, a patient in an advanced stage of a chronic condition may self-select to visit their healthcare provider more frequently. In contrast, individuals in good health may visit their healthcare provider less often or on a predetermined schedule.

Fig. 1
figure 1

A An example of noninformative examination times with all patients having the same examination times. B An example of informative self-selected examination times for patients with different T2D related states. Here the pattern of the examination times does not follow a fixed scheme

Dependent censoring plays an important role in survival analysis, ignoring the dependent censoring will introduce bias in estimators [14,15,16]. For right censoring, several studies using the frailty and copula frameworks have demonstrated the bias that arises when noninformative censoring is assumed [14, 17, 18]. Noninformative censoring is defined as the scenario where the time to event and the time to censoring are distributed independently [14]. For interval censoring, Finkelstein, Goggins, and Schoenfeld [19] employed a joint model of the visit distribution and failure distribution, demonstrating that the Turnbull estimator is biased upward when the probability of making a visit is low before failure and high afterward, if the dependent interval censoring is ignored. Zhang et al. [20] incorporated a latent variable to model the dependence structure between the censoring variables and failure time, effectively addressing the issue of dependent interval censoring. Schneider et al. [18] proposed an approach for clustered survival data that accounts for the dependence between failure time and censoring time using a latent frailty model, demonstrating strong performance.

In multistate transition models, ignoring the dependent interval censoring under the situation of an informative observation scheme will introduce bias as well [15, 16]. Most multistate transition models developed for panel data assume a noninformative examination scheme [12, 15]. The noninformative scheme can be ignored when the likelihood function is used to estimate the related transition intensities [6]. However, if the examination times are informative then biased estimates result using approaches designed for panel data since the examination schemes are not independent of the disease progression process [6, 16]. Chen, Yi, and Cook [15] proposed a progressive multi-state model with informative dropout under a predefined observation scheme. Lange et al. [16] introduced a joint model for the disease process and informative observation scheme that allows for misspecification. However, this model involves the computation of an exponential matrix, which poses challenges when encountering large sample sizes [4, 21].

In this paper, we model disease transitions under an informative examination scheme, with an application of modeling the risk of developing T2D and T2D progression for patients seen at a large U.S. academic medical center. We presume that the T2D process was continuously observed except that the real transition time was interval censored between two consecutive examination times. Four T2D related states were defined based on HbA1c values measured at different examination times, obtained from electronic medical record (EMR) data. An important aspect of the data is that patients in worse health have more records per year in the EMR dataset than those with a healthier status (Fig. 2). The interval length between two consecutive examination times depends on the rate of disease progression and the frequency of patient visits to a healthcare provider. We hypothesize that jointly modeling the pattern of informative examination times with MSM will reduce the bias for estimating the disease transition rate.

Fig. 2
figure 2

Yearly frequency of examinations under the condition of different T2D related states obtained from electronic medical record (EMR) data

We propose a multistate survival model for the analysis of T2D progression with the informative examination times. A Poisson distribution models the frequency of examination times under different T2D statuses. The model estimates parameters related to transition intensities between different T2D states and the rate of examination times under these different states. We develop an EM algorithm for handling interval censoring under the assumption that only a single transition occurs between two consecutive examination times. This assumption helps to alleviate computational complexity. In methods section, we present details for constructing the likelihood and application of the EM algorithm in the joint model. A simulation study is performed to evaluate the variance, bias, and coverage probability of the estimates under a 4-state disease progression model. In application section, a data set of HbA1c measurements obtained from EMR data is applied to our model describing the transition between different T2D related health states. Strengths and weaknesses of the model are considered in discussion section.

Methods

Theoretical background of the continuous time Markov chain (CTMC)

The disease process is described using a nonhomogeneous Markov jump model. Under the influence of a disease process, a patient’s check-in process is modelled by a Poisson process (PP). Let \(\{X\left(t\right)\}\) be the stochastic process denoting the state someone is in at time \(t\), \(X\left(t\right)\in \{\text{1,2},\dots ,s\}\), with \(s\) the total number of states. The main element of interest in this model is the infinitesimal generator matrix or intensity matrix \(Q(t)\). The \((r, c)th\) element of it can be denoted as [12]

$${q}_{rc}(t):=\underset{\Delta t\to {0}^{+}}{\text{lim}}\frac{P\left(X\left(t+\Delta t\right)=c \right| X\left(t\right)=r)}{\Delta t}$$

where \(P\left(X\left(t+\Delta t\right)=c \right| X\left(t\right)=r)\) is the transition probability from state \(r\) to state \(c\) in the interval \((t, t+\Delta t)\), \(r, c\) \(\in \{\text{1,2},\dots ,s\}\). Let \({q}_{r}(t)= -\sum_{c\ne r}{q}_{rc}(t)\) be the \(\left(r, c\right)th\) diagonal element of \(Q\)(t). In our study, a piecewise-constant intensity model is utilized to estimate intensities over time, following prior methodologies [22, 23]. The intensity \({q}_{rc}\left(t\right)\) is assumed to remain constant within each time interval \([{t}_{1}, {t}_{2})\) and is defined as:

$${q}_{rc}\left(t\right)={q}_{rc}\left({t}_{1}\right), {t}_{1}\le t<{t}_{2}$$

A log-linear model is employed to describe the relationship between the intensity and a set of covariates:

$$\text{log}{q}_{rc}\left(t\right)={\beta }_{rc}^{T}Z$$

where Z is the covariate vector and \({\beta }_{rc}^{T}\) is the corresponding coefficients for transition from \(r\) to \(c\). The covariates in Z can be either time-constant or time-varying. For time-varying covariates, the intensity is computed based on their values at the beginning of each time interval.

Check-in process is modulated by CTMC

In Fig. 3, a 4-state T2D transition model is provided. For patient \(i\in \{\text{1,2},\dots ,n\}\), let \(\left({t}_{ij}, X({t}_{ij})\right),j\in \{\text{1,2},\dots ,{k}_{i}\}\) denote the observation time \({t}_{ij}\), and the corresponding state \(X({t}_{ij})\) at \({t}_{ij}\). We assume that only a single diabetes status change can occur between two hospital visiting times. Based on this assumption, if \(X\left({t}_{ij}\right)=X\left({t}_{i(j+1)}\right)\), we have

Fig. 3
figure 3

An example of a 4-state T2D progression model. There are totally 6 observations. Times \(\{{d}_{1},{d}_{2},{d}_{3}\}\in D\) are the actual times for disease state change (unobserved). \({d}_{1}\) is the transition time from Normal to Prediabetes, \({d}_{2}\) is the transition time from Prediabetes to T2D, and \({d}_{3}\) is the transition time from T2D to Uncontrolled T2D

$$X\left(t\right)=X\left({t}_{ij}\right), t\in \left[{t}_{ij},{t}_{i\left(j+1\right)}\right).$$

If \(X\left({t}_{ij}\right)\ne X\left({t}_{i\left(j+1\right)}\right),\)

$$X\left(t\right)=\left\{\begin{array}{c}X\left({t}_{ij}\right), t\in \left[{t}_{ij},{t}_{id}\right)\\ X\left({t}_{i\left(j+1\right)}\right), t\in \left[{t}_{id},{t}_{i\left(j+1\right)}\right),\end{array}\right.$$
(1)

where \({t}_{id},d\in \{\text{1,2},\dots ,{D}_{i}\}\) is the censored time for diabetes status change. All \({t}_{ij}\in [{t}_{id},{t}_{i(d+1)})\) are defined as check-in times under state \(X({t}_{id})\). Let \(\{N\left(t\right), t\in ({t}_{d},{t}_{d+1})\}\) be the counting process for check-in events during \(({t}_{d},{t}_{d+1})\) with a nonhomogeneous Poisson process to model the check-in process. The check-in rates at different times \(t\) are denoted as \({\lambda }_{r}(t)\) under the state \(r\) and \(r=X\left(t\right)\). Similarly, a piecewise constant intensity model with log linear regression is used to model the check in rates under different states as

$$\text{log}{\lambda }_{r}(t)={\boldsymbol{\alpha }}_{{\varvec{r}}}^{{\varvec{T}}}{\varvec{Z}}$$

where Z is the covariate vector and \({\alpha }_{r}^{T}\) is the corresponding coefficients for state \(r\).

Dependent interval censoring

In EMR data of chronic diseases, the true transition times between disease states (\({t}_{d})\) are unknown (Fig. 3). For patient \(i\), once the disease transition time \({t}_{d}\) is censored between the time interval\(({t}_{ij}, {t}_{i(j+1)})\), the length of the interval \(\Delta {t}_{ij}={t}_{i(j+1)}-{t}_{ij}\) will be determined by two factors. The first is the transition rate between disease states. The second factor is the healthcare provider visit frequency. For example, the length between two examination times where a patient with prediabetes is subsequently diagnosed with T2D is the sum of two time segments. The first segment is the time from the last healthcare provider visit while in the prediabetes phase to progression to T2D, while the second segment is the time between progressing to T2D and the patient seeing a healthcare provider again. This scenario of interval censoring is an example of dependent interval censoring. In our application we assume only a single jump can occur between any two observation times.

Likelihood and parameter estimation

Likelihood without interval censoring

For a single patient \(i\) with their record denoted as (\({t}_{ij}, X({t}_{ij})\))\(,j\in \left\{\text{1,2},\dots ,{k}_{i}\right\},\) with \({t}_{ij}\) the record time, \(X({t}_{ij})\) the corresponding disease status, and \({k}_{i}\) the total number of records for patient \(i\). In the absence of interval censoring, all records occur at true transition times. The event transition time corresponds to the true event time so the transition probability or density between \(\left({t}_{ij},{t}_{i(j+1)}\right)\) of patient \(i\) can be constructed based on a joint distribution of the CTMC and Poisson process [24],

$${P}_{i}\left(X\left({t}_{i\left(j+1\right)}\right)=c,{{T}_{ij}=t}_{ij},{T}_{i\left(j+1\right)}={t}_{i(j+1)}|X\left({t}_{ij}\right)=r\right)=\left\{\begin{array}{c}{e}^{-{q}_{i,r}\Delta {t}_{ij}}{\lambda }_{i,r}{e}^{-{\lambda }_{i,r}\Delta {t}_{ij}}, r=c\\ {q}_{i,rc}{e}^{-{q}_{i,r}\Delta {t}_{ij}}{e}^{-{\lambda }_{i,r}\Delta {t}_{ij}}, r\ne c.\end{array}\right.$$
(2)

where \(\Delta {t}_{ij}={t}_{i(j+1)}-{t}_{ij}\), \({q}_{i,rc}={q}_{i.rc}({t}_{ij})\) and \({\lambda }_{i,r}={\lambda }_{i,r}({t}_{ij})\). \({q}_{i,rc}\) and \({\lambda }_{i,r}\) are modelled using log linear models with covariates under a piecewise-constant intensity framework. To reduce the complexity of the notation, we have not inserted the log linear model into Eq. (2) at this stage. They will be incorporated into the formula during the EM estimation process.

Then the full likelihood conditioned on the initial distribution [12] over the whole sample is given by.

\({L}_{full}\left({\varvec{\beta}},\boldsymbol{\alpha }\right)=\prod_{i=1}^{n}\prod_{j=1}^{{k}_{i}-1}{P}_{i}\left(X({t}_{i(j+1)}),{{T}_{ij}=t}_{ij},{T}_{i\left(j+1\right)}{=t}_{i(j+1)}|X\left({t}_{ij}\right)\right)\), (3).

where \({\varvec{\beta}}\) is a vector consist of parameter related to intensities and \(\boldsymbol{\alpha }\) is a vector consist of parameter related to check-in process.

Likelihood with dependent interval censoring

Since the actual transition times between states are unknown, a convolution method was used to estimate the transition probabilities between states. We updated the transition probability between (\({t}_{ij},{t}_{i(j+1)}\)) of patient \(i\) when \(X({t}_{i\left(j+1\right)})\ne X({t}_{ij})\) in Eq. (2) by

$${P}_{i}\left(X\left({t}_{i\left(j+1\right)}\right)=c,{{T}_{ij}=t}_{ij},{T}_{i\left(j+1\right)}={t}_{i(j+1)}|X\left({t}_{ij}\right)=r\right)={\int }_{0}^{\Delta {t}_{ij}}{q}_{i,rc}{e}^{-({q}_{i,r}+{\lambda }_{i,r})t}{\lambda }_{i,c}{e}^{-({q}_{i,c}+{\lambda }_{i,c})(\Delta {t}_{ij}-t)}dt.$$

The full likelihood follows the same form as Eq. (3).

For patients with chronic diseases, changes in their disease states are mainly captured when they have a healthcare provider visit. Hence, we assume that if a status change occurs then the true event time is interval censored. The EM algorithm was applied to maximize the likelihood function and improve computational efficiency.

EM algorithm for parameter estimation

The E step is applied for interval censoring when \(X({t}_{i\left(j+1\right)})\ne X({t}_{ij})\). The expectation of interval censored time \({t}_{d}\) between the interval \(\left({t}_{ij}, {t}_{i(j+1)}\right)\) is expressed as:

$$E\left({t}_{d}|X\left({t}_{ij}\right)=r,X\left({t}_{i\left(j+1\right)}\right)=c\right).$$

The conditional expectation is re-expressed as the expectation under the condition of the length of time interval \(\Delta {t}_{ij}= {t}_{i(j+1)}-{t}_{ij}\),

$$E\left({t}_{d}|X\left({t}_{ij}\right)=r,X\left({t}_{i\left(j+1\right)}\right)=c\right)=\frac{{\Delta {t}_{ij}e}^{{A}_{ij}\Delta {t}_{ij}}}{{e}^{{A}_{ij}\Delta {t}_{ij}}-{e}^{{B}_{ij}\Delta {t}_{ij}}}-\frac{1}{{A}_{ij}-{B}_{ij}},$$

where \({A}_{ij}=-({q}_{r}+{\lambda }_{r})\) and \({B}_{ij}=-({q}_{c}+{\lambda }_{c})\).

Detailed information for the derivation of the conditional expectation is provided in Appendix A.

The intensity and visiting frequency for patient \(i\) are modelled through a log linear model, i.e., log hazard and visiting frequency are linear combinations of variables,

$$\text{log}{q}_{i,rc}={{\varvec{\beta}}}_{{\varvec{r}}{\varvec{c}}}^{{\varvec{T}}}{{\varvec{z}}}_{{\varvec{i}}},$$
$$\text{log}{\lambda }_{i,r}={\boldsymbol{\alpha }}_{{\varvec{r}}}^{{\varvec{T}}}{{\varvec{z}}}_{{\varvec{i}}},$$

where, \({{\varvec{\beta}}}_{{\varvec{r}}{\varvec{c}}}={({\beta }_{rc.0}, {\beta }_{rc.1},\dots ,{\beta }_{rc.p})}^{T}\) are the coefficients of covariates for hazards from state \(r\) to state \(c\), \({\boldsymbol{\alpha }}_{{\varvec{r}}}={({\alpha }_{r.0},{\alpha }_{r.1},\dots ,{\alpha }_{r.p})}^{T}\) are the coefficients of covariates for visiting frequency of state \(r\) and covariate vector \({{\varvec{z}}}_{{\varvec{i}}}={(1, {z}_{i1},\dots ,{z}_{ip})}^{T}\).

The updated values of the parameters in the \((k+1)\) th iteration of the M step are the following:

$${{\varvec{\beta}}}_{rc}^{(k+1)}=argmax \sum_{i=1}^{n}\sum_{j=1}^{{k}_{i}-1}\left({I}_{ij,rc}{{\varvec{\beta}}}_{rc}^{T}{{\varvec{z}}}_{i}-{I}_{ij,r}{e}^{{{\varvec{\beta}}}_{rc}^{T}{{\varvec{z}}}_{i}}{\Delta t}_{ij}^{(k)}\right)$$
$${\boldsymbol{\alpha }}_{r}^{(k+1)}=argmax \sum_{i=1}^{n}\sum_{j=1}^{{k}_{i}-1}\left({I}_{ij,r}{\boldsymbol{\alpha }}_{r}^{T}{{\varvec{z}}}_{i}-{I}_{ij,r}{e}^{{\boldsymbol{\alpha }}_{r}^{T}{{\varvec{z}}}_{i}}{\Delta t}_{ij}^{(k)}\right).$$

Here, \({I}_{ij,rc}\) is the indicator for transition from state \(r\) to \(c\) at time \({t}_{ij}\) and \({I}_{ij\bullet ,r}\) is the indicator for staying in state \(r\) at time \({t}_{ij}\). \({\Delta t}_{ij}^{(k)}\) is the transition time between \({t}_{ij}\) and \({t}_{i(j+1)}\).

Covariance of parameters.

Let \(\uppsi =\left({\varvec{\beta}},\boldsymbol{\alpha }\right), {t}^{obs}\) represent the observed data, and \({t}^{censored}\) represent the interval censored data. The conditional probability of missing data based on the observations, \(K\left({t}^{censored}|{t}^{obs},\uppsi \right)\), is given by

$$K\left({t}^{censored}| {t}^{obs},\uppsi \right)=\frac{{L}_{full}({t}^{censored},{t}^{obs},\uppsi )}{L({t}^{obs},\uppsi )}.$$

Here \({L}_{full}({t}^{censored},{t}^{obs},\uppsi )\) is the complete likelihood based on the complete data \(({t}^{censored},{t}^{obs})\) and \(L({t}^{obs},\uppsi )\) is the likelihood based on the observed data. By taking the negative logarithm of both sides, we have

$$-\text{log}L\left({t}^{obs},\uppsi \right)=-\text{log}{L}_{full}\left({t}^{censored},{t}^{obs},\uppsi \right)+\text{log}K\left({t}^{censored}|{t}^{obs},\uppsi \right).$$

Taking the second derivative, we have

$$I\left(\uppsi \right|{t}^{obs})={I}_{full}\left(\uppsi \right|{t}^{censored},{t}^{obs})-{I}_{m}\left(\uppsi \right|{t}^{censored},{t}^{obs}),$$

where \({I}_{full}\left(\uppsi \right|{t}^{censored},{t}^{obs})\) is the complete information matrix and \({I}_{m}\left(\uppsi \right|{t}^{censored},{t}^{obs})\) is the missing information matrix. The estimation process of \({I}_{full}\left(\uppsi \right|({t}^{censored},{t}^{obs}))\) and \({I}_{m}\left(\uppsi \right|{t}^{censored},{t}^{obs})\) are provided in the Appendix A.

Simulation study

We simulated a 4-state T2D transition mode \(\text{l}\) based on the Gillespie [25] algorithm with the Poisson process, as summarized in Algorithm 1.

figure a

Algorithm 1. Gillespie algorithm with check-in times to construct Markov Modulated Poisson Process (MMPP) process for single patient with interval censoring.

In algorithm 1, \({s}_{i+1}\sim discrete\left(s\left[-{s}_{i}\right]\right)\) indicates that \({s}_{i+1}\) is sampled from the state space of \(s\) excluding\({s}_{i}\), using the probability distribution defined by\(P[{s}_{i}, ]\), Specifically,\(P\left[{s}_{i}, \right]=1-\frac{Q[{s}_{i},]}{|{q}_{{s}_{i}}|}\), where Q \([{s}_{i},]\) represents the transition intensities from state \({s}_{i}\) and \(\left|{q}_{{s}_{i}}\right|\) is the sum of transition rates from state\({s}_{i}\).

We simulated data based on three check-in frequency settings (low, medium, and high) under state 1 (Normal) to show the impact of different check-in frequencies on model results. The check-in frequencies under other states would not vary by check-in frequency settings for comparison purpose. The sample size was 300 in each replicate, with 100 replicates for each setting. Intensities, \({q}_{rc},\) and visiting frequencies \({\lambda }_{r}\) were generated by:

$$\text{log}{q}_{rc}={\beta }_{rc.0}+ {\beta }_{rc.1}{x}_{j} r, c \in (\text{1,2},\text{3,4})$$
$$\text{log}{\lambda }_{r}={\alpha }_{r.0}+ {\alpha }_{r.1}{x}_{j} r \in \left(\text{1,2},\text{3,4}\right),$$

where \({x}_{j}\) was randomly generated binary variable from Bernoulli(0.5). A complete list of parameter values used to generate data for the simulation study can be found in Appendix B. We compared our joint model to two other multistate models, the exact model and the panel model. The exact model treats all time points where transitions occur as true transition times. The panel model accounts for the interval censored nature of the data but assumes the examination scheme is independent of the disease process. Both standard multistate models were fitted with the msm R package. The simulation results related to state 1 are given in Table 1 and Fig. 4. Results related to other states are provided in Appendix B. The code is available on GitHub at [https://github.com/yuxizhucpu/JM_informative].

Table 1 Simulation results from state 1 for the joint model, the exact model that treats all time points as exact time and the panel model that fits model under the panel data style
Fig. 4
figure 4

Parameter estimations for transition intensities from state 1 under three check-in frequency settings for joint model, exact model and panel model based on 100 replicates of 300 subjects

Based on Table 1, the joint model leads to unbiased parameter estimates with high coverage probabilities relative to the exact and panel models. The exact model exhibited a consistent level of bias across three check-in frequency settings. The panel model had larger bias when the check-in frequency decreased. Additionally, as the check-in frequency increased, the parameter estimates of the panel model became increasingly close to those of the exact model (Fig. 4). The panel model also had convergence issue, resulting in SEs being unavailable under the low and medium frequency scenarios. The convergence issues in the panel model can be attributed to two primary factors. First, the parameter estimation of the panel model involves the canonical decomposition of the intensity matrix, which may lack an invertible matrix during the optimization process [26]. Second, the panel model does not adequately account for the dependency between the disease progression and the visit process [27]. This motivates our use of the EM algorithm, which addresses these limitations and enhances model stability.

Application to T2D transition data

The joint model was inspired by the analysis of HbA1c measurements from The Ohio State University Wexner Medical Center EMR from November 1, 2011 through February 7, 2018. Institutional review board (IRB) approval was obtained from The Ohio State University (2018H0173). Since the study was retrospective the IRB waived the requirement for informed consent which allowed access to data with identifiers by the investigators. The main goal was to model the development of T2D with a multistate transition model, while accounting for the challenge of dependent interval censoring within EMR data. In order to learn the transition intensity between different states, patients with only one record were excluded. The model included covariates for age, race/ethnicity, sex, and neighborhood socioeconomic deprivation (area deprivation index) to test whether they were significantly associated with transition times between T2D states and check-in frequencies for different T2D states. The basic demographic information was provided in Table 2. For example, whether patients living in more socioeconomically deprived neighborhoods had less frequent provider visits compared to individuals in less socioeconomically deprived neighborhoods. Neighborhood socioeconomic disadvantage was extracted from the Neighborhood Atlas website based on the zip code in the EMR dataset [28]. The neighborhood socioeconomic disadvantage was defined using the Area Deprivation Index (ADI). The ADI can rank neighborhoods by socioeconomic disadvantage into 10 deciles from 1 to 10 at the state or national level, where a higher ADI indicates greater disadvantage. The evaluation of ADI includes factors such as income, education, employment, and housing quality [28]. In our study, the ADI was categorized into two groups based on the ADI distribution, a less disadvantaged group with ADI value in the top 50% (ADI \(\le\) 5) and more disadvantaged group with ADI value in the bottom 50% (ADI \(>\) 5) [29]. The American Diabetes Association recommends annual T2D screening in patients 45 years and older [30, 31]. Age was categorized into two groups, Younger (< 45) and Older (≥ 45). Races included three main groups: Asian, Black and White. Patients from other groups such as Native Hawaiian were not included in the study as the sample size was small (< 100).

Table 2 provides basic demographic data on the complete dataset

The sample sizes for each subgroup were provided in Fig. 5.

Fig. 5
figure 5

Sample size for each subgroup in the whole dataset

Based on the development of T2D, a 4-state multistate model was considered defined by HbA1c level. State 1 “Normal” was defined as HbA1 C levels below 5.7%, State 2 “Prediabetes” was defined as 5.7% ≤ HbA1 C \(\le\) 6.4%, State 3 “T2D” as 6.5% ≤ HbA1 C \(\le\) 6.9%, and State 4 “Uncontrolled T2D” as HbA1 C ≥ 7%. The intensity matrix \(Q(t)\) at time \(t\) was defined as.

$$Q\left(t\right)=\left(\begin{array}{cccc}-& {q}_{12}(t)& {q}_{13}(t)& {q}_{14}(t)\\ {q}_{21}(t)& -& {q}_{23}(t)& {q}_{24}(t)\\ {q}_{31}(t)& {q}_{32}(t)& -& {q}_{34}(t)\\ {q}_{41}(t)& {q}_{42}(t)& {q}_{43}(t)& -\end{array}\right)$$

Where \({q}_{rc}\left(t\right)\) is the transition intensity for \(r,c\in \{\text{1,2},\text{3,4}\}\). The check-in frequencies under four different T2D states are denoted by \({\lambda }_{r}(t)\). The transition counts between different T2D states among different covariates are summarized in Table 3.

Table 3 Transition numbers between different T2D states among covariates of interest

Model selection and validation

One goal of the model was to evaluate the association of T2D progression with the following factors: sex (male (reference), female), age (\(\le\) 45 (reference), > 45), race (Black (\({R}_{1}\)), Asian (\({R}_{2}\)), White (reference)) and ADI (less disadvantaged (reference), more disadvantaged). The transition rates and check-in frequencies were modelled as a function of these covariates using three nested models with forms below:

  • Model 1: \(\text{log}\left({q}_{i,rc}\right)={\beta }_{rc.0}+{\beta }_{rc.1}{Sex}_{i}+{\beta }_{rc.2}{R}_{1i}+{\beta }_{rc.3}{R}_{2i}+{\beta }_{rc.4}{Age}_{i}+{\beta }_{rc.5}{ADI}_{i}\) 

  • Model 2: \(\text{log}\left({q}_{i,rc}({t}_{ij})\right)={\beta }_{rc.0}+{\beta }_{rc.1}{Sex}_{i}+{\beta }_{rc.2}{R1}_{i}+{\beta }_{rc.3}{R2}_{i}+{\beta }_{rc.4}{Age}_{i}+{\beta }_{rc.5}{ADI}_{i}+{\beta }_{rc.6}{t}_{ij}\) 

  • Model 3:\(\text{log}\left({q}_{i,rc}({t}_{ij})\right)={\beta }_{rc.0}+{\beta }_{rc.1}{Sex}_{i}+{\beta }_{rc.2}{R1}_{i}+{\beta }_{rc.3}{R2}_{i}+{\beta }_{rc.4}{Age}_{i}+{\beta }_{rc.5}{ADI}_{i}+{\beta }_{rc.6}{t}_{ij}+{\beta }_{rc.7}{I}_{1i}+{\beta }_{rc.8}{I}_{2i}+{\beta }_{rc.9}{I}_{3i}+{\beta }_{rc.10}{I}_{4i}\) 

In Model 1, a homogeneous model with constant intensities conditional on the covariates over time is considered. Model 2 adds time \(t\) into Model 1 so that intensities are allowed to change linearly with time. Model 3 is a piecewise constant intensity model using four annual indicator functions (\({I}_{1},{I}_{2},{I}_{3},{I}_{4}\)) accounting for the five years of the study. The coefficients of the four indicator functions are used to assess the effect of each year on intensities compared to the first year. Here, based on the method provided by Hout [6], the hazard at the starting point of the time interval was used to represent the hazard for the entire interval [32].

Model selection from the three proposed models was based on the Akaike information criterion (AIC) value. Table 4 gives value of AIC for each model. Model 2 came with the smallest AIC value.

Table 4 Model selection based on AIC

Our model goodness of fit was evaluated by comparing the predicted one-year prevalence to the observed prevalence [8] based on the overall effect which includes only time in the model. The predicted one-year prevalence can be calculated by.

$${\widehat{p}}_{(t+365)}={p}_{t}P(t, t+365)$$

Where \(P(t, t+365)\) is the transition probability matrix from time \(t\) to time \((t+365)\). \({p}_{t}\) is the observed prevalence at the beginning of the year and \({\widehat{p}}_{(t+365)}\) is the predicted prevalence at the end of the year [33]. Goodness of fit was assessed by comparing \({\widehat{p}}_{(t+365)}\) and \({p}_{(t+365)}\) at the times \(t\in \{0, 365, 730, 1095, 1460\}\). Figure 6 shows that our model fits the data well.

Fig. 6
figure 6

The comparison between predicted prevalence and observed prevalence during the 5 years

Results

The regression coefficient estimates of Model 2 were provided in Tables 5 and 6. Table 5 gives the regression parameter estimates for the disease transition process, while Table 6 gives the regression parameter estimates for the check-in process. For better illustration, hazard ratios of transition intensities and check-in frequencies for sex and ADI are visualized in Figs. 7 and 8, respectively. Related figures for other variables are provided in the appendix C.

Table 5 Estimated transition parameters between T2D states for Model 2
Table 6 Estimated parameters of the check-in process for Model 2
Fig. 7
figure 7

Plot of hazard ratios of transition intensities and check-in frequencies for sex. Male is the reference group

Fig. 8
figure 8

Plot of hazard ratios of transition intensities and check-in frequencies for ADI. More disadvantage area is the reference group

In Fig. 7, the check-in frequencies under four states were denoted by Normal to Normal (check-in frequency), Prediabetes to Prediabetes, T2D to T2D and Uncontrolled T2D to Uncontrolled T2D. Males are the reference group. Women had significantly higher check-in frequencies under Prediabetes (HR:1.06, 95% CI (1.05, 1.08)), T2D (HR:1.04, 95% CI (1.02, 1.08)) and uncontrolled T2D (HR:1.05, 95% CI (1.03, 1.06)). Meanwhile, based on Fig. 7, women had faster recovery rates from severe diabetes states and slower transition rates into severe diabetes. For example, under Prediabetes, the transition rate from Prediabetes to Normal was faster for women than men (HR:1.17 [95% CI, 1.12 − 1.23]). Conversely, the transition rates from Prediabetes to T2D (HR:0.98 [95% CI, 0.93 − 1.04]) and uncontrolled T2D (HR:0.78 [95% CI, 0.73 − 0.85]) were slower in females compared to males.

In Fig. 8 for ADI, patients living in more disadvantage areas had lower check-in frequencies under all four states, Normal (HR:0.90 (95% CI, 0.89 − 0.92)), Prediabetes (HR:0.77 (95% CI, 0.75 − 0.78)), T2D (HR:0.68 (95% CI, 0.66 − 0.70)) and Uncontrolled T2D (HR:0.71 (95% CI, 0.70 − 0.72)). Meanwhile, Patients lived in more disadvantage areas had slower rates of returning to Normal states, e.g. Prediabetes to Normal (HR:0.65 (95% CI, 0.62 − 0.68)), T2D to Normal (HR:0.76 (95% CI, 0.64 − 0.91)) and Uncontrolled T2D to Normal (HR:0.77 (95% CI, 0.66 − 0.90)).

Discussion

For EMR data, the assumption that the examination times are independent of disease progression usually does not hold. The examination times or pattern of recording are highly related to a patient’s disease status. When patients are healthier, the frequency of records is lower compared to patients with health complications. In our diabetes study, this was evident by the increasing trend in the frequency of visits per month as patients transition through progressively worse states related to diabetes. Also, the actual time for changing disease states is censored and usually occurs between two consecutive examination times. The censored actual transition time is dependent on the disease progression rate and the rate for seeing a health provider.

In this paper, motivated by these two features of EMR data, a joint multistate transition model was built that incorporated the informative examination scheme. A novel solution was provided to the interval censoring problem under the joint modeling framework. Based on simulations under three different check-in frequencies, our joint model provided good performance and unbiased estimates compared to a panel model that ignored the informative examination scheme and an exact model that assumed transition times are observed. Applying the panel data formulation in multistate modeling without considering whether time points are informative or noninformative will result in large biased estimates, as will the assumption that exact transition times are observed when they are not [16]. Understanding the data generating mechanism is imperative for developing an appropriate multistate transition model of a given system.

Our model incorporated individual level covariates for both the multistate transition parameters and check-in frequencies. Application to EMR data of HbA1 C measurements to model transitions between T2DM states including normal, prediabetes, T2D, and uncontrolled T2D indicated differences in both processes according to age, gender, race, and Area Deprivation Index. Our model flexibly incorporated transitions between all diabetes states, to capture transitions through progressively worse states in addition to improvements in health status. Based on these data, our model provided information concerning which individuals were at greater risk of progression, which had a higher chance of improvement, and which individuals had lower visit frequencies or record entries from their providers. Our model showed that the five-year probability for a patient in a normal state to T2D state was around 1% and the five-year probability for a person in a normal state staying in a normal state was around 60%. A similar multistate transition model for progression of T2D that defines diabetes states based on fasting glucose showed similar results that a person in a normal state would have around 70% chance to stay in normal state and 1% chance to T2D [34]. The multistate transition model based on fasting glucose only allowed transitions between adjacent states and no information about check-in frequencies among different groups. Another study using multinomial logistic regression also presented the transition probability from patients staying in the normal state is around 60% and the transition probability to T2D is 1.8% [35]. Examining race, we found Black and Asian groups had significantly higher risk of progression to the next diabetes state comparing to the White group. The progression pattern was consistent with the model defined by fasting glucose. Based on the socioeconomic status, people living in more disadvantaged areas had higher risk of progressing to worse diabetes states comparing to those in less disadvantaged areas. The conclusion of association between socioeconomic status and diabetes has been observed previously [36, 37]. Besides the risk factor information, our model also showed inequities by socioeconomic status, as the model found people living in more disadvantaged areas have less frequent visits to physician or hospitals.

A challenge in the development of multistate transition models is the computational burden for estimating the transition parameters. Our solution was to assume only a single transition occurs between two records, and we flexibly allowed for transitions to occur between any two states in our model. This assumption reduced the computational complexity and also provided an intuitive solution to the dependent interval censoring problem. Finally, an iterative optimization solution is derived based on the EM algorithm.

There are several important areas to extend the current findings with further research. For example, the sojourn time under different states can be estimated. We currently use a log linear model to fit the transition intensities, which means the baseline hazard is a constant. Other parametric baseline intensities can be selected such as the Weibull and log-logistic model. Frailty models can be incorporated when transitions have a random cluster effect [6]. Joint modeling with other longitudinal observations can provide further information for multistate transition processes. For example, the repeated BMI measurements can be included in our current model to study BMI trends under different diabetes states. The first limitation of our study is we based these states solely on HbA1c values, without considering diagnoses, provider notes, and medications, or other measurements such as fasting glucose [38]. In the future, all these measures could be used to inform the diabetes status. Meanwhile, our modeling approach is based on the piecewise constant intensity assumption. For models incorporating time-varying covariates, intensities within each interval are computed using the covariate values at the beginning of the interval. However, as the length of the time interval increases, the potential for estimation bias also grows, as covariate values may change within the interval. Introducing additional knots to refine the approximation of intensity could help mitigate this bias, making it a promising direction for future research.

Conclusions

Overall, our joint model makes full use of the information from electronic health records by modeling both the record frequency and transition process. In addition to providing unbiased estimates of the transition model, our model provides information of visiting frequencies under four different diabetes states that identified individuals who are less likely to visit their provider. We developed a novel approach to handle dependent censoring under the joint model, to account for the both the informative nature and interval censoring aspect of electronic health record data.

Data availability

No datasets were generated or analysed during the current study.

References

  1. James SL, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, Abbastabar H, Abd-Allah F, Abdela J, Abdelalim A. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2018;392(10159):1789–858.

    Article  Google Scholar 

  2. Kyu HH, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, Abbastabar H, Abd-Allah F, Abdela J, Abdelalim A. Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2018;392(10159):1859–922.

    Article  Google Scholar 

  3. ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, Collins BS, Hilliard ME, Isaacs D, Johnson EL. 6 Glycemic targets: standards of care in diabetes—2023. Diabetes Care. 2023;46(Supplement_1):S97-S110.

    Article  PubMed  Google Scholar 

  4. Williams JP, Storlie CB, Therneau TM Jr, CRJ, Hannig J,. A Bayesian approach to multistate hidden Markov models: application to dementia progression. J Am Stat Assoc. 2020;115(529):16–31.

    Article  CAS  Google Scholar 

  5. Dent KM, Kenneson A, Palumbos JC, Maxwell S, Eichwald J, White K, Mao R, Bale Jr JF, Carey JC. Methodology of a multistate study of congenital hearing loss: preliminary data from Utah newborn screening. Am J Med Genet Semin Med Genet. Wiley Online Library. 2004;125(1):28–34.

  6. Van Den Hout A. Multi-state survival models for interval-censored data. Boca Raton: CRC/Chapman & Hall; 2016.

  7. Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Stat Med. 1995;14(18):1975–83.

    Article  CAS  PubMed  Google Scholar 

  8. Zhu Y, Chiang CW, Wang L, Brock G, Milks MW, Cao W, Zhang P, Zeng D, Donneyong M, Li L: A multistate transition model for statin‐induced myopathy and statin discontinuation. CPT: Pharmacometr Syst Pharmacol 2021, 10(10):1236–1244.

  9. Samuelsen SO, Kongerud J. Interval censoring in longitudinal data of respiratory symptoms in aluminium potroom workers: a comparison of methods. Stat Med. 1994;13(17):1771–80.

    Article  CAS  PubMed  Google Scholar 

  10. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data: 2nd Edition. New York: Wiley; 2011.

  11. Sun J. The statistical analysis of interval-censored failure time data: Heidelberg: Springer; 2006.

  12. Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. J Am Stat Assoc. 1985;80(392):863–71.

    Article  Google Scholar 

  13. Gruger J, Kay R, Schumacher M. The validity of inferences based on incomplete observations in disease state models. Biometrics 1991;47:595–605.

  14. Staplin ND. Informative censoring in transplantation statistics. Doctoral Thesis, University of Southampton, School of Mathematics. 2012.

  15. Chen B, Yi GY, Cook RJ. Analysis of interval-censored disease progression data via multi-state models under a nonignorable inspection process. Stats Med. 2010;29(11):1175–89.

    Article  Google Scholar 

  16. Lange JM, Hubbard RA, Inoue LY, Minin VN. A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data. Biometrics. 2015;71(1):90–101.

    Article  PubMed  Google Scholar 

  17. Huang X, Wolfe RA. A frailty model for informative censoring. Biometrics. 2002;58(3):510–20.

    Article  PubMed  Google Scholar 

  18. Schneider S, Demarqui FN, Colosimo EA, Mayrink VD. An approach to model clustered survival data with dependent censoring. Biometr J. 2020;62(1):157–74.

    Article  Google Scholar 

  19. Finkelstein DM, Goggins WB, Schoenfeld DA. Analysis of failure time data with dependent interval censoring. Biometrics. 2002;58(2):298–304.

    Article  PubMed  Google Scholar 

  20. Zhang Z, Sun L, Sun J, Finkelstein DM. Regression analysis of failure time data with informative interval censoring. Stat Med. 2007;26(12):2533–46.

    Article  PubMed  Google Scholar 

  21. Hatami F, Ocampo A, Graham G, Nichols TE, Ganjgahi H. A scalable approach for continuous time Markov models with covariates. Biostatistics. 2024;25(3):681–701.

    Article  PubMed  Google Scholar 

  22. van den Hout A, Matthews FE. Multi-state analysis of cognitive ability data: a piecewise-constant model and a Weibull model. Stat Med. 2008;27(26):5440–55.

    Article  PubMed  Google Scholar 

  23. Andersen PK, Pohar Perme M. Inference for outcome probabilities in multi-state models. Lifetime Data Anal. 2008;14(4):405–31.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Pishro-Nik H: Introduction to probability, statistics, and random processes: Kappa Research, LLC Blue Bell, PA, USA; 2014.

  25. Gillespie DT. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. 1976;22(4):403–34.

    Article  CAS  Google Scholar 

  26. Zhu Y, Brock G, Li L. Uniformization and bounded Taylor series in Newton-Raphson method improves computational performance for a multistate transition model estimation and inference. Stat Methods Med Res. 2024;33(11–12):1901–19.

    Article  PubMed  Google Scholar 

  27. Gu Y, Zeng D, Heiss G, Lin D. Maximum likelihood estimation for semiparametric regression models with interval-censored multistate data. Biometrika. 2024;111(3):971–88.

    Article  PubMed  Google Scholar 

  28. Kind AJ, Buckingham WR. Making neighborhood-disadvantage metrics accessible—the neighborhood atlas. New Engl J Med. 2018;378(26):2456.

    Article  PubMed  Google Scholar 

  29. Hu J, Kind AJ, Nerenz D. Area deprivation index predicts readmission risk at an urban teaching hospital. Am J Med Qual. 2018;33(5):493–501.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Mayfield JA. Diagnosis and classification of diabetes mellitus: new criteria. Am Fam Phys. 1998;58(6):1355.

    CAS  Google Scholar 

  31. Pippitt K, Li M, Gurgle HE. Diabetes mellitus: screening and diagnosis. Am Fam Phys. 2016;93(2):103–9.

    Google Scholar 

  32. Bhatt R, van den Hout A, Pashayan N. A multistate survival model of the natural history of cancer using data from screened and unscreened population. Stat Med. 2021;40(16):3791–807.

    Article  PubMed  Google Scholar 

  33. Pinsky M, Karlin S. An introduction to stochastic modeling. Massachusetts: Academic press; 2010.

  34. Nazari M, Nazari SH, Zayeri F, Dehaki MG, Baghban AA. Estimating transition probability of different states of type 2 diabetes and its associated factors using Markov model. Prim Care Diabetes. 2018;12(3):245–53.

    Article  PubMed  Google Scholar 

  35. Gaillard T, Chen H, Effoe VS, Correa A, Carnethon M, Kalyani RR, Echouffo-Tcheugui JB, Joseph JJ, Bertoni AG. Glucometabolic State Transitions: The Jackson Heart Study. Ethnic Dis. 2022;32(3):203–12.

    Article  Google Scholar 

  36. Hwang J, Shon C. Relationship between socioeconomic status and type 2 diabetes: results from Korea National Health and Nutrition Examination Survey (KNHANES) 2010–2012. BMJ Open. 2014;4(8): e005710.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Suwannaphant K, Laohasiriwong W, Puttanapong N, Saengsuwan J, Phajan T: Association between socioeconomic status and diabetes mellitus: the National Socioeconomics Survey, 2010 and 2012. J Clin Diagn Res: JCDR 2017, 11(7):LC18.

  38. Rooney MR, Rawlings AM, Pankow JS, Tcheugui JBE, Coresh J, Sharrett AR, Selvin E. Risk of progression to diabetes among older adults with prediabetes. JAMA Intern Med. 2021;181(4):511–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. wrote the main manuscript, designed the study and performed the data analysis. G.B. wrote the main manuscript and designed the study. L.L. designed the study and reviewed the manuscript. J.J. reviewed the manuscript and revised introduction part. N.T. reviewed the manuscript and provided suggestions.

Corresponding authors

Correspondence to Yuxi Zhu or Guy Brock.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Joseph, J.J., Thomas, N. et al. Joint modeling of multistate survival processes with informative examination scheme: application to progressions in diabetes. BMC Med Res Methodol 25, 97 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-025-02543-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-025-02543-z

Keywords