This package contains functions to estimate the regression parameters \(\beta= (\beta_1,\beta_2^T)^T\) for the semiparametric transformation model using imputation method, naive method and regression calibration methods. Standard errors for these parameter estimates are also provided. The linear transformation model is given by
\(H(T)= - X\beta_1 - Z^T\beta_2 + e\),
where \(T\) is the interval-censored time-to-event, \(H\) is an unknown monotone transformation function on \((0,\infty)\) with \(H(T)\to -\infty\) as \(T\to 0\), \(X\) is the covariate measured with error and \(Z\) is the error-free covariate, \(e\) is the error with a completely known distribution function and is independent of both \(X\) and \(Z\).
The observed data are collected from \(n\) independent subjects randomly drawn from an underlying population. The data from the \(i\)th subject is of the form \((L_i, R_i, \Delta_i, W_{i1}, \dots, W_{im}, Z_i)\), \(i=1, \dots, n\). For the \(i\)th subject, if the right censoring indicator \(\Delta_i=1\), then the time-to-event \(T_i\) satisfies \(L_i<T_i\leq R_i\), and when \(\Delta_i=0\), the subject is right censored above \(L_i\), and \(L_i<T_i <\infty\). Here \(Z_i\) is a \(p\times 1\) vector of error free covariates, and \(W_{i1}, \dots, W_{im}\) denote the replicated measurements on a surrogate variable for \(X\). The surrogacy implies that conditional on the true covariates, the surrogate variable is independent of the response \(T\). We assume that the observed surrogate \(W\) is related to the unobserved covariate \(X\) through the additive measurement error model,
\(W_{ij}=X_i+U_{ij}\), \(j=1, \dots, m\),
where measurement error \(U_{ij}\)’s are assumed to be independently and identically distributed (iid) \({\rm Normal}(0, \sigma^2_u)\) variables. Furthermore, we assume that \(U\) is completely independent of other observed variables.
In the following example, we will simulate a dataset using the true parameter values \(\beta_1= -1\) and \(\beta_2= 1\) and then use the package to get the estimates of the parameters and their standard errors using the naive (NV), regression calibration (RC) and imputation (IM) methods.
The following function is used in generating epsilon from its CDF. Here, a choice of \(r=0\) generates Cox PH model and \(r=1\) generates Proportional Odds model.
rsep= function(u,r)
{
if(r==0)
return( log(-log(1-u)) )
else
return( log((exp(-r*log(1-u))-1)/r) )
}
n= 200 # sample size
m= 10 # number of failure-time imputations
nrep= 2 # number of repeated measurements for W
gridlen= 0.1 # grid length to divide time intervals
result= NULL
set.seed(15)
# x= rnorm(n, mean= 0, sd=1) # symmetric true X
x= (rgamma(n,shape=2,scale=2)-4)/sqrt(8) # skewed true X
z2= rbinom(n,1,0.5) # error-free covariate Z
Generating the random error terms:
ugen= runif(n)
rfix= 0.0 #characterizes the error-distribution
ep= rsep(ugen,rfix)
truebeta= c(-1,1)
logt= -truebeta[1]*x -truebeta[2]*z2 + ep + 3
ttime= exp(logt) #true failure time
cen= runif(n,0,0.0001)
Creating the intervals \((L_i,R_i],\; i=1, \dots, n\)
len= 0.12
taumat= matrix(0,n,10)
taumat[,10]= 9000000000 # a large value used to denote infinity for right-censored observations;
# if you're changing this, change rcpos below
taumat[,2]= cen
for(i1 in 3:9)
taumat[,i1]= taumat[,2]+(i1-1)*len
right1= rep(0,n)
left= rep(0,n)
for(i2 in 1:n)
{
lenleft= length(which(taumat[i2,2:9]<ttime[i2]))
leftpot= rep(0,8)
leftpot[1:lenleft]= 1
missvec1= c(rbinom(4,1,0.7),rbinom(4,1,0.5))
left[i2]= max(leftpot*missvec1*taumat[i2,2:9])
if(left[i2]==0)
left[i2]= taumat[i2,2]
lenright= length(which(taumat[i2,2:10]>ttime[i2]))
rightpot= rep(0,9)
rightpot[(9-lenright+1):9]= 1
missvec2= c(rbinom(4,1,0.7),rbinom(4,1,0.5),1)
right1temp= rightpot*missvec2*taumat[i2,2:10]
right1[i2]= min(right1temp[right1temp!=0])
}
Identifying the positions of the right-censored observations:
rcpos= which(right1==9000000000)
lrcpos= length(rcpos)
notrcpos= (1:n)[-rcpos]
deltatemp= rep(1,n) # del=1 are uncensored observations
deltatemp[rcpos]= 0
k= sum(deltatemp) # number of data points that are not right censored
Generating measurement error from a skewed distribution:
umat= matrix((rgamma(n*nrep,shape=2,scale=2)-4)*0.70711/sqrt(8),n,nrep)
wmat= x+umat
Creating the data matrix and specifying the number of \(T\) and \(X\) imputations:
datamat= cbind(left,right1,deltatemp,z2)
ntimp= m
nximp= 20
Loading the package in the current workspace
library(icemelt)
The final step is feeding the arguments into the function:
datamat: the datamatrix with columns ‘left time’, ‘right time’, ‘right censoring indicator’ and ‘error-free covariate’ strictly in that order
wmat: the matrix of replicated surrogate measurements of the error-prone covariate
rfix: a real number to specify a particular model from the general class of models included within the linear transformation models (for example: rfix=0 denotes Cox PH model and rfix=1 denotes Proportional Odds model)
gridlen: distance between gridpoints in \(T\) imputation (smaller values indicate more grid points between the left and right ends of the interval)
ntimp: number of \(T\) imputations
nximp: number of \(X\) imputations
head(datamat) # Just seeing the input objects
## left right1 deltatemp z2
## [1,] 0.9600175 9e+09 0 0
## [2,] 0.9600031 9e+09 0 0
## [3,] 0.9600308 9e+09 0 0
## [4,] 0.6000660 9e+09 0 1
## [5,] 0.8400124 9e+09 0 1
## [6,] 0.9600351 9e+09 0 0
head(wmat)
## [,1] [,2]
## [1,] -0.6308733 -0.4966032
## [2,] 1.5712982 1.1165383
## [3,] 0.1825023 -0.9672121
## [4,] -0.5859009 -1.0583013
## [5,] -1.6817932 -1.6694583
## [6,] -0.2043142 -0.5999319
rfix
## [1] 0
gridlen
## [1] 0.1
ntimp
## [1] 10
nximp
## [1] 20
out.im.ph=im(datamat, wmat, rfix, gridlen, ntimp, nximp) # Since rfix=0, we fit a PH models
out.im.ph
## $beta1.est
## [1] -0.835492
##
## $beta2.est
## [1] 0.3013384
##
## $beta1.sd
## [1] 0.4055803
##
## $beta2.sd
## [1] 0.557459
The im() function returns the following estimates and standard errors when a proportional hazard model is fitted to the dataset:
IM estimate of \(\beta_1\) = -0.8354
IM estimate of \(\beta_2\) = 0.3013
Estimated standard error of \(\beta_1\) = 0.4055
Estimated standard error of \(\beta_2\) = 0.5574
Next we use the rc() and nv() functions to calculate the regression calibration and naive estimates respectively.
out.im.po=im(datamat, wmat, 1, gridlen, ntimp, nximp) # Since rfix=1, we fit a PO models
out.im.po
## $beta1.est
## [1] -0.7732787
##
## $beta2.est
## [1] 0.3492097
##
## $beta1.sd
## [1] 0.384812
##
## $beta2.sd
## [1] 0.5824028
out.im.general=im(datamat, wmat, 2, gridlen, ntimp, nximp) # Note that rfix=2 yields a general model
out.im.general
## $beta1.est
## [1] -0.8459139
##
## $beta2.est
## [1] 0.3877627
##
## $beta1.sd
## [1] 0.4387852
##
## $beta2.sd
## [1] 0.6162083
out.rc.ph=rc(datamat, wmat, rfix, gridlen, ntimp)
out.rc.ph
## $beta1.est
## [1] -0.7944614
##
## $beta2.est
## [1] 0.3290693
##
## $beta1.sd
## [1] 0.3690732
##
## $beta2.sd
## [1] 0.5511381
The rc() function returns the following estimates and standard errors:
RC estimate of \(\beta_1\) = -0.7944
RC estimate of \(\beta_2\) = 0.3298
Estimated standard error of \(\beta_1\) = 0.3687
Estimated standard error of \(\beta_2\) = 0.5510
out.nv.ph=nv(datamat, wmat, rfix, gridlen, ntimp)
out.nv.ph
## $beta1.est
## [1] -0.6806057
##
## $beta2.est
## [1] 0.320266
##
## $beta1.sd
## [1] 0.3159862
##
## $beta2.sd
## [1] 0.5510121
The nv() function returns the following estimates and standard errors:
NV estimate of \(\beta_1\) = -0.6803
NV estimate of \(\beta_2\) = 0.3194
Estimated standard error of \(\beta_1\) = 0.3161
Estimated standard error of \(\beta_2\) = 0.5508
The functions imw1() and rcw1() are used when only one replication of the surrogate variable \(W\) is available. While everything else remain same as the example provided above, these functions require a known value for the measurement error variance because it is no longer possible to estimate it from only one replication. Note that the nv() function is capable of handling the single replication scenario. We use the same example as above after making these few modifications. The changed lines are highlighted.
The following function is used in generating epsilon from its CDF. Here, a choice of \(r=0\) generates Cox PH model and \(r=1\) generates Proportional Odds model.
n= 200 # sample size
m= 10 # number of failure-time imputations
nrep= 1 # number of repeated measurements for $W$
sigma2u= 0.5 # measurement error variance
gridlen= 0.1 # grid length to divide time intervals
result= NULL
set.seed(15)
# x= rnorm(n, mean= 0, sd=1) # symmetric true X
x= (rgamma(n,shape=2,scale=2)-4)/sqrt(8) # skewed true X
z2= rbinom(n,1,0.5) # error-free covariate Z
Generating the random error terms:
ugen= runif(n)
rfix= 0.0 # characterizes the error-distribution
ep= rsep(ugen,rfix)
truebeta= c(-1,1)
logt= -truebeta[1]*x -truebeta[2]*z2 + ep + 3
ttime= exp(logt) # true failure time
cen= runif(n,0,0.0001)
Creating the intervals \((L_i,R_i],\; i=1, \dots, n\)
len= 0.12
taumat= matrix(0,n,10)
taumat[,10]= 9000000000 # a large value used to denote infinity for right-censored observations;
# if you're changing this, change rcpos below\\
taumat[,2]= cen
for(i1 in 3:9)
taumat[,i1]= taumat[,2]+(i1-1)*len
right1= rep(0,n)
left= rep(0,n)
for(i2 in 1:n)
{
lenleft= length(which(taumat[i2,2:9]<ttime[i2]))
leftpot= rep(0,8)
leftpot[1:lenleft]= 1
missvec1= c(rbinom(4,1,0.7),rbinom(4,1,0.5))
left[i2]= max(leftpot*missvec1*taumat[i2,2:9])
if(left[i2]==0)
left[i2]= taumat[i2,2]
lenright= length(which(taumat[i2,2:10]>ttime[i2]))
rightpot= rep(0,9)
rightpot[(9-lenright+1):9]= 1
missvec2= c(rbinom(4,1,0.7),rbinom(4,1,0.5),1)
right1temp= rightpot*missvec2*taumat[i2,2:10]
right1[i2]= min(right1temp[right1temp!=0])
}
Identifying the positions of the right-censored observations:
rcpos= which(right1==9000000000)
lrcpos= length(rcpos)
notrcpos= (1:n)[-rcpos]
deltatemp= rep(1,n) # del=1 are uncensored observations
deltatemp[rcpos]= 0
k= sum(deltatemp) # number of data points that are not right censored
Generating measurement error from a skewed distribution:
umat= matrix((rgamma(n*nrep,shape=2,scale=2)-4)*0.70711/sqrt(8),n,nrep)
wmat= x+umat
Creating the data matrix and specifying the number of \(T\) and \(X\) imputations
datamat= cbind(left,right1,deltatemp,z2)
ntimp= m
nximp= 20
The final step is feeding the arguments into the function:
datamat: the datamatrix with columns ‘left time’, ‘right time’, ‘right censoring indicator’ and ‘error-free covariate’ strictly in that order
wmat: a matrix with of replicated surrogate measurements of the error-prone covariate
rfix: a real number to specify a particular model from the general class of models included within the linear transformation models (for example: rfix=0 denotes Cox PH model and rfix=1 denotes Proportional Odds model)
gridlen: distance between gridpoints in \(T\) imputation (smaller values indicate more grid points between the left and right ends of the interval)
ntimp: number of \(T\) imputations
nximp: number of \(X\) imputations
sigma2u: a known value for the measurement error variance
head(datamat) # Just checking the input objects
## left right1 deltatemp z2
## [1,] 0.9600175 9e+09 0 0
## [2,] 0.9600031 9e+09 0 0
## [3,] 0.9600308 9e+09 0 0
## [4,] 0.6000660 9e+09 0 1
## [5,] 0.8400124 9e+09 0 1
## [6,] 0.9600351 9e+09 0 0
head(wmat)
## [,1]
## [1,] -0.6308733
## [2,] 1.5712982
## [3,] 0.1825023
## [4,] -0.5859009
## [5,] -1.6817932
## [6,] -0.2043142
rfix
## [1] 0
gridlen
## [1] 0.1
ntimp
## [1] 10
nximp
## [1] 20
sigma2u
## [1] 0.5
out.im1.ph=imw1(datamat, wmat, rfix, gridlen, ntimp, nximp, sigma2u)
out.im1.ph
## $beta1.est
## [1] -1.004976
##
## $beta2.est
## [1] 0.3118699
##
## $beta1.sd
## [1] 0.7496049
##
## $beta2.sd
## [1] 0.5665249
The imw1() function returns the following estimates and standard errors:
IM estimate of \(\beta_1\) = -1.004
IM estimate of \(\beta_2\) = 0.3118
Estimated standard error of \(\beta_1\) = 0.7496
Estimated standard error of \(\beta_2\) = 0.5665
Next we use the rcw1() and nv() functions to calculate the regression calibration and naive estimates respectively.
out.rc1.ph=rcw1(datamat, wmat, rfix, gridlen, ntimp, sigma2u)
out.rc1.ph
## $beta1.est
## [1] -0.7239568
##
## $beta2.est
## [1] 0.3386628
##
## $beta1.sd
## [1] 0.3906076
##
## $beta2.sd
## [1] 0.5500546
The rcw1() function returns the following estimates and standard errors:
RC estimate of \(\beta_1\) = -0.7239
RC estimate of \(\beta_2\) = 0.3386
Estimated standard error of \(\beta_1\) = 0.3906
Estimated standard error of \(\beta_2\) = 0.5500
out.nv.ph=nv(datamat, wmat, rfix, gridlen, ntimp)
out.nv.ph
## $beta1.est
## [1] -0.5086121
##
## $beta2.est
## [1] 0.3419563
##
## $beta1.sd
## [1] 0.2734373
##
## $beta2.sd
## [1] 0.5485249
The nv() function returns the following estimates and standard errors:
NV estimate of \(\beta_1\) = -0.5086
NV estimate of \(\beta_2\) = 0.3419
Estimated standard error of \(\beta_1\) = 0.2734
Estimated standard error of \(\beta_2\) = 0.5485