me.test()
for a nonparametric test of equality of two distributions when the observed contaminated data follow the classical additive measurement error model. We want to test the hypothesis \(H_0: F_x=F_y\) when the observed data are \(D_w=\{{\mbox{$\mathbf W$}}_1, \dots, {\mbox{$\mathbf W$}}_{n_x}\}\) and \(D_v=\{{\mbox{$\mathbf V$}}_1, \dots, {\mbox{$\mathbf V$}}_{n_y}\}\), where \({\mbox{$\mathbf W$}}^T_j=(W_{j1}, \dots, W_{jm_x})\), \({\mbox{$\mathbf V$}}^T_k=(V_{k1}, \dots, V_{km_y})\) for \(j=1, \dots, n_x\) and \(k=1, \dots, n_y\), \(F_x\) and \(F_y\) are the cumulative distribution functions (CDFs) of \(X\) and \(Y\). We note that the observed \(W\)’s and \(V\)’s are the surrogates of the unobserved \(X\)’s and \(Y\)’s, and assume that \(m_x\geq 2\) and \(m_y\geq 2\), that is, we have multiple replicates for unobservable true value. By the classical additive measurement error model we meanfor \(j=1, \dots, n_x\), \(l=1, \dots, m_x\), \(k=1, \dots, n_y\), \(l'=1, \dots, m_y\). The measurement error \(U_{x, jl}\)’s are assumed to be iid, independent of \(X_j\), and follows the distribution \(F_{u_x}\) that is symmetric around \(0\). Similarly, we assume that the measurement error \(U_{y, kl}\) are iid, independent of \(Y_k\), and follows the distribution \(F_{u_y}\) that is symmetric around \(0\). Further, \(X\), \(Y\), \(U_x\) and \(U_y\) are assumed to be independent. It is important to note that \(\{X_1, \dots, X_{n_x}\}\) and \(\{Y_1, \dots, Y_{n_y}\}\) are never observed.
We will simulate true values and measurement errors from the following design: \[ X \sim {\rm Normal}(0, 1), Y \sim {\rm Normal}(0.2, 1) \mbox{ and }U_x, U_y \sim DE(0, 0.35), \] where \(DE(a, b)\) stands for the double exponential distribution with mean \(a\) and variance \(2b^2\). Note that the standard deviation of the measurement errors \(U_x\) and \(U_y\) is half of the variance of true signals.
library(smoothmest) ## To generate dobule exponential distribution
## Loading required package: MASS
set.seed(1234)
n <- 200
mx <- my <- 2
X <- rnorm(n, mean = 0, sd = 1)
Y <- rnorm(n, mean = 0.2, sd = 1)
Ux <- matrix(rdoublex(n*mx, mu = 0, lambda = 0.35), ncol = mx)
Uy <- matrix(rdoublex(n*my, mu = 0, lambda = 0.35), ncol = my)
W <- X + Ux
V <- Y + Uy
In this example, \(n_x = n_y = 200\) and \(m_x = m_y = 2\).
par(mfrow = c(1,2))
boxplot(W, main = "W")
boxplot(V, main = "V")
Now we load packages in the current workspace
library(statmod)
library(MEtest) # This package conducts the hypothesis test.
The package statmod
is necessary for the numerical integration. The function me.test()
in MEtest
has the following arguments:
W
: an \(m_x\) (>= 2) by \(n_x\) matrix of observations.
V
: an \(m_y\) (>= 2) by \(n_y\) matrix of observations.
B
: the number of bootstrap samples. Default is 1000.
wt
: type of the weight function and must be one of “Uniform” or “Normal”.
wt.bd
: lower and upper bound of the weight function. If wt.bd is not specified, bounds are computed based on the deconvoluted distribution function.
wt.prob
: probability used to compute lower and upper bound. Default value is 0.99 but this Will be ignored if wt.bd is provided.
nGL
: the number of nodes for Gaussian quadrature. Default value is 32.
Once we have W
and V
it is straighforwad to compute \(p\)-value for \(H_0\).
# Let's view the observed data
head(W)
## [,1] [,2]
## [1,] -0.3544622 -1.7772733
## [2,] 0.5700091 -0.2169628
## [3,] 0.7204120 1.6900299
## [4,] -2.2969413 -2.3486976
## [5,] -0.2219204 0.5253668
## [6,] 0.8506322 0.2758073
head(V)
## [,1] [,2]
## [1,] 0.49508086 -0.08912287
## [2,] 0.77135540 0.92884779
## [3,] -0.05501657 0.46879317
## [4,] 1.75113131 1.57563234
## [5,] 0.67080298 0.10767074
## [6,] 0.98045858 0.67460554
me.test(W, V, wt = "Uniform")
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 56.606, p-value = 0.006
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
##
## Homogeneity test under Measurement Error with Normal weight
##
## data: W and V
## T (Normal) = 18.232, p-value < 2.2e-16
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 46.34, p-value < 2.2e-16
## alternative hypothesis: True signals come from different distributions
The output of me.test()
contains test statistic, \(p\)-value, and information on the weight function.
In this example, we set \(n_x = 200\) and \(n_y = 300\) and \(m_x = 2\) and \(m_y = 3\). \[ X \sim {\rm Normal}(0, 1), Y \sim {\rm Normal}(0, 1) \mbox{ and }U_x, U_y \sim DE(0, 0.35), \]
set.seed(12345)
nx <- 200; ny <- 300
mx <- 2; my <- 3
X <- rnorm(nx, mean = 0, sd = 1)
Y <- rnorm(ny, mean = 0, sd = 1)
Ux <- matrix(rdoublex(nx*mx, mu = 0, lambda = 0.35), ncol = mx)
Uy <- matrix(rdoublex(ny*my, mu = 0, lambda = 0.35), ncol = my)
W <- X + Ux
V <- Y + Uy
## Observed data are W and V
head(W)
## [,1] [,2]
## [1,] -0.22771326 0.91312650
## [2,] 0.46886646 -0.09680206
## [3,] 1.67410510 0.06787347
## [4,] -0.70181978 0.80573355
## [5,] -0.01687766 -0.41074833
## [6,] -1.96293518 -1.65193027
head(V)
## [,1] [,2] [,3]
## [1,] -0.94299160 -1.1055595 -1.8662883
## [2,] -0.11082930 -0.5856071 -0.7795955
## [3,] -0.09263611 0.8909308 -0.7878444
## [4,] 1.18335963 1.4697445 1.5616781
## [5,] 1.62380283 2.1007738 1.1184517
## [6,] 0.10053012 0.3129272 0.6286603
me.test(W, V, wt = "Uniform")
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 13.165, p-value = 0.199
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
##
## Homogeneity test under Measurement Error with Normal weight
##
## data: W and V
## T (Normal) = 3.4635, p-value = 0.209
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 9.6725, p-value = 0.141
## alternative hypothesis: True signals come from different distributions
In this example, we set \(n_x = 200\) and \(n_y = 300\) and \(m_x = 2\) and \(m_y = 3\). \[ X \sim {\rm Normal}(0, 10^2), Y \sim {\rm Normal}(0, 10^2) \mbox{ and }U_x, U_y \sim DE(0, 3.5), \]
set.seed(12345)
nx <- 200; ny <- 300
mx <- 2; my <- 3
X <- rnorm(nx, mean = 0, sd = 10)
Y <- rnorm(ny, mean = 0, sd = 10)
Ux <- matrix(rdoublex(nx*mx, mu = 0, lambda = 3.5), ncol = mx)
Uy <- matrix(rdoublex(ny*my, mu = 0, lambda = 3.5), ncol = my)
W <- X + Ux
V <- Y + Uy
## Observed data are W and V
head(W)
## [,1] [,2]
## [1,] -2.2771326 9.1312650
## [2,] 4.6886646 -0.9680206
## [3,] 16.7410510 0.6787347
## [4,] -7.0181978 8.0573355
## [5,] -0.1687766 -4.1074833
## [6,] -19.6293518 -16.5193027
head(V)
## [,1] [,2] [,3]
## [1,] -9.4299160 -11.055595 -18.662883
## [2,] -1.1082930 -5.856071 -7.795955
## [3,] -0.9263611 8.909308 -7.878444
## [4,] 11.8335963 14.697445 15.616781
## [5,] 16.2380283 21.007738 11.184517
## [6,] 1.0053012 3.129272 6.286603
me.test(W, V, wt = "Uniform")
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 642770905, p-value = 0.713
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
##
## Homogeneity test under Measurement Error with Normal weight
##
## data: W and V
## T (Normal) = 15933, p-value = 0.767
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
##
## Homogeneity test under Measurement Error with Uniform weight
##
## data: W and V
## T (Uniform) = 2.27e+09, p-value = 0.517
## alternative hypothesis: True signals come from different distributions
Lee, D., Lahiri, S. and Sinha, S. A test of homogeneity of distributions when observations are subject to measurement errors. Under revision.