Functional data analysis is commonly used to analyze high-dimensional data that appear as functions or images [1] [2] [3]. Functional data analysis can be used to analyzed data collected continuously or frequently over a given time period that appear as complex high dimensional functions of time. When assessing how both functional and scalar-valued covariates influence scalar-valued outcomes, scalar-on-function regression models may be used. Self-reported measures, such as dietary intake, are well known to be prone to measurement error[4], and recent studies have indicated that even the relatively more objective data obtained from wearable devices is prone to measurement error [5] [6]. It has been demonstrated that similar to scalar-valued covariates prone to measurement error, the failure to correct for biases due to measurement error associated with functional data also leads to biased estimates
We develop an R package MECfda for solving scalar-on-function regression models including multi-level generalized scalar-on-function regression models and functional quantile linear regression models measurement error corrections using various bias reduction techniques in these models.
The general form of scalar-on-function linear regression model is $$T(F_{Y_i|X_i,Z_i}) = \sum_{l=1}^{L} \int_{\Omega_l} \beta_l(t) X_{li}(t) dt + (1,Z_i^T)\gamma$$ where Yi represents the scalar-valued response variable, Xli(t) represents the l-th function-valued covariate (l = 1, …, L), Ωl is the domain of the l-th function-valued covariate, βl is a function-valued parameter, βl and Xli (t ∈ Ωl) are in L2(Ωl), Zi = (Z1i, …, Zqi)T represents the scalar valued covariates, γ = (γ0, γ1, …, γq)T represents scalar-valued parameter, FYi|Xi, Zi represents the cumulative distribution function of Yi given Xi and Zi, and T(⋅) is a statistical functional. A statistical functional is defined as a mapping from probability distribution to real numbers. Statistical functionals represent quantifiable properties of probability distributions, such as expectation and variance. The following gives forms of the statistical functionals in the two types of scalar-on-function linear regression models that can be solved using the algorithms included in MECfda package.
In multi-level generalized scalar-on-function generalized linear regression models, T(FYi|Xi, Zi) = g{∫ℝydFYi|Xi, Zi(y)} = g(E[Yi|Xi, Zi]), where g(⋅) is a link function as in generalized linear model. The only restriction of g(⋅) is that it should be strictly monotonic.
In scalar-on-function quantile linear regression models, T(FYi|Xi, Zi) = QYi|Xi, Zi(τ) = FYi|Xi, Zi−1(τ), where FYi|Xi, Zi−1(⋅) is the inverse of FYi|Xi, Zi(⋅), τ ∈ (0, 1).
Function-valued variables (functional variables) are usually recorded as the value of the functions at certain (time) points in its domain. The data of a function-valued variable are often presented in the form of a matrix, (xij)n × m, where xij = fi(tj), represents the value of fi(tj), each row represents an observation (subject), and each column corresponds to a measurement (time) point.
functional_variable
In the MECfda package, we have an s4 class,
functional_variable
, that represents the data of a
function-valued variable in a matrix form. The class has four slots. The
slot X
, represents matrix (xij)n × m.
The slots t_0
and period
represent the left
end and length of the domain of the function-valued variable,
respectively. The slot t_points
represents the (time)
points that the functional variable is measured. A method,
dim
, is provided to return the number of subjects and
measurement (time) points for a functional_variable
object.
{ρk}k = 1∞ is a basis of L2(Ω). For an arbitrary function-valued parameter β(⋅) ∈ L2(Ω), there exists a number sequence {ck}k = 1∞ such that $$\beta(t) = \sum_{k=1}^\infty c_{k}\rho_{k}(t)$$ and for a function-valued variable, Xi(t), $$\int_\Omega\beta(t) X_i(t) dt = \int_\Omega X_i(t) \sum_{k=1}^\infty c_{k}\rho_{k}(t) dt = \sum_{k=1}^\infty c_{k} \int_\Omega \rho_{k}(t) X_i(t) dt $$ We perform a truncation for the infinite basis sequence, then $$\int_\Omega\beta(t) X_i(t) dt \approx \sum_{k=1}^p c_{k} \int_\Omega X_i(t) \rho_{k}(t) dt$$ where p < ∞. For statistical models with parts in the form of ∫Ωβ(t)Xi(t)dt, we use ck∫ΩXi(t)ρk(t)dt to approximate ∫Ωβ(t)Xi(t)dt and treat ∫Ωρk(t)Xi(t)dt as the variable. In practice, we usually choose the number of basis, p based on the sample size, n, and the p is in a form of p(n). The scalar-on-function linear model is converted to an ordinary scalar-on-scalar linear model, and the problem of estimating β(t) is converted to estimating ck, k = 1, …p.
In practice, we may not necessarily use the truncated complete basis of L2 function space. Instead, we can just use a finite sequence of linearly independent functions as the basis function.
Performing basis expansion methods on function-valued variable data in matrix form as we mentioned is to compute (bik)n × p, where bik = ∫Ωf(t)ρk(t)dt.
Usually the domain Ω of a function-valued variable {X(t), t ∈ Ω} is an interval. The commonly used basis sequence types includes the Fourier basis, b-splines basis, and eigen function basis.
The Fourier basis of L2([t0, t0 + T]) consists of $$\frac{1}{2},\ \cos(\frac{2\pi}{T}k[x-t_0]),\ \sin (\frac{2\pi}{T}k[x-t_0])$$ where k = 1, …, ∞.
Fourier_series
In the MECfda package, we have an s4 class,
Fourier_series
that represents the linear combination of
Fourier basis functions $$\frac{a_0}{2} +
\sum_{k=1}^{p_a} a_k \cos{(\frac{2\pi}{T}k(x-t_0))} +
\sum_{k=1}^{p_b} b_k \sin{(\frac{2\pi}{T}k(x-t_0))},\qquad
x\in[t_0,t_0+T].$$ The slot double_constant
is the
value of a0. The
slot cos
contains the coefficient value of cos waves, ak. The slot
sin
contains the coefficient value of sin waves, bk. The slot
k_cos
contains the values of k that correspond to the
coefficients of cos waves. The slot
k_sin
contains the values of k that correspond to the
coefficients of sin waves. The slot
t_0
is the left end of the domain interval t0. The slot
period
is length of the domain interval T.
The method plot
for class Fourier_series
is
provided to generate a visual representation of the summation function.
The method FourierSeries2fun
is provided to compute the
value of the summation function. The argument object
should
be an object of Fourier_series
class. The argument
x
is the value of independent variable x. The method
extractCoef
is provided to extract the Fourier coefficients
of Fourier_series
class object.
fsc = Fourier_series(
double_constant = 3,
cos = c(0,2/3),
sin = c(1,7/5),
k_cos = 1:2,
k_sin = 1:2,
t_0 = 0,
period = 1
)
plot(fsc)
FourierSeries2fun(fsc,seq(0,1,0.05))
#> [1] 2.1666667 3.1712610 3.6252757 3.4344848 2.7346112 1.8333333
#> [7] 1.0888125 0.7715265 0.9623175 1.5254623 2.1666667 2.5532270
#> [13] 2.4497052 1.8164508 0.8324982 -0.1666667 -0.8133005 -0.8465074
#> [19] -0.2132530 0.9074283 2.1666667
extractCoef(fsc)
#> $a_0
#> 0
#> 3
#>
#> $a_k
#> 1 2
#> 0.0000000 0.6666667
#>
#> $b_k
#> 1 2
#> 1.0 1.4
The object fsc
represents the summation $$\frac32 + \frac23 \cos(2\pi\cdot2x) + \sin(2\pi
x) + \frac75\sin(2\pi\cdot2x).$$
A b-spline basis {Bi, p(x)}i = −pk
on the interval [t0, tk + 1]
is defined as $$B_{i,0}(x) = \left\{
\begin{aligned}
&I_{(t_i,t_{i+1}]}(x), & i = 0,1,\dots,k\\
&0, &i<0\ or\ i>k
\end{aligned}
\right.$$ $$B_{i,r}(x) = \frac{x -
t_{i}}{t_{i+r}-t_{i}} B_{i,r-1}(x) + \frac{t_{i+r+1} - x}
{t_{i+r+1} - t_{i+1}}B_{i+1,r-1}(x)$$ For all discontinuity
points of Bi, r
(r > 0) in the interval
(t0, tk),
let the value equals its limit, which means Bi, r(x) = limt → xBi, r(t).
The slot Boundary.knots
is the boundary of the spline
domain (start and end), which is t0 and tk + 1; the
default is [0, 1]. The slot
knots
represents the spline knots, which is (t1, …, tk),
an equally spaced sequence is chosen by the function automatically
($t_j = t_0 +
j\cdot\frac{t_{k+1}-t_0}{k+1}$) when not assigned. The slot
intercept
is used to define whether an intercept is
included in the basis; the default value is TRUE, and must be TRUE. The
slot df
is the degrees of freedom of the basis, which is
the number of the splines, equal to p + k + 1. By default k = 0 and df = p + 1. The slot
degree
is the degree of the splines, which is the degree of
piecewise polynomials, p; the
default value is 3. See degree in bs.
bspline_basis
and
bspline_series
In the MECfda package, we have an s4 class,
bspline_basis
that represents a b-spline basis, {Bi, p(x)}i = −pk,
on the interval [t0, tk + 1].
We also have an s4 class, bspline_series
, that represents
the summation $\sum_{i=0}^{k}b_i
B_{i,p}(x)$. The slot coef
contains the b-spline
coefficients, bi. (i = 0, …, k) The slot
bspline_basis
is a bspline_basis
object, that
represents the b-spline basis used, {Bi, p(x)}i = −pk.
The method plot
for class bspline_basis
is
provided to generate a visual representation of the summation function.
The method bsplineSeries2fun
is provided to compute the
value of the summation function. The argument object
should
be an object of the bspline_series
class. The argument
x
is the value of the independent variable x.
bsb = bspline_basis(
Boundary.knots = c(0,24),
df = 7,
degree = 3
)
bss = bspline_series(
coef = c(2,1,3/4,2/3,7/8,5/2,19/10),
bspline_basis = bsb
)
plot(bss)
bsplineSeries2fun(bss,seq(0,24,0.5))
#> [1] 2.0000000 1.7677509 1.5690908 1.4011502 1.2610597 1.1459499 1.0529514
#> [8] 0.9791948 0.9218107 0.8779297 0.8446824 0.8191993 0.7986111 0.7805266
#> [15] 0.7644676 0.7504340 0.7384259 0.7284433 0.7204861 0.7145544 0.7106481
#> [22] 0.7087674 0.7089120 0.7110822 0.7152778 0.7216857 0.7312404 0.7450629
#> [29] 0.7642747 0.7899969 0.8233507 0.8654574 0.9174383 0.9804145 1.0555073
#> [36] 1.1438380 1.2465278 1.3634786 1.4897151 1.6190430 1.7452675 1.8621942
#> [43] 1.9636285 2.0433759 2.0952418 2.1130317 2.0905511 2.0216053 1.9000000
The object bsb
represents {Bi, 3(x)}i = −30,
and the object bss
represents
$$2B_{i,-3}(x)+B_{i,-2}(x)+\frac34B_{i,-1}(x)+\frac23B_{i,0}(x)
+ \frac78B_{i,1}(x) +
\frac52B_{i,2}(x) +\frac{19}{10}B_{i,3}(x),$$ where x ∈ [t0, t4]
and t0 = 0, tk = tk − 1 + 6
,k = 1, 2, 3, 4.
A generic function, basis2fun
, is provided for the class
Fourier_series
and bspline_series
. When
applied to a Fourier_series
object, it is equivalent to
method FourierSeries2fun
. When applied to a
bspline_series
object, it is equivalent to method
bsplineSeries2fun
.
basis2fun(fsc,seq(0,1,0.05))
#> [1] 2.1666667 3.1712610 3.6252757 3.4344848 2.7346112 1.8333333
#> [7] 1.0888125 0.7715265 0.9623175 1.5254623 2.1666667 2.5532270
#> [13] 2.4497052 1.8164508 0.8324982 -0.1666667 -0.8133005 -0.8465074
#> [19] -0.2132530 0.9074283 2.1666667
basis2fun(bss,seq(0,24,0.5))
#> [1] 2.0000000 1.7677509 1.5690908 1.4011502 1.2610597 1.1459499 1.0529514
#> [8] 0.9791948 0.9218107 0.8779297 0.8446824 0.8191993 0.7986111 0.7805266
#> [15] 0.7644676 0.7504340 0.7384259 0.7284433 0.7204861 0.7145544 0.7106481
#> [22] 0.7087674 0.7089120 0.7110822 0.7152778 0.7216857 0.7312404 0.7450629
#> [29] 0.7642747 0.7899969 0.8233507 0.8654574 0.9174383 0.9804145 1.0555073
#> [36] 1.1438380 1.2465278 1.3634786 1.4897151 1.6190430 1.7452675 1.8621942
#> [43] 1.9636285 2.0433759 2.0952418 2.1130317 2.0905511 2.0216053 1.9000000
Suppose K(s, t) ∈ L2(Ω × Ω), f(t) ∈ L2(Ω). Then K induces an linear operator 𝒦 by (𝒦f)(x) = ∫ΩK(t, x)f(t)dt If ξ(⋅) ∈ L2(Ω) such that 𝒦ξ = λξ where λ ∈ C, we call ξ an eigenfunction/eigenvector of 𝒦, and λ an eigenvalue associated with ξ.
All the eigenfunctions of 𝒦 make a basis of L2(Ω). We call the basis induced by K(s, t) = Cov(X(t), X(s)) a functional principal component (FPC) basis, where {X(t), t ∈ Ω} is a stochastic process.
functional_variable
classThe MECfda pakcage provide the methods
fourier_basis_expansion
and
bspline_basis_expansion
for the class
functional_variable
, which perform basis expansion using
Fourier basis and b-spline basis respectively.
We use $$\frac{1}{|T|}\sum_{t\in T} \rho_{k}(t) X_{i}(t)$$ to compute the integral ∫Ωρk(t)Xi(t)dt where T is the measurement (time) points of Xi(t), and |T| represents the cardinal number of T.
fcRegression(Y, FC, Z, formula.Z, family = gaussian(link = "identity"),
basis.type = c("Fourier", "Bspline"), basis.order = 6L,
bs_degree = 3)
The MECfda package provides a function,
fcRegression
, to fit generalized linear mixed effect
models, including ordinary linear models, and generalized linear models
with fixed and random effects, using basis expansion to discretize the
function-valued covariates. The function fcRegression
can
solve a linear model in the following form: $$g(E(Y_i|X_i,Z_i)) = \sum_{l=1}^{L}
\int_{\Omega_l} \beta_l(t) X_{li}(t) dt + (1,Z_i^T)\gamma.$$ The
function can allow one or multiple function-valued covariates to be used
as fixed effects, and zero, one, or multiple scalar-valued covariates to
be used as fixed or random effects. Response variable, function-valued
covariates, and scalar-valued covariates are input separately as three
different arguments, Y
, FC
, and
Z
, respectively. The format of the input data can be very
flexible. For response variable, the input format can be an atomic
vector, a one-column matrix or data frame. The recommended form is a
one-column data frame or matrix with a column name, because in this
case, the name of response variable is specified. For function-valued
covariates, a functional_variable
object or a matrix or a
data frame or a list of these objects can be accepted as input data.
When one functional_variable
object, matrix or, data frame
is input as argument FC
, only one function-valued covariate
is included in the model. When a list of these objects is input as
argument FC
, the model can have multiple function-valued
covariates, with each element of the list corresponding to a different
function-valued covariate. For of scalar-valued covariates, a matrix,
data frame, atomic vector, NULL
, or, no input can be
accepted as input data. When argument Z
is not assigned, no
scalar-valued covariate is included in the model and argument
formula.Z
should also be NULL
, or no input.
When an atomic vector is input as argument Z
, only one
scalar-valued is included in the model, and, the name of the
scalar-valued covariate is not specified. So even if only one
scalar-valued covariate is included in the model, a matrix or data frame
with column name is recommended as the input for argument
Z
. The argument formula.Z
is used to specify
which part of the argument Z
is used and whether to treat
scalar-valued covariates as fixed effects or random effects. The
argument family
can specify the distribution type of the
response variable and link function to be used in the regression. The
argument basis.type
indicate the type of basis function to
be used in basis expansion process. Available options are ‘Fourier’ and
‘Bspline’, representing the Fourier basis and the b-spline basis,
respectively. The argument basis.order
indicates the number
of function basis to be used. When using the Fourier basis $\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f$, basis.order
is the number pf. When using
the b-splines basis {Bi, p(x)}i = −pk,
basis.order
is the number of splines, equal to k + p + 1. The argument
bs_degree
specifies the degree of the piecewise polynomials
of the b-spline basis function if using the b-splines basis, and is only
needed when using the b-spline basis.
The function fcRegression
returns an object of s3 class
fcRegression
. as a list that containing the following
elements.
We can use the method predict
to get a predicted value
from the model. We can use the method fc.beta
to get the
values of function-valued linear coefficient parameters, βl(t).
data(MECfda.data.sim.0.0)
res = fcRegression(FC = MECfda.data.sim.0.0$FC,
Y=MECfda.data.sim.0.0$Y,
Z=MECfda.data.sim.0.0$Z,
family = gaussian(link = "identity"),
basis.order = 5, basis.type = c('Bspline'),
formula.Z = ~ Z_1 + (1|Z_2))
t = (0:100)/100
plot(x = t, y = fc.beta(res,1,t), ylab = expression(beta[1](t)))
fcQR(Y, FC, Z, formula.Z, tau = 0.5, basis.type = c("Fourier", "Bspline"),
basis.order = 6L, bs_degree = 3)
The MECfda package provides a function,
fcQR
, for fitting quantile linear regression models. The
method for addressing function-valued covariates is discretization using
basis expansion. The function fcQR
can solve a linear model
in the following form: $$Q_{Y_i|X_i,Z_i}(\tau) =
\sum_{l=1}^L\int_{\Omega_l} \beta_l(\tau,t) X_{li}(t) dt
+ (1,Z_i^T)\gamma.$$ The function allows one or multiple
function-valued covariates, and zero, one, or multiple scalar-valued
covariates. Data input occurs as described for the function
fcRegression
. The treatment of the scalar-valued covariates
is specified by the argument formula.Z
, similar to the
function fcRegression
. The only difference between function
fcQR and function fcRegression is that the quantile linear regression
model does not include a random effect. The quantile τ is specified by the argument
tau
. The type and parameters of the basis function are
specified bythe argument basis.type
,
basis.order
, and bs_degree
, as described for
the function fcRegression
.
The function fcQR
returns an object of s3 class
fcQR
, as a list that contains containing the following
elements.
We can use the method predict
to obtain a predicted
value from the model and use the method fc.beta
to obtain
the values of function-valued linear coefficient parameters βl(t).
Data collected in real- world settings often include measurement error, especially function-valued data. Measurement error in a data set may result in estimation bias. The *MECfda** package also provides bias correction estimation method functions for certain linear regression models for use with data containing measurement error.
ME.fcRegression_MEM(
data.Y,
data.W,
data.Z,
method = c("UP_MEM", "MP_MEM", "average"),
t_interval = c(0, 1),
t_points = NULL,
d = 3,
family.W = c("gaussian", "poisson"),
family.Y = "gaussian",
formula.Z,
basis.type = c("Fourier", "Bspline"),
basis.order = NULL,
bs_degree = 3,
smooth = FALSE,
silent = TRUE
)
Wearable monitoring devices permit the continuous monitoring of
biological processes, such as blood glucose metabolism, and behaviors,
such as sleep quality and physical activity.
Continuous monitoring often collect data in 60-second epochs over
multiple days, resulting in high-dimensional multi-level longitudinal
curves that are best described and analyzed as multi-level functional
data.
Although researchers have previously addressed measurement error in
scalar covariates prone to error, less work has been done for correcting
measurement error in multi-level high dimensional curves prone to
heteroscedastic measurement error. Therefore, Luan et. al. proposed
mixed effects-based-model-based (MEM) methods for bias correction due to
measurement error in multi-level functional data from the exponential
family of distributions that are prone to complex heteroscedastic
measurement error.
They first developed MEM methods to adjust for biases due to the
presence of measurement error in multi-level generalized functional
regression models.
They assumed that the distributions of the function-valued covariates
prone to measurement error belong to the exponential family. This
assumption allows for a more general specification of the distributions
of error-prone functional covariates compared to current approaches that
often entail normality assumptions for these observed measures. The
approach adopted by Luan et al. allows a nonlinear association between
the true measurement and the observed measurement prone to measurement
error.
Second, they treated the random errors in the observed measures as
complex heteroscedastic errors from the Gaussian distribution with
covariance error functions. Third, these methods can be used to evaluate
relationships between multi-level function-valued covariates with
complex measurement error structures and scalar outcomes with
distributions in the exponential family.
Fourth, they treat the function-valued covariate as an observed measure
of the true function-valued unobserved latent covariate.
Additionally, these methods employ a point-wise method for fitting the
multi-level functional MEM approach, avoiding the need to compute
complex and intractable integrals that would be required in traditional
approaches for reducing biases due to measurement error in multi-level
functional data [7].
Their statistical model is as follows: where the response variable Yi and scalar-valued covariates Zi are measured without error, function-valued covariate Xi(t) is repeatedly measured with error as Wij(t). The model also includes the following assumptions:
They proposed a MEM estimation method to correct for bias caused by the presence of measurement error in the function-valued covariate. This method allows for the investigation of a nonlinear measurement error model, in which the relationship between the true and observed measurements is not constrained to be linear, and the distribution assumption for the observed measurement is relaxed to encompass the exponential family rather than being limited to a Gaussian distribution.
The MEM approach is a two stage method that employs functional
mixed-effects models. The first stage of the MEM approach involves using
a functional mixed-effects model
to approximate the true measure Xi(t)
with the repeated observed measurement Wij(t).
The MEM approach is primarily based on the assumptions that h[E{Wij(t)|Xi(t)}] = Xi(t)
and Xi(t) = μx(t) + εxi(t).
That is, in the functional mixed-effects model containing one fixed
intercept and one random intercept, the random intercept is assumed to
to be the subject-wise deviation of Xi(t)
from the mean process μx(t),
and the fixed intercept is assumed to represent μx(t).
The second stage involves replacing Xi(t)
in the regression model with the resulting approximation of Xi(t)
from the first stage. The MEM approach employs point-wise (UP_MEM) and
multi-point-wise (MP_MEM) estimation procedures to avoid potential
computational complexities caused by analyses of multi-level functional
data and computations of potentially intractable and complex
integrals.
The MECfda package provides a function
ME.fcRegression_MEM
for application to the bias correction
estimation method. The response variable, function-valued covariates,
and scalar-valued covariates are input separately as three different
arguments. The argument data.Y
is the response variable.
Inputs can be an atomic vector, a one-column matrix or data frame,
recommended form is a one-column data frame with column name. The
argument data.W
is the data for W, the measurement of X, in the statistical model. The
input should be a 3-dimensional array. in which each row represents a
subject, each column represent a measurement (time) point, each layer
represents an observation. The argument data.Z
is the data
for the scalar covariates, It can be no input or NULL
if
the model does not include a scalar covariate, input in the form of an
atomic vector when the model includes only one scalar covariate, or
input as a matrix or data frame, with the recommended form being a data
frame with column names. The argument method
specifies the
method for constructing the substitution X. Available options includes
'UP_MEM'
, 'MP_MEM'
, and
'average'
. The argument t_interval
specifies
the domain of the function-valued covariate and should be a 2-element
vector, representing an interval. The default is c(0,1)
,
represent interval [0, 1]. The argument
t_points
is the sequence of measurement (time) points, and
the default is NULL
. The argument d
is the
number of measurement (time) points involved for MP_MEM
(the default value is 3, which is also the minimum value). The argument
family.W
is the distribution of W given X, and available options are
'gaussian'
and 'poisson'
. The argument
family.Y
is a description of the error distribution and
link function to be used in the model. See stats::family
.
The argument formula.Z
is used to specify which part of the
argument Z
is used and whether to treat the scalar-valued
covariates, as fixed effects or random effects. The argument
basis.type
indicates the type of basis function to be used
in the basis expansion process. Available options are
'Fourier'
and 'Bspline'
, representing Fourier
basis and b-spline basis respectively. The argument
basis.order
indicates number of the function basis to be
used. When using Fourier basis $\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f$, basis.order
is the number pf. When using
b-splines basis, {Bi, p(x)}i = −pk,
basis.order
is the number of splines, equal to k + p + 1. The argument
bs_degree
specifies the degree of the piecewise polynomials
of b-spline basis function if use b-splines basis. This argument is need
only when using b-spline basis. The argument smooth
specifies whether to smooth the substitution of X, and the default is FALSE. The
argument silent
specifies whether not to show the statuse
of the running of the function, and the default is TRUE.
The function ME.fcRegression_MEM returns a fcRegression object.
ME.fcQR_IV.SIMEX(
data.Y,
data.W,
data.Z,
data.M,
tau = 0.5,
t_interval = c(0, 1),
t_points = NULL,
formula.Z,
basis.type = c("Fourier", "Bspline"),
basis.order = NULL,
bs_degree = 3
)
Health a major concerns for many people, and as technology advances,
wearable devices have become the mainstream method for monitoring and
evaluating individual physical activity levels. Despite personal
preferences in brands and feature design, the accuracy of the data
presented is what makes the device a great product. These functional
data collected from devices are generally considered more accurate and
subjective compared to other objective methods like questionnaires and
self-reports. Because physical activity levels are not directly
observable, algorithms are used generate corresponding summary reports
of different activity level using complex data (e.g. steps, heart
rate).
However, these results may differ depending on which device is used. And
aside from variation in hardware, data collected could also vary by the
combinations between how the device is worn and the activity of
interest. Although current devices may be sufficiently accurate to
monitor general physical activity levels, more precise data may enable
additional functions such as detecting irregular heart rhythms or
radiation exposures that would contribute toward improving the health of
the general public and elevating the overall well-being of society.
Quantile regression is a tool that was developed to meet the need for modeling complex relationships between a set of covariates and quantiles of an outcome. In obesity studies, the effects of interventions or covariates on body mass index (BMI) are most commonly estimated using ordinary least square methods. However, mean regression approaches are unable to address these effects for individuals whose BMI values fall in the upper quantile of the distribution. Compared with traditional linear regression approaches, quantile regression approaches make no assumptions regarding the distribution of residuals and are robust to outlying observations.
Tekwe et. al. proposed a simulation extrapolation (SIMEX) estimation method that uses an instrumental variable to correct for bias due to measurement error in scalar-on-function quantile linear regression. They demonstrated the usefulness of this method by evaluating the association between physical activity and obesity using the National Health and Nutrition Examination Survey (NHANES) dataset [8]. Their statistical model is as follows: where the response variable Yi and scalar-valued covariates Zi are measured without error, the function-valued covariate Xi(t) is measured with error as Wi(t), and Mi(t) is a measured instrumental variable. The model also includes the following assumptions:
for ∀t, s ∈ [0, 1] and i = 1, …, n.
Their bias correction estimation method uses a two-stage strategy to correct for measurement error in a function-valued covariate and then fits a linear quantile regression model. In the first stage, an instrumental variable is used to estimate the covariance matrix associated with the measurement error. In the second stage, SIMEX is used to correct for measurement error in the function-valued covariate.
The MECfda package provides a function
ME.fcQR_IV.SIMEX
for the application of their bias
correction estimation method. The argument data.Y
is the
response variable. Input can be an atomic vector, a one-column matrix or
data frame, recommended form is a one-column data frame with column
name. The argument data.W
is the data for W, the measurement of X, in the statistical model. The
input should be a dataframe or matrix, in which each row represents a
subject, each column represent a measurement (time) point. The argument
data.Z
is the data for scalar covariates, This argument can
be input as no input or NULL
when no scalar covariate is
included in the model, an atomic vector when only one scalar covariate
is included in the model, or as a matrix or data frame. The recommended
form is a data frame with column names. The argument data.M
is the data of M, the
instrumental variable. Input should be a dataframe or matrix, in which
each row represents a subject. each column represent a measurement
(time) point. The argument tau
is the quantile τ ∈ (0, 1), default is 0.5. The
argument t_interval
specifies the domain of the
function-valued covariate, which should be a 2-element vector,
represents an interval. The default is c(0,1)
, represent
interval [0, 1]. The argument
t_points
is the sequence of the measurement (time) points,
default is NULL
. The argument formula.Z
is
used to specify which part of the argument Z
is used and
how to treat the scalar-valued covariates. The argument
basis.type
indicates the type of basis function to be used
in the basis expansion process. Available options are
'Fourier'
and 'Bspline'
, representing the
Fourier basis and the b-spline basis, respectively. The argument
basis.order
indicates number of the function basis to be
used. When using the Fourier basis $\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f$, basis.order
is the number pf. When using
the b-splines basis {Bi, p(x)}i = −pk,
basis.order
is the number of splines, equal to k + p + 1. The argument
bs_degree
specifies the degrees of piecewise polynomials of
the b-spline basis function if using the b-splines basis. This argument
is only needed when using b-spline basis.
The function ME.fcQR_IV.SIMEX
returns an
ME.fcQR_IV.SIMEX class object as a list that containing the following
elements.
ME.fcQR_CLS(
data.Y,
data.W,
data.Z,
tau = 0.5,
t_interval = c(0, 1),
t_points = NULL,
grid_k,
grid_h,
degree = 45,
observed_X = NULL
)
Zhang et. al. proposed a corrected loss score estimation method for scalar-on-function quantile linear regression to correct for bias due to measurement error [9]. Their statistical model is as follows: where the response variable Yi and the scalar-valued covariates Zi are measured without error, the function-valued covariate Xi(t) is measured with error as Wi(t).
Zhang et al. proposed a new corrected loss function for a partially functional linear quantile model with functional measurement error in this manuscript. They established a corrected quantile objective function for an observed measurement, which is an unbiased estimator of the quantile objective function that would be obtained if the true measurements were available. The estimators of the regression parameters are obtained by optimizing the resulting corrected loss function. The resulting estimator of the regression parameters is shown to be consistent.
The MECfda package provides a function,
ME.fcQR_CLS
, for the application of their bias correction
estimation method. The argument data.Y
is the response
variable. Input can be an atomic vector, a one-column matrix or data
frame, with the recommended form being a one-column data frame with
column name. The argument data.W
is the data for W, the measurement of X, in the statistical model. The
input should be a 3-dimensional array, in which each row represents a
subject, each column represent a measurement (time) point, each layer
represents an observation. The argument data.Z
is the data
for scalar covariates, This argument can be no input or
NULL
when no scalar covariate is included in the model, as
an atomic vector when only one scalar covariate is included in the
model, or as a matrix or data frame. The recommended form is a data
frame with column names. The argument tau
is the quantile
τ ∈ (0, 1), with the default
value 0.5. The argument t_interval
specifies the domain of
the function-valued covariate, and should be a 2-element vector,
representing an interval. The default is c(0,1)
,
representing interval [0, 1]. The
argument t_points
is the sequence of measurement (time)
points, default is NULL
. The argument grid_k
is an atomic vector, in which each element is a candidate basis
number.
The argument grid_h
is a non-zero-value atomic vector, in
which each element is a candidate value for the tuning parameter. The
argument degree
is used in to compute the derivative and
integral, and the default value is 45, which is large enough for most
scenario. The argument observed_X
is for estimating
parametric variance, and the default value is NULL
.
The function returns an ME.fcQR_CLS class object. as a list that containing the following elements.
NULL
.Tekwe et. al. proposed a bias correction estimation method for scalar-on-function linear regression model with measurement error using an instrumental variable [6]. Their statistical model is as follows: where the response variable Yi and are measured without error, the function-valued covariate Xi(t) is measured with error as Wi(t), and Mi(t) is an measured instrumental variable. They included the following additional assumptions:
for ∀t, s ∈ [0, 1] and i = 1, …, n.
The MECfda package provides a function
ME.fcLR_IV
for the application of their bias correction
estimation method. The argument data.Y
is the response
variable, which can be input as an atomic vector, a one-column matrix or
data frame, with the recommended form being a one-column data frame with
column name. The argument data.W
is the data for W, the measurement of X, in the statistical model. The
input should be a dataframe or matrix, in which each row represents a
subject. each column represent a measurement (time) point. The argument
data.M
is the data for M, the instrumental variable. Input
should be a dataframe or matrix. Each row represents a subject. Each
column represent a measurement (time) point. The argument
t_interval
specifies the domain of the function-valued
covariate, and should be a 2-element vector, represents an interval. The
default value is c(0,1)
, represent interval [0, 1]. The argument t_points
is
the sequence of the measurement (time) points, and the default value is
NULL
. The argument CI.bootstrap
specifies
whether to return the confidence jinterval using the bootstrap method,
and the default is FALSE.
The function returns a ME.fcLR_IV class object as a list that containing the following elements.