This function is used to fit nonparametric models by using local polynomial kernel smoothers or splines. These models can include or not factor-by-curve interactions. Additionally, a parametric model (allometric model) can be estimated (or not).
Usage
frfast(
formula,
data,
na.action = "na.omit",
model = "np",
smooth = "kernel",
h0 = -1,
h = -1,
nh = 30,
weights = NULL,
kernel = "epanech",
p = 3,
kbin = 100,
nboot = 500,
rankl = NULL,
ranku = NULL,
seed = NULL,
cluster = TRUE,
ncores = NULL,
...
)
Arguments
- formula
An object of class
formula
: a sympbolic description of the model to be fitted. The details of model specification are given under 'Details'.- data
An optional data frame, matrix or list required by the formula. If not found in data, the variables are taken from
environment(formula)
, typically the environment from whichfrfast
is called.- na.action
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'.
- model
Type model used:
model = "np"
for a nonparametric regression model,model = "allo"
for an allometric model. See details.- smooth
Type smoother used:
smooth = "kernel"
for local polynomial kernel smoothers andsmooth = "splines"
for splines using themgcv
package.- h0
The kernel bandwidth smoothing parameter for the global effect (see references for more details at the estimation). Large values of the bandwidth lead to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates. By default, cross validation is used to obtain the bandwidth.
- h
The kernel bandwidth smoothing parameter for the partial effects.
- nh
Integer number of equally-spaced bandwidth in which the
h
is discretised, to speed up computation in the kernel-based regression.- weights
Prior weights on the data.
- kernel
A character string specifying the desired kernel. Defaults to
kernel = "epanech"
, where the Epanechnikov density function kernel will be used. Also, several types of kernel functons can be used: triangular and Gaussian density function, with"triang"
and"gaussian"
term, respectively.- p
Polynomial degree to be used in the kernel-based regression. Its value must be the value of derivative + 1. The default value is 3, returning the estimation, first and second derivative.
- kbin
Number of binning nodes over which the function is to be estimated.
- nboot
Number of bootstrap repeats. Defaults to 500 bootstrap repeats. The wild bootstrap is used when
model = "np"
and the simple bootstrap whenmodel = "allo"
.- rankl
Number or vector specifying the minimum value for the interval at which to search the
x
value which maximizes the estimate, first or second derivative (for each level). The default is the minimum data value.- ranku
Number or vector specifying the maximum value for the interval at which to search the
x
value which maximizes the estimate, first or second derivative (for each level). The default is the maximum data value.- seed
Seed to be used in the bootstrap procedure.
- cluster
A logical value. If
TRUE
(default), the bootstrap procedure is parallelized (only forsmooth = "splines"
). Note that there are cases (e.g., a low number of bootstrap repetitions) that R will gain in performance through serial computation. R takes time to distribute tasks across the processors also it will need time for binding them all together later on. Therefore, if the time for distributing and gathering pieces together is greater than the time need for single-thread computing, it does not worth parallelize.- ncores
An integer value specifying the number of cores to be used in the parallelized procedure. If
NULL
(default), the number of cores to be used is equal to the number of cores of the machine - 1.- ...
Other options.
Value
An object is returned with the following elements:
- x
Vector of values of the grid points at which model is to be estimate.
- p
Matrix of values of the grid points at which to compute the estimate, their first and second derivative.
- pl
Lower values of 95% confidence interval for the estimate, their first and second derivative.
- pu
Upper values of 95% confidence interval for the estimate, their first and second derivative.
- diff
Differences between the estimation values of a couple of levels (i. e. level 2 - level 1). The same procedure for their first and second derivative.
- diffl
Lower values of 95% confidence interval for the differences between the estimation values of a couple of levels. It is performed for their first and second derivative.
- diffu
Upper values of 95% confidence interval for the differences between the estimation values of a couple of levels. It is performed for their first and second derivative.
- nboot
Number of bootstrap repeats.
- n
Sample size.
- dp
Degree of polynomial to be used.
- h0
The kernel bandwidth smoothing parameter for the global effect.
- h
The kernel bandwidth smoothing parameter for the partial effects.
- fmod
Factor's level for each data.
- xdata
Original x values.
- ydata
Original y values.
- w
Weights on the data.
- kbin
Number of binning nodes over which the function is to be estimated.
- nf
Number of levels.
- max
Value of covariate
x
which maximizes the estimate, first or second derivative.- maxu
Upper value of 95% confidence interval for the value
max
.- maxl
Lower value of 95% confidence interval for the value
max
.- diffmax
Differences between the estimation of
max
for a couple of levels (i. e. level 2 - level 1). The same procedure for their first and second derivative.- diffmaxu
Upper value of 95% confidence interval for the value
diffmax
.- diffmaxl
Lower value of 95% confidence interval for the value
diffmax
.- repboot
Matrix of values of the grid points at which to compute the estimate, their first and second derivative for each bootstrap repeat.
- rankl
Maximum value for the interval at which to search the
x
value which maximizes the estimate, first or second derivative (for each level). The default is the maximum data value.- ranku
Minimum value for the interval at which to search the
x
value which maximizes the estimate, first or second derivative (for each level). The default is the minimum data value.- nmodel
Type model used:
nmodel = 1
the nonparametric model,nmodel = 2
the allometric model.- label
Labels of the variables in the model.
- numlabel
Number of labels.
- kernel
A character specifying the derised kernel.
- a
Estimated coefficient in the case of fitting an allometric model.
- al
Lower value of 95% confidence interval for the value of
a
.- au
Upper value of 95% confidence interval for the value of
a
.- b
Estimated coefficient in the case of fitting an allometric model.
- bl
Lower value of 95% confidence interval for the value of
b
.- bu
Upper value of 95% confidence interval for the value of
b
.- name
Name of the variables in the model.
- formula
A sympbolic description of the model to be fitted.
- nh
Integer number of equally-spaced bandwidth on which the
h
is discretised.- r2
Coefficient of determination (in the case of the allometric model).
- smooth
Type smoother used.
- cluster
Is the procedure parallelized? (for splines smoothers).
- ncores
Number of cores used in the parallelized procedure? (for splines smoothers).
Details
The models fitted by frfast
function are specified
in a compact symbolic form. The ~ operator is basic in the formation
of such models. An expression of the form y ~ model
is interpreted as
a specification that the response y
is modelled by a predictor
specified symbolically by model
. The possible terms consist of a
variable name or a variable name and a factor name separated by : operator.
Such a term is interpreted as the interaction of the continuous variable and
the factor. However, if smooth = "splines"
, the formula is based on the function
formula.gam of the mgcv package.
According with the model
argument, if model = "np"
the
estimated regression model will be of the type
$$Y = m(X) + e$$
being \(m\) an smooth and unknown function and \(e\)
the regression error with zero mean. If model = "allo"
, users could estimate
the classical allometric model (Huxley, 1924) with a regression curve
$$m(X) = a X^b$$ being \(a\) and \(b\) the parameters of the model.
References
Huxley, J. S. (1924). Constant differential growth-ratios and their significance. Nature, 114:895--896.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
Examples
library(npregfast)
data(barnacle)
# Nonparametric regression without interactions
fit <- frfast(DW ~ RC, data = barnacle, nboot = 100, smooth = "kernel")
fit
#>
#> Call:
#> frfast(formula = DW ~ RC, data = barnacle, smooth = "kernel",
#> nboot = 100)
#>
#> *********************************************
#> Nonparametric Model
#> *********************************************
#>
#> Number of Observations: 2000
#>
#> Number of Bootstrap Repeats: 100
#>
#> Type of Nonparametric Smoother: kernel
#>
#> Bandwidth: 0.28
#>
#> Kernel Function: Epanechnikov
summary(fit)
#>
#> Call:
#> frfast(formula = DW ~ RC, data = barnacle, smooth = "kernel",
#> nboot = 100)
#>
#> *********************************************
#> Nonparametric Model
#> *********************************************
#>
#> Type of nonparametric smoother: kernel
#> Kernel: Epanechnikov
#> Bandwidth: 0.28
#> Polynomial degree: 3
#> Number of bootstrap repeats: 100
#> Number of binning nodes 100
#>
#>
#> The number of data is: 2000
#>
#> Summaries for the response variable:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.1300 0.4200 0.5705 0.8900 2.5800
# using splines
#fit <- frfast(DW ~ s(RC), data = barnacle, nboot = 100,
#smooth = "splines", cluster = TRUE, ncores = 2)
#fit
#summary(fit)
# Change the number of binning nodes and bootstrap replicates
fit <- frfast(DW ~ RC, data = barnacle, kbin = 200,
nboot = 100, smooth = "kernel")
# Nonparametric regression with interactions
fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100)
fit2
#>
#> Call:
#> frfast(formula = DW ~ RC:F, data = barnacle, nboot = 100)
#>
#> *********************************************
#> Nonparametric Model
#> *********************************************
#>
#> Number of Observations: 2000
#>
#> Number of Factors: 2
#>
#> Number of Bootstrap Repeats: 100
#>
#> Type of Nonparametric Smoother: kernel
#>
#> Bandwidth: 0.28 0.52 1.00
#>
#> Kernel Function: Epanechnikov
summary(fit2)
#>
#> Call:
#> frfast(formula = DW ~ RC:F, data = barnacle, nboot = 100)
#>
#> *********************************************
#> Nonparametric Model
#> *********************************************
#>
#> Type of nonparametric smoother: kernel
#> Kernel: Epanechnikov
#> Bandwidth: 0.28 0.52 1.00
#> Polynomial degree: 3
#> Number of bootstrap repeats: 100
#> Number of binning nodes 100
#>
#>
#> The number of data is: 2000
#> The factor's levels are: barca lens
#> The number of data for the level barca is: 1000
#> The number of data for the level lens is: 1000
#>
#> Summaries for the response variable (for each level):
#> Level barca :
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.1300 0.4100 0.5437 0.8425 2.2500
#>
#> Level lens :
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.1400 0.4350 0.5974 0.9500 2.5800
# using splines
#fit2 <- frfast(DW ~ s(RC, by = F), data = barnacle,
# nboot = 100, smooth = "splines", cluster = TRUE, ncores = 2)
#fit2
#summary(fit2)
# Allometric model
fit3 <- frfast(DW ~ RC, data = barnacle, model = "allo", nboot = 100)
summary(fit3)
#>
#> Call:
#> frfast(formula = DW ~ RC, data = barnacle, model = "allo", nboot = 100)
#>
#> *********************************************
#> Allometric Model
#> *********************************************
#>
#> Coefficients:
#>
#> 2.5 % 97.5 %
#> a 0.000269 0.000247 0.000286
#> b 2.908623 2.883949 2.940579
#>
#> Adjusted R-squared: 0.9446261
#>
#> *********************************************
#>
#>
#> Number of bootstrap repeats: 100
#> Number of binning nodes 100
#>
#>
#> The number of data is: 2000
#> The factor's levels are: 1
#>
#>
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.1300 0.4200 0.5705 0.8900 2.5800
# fit4 <- frfast(DW ~ RC : F, data = barnacle, model = "allo", nboot = 100)
# summary(fit4)