boostmtree {boostmtree}  R Documentation 
Multivariate extension of Friedman's gradient descent boosting method for modeling continuous or binary longitudinal response using multivariate tree base learners (Pande et al., 2017). Covariatetime interactions are modeled using penalized Bsplines (Psplines) with estimated adaptive smoothing parameter.
boostmtree(x, tm, id, y, family = c("Continuous","Binary","Nominal","Ordinal"), y_reference = NULL, M = 200, nu = 0.05, na.action = c("na.omit","na.impute")[2], K = 5, mtry = NULL, nknots = 10, d = 3, pen.ord = 3, lambda, rho, lambda.max = 1e6, lambda.iter = 2, svd.tol = 1e6, forest.tol = 1e3, verbose = TRUE, cv.flag = FALSE, eps = 1e5, mod.grad = TRUE, NR.iter = 3, ...)
x 
Data frame (or matrix) containing the xvalues. Rows must be duplicated to match the number of time points for an individual. That is, if individual i has n[i] outcome yvalues, then there must be n[i] duplicate rows of i's xvalue. 
tm 
Vector of time values, one entry for each row in 
id 
Unique subject identifier, one entry for each row in 
y 
Observed yvalue, one entry for each row in 
family 
Family of the response variable 
y_reference 
Set this value, among the unique 
M 
Number of boosting iterations 
nu 
Boosting regularization parameter. A value in (0,1]. 
na.action 
Remove missing values (casewise) or impute it. Default is to impute the missign values. 
K 
Number of terminal nodes used for the multivariate tree learner. 
mtry 
Number of 
nknots 
Number of knots used for the Bspline for modeling the time interaction effect. 
d 
Degree of the piecewise Bspline polynomial (no time effect is fit when d < 1). 
pen.ord 
Differencing order used to define the penalty with increasing values implying greater smoothness. 
lambda 
Smoothing (penalty) parameter used for Bsplines with increasing values associated with increasing smoothness/penalization. If missing, or nonpositive, the value is estimated adaptively using a mixed models approach. 
rho 
If missing, rho is estimated, else, use the 
lambda.max 
Tolerance used for adaptively estimated lambda (caps it). For experts only. 
lambda.iter 
Number of iterations used to estimate lambda (only applies when lambda is not supplied and adaptive smoothing is employed). 
svd.tol 
Tolerance value used in the SVD calculation of the penalty matrix. For experts only. 
forest.tol 
Tolerance used for forest weighted least squares solution. Experimental and for experts only. 
verbose 
Should verbose output be printed? 
cv.flag 
Should insample crossvalidation (CV) be used to determine optimal stopping using out of bag data? 
eps 
Tolerance value used for determining the optimal

mod.grad 
Use a modified gradient? See details below. 
NR.iter 
Number of NewtonRaphson iteration. Applied
for 
... 
Further arguments passed to or from other methods. 
Each individual has observed yvalues, over possibly different time points, with possibly differing number of time points. Given y, the time points, and x, the conditional mean time profile of y is estimated using gradient boosting in which the gradient is derived from a criterion function involving a working variance matrix for y specified as an equicorrelation matrix with parameter rho multiplied by a variance parameter phi. Multivariate trees are used for base learners and weighted least squares is used for solving the terminal node optimization problem. This provides solutions to the core parameters of the algorithm. For ancillary parameters, a mixedmodel formulation is used to estimate the smoothing parameter associated with the Bsplines used for the timeinteraction effect, although the user can manually set the smoothing parameter as well. Ancillary parameters rho and phi are estimated using GLS (generalized least squares).
In the original boostmtree algorithm (Pande et al., 2017), the
equicorrelation parameter rho is used in two places in the
algorithm: (1) for growing trees using the gradient, which depends
upon rho; and (2) for solving the terminal node optimization
problem which also uses the gradient. However, Pande (2017) observed
that setting rho to zero in the gradient used for growing trees
improved performance of the algorithm, especially in high dimensions.
For this reason the default setting used in this algorithm is to set
rho to zero in the gradient for (1). The rho
in the
gradient for (2) is not touched. The option mod.grad
specifies
whether a modified gradient is used in the tree growing process and is
TRUE by default.
By default, trees are grown from a bootstrap sample of the data – thus the boosting method employed here is a modified example of stochastic gradient descent boosting (Friedman, 2002). Stochastic descent often improves performance and has the added advantage that outofsample data (outofbag, OOB) can be used to calculate variable importance (VIMP).
The package implements Rside parallel processing by replacing
the R function lapply
with mclapply
found in the
parallel package. You can set the number of cores accessed by
mclapply
by issuing the command options(mc.cores =
x)
, where x
is the number of cores. The options command
can also be placed in the users .Rprofile file for convenience. You
can, alternatively, initialize the environment variable
MC_CORES
in your shell environment.
As an example, issuing the following options command uses all available cores for Rside parallel processing:
options(mc.cores=detectCores())
However, be cautious when setting mc.cores
. This can create
not only high CPU usage but also high RAM usage, especially when using
functions partialPlot
and predict
.
The method can impute the missing observations in x (covariates) using
on the fly imputation. Details regarding can be found in the
randomForestSRC package. If missing values are present in the
tm
, id
or y
, the user should either impute or
delete these values before executing the function.
Finally note cv.flag
can be used for an insample
crossvalidated estimate of prediction error. This is used to
determine the optimized number of boosting iterations Mopt.
The final mu predictor is evaluated at this value and is
crossvalidated. The prediction error returned via err.rate
is standardized by the overall standard deviation of y.
An
object of class (boostmtree, grow)
with the following
components:
x 
The xvalues, but with only one row per individual
(i.e. duplicated rows are removed). Values sorted on 
xvar.names 
Xvariable names. 
time 
List with each component containing the time
points for a given individual. Values sorted on 
id 
Sorted subject identifier. 
y 
List with each component containing the observed
yvalues for a given individual. Values sorted on 
Yorg 
For family == "Nominal" or family == "Ordinal", this provides the response in listformat where each element coverted the response into the binary response. 
family 
Family of 
ymean 
Overall mean of yvalues for all individuals. If 
ysd 
Overall standard deviation of yvalues for all individuals. If 
na.action 
Remove missing values or impute? 
n 
Total number of subjects. 
ni 
Number of repeated measures for each subject. 
n.Q 
Number of class labels for noncontinuous response. 
Q_set 
Class labels for the noncontinuous response. 
y.unq 
Unique y values for the noncontinous response. 
y_reference 
Reference value for family == "Nominal". 
tm.unq 
Unique time points. 
gamma 
List of length M, with each component containing the boosted tree fitted values. 
mu 
List with each component containing the estimated mean
values for an individual. That is, each component contains the
estimated timeprofile for an individual. When insample
crossvalidation is requested using 
Prob_class 
For family == "Ordinal", this provides individual probabilty rather than cumulative probabilty. 
lambda 
Smoothing parameter. Results provided in vector or matrix form, depending on whether family == c("Continuous","Binary") or family == c("Nominal", "Ordinal"). 
phi 
Variance parameter.Results provided in vector or matrix form, depending on whether family == c("Continuous","Binary") or family == c("Nominal", "Ordinal"). 
rho 
Correlation parameter.Results provided in vector or matrix form, depending on whether family == c("Continuous","Binary") or family == c("Nominal", "Ordinal"). 
baselearner 
List of length M containing the base learners. 
membership 
List of length M, with each component containing the terminal node membership for a given boosting iteration. 
X.tm 
Design matrix for all the unique time points. 
D 
Design matrix for each subject. 
d 
Degree of the piecewise Bspline polynomial. 
pen.ord 
Penalization difference order. 
K 
Number of terminal nodes. 
M 
Number of boosting iterations. 
nu 
Boosting regularization parameter. 
ntree 
Number of trees. 
cv.flag 
Whether insample CV is used or not? 
err.rate 
Insample standardized estimate of l1error and RMSE. 
rmse 
Insample standardized RMSE at optimized 
Mopt 
The optimized 
gamma.i.list 
Estimate of gamma obtained from insample CV if 
forest.tol 
Forest tolerance value (needed for prediction). 
Hemant Ishwaran, Amol Pande and Udaya B. Kogalur
Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:11891232.
Friedman J.H. (2002). Stochastic gradient boosting. Comp. Statist. Data Anal., 38(4):367–378.
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data, Machine Learning, 106(2): 277–305.
Pande A. (2017). Boosting for longitudinal data. Ph.D. Dissertation, Miller School of Medicine, University of Miami.
marginalPlot
partialPlot
,
plot.boostmtree
,
predict.boostmtree
,
print.boostmtree
,
simLong
,
vimpPlot
## ## synthetic example (Response y is continuous) ## 0.8 correlation, quadratic time with quadratic interaction ## #simulate the data (use a small sample size for illustration) dta < simLong(n = 50, N = 5, rho =.80, model = 2,family = "Continuous")$dtaL #basic boosting call (M set to a small value for illustration) boost.grow < boostmtree(dta$features, dta$time, dta$id, dta$y,family = "Continuous",M = 20) #print results print(boost.grow) #plot.results plot(boost.grow) ## ## synthetic example (Response y is binary) ## 0.8 correlation, quadratic time with quadratic interaction ## #simulate the data (use a small sample size for illustration) dta < simLong(n = 50, N = 5, rho =.80, model = 2, family = "Binary")$dtaL #basic boosting call (M set to a small value for illustration) boost.grow < boostmtree(dta$features, dta$time, dta$id, dta$y,family = "Binary", M = 20) #print results print(boost.grow) #plot.results plot(boost.grow) ## Not run: ## ## Same synthetic example as above with continuous response ## but with insample crossvalidation estimate for RMSE ## dta < simLong(n = 50, N = 5, rho =.80, model = 2,family = "Continuous")$dtaL boost.cv.grow < boostmtree(dta$features, dta$time, dta$id, dta$y, family = "Continuous", M = 300, cv.flag = TRUE) plot(boost.cv.grow) print(boost.cv.grow) ## ## spirometry data (Response is continuous) ## data(spirometry, package = "boostmtree") #boosting call: cubic Bsplines with 15 knots spr.obj < boostmtree(spirometry$features, spirometry$time, spirometry$id, spirometry$y, family = "Continuous",M = 100, nu = .025, nknots = 15) plot(spr.obj) ## ## Atrial Fibrillation data (Response is binary) ## data(AF, package = "boostmtree") #boosting call: cubic Bsplines with 15 knots AF.obj < boostmtree(AF$feature, AF$time, AF$id, AF$y, family = "Binary",M = 100, nu = .025, nknots = 15) plot(AF.obj) ## ## sneaky way to use boostmtree for (univariate) regression: boston housing ## if (library("mlbench", logical.return = TRUE)) { ## assemble the data data(BostonHousing) x < BostonHousing; x$medv < NULL y < BostonHousing$medv trn < sample(1:nrow(x), size = nrow(x) * (2 / 3), replace = FALSE) ## run boosting in univariate mode o < boostmtree(x = x[trn,], y = y[trn],family = "Continuous") o.p < predict(o, x = x[trn, ], y = y[trn]) print(o) plot(o.p) ## run boosting in univariate mode to obtain RMSE and vimp o.cv < boostmtree(x = x, y = y, M = 100,family = "Continuous",cv.flag = TRUE) print(o.cv) plot(o.cv) } ## End(Not run)