The computational load of function `pre`

can be heavy.
This vignette shows some ways to reduce it.

There are two main steps in fitting a rule ensemble: 1) Rule generation and 2) Estimation of the final ensemble. Both can be adjusted to reduce computational load.

By default, `pre`

uses the conditional inference tree
algorithm Hothorn, Hornik, & Zeileis
(2006) as implemented in function `ctree`

of
`R`

package `partykit`

Hothorn & Zeileis (2015) for rule induction.
The main reason is that it does not present with a selection bias
towards variables with a greater number of possible cut points. Yet, use
of `ctree`

brings a relatively heavy computational load:

```
airq <- airquality[complete.cases(airquality), ]
airq$Month <- factor(airq$Month)
library("pre")
set.seed(42)
system.time(airq.ens <- pre(Ozone ~ ., data = airq))
```

```
## user system elapsed
## 2.74 0.01 2.75
```

```
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 3.229132
## number of terms = 13
## mean cv error (se) = 336.2188 (95.24635)
##
## cv error type : Mean-Squared Error
```

Computational load can be substantially reduced by employing the CART
algorithm of Breiman, Friedman, Olshen, &
Stone (1984) as implemented in function `rpart`

from
the package of the same name of Therneau &
Atkinson (2022). This can be specified through the
`tree.unbiased`

argument:

```
## user system elapsed
## 1.67 0.03 1.70
```

```
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 23
## mean cv error (se) = 287.7377 (86.43107)
##
## cv error type : Mean-Squared Error
```

Note, however, that the variable selection induced by fitting CART
trees may also present in the final ensemble. Furthermore, use of CART
trees will tend to result in more complex ensembles, because the default
stopping criterion in `rpart`

is considerably less
conservative than that of `ctree`

. As a result, many more
rules may be created and retained.

Reducing tree (and therby rule) depth will reduce computational load of both rule fitting and estimation of the final ensemble:

```
## user system elapsed
## 2.26 0.05 2.33
```

```
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.676698
## number of terms = 14
## mean cv error (se) = 424.1761 (126.9922)
##
## cv error type : Mean-Squared Error
```

Likely, reducing the maximum depth will improve interpretability, but may decrease predictive accuracy for the final ensemble.

By default, 500 trees are generated. Computation time can be reduced
substantially by reducing the number of trees. This may of course
negatively impact predictive accuracy. When using a smaller number of
trees, it is likely beneficial to increase the learning rate
(`learnrate = .01`

, by default) accordingly:

```
set.seed(42)
system.time(airq.ens.nt <- pre(Ozone ~ ., data = airq, ntrees = 100L, learnrate = .05))
```

```
## user system elapsed
## 0.83 0.01 0.85
```

```
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 11
## mean cv error (se) = 276.9282 (69.1872)
##
## cv error type : Mean-Squared Error
```

Function `cv.glmnet`

from package `glmnet`

is
used for fitting the final model. The relevant arguments of function
`cv.glmnet`

can be passed directly to function
`pre`

.

For example, parallel computation can be employed by specifying
`par.final = TRUE`

in the call to `pre`

(and
registering parallel beforehand, e.g., using `doMC`

).
Parallel computation will not affect performance of the final ensemble.
Note that parallel computation will only reduce computation time for
datasets with a (very) large number of observations. For smaller
datasets, use of parallel computation may even be increased computation
time.

The number of cross-validation repetitions can be also be reduced, but this will likely affect predictive performance of the final ensemble.

```
## user system elapsed
## 2.67 0.00 2.68
```

```
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 2.559032
## number of terms = 13
## mean cv error (se) = 307.2755 (105.2568)
##
## cv error type : Mean-Squared Error
```

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984).
*Classification and regression trees*. New York: Routledge. https://doi.org/10.1201/9781315139470

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive
partitioning: A conditional inference framework. *Journal of
Computational and Graphical Statistics*, *15*(3), 651–674. https://doi.org/10.1198/106186006X133933

Hothorn, T., & Zeileis, A. (2015). Partykit: A modular toolkit for
recursive partytioning in R. *The Journal of Machine
Learning Research*, *16*(1), 3905–3909. Retrieved from https://www.jmlr.org/papers/v16/hothorn15a.html

Therneau, T., & Atkinson, B. (2022). *Rpart: Recursive
partitioning and regression trees*. Retrieved from https://CRAN.R-project.org/package=rpart