% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rf_evaluate.R
\name{rf_evaluate}
\alias{rf_evaluate}
\title{Evaluates random forest models with spatial cross-validation}
\usage{
rf_evaluate(
  model = NULL,
  xy = NULL,
  repetitions = 30,
  training.fraction = 0.75,
  metrics = c("r.squared", "pseudo.r.squared", "rmse", "nrmse", "auc"),
  distance.step = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  grow.testing.folds = FALSE,
  seed = 1,
  verbose = TRUE,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)
}
\arguments{
\item{model}{Model fitted with \code{\link[=rf]{rf()}}, \code{\link[=rf_repeat]{rf_repeat()}}, or \code{\link[=rf_spatial]{rf_spatial()}}.}

\item{xy}{Data frame or matrix with two columns containing coordinates and named "x" and "y". If \code{NULL}, the function will throw an error. Default: \code{NULL}}

\item{repetitions}{Integer, number of spatial folds to use during cross-validation. Must be lower than the total number of rows available in the model's data. Default: \code{30}}

\item{training.fraction}{Proportion between 0.5 and 0.9 indicating the proportion of records to be used as training set during spatial cross-validation. Default: \code{0.75}}

\item{metrics}{Character vector, names of the performance metrics selected. The possible values are: "r.squared" (\code{cor(obs, pred) ^ 2}), "pseudo.r.squared" (\code{cor(obs, pred)}), "rmse" (\code{sqrt(sum((obs - pred)^2)/length(obs))}), "nrmse" (\code{rmse/(quantile(obs, 0.75) - quantile(obs, 0.25))}), and "auc" (only for binary responses with values 1 and 0). Default: \code{c("r.squared", "pseudo.r.squared", "rmse", "nrmse")}}

\item{distance.step}{Numeric, argument \code{distance.step} of \code{\link[=thinning_til_n]{thinning_til_n()}}. distance step used during the selection of the centers of the training folds. These fold centers are selected by thinning the data until a number of folds equal or lower than \code{repetitions} is reached. Its default value is 1/1000th the maximum distance within records in \code{xy}. Reduce it if the number of training folds is lower than expected.}

\item{distance.step.x}{Numeric, argument \code{distance.step.x} of \code{\link[=make_spatial_folds]{make_spatial_folds()}}. Distance step used during the growth in the x axis of the buffers defining the training folds. Default: \code{NULL} (1/1000th the range of the x coordinates).}

\item{distance.step.y}{Numeric, argument \code{distance.step.x} of \code{\link[=make_spatial_folds]{make_spatial_folds()}}. Distance step used during the growth in the y axis of the buffers defining the training folds. Default: \code{NULL} (1/1000th the range of the y coordinates).}

\item{grow.testing.folds}{Logic. By default, this function grows contiguous training folds to keep the spatial structure of the data as intact as possible. However, when setting \code{grow.testing.folds = TRUE}, the argument \code{training.fraction} is set to \code{1 - training.fraction}, and the training and testing folds are switched. This option might be useful when the training data has a spatial structure that does not match well with the default behavior of the function. Default: \code{FALSE}}

\item{seed}{Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default: \code{1}.}

\item{verbose}{Logical. If \code{TRUE}, messages and plots generated during the execution of the function are displayed, Default: \code{TRUE}}

\item{n.cores}{Integer, number of cores to use for parallel execution. Creates a socket cluster with \code{parallel::makeCluster()}, runs operations in parallel with \code{foreach} and \verb{\%dopar\%}, and stops the cluster with \code{parallel::clusterStop()} when the job is done. Default: \code{parallel::detectCores() - 1}}

\item{cluster}{A cluster definition generated with \code{parallel::makeCluster()}. If provided, overrides \code{n.cores}. When \code{cluster = NULL} (default value), and \code{model} is provided, the cluster in \code{model}, if any, is used instead. If this cluster is \code{NULL}, then the function uses \code{n.cores} instead. The function does not stop a provided cluster, so it should be stopped with \code{parallel::stopCluster()} afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the \code{model} argument, or using the \verb{\%>\%} pipe. Default: \code{NULL}}
}
\value{
A model of the class "rf_evaluate" with a new slot named "evaluation", that is a list with the following slots:
\itemize{
\item \code{training.fraction}: Value of the argument \code{training.fraction}.
\item \code{spatial.folds}: Result of applying \code{\link[=make_spatial_folds]{make_spatial_folds()}} on the data coordinates. It is a list with as many slots as \code{repetitions} are indicated by the user. Each slot has two slots named "training" and "testing", each one having the indices of the cases used on the training and testing models.
\item \code{per.fold}: Data frame with the evaluation results per spatial fold (or repetition). It contains the ID of each fold, it's central coordinates, the number of training and testing cases, and the training and testing performance measures: R squared, pseudo R squared (cor(observed, predicted)), rmse, and normalized rmse.
\item \code{per.model}: Same data as above, but organized per fold and model ("Training", "Testing", and "Full").
\item \code{aggregated}: Same data, but aggregated by model and performance measure.
}
}
\description{
Evaluates the performance of random forest on unseen data over independent spatial folds.
}
\details{
The evaluation algorithm works as follows: the number of \code{repetitions} and the input dataset (stored in \code{model$ranger.arguments$data}) are used as inputs for the function \code{\link[=thinning_til_n]{thinning_til_n()}}, that applies \code{\link[=thinning]{thinning()}} to the input data until as many cases as \code{repetitions} are left, and as separated as possible. Each of these remaining records will be used as a "fold center". From that point, the fold grows, until a number of points equal (or close) to \code{training.fraction} is reached. The indices of the records within the grown spatial fold are stored as "training" in the output list, and the remaining ones as "testing". Then, for each spatial fold, a "training model" is fitted using the cases corresponding with the training indices, and predicted over the cases corresponding with the testing indices. The model predictions on the "unseen" data are compared with the observations, and the performance measures (R squared, pseudo R squared, RMSE and NRMSE) computed.
}
\examples{

if(interactive()){

data(
  plants_rf,
  plants_xy
)

plants_rf <- rf_evaluate(
  model = plants_rf,
  xy = plants_xy,
  repetitions = 5,
  n.cores = 1
)

plot_evaluation(plants_rf, notch = FALSE)

print_evaluation(plants_rf)

get_evaluation(plants_rf)

}

}
\seealso{
Other model_workflow: 
\code{\link{rf_compare}()},
\code{\link{rf_importance}()},
\code{\link{rf_repeat}()},
\code{\link{rf_tuning}()}
}
\concept{model_workflow}
