Removes unnecessary test of the pROC ROC curve
contents. Thanks @xrobin.
Fixes test for compatibility with pROC 1.19.0.1.
Thanks @xrobin.
plot_confusion_matrix(): Fixes deprecation warning
when enabling add_sums=TRUE without specifying non-default
sums_settings. Thanks @lucasxteixeira.
plot_confusion_matrix():Adds arrow_color argument for choosing between black
and white arrow icons.
Deprecates font_color argument in
sum_tile_settings() in favor of
font_counts_color and
font_normalized_color.
Adds dynamic_font_colors argument and the associated
dynamic_font_color_settings() helper function for
specifying colors below and above a given value threshold. This, for
instance, allows changing the color of higher numbers when the tile
background color is very dark. It also allows inverting colors of arrow
icons below/above the threshold. The argument is also added to
sum_tile_settings().
Most arguments in font() (and the
font_*_color arguments in sum_tile_settings())
now also accepts a function that decides the setting based on the values
(counts, normalized, etc.).
plot_confusion_matrix():Adds option to set intensities of main tiles by
row_percentages or col_percentages. Sum tiles
cannot be set by these. Idea from Elliot.
Fixes error when the input has been filtered (for asymmetrical
plotting) and add_sums=TRUE and/or
diag_percentages_only=TRUE. Thanks @jwang-lilly for
reporting.
nnet::multinom coefficients after
change in parameters. Thanks @strengejacke for
reporting the issue.ggnewscale. Thanks @eliocamp for the
PR.Removes defunct tests after ggplot2 update.
plot_confusion_matrix() now shows total count when
add_normalized=FALSE. Thanks @JianWang2016 for
reporting the issue.
Makes it more clear in the documentation that the
Balanced Accuracy metric in multiclass classification is
the macro-averaged metric, not the average recall metric that
is sometimes used.
plot_confusion_matrix():Breaking: Adds slight 3D tile effect to help separate tiles with the same count. Not tested in all many-class scenarios.
Fixes image sizing (arrows and zero-shading) when there are different numbers of unique classes in targets and predictions.
Fixes bug with class_order argument when there are
different numbers of unique classes in targets and predictions.
plot_confusion_matrix():NEW: We created a Plot
Confusion Matrix web application! It allows using
plot_confusion_matrix() without code. Select from multiple
design templates or make your own.
For (palette=, sums_settings(palette=)) arguments,
tile color palettes can now be a custom gradient. Simply supply a named
list with hex colors for “low” and “high”
(e.g. list("low"="#B1F9E8", "high"="#239895")).
Adds intensity_by, intensity_lims, and
intensity_beyond_lims arguments to
sum_tile_settings() to allow setting them separately for
sum tiles.
Adds intensity_lims argument which allows setting a
custom range for the tile color intensities. Makes it easier to compare
plots for different prediction sets.
Adds intensity_beyond_lims for specifying how to
handle counts / percentages outside the specified
intensity_lims. Default is to truncate the
intensities.
Fixes bug where arrow size was not taking add_sums
into account.
plot_confusion_matrix():Adds option to set intensity_by to a log/arcsinh
transformed version of the counts. This adds the options
"log counts", "log2 counts",
"log10 counts", "arcsinh counts" to the
intensity_by argument.
Fixes bug when add_sums = TRUE and
counts_on_top = TRUE.
Raises error for negative counts.
Fixes zero-division when all counts are 0.
Sets palette colors to lowest value when all counts are 0.
In plot_confusion_matrix(), adds
sub_col argument for passing in text to replace the bottom
text (counts by default).
In plot_confusion_matrix(), fixes direction of
arrows when class_order is specified.
In update_hyperparameters(), allows
hyperparameters argument to be NULL. Thanks @ggrothendieck
for reporting the issue.
In relevant contexts: Informs user once about the
positive argument in evaluate() and
cross_validate*() not affecting the interpretation of
probabilities. I, myself, had forgotten about this in a project, so
seems useful to remind us all about :-)
Fixes usage of the "all" name in
set_metrics() after purrr v1.0.0
update.
Makes testing conditional on the availability of
xpectr.
Fixes tidyselect-related warnings.
parameters 0.19.0. Thanks to @strengejacke.Fixes tests for CRAN.
Adds merDeriv as suggested package.
parameters 0.15.0. Thanks to @strengejacke.checkmate 2.1.0.ggplot2 functions. Now
compatible with ggplot2 3.3.4.In order to reduce dependencies, model coefficients are now
tidied with the parameters package instead of
broom and broom.mixed. Thanks to @IndrajeetPatil for
the contributions.
In cross_validate() and
cross_validate_fn(), fold columns can now have a varying
number of folds in repeated cross-validation. Struggling to choose a
number of folds? Average over multiple settings.
In the Class Level Results in multinomial
evaluations, the nested Confusion Matrix and
Results tibbles are now named with their class to ease
extraction and further work with these tibbles. The Results
tibble further gets a Class column. This information might
be redundant, but could make life easier.
Adds vignette:
Multiple-k: Picking the number of folds for cross-validation.
plot_confusion_matrix(), where tiles with
a count > 0 but a rounded percentage of 0 did not have the percentage
text. Only tiles with a count of 0 should now be without text.Breaking change: In plot_confusion_matrix(), the
targets_col and predictions_col arguments have
been renamed to target_col and prediction_col
to be consistent with evaluate().
Breaking change: In evaluate_residuals(), the
targets_col and predictions_col arguments have
been renamed to target_col and prediction_col
to be consistent with evaluate().
Breaking change: In
process_info_gaussian/binomial/multinomial(), the
targets_col argument have been renamed to
target_col to be consistent with
evaluate().
In binomial most_challenging(), the
probabilities are now properly of the second class
alphabetically.
In plot_confusion_matrix(), adds argument
class_order for manually setting the order of the classes
in the facets.
In plot_confusion_matrix(), tiles with a count of
0 no longer has text in the tile by default. This adds the
rm_zero_percentages (for column/row percentage) and
rm_zero_text (for counts and normalized)
arguments.
In plot_confusion_matrix(), adds optional sum tiles.
Enabling this (add_sums = TRUE) adds an extra column and an
extra row with the sums. The corner tile contains the total count. This
adds the add_sums and sums_settings arguments.
A sum_tile_settings() function has been added to control
the appearance of these tiles. Thanks to @MaraAlexeev for
the idea.
In plot_confusion_matrix(), adds option
(intensity_by) to set the color intensity of the tiles to
the overall percentages (normalized).
In plot_confusion_matrix(), adds option to only have
row and column percentages in the diagonal tiles. Thanks to @xgirouxb for the
idea.
Adds Process information to output with the settings
used. Adds transparency. It has a custom print method, making it easy to
read. Underneath it is a list, why all information is available using
$ or similar. In most cases, the Family
information has been moved into the Process object. Thanks
to @daviddalpiaz for
notifying me of the need for more transparency.
In outputs, the Family information is (in most
cases) moved into the new Process object.
In binomial evaluate() and
baseline(), Accuracy is now enabled by
default. It is still disabled in cross_validate*()
functions to guide users away from using it as the main criterion for
model selection (as it is well known to many but can be quite bad in
cases with imbalanced datasets.)
Fixes: In binomial evaluation, the probabilities are now properly of the second class alphabetically. When the target column was a factor where the levels were not in alphabetical order, the second level in that order was used. The levels are now sorted before extraction. Thanks to @daviddalpiaz for finding the bug.
Fixes: In grouped multinomial evaluation, when predictions are classes and there are different sets of classes per group, only the classes in the subset are used.
Fixes: Bug in ROC direction parameter being set
wrong when positive is numeric. In regression tests, the
AUC scores were not impacted.
Fixes: 2-class multinomial evaluation returns all
expected metrics.
In multinomial evaluation, the Class Level Results
are sorted by the Class.
Imports broom.mixed to allow tidying of coefficients
from lme4::lmer models.
Exports process_info_binomial(),
process_info_multinomial(),
process_info_gaussian() constructors to ensure the various
methods are available. They are not necessarily intended for external
use.
dplyr version 1.0.0.
NOTE: this version of dplyr slows down some functions in
cvms significantly, why it might be beneficial not to
update before version 1.1.0, which is supposed to tackle
this problem.rsvg and ggimage are now only
suggested and plot_confusion_matrix() throws
warning if either are not installed.
Additional input checks for evaluate().
In cross_validate() and validate(), the
models argument is renamed to formulas. This
is a more meaningful name that was recently introduced in
cross_validate_fn(). For now, the models
argument is deprecated, will be used instead of formulas if
specified, and will throw a warning.
In cross_validate() and validate(), the
model_verbose argument is renamed to verbose.
This is a more meaningful name that was recently introduced in
cross_validate_fn(). For now, the
model_verbose argument is deprecated, will be used instead
of verbose if specified, and will throw a warning.
In cross_validate() and validate(), the
link argument is removed. Consider using
cross_validate_fn() or validate_fn() instead,
where you have full control over the prediction type fed to the
evaluation.
In cross_validate_fn(), the
predict_type argument is removed. You now have to pass a
predict function as that is safer and more transparent.
In functions with family/type argument,
this argument no longer has a default, forcing the user to specify the
family/type of the task. This also means that arguments have been
reordered. In general, it is safer to name arguments when passing values
to them.
In evaluate(), apply_softmax now
defaults to FALSE. Throws error if probabilities do not add
up to 1 row-wise (tolerance of 5 decimals) when type is
multinomial.
multinomial MCC is now the proper
multiclass generalization. Previous versions used
macro MCC. Removes MCC from the class level
results. Removes the option to enable
Weighted MCC.
multinomial AUC is calculated with
pROC::multiclass.roc() instead of in the one-vs-all
evaluations. This removes AUC, Lower CI, and
Upper CI from the Class Level Results and
removes Lower CI and Upper CI from the main
output tibble. Also removes option to enable “Weighted AUC”, “Weighted
Lower CI”, and “Weighted Upper CI”.
multinomial AUC is disabled by default,
as it can take a long time to calculate for a large set of
classes.
ROC columns now return the ROC objects
instead of the extracted sensitivities and
specificities, both of which can be extracted from the
objects.
In evaluate(), it’s no longer possible to pass model
objects. It now only evaluates the predictions. This removes the the
AIC, AICc, BIC, r2m,
and r2c metrics.
In cross_validate and validate(), the
r2m, and r2c metrics are now disabled by
default in gaussian. The r-squared metrics are
non-predictive and should not be used for model selection. They can be
enabled with
metrics = list("r2m" = TRUE, "r2c" = TRUE).
In cross_validate_fn(), the AIC,
AICc, BIC, r2m, and
r2c metrics are now disabled by default in
gaussian. Only some model types will allow the computation
of those metrics, and it is preferable that the user actively makes a
choice to include them.
In baseline(), the AIC,
AICc, BIC, r2m, and
r2c metrics are now disabled by default in
gaussian. It can be unclear whether the IC metrics
(computed on the lm()/lmer() model objects)
can be compared to those calculated for a given other model function. To
avoid such confusion, it is preferable that the user actively makes a
choice to include the metrics. The r-squared metrics will only be
non-zero when random effects are passed. Given that we shouldn’t use the
r-squared metrics for model selection, it makes sense to not have them
enabled by default.
validate() now returns a tibble with the model
objects nested in the Model column. Previously, it returned
a list with the results and models. This allows for easier use in
magrittr pipelines (%>%).
In multinomial baseline(), the aggregation approach
is changed. The summarized results now properly describe the random
evaluations tibble, except for the four new measures
CL_Max, CL_Min, CL_NAs, and
CL_INFs, which describe the class level results.
Previously, NAs were removed before aggregating the
one-vs-all evaluations, meaning that some metric summaries could become
inflated if small classes had NAs. It was also
non-transparent that the NAs and INFs were
counted in the class level results instead of being a count of random
evaluations with NAs or INFs.
cv_plot() is removed. It wasn’t very useful and has
never been developed properly. We aim to provide specialized plotting
functions instead.
validate_fn() is added. Validate your custom model
function on a test set.
confusion_matrix() is added. Create a confusion
matrix and calculate associated metrics from your targets and
predictions.
evaluate_residuals() is added. Calculate common
metrics from regression residuals.
summarize_metrics() is added. Use it summarize the
numeric columns in your dataset with a set of common descriptors. Counts
the NAs and Infs. Used by
baseline().
select_definitions() is added. Select the columns
that define the models, such as Dependent,
Fixed, Random, and the (unnested)
hyperparameters.
model_functions() is added. Contains simple
model_fn examples that can be used in
cross_validate_fn() and validate_fn() or as
starting points.
predict_functions() is added. Contains simple
predict_fn examples that can be used in
cross_validate_fn() and validate_fn() or as
starting points.
preprocess_functions() is added. Contains simple
preprocess_fn examples that can be used in
cross_validate_fn() and validate_fn() or as
starting points.
update_hyperparameters() is added. For managing
hyperparameters when writing custom model functions.
most_challenging() is added. Finds the data points
that were the most difficult to predict.
plot_confusion_matrix() is added. Creates a
ggplot representing a given confusion matrix. Thanks to
Malte Lau Petersen (@maltelau), Maris Sala (@marissala) and Kenneth
Enevoldsen (@KennethEnevoldsen) for
feedback.
plot_metric_density() is added. Creates a ggplot
density plot for a metric column.
font() is added. Utility for setting font settings
(size, color, etc.) in plotting functions.
simplify_formula() is added. Converts a formula with
inline functions to a simple formula where all variables are added
together (e.g. y ~ x*z + log(a) + (1|b) ->
y ~ x + z + a + b). This is useful when passing a formula
to recipes::recipe(), which doesn’t allow the inline
functions.
gaussian_metrics(), binomial_metrics(),
and multinomial_metrics() are added. Can be used to select
metrics for the metrics argument in many cvms
functions.
baseline_gaussian(),
baseline_binomial(), baseline_multinomial()
are added. Simple wrappers for baseline() that are easier
to use and have simpler help files. baseline() has a lot of
arguments that are specific to a family, which can be a bit
confusing.
wines dataset is added. Contains a list of wine
varieties in an approximately Zipfian distribution.
musicians dataset is added. This has been
generated for multiclass classification
examples.
predicted.musicians dataset is added. This contains
cross-validated predictions of the musicians dataset by
three algorithms. Can be used to demonstrate working with predictions
from repeated 5-fold stratified cross-validation.
Adds NRMSE(RNG), NRMSE(IQR),
NRMSE(STD), NRMSE(AVG) metrics to
gaussian evaluations. The RMSE is normalized
by either target range (RNG), target interquartile range (IQR), target
standard deviation (STD), or target mean (AVG). Only
NRMSE(IQR) is enabled by default.
Adds RMSLE, RAE, RSE,
RRSE, MALE, MAPE,
MSE, TAE and TSE metrics to
gaussian evaluations. RMSLE, RAE,
and RRSE are enabled by default.
Adds Information Criterion metrics (AIC,
AICc, BIC) to the binomial and
multinomial output of some functions (disabled by default).
These are based on the fitted model objects and will only work for some
types of models.
Adds Positive Class column to binomial
evaluations.
Adds optional hyperparameter argument to
cross_validate_fn(). Pass a list of hyperparameters and
every combination of these will be cross-validated.
Adds optional preprocess_fn argument to
cross_validate_fn(). This can, for instance, be used to
standardize the training and test sets within the function. E.g., by
extracting the scaling and centering parameters from the training set
and apply them to both the training set and the test fold.
Adds Preprocess column to output when
preprocess_fn is passed. Contains returned parameters
(e.g. mean, sd) used in preprocessing.
Adds preprocess_once argument to
cross_validate_fn(). When preprocessing does not depend on
the current formula or hyperparameters, we might as well perform it on
each train/test split once, instead of for every model.
Adds metrics argument to baseline().
Enable the non-default metrics you want a baseline evaluation
for.
Adds preprocessing argument to
cross_validate() and validate(). Currently
allows “standardize”, “scale”, “center”, and “range”. Results will
likely not be affected noticeably by the preprocessing.
Adds add_targets and
add_predicted_classes arguments to
multiclass_probability_tibble().
Adds Observation column in the nested predictions
tibble in cross_validate(),
cross_validate_fn(), validate(), and
validate_fn(). These indices can be used to identify which
observations are difficult to predict.
Adds SD column in the nested predictions tibble in
evaluate() when performing ID aggregated evaluation with
id_method = 'mean'. This is the standard deviation of the
predictions for the ID.
Adds vignette:
Cross-validating custom model functions with cvms
Adds vignette:
Creating a confusion matrix with cvms
Adds vignette:
The available metrics in cvms
Adds vignette: Evaluate by ID/group
The metrics argument now allows setting a boolean
for "all" inside the list to enable or disable all the
metrics. For instance, the following would disable all the metrics
except RMSE:
metrics = list("all" = FALSE, "RMSE" = TRUE).
multinomial evaluation results now contain the
Results tibble with the results for each fold column. The
main metrics are now averages of these fold column results. Previously,
they were not aggregated by fold column first. In the unit tests, this
has not altered the results, but it is a more correct approach.
The prediction column(s) in evaluate() must be
either numeric or character, depending on the format chosen.
In binomial evaluate(), it’s now
possible to pass predicted classes instead of probabilities.
Probabilities still carry more information though. Both the prediction
and target columns must have type character in this format.
Changes the required arguments in the predict_fn
function passed to cross_validate_fn().
Changes the required arguments in the model_fn
function passed to cross_validate_fn().
Warnings and messages from preprocess_fn are caught
and added to Warnings and Messages. Warnings are counted in
Other Warnings.
Nesting is now done with dplyr::group_nest instead
of tidyr::nest_legacy for speed improvements.
caret, mltools, and
ModelMetrics are no longer dependencies. The confusion
matrix metrics have instead been implemented in cvms (see
confusion_matrix()).
select_metrics() now works with a wider range of
inputs as it no longer depends on a Family column.
The Fixed column in some of the output tibbles have
been moved to make it clearer which model was evaluated.
Better handling of inline functions in formulas.
evaluate(), when used on a grouped data
frame. The row order in the output was not guaranteed to fit the
grouping keys.Fixes documentation in cross_validate_fn(). The
examples section contained an unreasonable number of mistakes
:-)
In cross_validate_fn(), warnings and messages from
the predict function are now included in
Warnings and Messages. The warnings are counted in
Other Warnings.
Breaking change: In evaluate(), when
type is multinomial, the output is now a
single tibble. The Class Level Results are included as a
nested tibble.
Breaking change: In baseline(), lmer
models are now fitted with REML = FALSE by
default.
Adds REML argument to
baseline().
cross_validate_fn() is added. Cross-validate custom
model functions.
Bug fix: the control argument in
cross_validate() was not being used. Now it is.
In cross_validate(), the model is no longer fitted
twice when a warning is thrown during fitting.
Adds metrics argument to
cross_validate() and validate(). Allows
enabling the regular Accuracy metric in
binomial or to disable metrics (will currently still be
computed but not included in the output).
AICc is now computed with the MuMIn
package instead of the AICcmodavg package, which is no
longer a dependency.
Adds lifecycle badges to the function
documentation.
evaluate() is added. Evaluate your model’s
predictions with the same metrics as used in
cross_validate().
Adds 'multinomial' family/type to
baseline() and evaluate().
Adds multiclass_probability_tibble() for generating
a random probability tibble.
Adds random_effects argument to
baseline() for adding random effects to the Gaussian
baseline model.
Adds Zenodo DOI for easier citation.
In nested confusion matrices, the Reference column is renamed to Target, to use the same naming scheme as in the nested predictions.
Bug fix: p-values are correctly added to the nested coefficients tibble. Adds tests of this table as well.
Adds extra unit tests to increase code coverage.
When argument "model_verbose" is TRUE,
the used model function is now messaged instead of printed.
Adds badges to README, including travis-ci status, AppVeyor status, Codecov, min. required R version, CRAN version and monthly CRAN downloads. Note: Zenodo badge will be added post release.
R v. 3.5Adds optional parallelization.
Results now contain a count of singular fit messages. See
?lme4::isSingular for more information.
Argument "positive" changes default value to 2. Now
takes either 1 or 2 (previously 0 and 1). If your dependent variable has
values 0 and 1, 1 is now the positive class by default.
AUC calculation has changed. Now explicitly sets the direction in
pROC::roc.
Unit tests have been updated for the new random sampling
generator in R 3.6.0. They will NOT run previous versions
of R.
Adds baseline() for creating baseline
evaluations.
Adds reconstruct_formulas() for reconstructing
formulas based on model definition columns in the results
tibble.
Adds combine_predictors() for generating model
formulas from a set of fixed effects.
Adds select_metrics() for quickly selecting the
metrics and model definition columns.
Breaking change: Metrics have been rearranged and a few metrics have been added.
Breaking change: Renamed argument folds_col to
fold_cols to better fit the new repeated cross-validation
option.
New: repeated cross-validation.
Created package :)