calculateMetrics

Calculate assessment metrics

The metrics calculated depend on whether the problem is framed as a multiclass or binary classification. Multiclass must be used when more than two classes are differentiated. If two classes are differentiated, binary should be used if there is a clear positive, presence, or foreground class and a clear negative, absence, or background class. If this is not the case, multiclass mode is more meaningful.

The mice() function is used to calculate a set of metrics by providing vectors or a dataframe containing columns of reference and predicted class labels. Alternatively, the miceCM() function can be used to calculate the metrics from a confusion matrix table where the columns represent the correct label and the rows represent the predicted labels.

Results are returned as a list object. For a multiclass classification, the following objects are returned:

Mappings = class names
confusionMatrix = confusion matrix where columns represent the reference data and rows represent the classification result
referenceCounts = count of samples in each reference class
predictionCounts = count of predictions in each class
overallAccuracy = overall accuracy
MICE = map image classification efficacy
usersAccuracies = class-level user’s accuracies (1 - commission error)
CTBICEs = classification-total-based image classification efficacies (adjusted user’s accuracies)
producersAccuracies = class-level producer’s accuracies (1 - omission error)
RTBICEs = reference-total-based image classification efficacies (adjusted producer’s accuracies)
F1Scores = class-level harmonic means of user’s and producer’s accuracies
F1Efficacies = F1-score efficacies
macroPA = class-aggregated, macro-averaged producer’s accuracy
macroRTBICE = class-aggregated, macro-averaged reference-total-based image classification efficacy
macroUA = class-aggregated, macro-averaged user’s accuracy
macroCTBICE = class-aggregated, macro-averaged classification-total-based image classification efficacy
macroF1 = class-aggregated, macro-averaged F1-score
macroF1Efficacy = class-aggregated, macro-averaged F1 efficacy

The mcData.rda data included with the package represents a multiclass problem in which the following classes are differentiated (counts are relative to the reference labels): “Barren” (n=163), “Forest” (n=20,807), “Impervious” (n=426), “Low Vegetation” (n=3,182), “Mixed Dev” (n=520), and “Water” (n=200). There are a total of 25,298 samples. The code example below shows how to derive assessment metrics for these data using both the mice() and miceCM() functions. To perform multiclass assessment, the multiclass argument must be set to TRUE. The mappings parameter allows the user to provide names for each class. If no mappings are provided, the default factor level names are used.

data(mcData)

miceResultMC <- mice(mcData$ref,
                     mcData$pred,
                     mappings=c("Barren", 
                                "Forest", 
                                "Impervious", 
                                "Low Vegetation", 
                                "Mixed Dev", 
                                "Water"),
                     multiclass=TRUE)


cmMC <- table(mcData$pred, mcData$ref)
miceResultMC <- miceCM(cmMC,
                       mappings=c("Barren", 
                                  "Forest", 
                                  "Impervious", 
                                  "Low Vegetation", 
                                  "Mixed Dev", 
                                  "Water"),
                       multiclass=TRUE)

print(miceResultMC)
#> $Mappings
#> [1] "Barren"         "Forest"         "Impervious"     "Low Vegetation"
#> [5] "Mixed Dev"      "Water"         
#> 
#> $confusionMatrix
#>                 Reference
#> Predicted        Barren Forest Impervious Low Vegetation Mixed Dev Water
#>   Barren             75      7         59             46         1     6
#>   Forest             13  20585         62            617       142    21
#>   Impervious         10      8        196             33        22    12
#>   Low Vegetation     63    138         34           2413        84     1
#>   Mixed Dev           1     64         75             72       270     2
#>   Water               1      5          0              1         1   158
#> 
#> $referenceCounts
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>            163          20807            426           3182            520 
#>          Water 
#>            200 
#> 
#> $predictionCounts
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>            194          21440            281           2733            484 
#>          Water 
#>            166 
#> 
#> $overallAccuracy
#> [1] 0.9367144
#> 
#> $MICE
#> [1] 0.7937788
#> 
#> $usersAccuracies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.3865979      0.9601213      0.6975089      0.8829125      0.5578512 
#>          Water 
#>      0.9518072 
#> 
#> $CTBICEs
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.3826138      0.7753487      0.6923248      0.8660647      0.5485675 
#>          Water 
#>      0.9514226 
#> 
#> $producersAccuracies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4601227      0.9893305      0.4600939      0.7583281      0.5192308 
#>          Water 
#>      0.7900000 
#> 
#> $RTBICEs
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4566161      0.9398949      0.4508410      0.7235537      0.5091362 
#>          Water 
#>      0.7883244 
#> 
#> $f1Scores
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4201680      0.9745071      0.5544554      0.8158918      0.5378486 
#>          Water 
#>      0.8633879 
#> 
#> $f1Efficacies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4163522      0.8497292      0.5460772      0.7884211      0.5281168 
#>          Water 
#>      0.8622284 
#> 
#> $macroPA
#> [1] 0.662851
#> 
#> $macroRTBUCE
#> [1] 0.6447277
#> 
#> $macroUA
#> [1] 0.7394665
#> 
#> $macroCTBICE
#> [1] 0.7027237
#> 
#> $macroF1
#> [1] 0.6990658
#> 
#> $macroF1Efficacy
#> [1] 0.6724776

C:4eth4v4424215b11b9.R

For a binary classification, the following objects are returned within a list object:

Mappings = class names
confusionMatrix = confusion matrix where columns represent the reference data and rows represent the classification result
referenceCounts = count of samples in each reference class
predictionCounts = count of predictions in each class
postiveCase = name or mapping for the positive case
overallAccuracy = overall accuracy
MICE = map image classification efficacy
Precision = precision (1 - commission error relative to positive case)
precisionEfficacy = precision efficacy
NPV = negative predictive value (1 - commission error relative to negative case)
npvEfficacy = negative predictive value efficacy
Recall = recall (1 - omission error relative to positive case)
recallEfficacy = recall efficacy
specificity = specificity (1 - omission error relative to negative case)
specificityEfficacy = specificity efficacy
f1Score = harmonic mean of precision and recall
f1Efficacy = F1-score efficacy

The biData.rda file included with the package represents results for a binary classification. “Mine” is the positive case and “Not Mine” is the background class. There are 178 samples from the “Mine” class and 4,822 samples from the “Not Mine” class. Class proportions are based on landscape proportions and are relative to the reference labels. There are a total of 5,000 samples. The example code below demonstrates calculating binary assessment results using mice() and miceCM(). When performing assessment for a binary classification, the multiclass parameter must be set to FALSE and the index associated with the positive case must be provided. Here, “Mine” has an index of 1 while “Not Mine” has an index of 2.

data(biData)

miceResultBI <- mice(biData$ref,
                     biData$pred,
                     mappings = c("Mined", 
                                  "Not Mined"),
                     multiclass=FALSE,
                     positiveIndex=1)

cmB <- table(biData$pred, biData$ref)
miceResultBI <- miceCM(cmB,
                       mappings=c("Mined", 
                                  "Not Mined"),
                       multiclass=FALSE,
                       positiveIndex=1)

print(miceResultBI)
#> $Mappings
#> [1] "Mined"     "Not Mined"
#> 
#> $confusionMatrix
#>            Reference
#> Predicted   Mined Not Mined
#>   Mined       158         2
#>   Not Mined    20      4820
#> 
#> $referenceCounts
#>     Mined Not Mined 
#>       178      4822 
#> 
#> $predictionCounts
#>     Mined Not Mined 
#>       160      4840 
#> 
#> $positiveCase
#> [1] "Mined"
#> 
#> $overallAccuracy
#> [1] 0.9956
#> 
#> $mice
#> [1] 0.9359024
#> 
#> $Precision
#> [1] 0.9874999
#> 
#> $precisionEfficacy
#> [1] 0.9870384
#> 
#> $NPV
#> [1] 0.9958678
#> 
#> $npvEfficacy
#> [1] 0.8838934
#> 
#> $Recall
#> [1] 0.8876404
#> 
#> $recallEfficacy
#> [1] 0.8834915
#> 
#> $Specificity
#> [1] 0.9995852
#> 
#> $specificityEfficicacy
#> [1] 0.9883459
#> 
#> $f1Score
#> [1] 0.9349112
#> 
#> $f1ScoreEfficacy
#> [1] 0.9323989

C:4eth4v4424215b11b9.R

The miceCI() function calculates confidence intervals for all aggregated metrics. This is accomplished by calculating the metrics using a large number of subsets from the entire dataset. Subsets are generated using bootstrapping. In our examples below, 1,000 replicates are used with reach replicate including 70% of the available samples. The lowPercentile and highPercentile arguments allow for defining the confidence interval to use. In our example, it is configured for a 95% confidence interval. The result is a dataframe object the includes the mean, median, and upper and lower confidence intervals for all aggregated metrics.

data(mcData)

ciResultsMC <- miceCI(rep=1000,
                      frac=.7,
                      mcData$ref,
                      mcData$pred,
                      lowPercentile=0.025,
                      highPercentile=0.975,
                      mappings=c("Barren", 
                                 "Forest", 
                                 "Impervious", 
                                 "Low Vegetation", 
                                 "Mixed Dev", 
                                 "Water"),
                      multiclass=TRUE)

print(ciResultsMC)
#>            metric      mean    median    low.ci   high.ci
#> 1 overallAccuracy 0.9366679 0.9366989 0.9330284 0.9401434
#> 2            MICE 0.7934428 0.7934450 0.7829382 0.8033249
#> 3         macroPA 0.6625287 0.6624393 0.6394932 0.6848644
#> 4     macroRTBUCE 0.6443700 0.6444234 0.6206823 0.6672057
#> 5         macroUA 0.7391670 0.7390207 0.7198489 0.7587803
#> 6     macroCTBICE 0.7024025 0.7023108 0.6822655 0.7228386
#> 7         macroF1 0.6987213 0.6986863 0.6790950 0.7178079
#> 8 macroF1Efficacy 0.6721069 0.6720102 0.6513987 0.6917839

C:4eth4v4424215b11b9.R

data(biData)

ciResultsBi <- miceCI(rep=1000,
                      frac=.7,
                      biData$ref,
                      biData$pred,
                      lowPercentile=0.025,
                      highPercentile=0.975,
                      mappings = c("Mined", 
                                   "Not Mined"),
                      multiclass=FALSE,
                      positiveIndex=1)

print(ciResultsBi)
#>                   metric      mean    median    low.ci   high.ci
#> 1        overallAccuracy 0.9956274 0.9957143 0.9934286 0.9974286
#> 2                   MICE 0.9361483 0.9363576 0.9047585 0.9637775
#> 3              Precision 0.9870046 0.9903845 0.9629629 0.9999999
#> 4      precisionEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 5                    NPV 0.9870046 0.9903845 0.9629629 0.9999999
#> 6            npvEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 7                 Recall 0.8886629 0.8897348 0.8305004 0.9366276
#> 8         recallEfficacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 9            Specificity 0.8886629 0.8897348 0.8305004 0.9366276
#> 10 specificityEfficicacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 11               f1Score 0.9350095 0.9355530 0.9004728 0.9641469
#> 12       f1ScoreEfficacy 0.9324995 0.9330913 0.8969650 0.9627475

C:4eth4v4424215b11b9.R

Lastly, MICE metrics can be compared between two models. This requires the reference labels and the predictions from two separate models. The comparison is performed using a paired t-test, a large number of bootstrap replicates, and by comparing the difference between the calculated metric on a pairwise basis for each bootstrap sample. In the provided example, which makes use of the compareData.rda data provided with micer, the mean difference between MICE metrics for a random forest and single decision tree model is 0.108, and the two models are suggested to be statistically different using a 95% confidence interval.

data(compareData)

set.seed(42)
compareResult <- miceCompare(ref=compareData$ref,
                             result1=compareData$rfPred,
                             result2=compareData$dtPred,
                             reps=1000,
                             frac=.7)

print(compareResult)
#> 
#>  Paired t-test
#> 
#> data:  resultsDF$mice1 and resultsDF$mice2
#> t = 262.52, df = 999, p-value < 2.2e-16
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#>  0.1063643 0.1079664
#> sample estimates:
#> mean difference 
#>       0.1071654

C:4eth4v4424215b11b9.R

calculateMetrics

Intro to micer

Calculate assessment metrics