calculateMetrics

library(micer)

C:4eth4v4424215b11b9.R

Intro to micer

The goal of this simple R package is to allow for the calculation of map image classification efficacy (MICE) and associated metrics. MICE was originally proposed in the following paper:

Shao, G., Tang, L. and Zhang, H., 2021. Introducing image classification efficacies. IEEE Access, 9, pp.134809-134816.

It was further explored in the following paper:

Tang, L., Shao, J., Pang, S., Wang, Y., Maxwell, A., Hu, X., Gao, Z., Lan, T. and Shao, G., 2024. Bolstering Performance Evaluation of Image Segmentation Models with Efficacy Metrics in the Absence of a Gold Standard. IEEE Transactions on Geoscience and Remote Sensing.

MICE adjusts the accuracy rate relative to a random classification baseline. Only the proportions from the reference labels are considered, as opposed to the proportions from the reference and predictions, as is the case for the Kappa statistic. Due to documented issues with the Kappa statistic, its use in remote sensing and thematic map accuracy assessment is being discouraged. MICE offers an alternative to Kappa. This package specifically calculates MICE and adjusted versions of class-level user’s (i.e., precision) and producer’s (i.e., recall) accuracies and F1-scores. Class-level metrics are aggregated using macro-averaging in which each class contributes equally. Functions are also made available to estimate confidence intervals (CIs) using bootstrapping and to statistically compare two classification results.

This article demonstrates the functions made available by micer.

Calculate assessment metrics

The metrics calculated depend on whether the problem is framed as a multiclass or binary classification. Multiclass must be used when more than two classes are differentiated. If two classes are differentiated, binary should be used if there is a clear positive, presence, or foreground class and a clear negative, absence, or background class. If this is not the case, multiclass mode is more meaningful.

The mice() function is used to calculate a set of metrics by providing vectors or a dataframe containing columns of reference and predicted class labels. Alternatively, the miceCM() function can be used to calculate the metrics from a confusion matrix table where the columns represent the correct label and the rows represent the predicted labels.

Results are returned as a list object. For a multiclass classification, the following objects are returned:

The mcData.rda data included with the package represents a multiclass problem in which the following classes are differentiated (counts are relative to the reference labels): “Barren” (n=163), “Forest” (n=20,807), “Impervious” (n=426), “Low Vegetation” (n=3,182), “Mixed Dev” (n=520), and “Water” (n=200). There are a total of 25,298 samples. The code example below shows how to derive assessment metrics for these data using both the mice() and miceCM() functions. To perform multiclass assessment, the multiclass argument must be set to TRUE. The mappings parameter allows the user to provide names for each class. If no mappings are provided, the default factor level names are used.

data(mcData)

miceResultMC <- mice(mcData$ref,
                     mcData$pred,
                     mappings=c("Barren", 
                                "Forest", 
                                "Impervious", 
                                "Low Vegetation", 
                                "Mixed Dev", 
                                "Water"),
                     multiclass=TRUE)


cmMC <- table(mcData$pred, mcData$ref)
miceResultMC <- miceCM(cmMC,
                       mappings=c("Barren", 
                                  "Forest", 
                                  "Impervious", 
                                  "Low Vegetation", 
                                  "Mixed Dev", 
                                  "Water"),
                       multiclass=TRUE)

print(miceResultMC)
#> $Mappings
#> [1] "Barren"         "Forest"         "Impervious"     "Low Vegetation"
#> [5] "Mixed Dev"      "Water"         
#> 
#> $confusionMatrix
#>                 Reference
#> Predicted        Barren Forest Impervious Low Vegetation Mixed Dev Water
#>   Barren             75      7         59             46         1     6
#>   Forest             13  20585         62            617       142    21
#>   Impervious         10      8        196             33        22    12
#>   Low Vegetation     63    138         34           2413        84     1
#>   Mixed Dev           1     64         75             72       270     2
#>   Water               1      5          0              1         1   158
#> 
#> $referenceCounts
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>            163          20807            426           3182            520 
#>          Water 
#>            200 
#> 
#> $predictionCounts
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>            194          21440            281           2733            484 
#>          Water 
#>            166 
#> 
#> $overallAccuracy
#> [1] 0.9367144
#> 
#> $MICE
#> [1] 0.7937788
#> 
#> $usersAccuracies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.3865979      0.9601213      0.6975089      0.8829125      0.5578512 
#>          Water 
#>      0.9518072 
#> 
#> $CTBICEs
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.3826138      0.7753487      0.6923248      0.8660647      0.5485675 
#>          Water 
#>      0.9514226 
#> 
#> $producersAccuracies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4601227      0.9893305      0.4600939      0.7583281      0.5192308 
#>          Water 
#>      0.7900000 
#> 
#> $RTBICEs
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4566161      0.9398949      0.4508410      0.7235537      0.5091362 
#>          Water 
#>      0.7883244 
#> 
#> $f1Scores
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4201680      0.9745071      0.5544554      0.8158918      0.5378486 
#>          Water 
#>      0.8633879 
#> 
#> $f1Efficacies
#>         Barren         Forest     Impervious Low Vegetation      Mixed Dev 
#>      0.4163522      0.8497292      0.5460772      0.7884211      0.5281168 
#>          Water 
#>      0.8622284 
#> 
#> $macroPA
#> [1] 0.662851
#> 
#> $macroRTBUCE
#> [1] 0.6447277
#> 
#> $macroUA
#> [1] 0.7394665
#> 
#> $macroCTBICE
#> [1] 0.7027237
#> 
#> $macroF1
#> [1] 0.6990658
#> 
#> $macroF1Efficacy
#> [1] 0.6724776

C:4eth4v4424215b11b9.R

For a binary classification, the following objects are returned within a list object:

The biData.rda file included with the package represents results for a binary classification. “Mine” is the positive case and “Not Mine” is the background class. There are 178 samples from the “Mine” class and 4,822 samples from the “Not Mine” class. Class proportions are based on landscape proportions and are relative to the reference labels. There are a total of 5,000 samples. The example code below demonstrates calculating binary assessment results using mice() and miceCM(). When performing assessment for a binary classification, the multiclass parameter must be set to FALSE and the index associated with the positive case must be provided. Here, “Mine” has an index of 1 while “Not Mine” has an index of 2.

data(biData)

miceResultBI <- mice(biData$ref,
                     biData$pred,
                     mappings = c("Mined", 
                                  "Not Mined"),
                     multiclass=FALSE,
                     positiveIndex=1)

cmB <- table(biData$pred, biData$ref)
miceResultBI <- miceCM(cmB,
                       mappings=c("Mined", 
                                  "Not Mined"),
                       multiclass=FALSE,
                       positiveIndex=1)

print(miceResultBI)
#> $Mappings
#> [1] "Mined"     "Not Mined"
#> 
#> $confusionMatrix
#>            Reference
#> Predicted   Mined Not Mined
#>   Mined       158         2
#>   Not Mined    20      4820
#> 
#> $referenceCounts
#>     Mined Not Mined 
#>       178      4822 
#> 
#> $predictionCounts
#>     Mined Not Mined 
#>       160      4840 
#> 
#> $positiveCase
#> [1] "Mined"
#> 
#> $overallAccuracy
#> [1] 0.9956
#> 
#> $mice
#> [1] 0.9359024
#> 
#> $Precision
#> [1] 0.9874999
#> 
#> $precisionEfficacy
#> [1] 0.9870384
#> 
#> $NPV
#> [1] 0.9958678
#> 
#> $npvEfficacy
#> [1] 0.8838934
#> 
#> $Recall
#> [1] 0.8876404
#> 
#> $recallEfficacy
#> [1] 0.8834915
#> 
#> $Specificity
#> [1] 0.9995852
#> 
#> $specificityEfficicacy
#> [1] 0.9883459
#> 
#> $f1Score
#> [1] 0.9349112
#> 
#> $f1ScoreEfficacy
#> [1] 0.9323989

C:4eth4v4424215b11b9.R

The miceCI() function calculates confidence intervals for all aggregated metrics. This is accomplished by calculating the metrics using a large number of subsets from the entire dataset. Subsets are generated using bootstrapping. In our examples below, 1,000 replicates are used with reach replicate including 70% of the available samples. The lowPercentile and highPercentile arguments allow for defining the confidence interval to use. In our example, it is configured for a 95% confidence interval. The result is a dataframe object the includes the mean, median, and upper and lower confidence intervals for all aggregated metrics.

data(mcData)

ciResultsMC <- miceCI(rep=1000,
                      frac=.7,
                      mcData$ref,
                      mcData$pred,
                      lowPercentile=0.025,
                      highPercentile=0.975,
                      mappings=c("Barren", 
                                 "Forest", 
                                 "Impervious", 
                                 "Low Vegetation", 
                                 "Mixed Dev", 
                                 "Water"),
                      multiclass=TRUE)

print(ciResultsMC)
#>            metric      mean    median    low.ci   high.ci
#> 1 overallAccuracy 0.9366679 0.9366989 0.9330284 0.9401434
#> 2            MICE 0.7934428 0.7934450 0.7829382 0.8033249
#> 3         macroPA 0.6625287 0.6624393 0.6394932 0.6848644
#> 4     macroRTBUCE 0.6443700 0.6444234 0.6206823 0.6672057
#> 5         macroUA 0.7391670 0.7390207 0.7198489 0.7587803
#> 6     macroCTBICE 0.7024025 0.7023108 0.6822655 0.7228386
#> 7         macroF1 0.6987213 0.6986863 0.6790950 0.7178079
#> 8 macroF1Efficacy 0.6721069 0.6720102 0.6513987 0.6917839

C:4eth4v4424215b11b9.R

data(biData)

ciResultsBi <- miceCI(rep=1000,
                      frac=.7,
                      biData$ref,
                      biData$pred,
                      lowPercentile=0.025,
                      highPercentile=0.975,
                      mappings = c("Mined", 
                                   "Not Mined"),
                      multiclass=FALSE,
                      positiveIndex=1)

print(ciResultsBi)
#>                   metric      mean    median    low.ci   high.ci
#> 1        overallAccuracy 0.9956274 0.9957143 0.9934286 0.9974286
#> 2                   MICE 0.9361483 0.9363576 0.9047585 0.9637775
#> 3              Precision 0.9870046 0.9903845 0.9629629 0.9999999
#> 4      precisionEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 5                    NPV 0.9870046 0.9903845 0.9629629 0.9999999
#> 6            npvEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 7                 Recall 0.8886629 0.8897348 0.8305004 0.9366276
#> 8         recallEfficacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 9            Specificity 0.8886629 0.8897348 0.8305004 0.9366276
#> 10 specificityEfficicacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 11               f1Score 0.9350095 0.9355530 0.9004728 0.9641469
#> 12       f1ScoreEfficacy 0.9324995 0.9330913 0.8969650 0.9627475

C:4eth4v4424215b11b9.R

Lastly, MICE metrics can be compared between two models. This requires the reference labels and the predictions from two separate models. The comparison is performed using a paired t-test, a large number of bootstrap replicates, and by comparing the difference between the calculated metric on a pairwise basis for each bootstrap sample. In the provided example, which makes use of the compareData.rda data provided with micer, the mean difference between MICE metrics for a random forest and single decision tree model is 0.108, and the two models are suggested to be statistically different using a 95% confidence interval.

data(compareData)

set.seed(42)
compareResult <- miceCompare(ref=compareData$ref,
                             result1=compareData$rfPred,
                             result2=compareData$dtPred,
                             reps=1000,
                             frac=.7)

print(compareResult)
#> 
#>  Paired t-test
#> 
#> data:  resultsDF$mice1 and resultsDF$mice2
#> t = 262.52, df = 999, p-value < 2.2e-16
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#>  0.1063643 0.1079664
#> sample estimates:
#> mean difference 
#>       0.1071654

C:4eth4v4424215b11b9.R