library(micer)
C:4eth4v4424215b11b9.R
The goal of this simple R package is to allow for the calculation of map image classification efficacy (MICE) and associated metrics. MICE was originally proposed in the following paper:
Shao, G., Tang, L. and Zhang, H., 2021. Introducing image classification efficacies. IEEE Access, 9, pp.134809-134816.
It was further explored in the following paper:
Tang, L., Shao, J., Pang, S., Wang, Y., Maxwell, A., Hu, X., Gao, Z., Lan, T. and Shao, G., 2024. Bolstering Performance Evaluation of Image Segmentation Models with Efficacy Metrics in the Absence of a Gold Standard. IEEE Transactions on Geoscience and Remote Sensing.
MICE adjusts the accuracy rate relative to a random classification baseline. Only the proportions from the reference labels are considered, as opposed to the proportions from the reference and predictions, as is the case for the Kappa statistic. Due to documented issues with the Kappa statistic, its use in remote sensing and thematic map accuracy assessment is being discouraged. MICE offers an alternative to Kappa. This package specifically calculates MICE and adjusted versions of class-level user’s (i.e., precision) and producer’s (i.e., recall) accuracies and F1-scores. Class-level metrics are aggregated using macro-averaging in which each class contributes equally. Functions are also made available to estimate confidence intervals (CIs) using bootstrapping and to statistically compare two classification results.
This article demonstrates the functions made available by micer.
The metrics calculated depend on whether the problem is framed as a multiclass or binary classification. Multiclass must be used when more than two classes are differentiated. If two classes are differentiated, binary should be used if there is a clear positive, presence, or foreground class and a clear negative, absence, or background class. If this is not the case, multiclass mode is more meaningful.
The mice() function is used to calculate a set of metrics by providing vectors or a dataframe containing columns of reference and predicted class labels. Alternatively, the miceCM() function can be used to calculate the metrics from a confusion matrix table where the columns represent the correct label and the rows represent the predicted labels.
Results are returned as a list object. For a multiclass classification, the following objects are returned:
The mcData.rda data included with the package represents a multiclass problem in which the following classes are differentiated (counts are relative to the reference labels): “Barren” (n=163), “Forest” (n=20,807), “Impervious” (n=426), “Low Vegetation” (n=3,182), “Mixed Dev” (n=520), and “Water” (n=200). There are a total of 25,298 samples. The code example below shows how to derive assessment metrics for these data using both the mice() and miceCM() functions. To perform multiclass assessment, the multiclass argument must be set to TRUE. The mappings parameter allows the user to provide names for each class. If no mappings are provided, the default factor level names are used.
data(mcData)
<- mice(mcData$ref,
miceResultMC $pred,
mcDatamappings=c("Barren",
"Forest",
"Impervious",
"Low Vegetation",
"Mixed Dev",
"Water"),
multiclass=TRUE)
<- table(mcData$pred, mcData$ref)
cmMC <- miceCM(cmMC,
miceResultMC mappings=c("Barren",
"Forest",
"Impervious",
"Low Vegetation",
"Mixed Dev",
"Water"),
multiclass=TRUE)
print(miceResultMC)
#> $Mappings
#> [1] "Barren" "Forest" "Impervious" "Low Vegetation"
#> [5] "Mixed Dev" "Water"
#>
#> $confusionMatrix
#> Reference
#> Predicted Barren Forest Impervious Low Vegetation Mixed Dev Water
#> Barren 75 7 59 46 1 6
#> Forest 13 20585 62 617 142 21
#> Impervious 10 8 196 33 22 12
#> Low Vegetation 63 138 34 2413 84 1
#> Mixed Dev 1 64 75 72 270 2
#> Water 1 5 0 1 1 158
#>
#> $referenceCounts
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 163 20807 426 3182 520
#> Water
#> 200
#>
#> $predictionCounts
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 194 21440 281 2733 484
#> Water
#> 166
#>
#> $overallAccuracy
#> [1] 0.9367144
#>
#> $MICE
#> [1] 0.7937788
#>
#> $usersAccuracies
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.3865979 0.9601213 0.6975089 0.8829125 0.5578512
#> Water
#> 0.9518072
#>
#> $CTBICEs
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.3826138 0.7753487 0.6923248 0.8660647 0.5485675
#> Water
#> 0.9514226
#>
#> $producersAccuracies
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.4601227 0.9893305 0.4600939 0.7583281 0.5192308
#> Water
#> 0.7900000
#>
#> $RTBICEs
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.4566161 0.9398949 0.4508410 0.7235537 0.5091362
#> Water
#> 0.7883244
#>
#> $f1Scores
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.4201680 0.9745071 0.5544554 0.8158918 0.5378486
#> Water
#> 0.8633879
#>
#> $f1Efficacies
#> Barren Forest Impervious Low Vegetation Mixed Dev
#> 0.4163522 0.8497292 0.5460772 0.7884211 0.5281168
#> Water
#> 0.8622284
#>
#> $macroPA
#> [1] 0.662851
#>
#> $macroRTBUCE
#> [1] 0.6447277
#>
#> $macroUA
#> [1] 0.7394665
#>
#> $macroCTBICE
#> [1] 0.7027237
#>
#> $macroF1
#> [1] 0.6990658
#>
#> $macroF1Efficacy
#> [1] 0.6724776
C:4eth4v4424215b11b9.R
For a binary classification, the following objects are returned within a list object:
The biData.rda file included with the package represents results for a binary classification. “Mine” is the positive case and “Not Mine” is the background class. There are 178 samples from the “Mine” class and 4,822 samples from the “Not Mine” class. Class proportions are based on landscape proportions and are relative to the reference labels. There are a total of 5,000 samples. The example code below demonstrates calculating binary assessment results using mice() and miceCM(). When performing assessment for a binary classification, the multiclass parameter must be set to FALSE and the index associated with the positive case must be provided. Here, “Mine” has an index of 1 while “Not Mine” has an index of 2.
data(biData)
<- mice(biData$ref,
miceResultBI $pred,
biDatamappings = c("Mined",
"Not Mined"),
multiclass=FALSE,
positiveIndex=1)
<- table(biData$pred, biData$ref)
cmB <- miceCM(cmB,
miceResultBI mappings=c("Mined",
"Not Mined"),
multiclass=FALSE,
positiveIndex=1)
print(miceResultBI)
#> $Mappings
#> [1] "Mined" "Not Mined"
#>
#> $confusionMatrix
#> Reference
#> Predicted Mined Not Mined
#> Mined 158 2
#> Not Mined 20 4820
#>
#> $referenceCounts
#> Mined Not Mined
#> 178 4822
#>
#> $predictionCounts
#> Mined Not Mined
#> 160 4840
#>
#> $positiveCase
#> [1] "Mined"
#>
#> $overallAccuracy
#> [1] 0.9956
#>
#> $mice
#> [1] 0.9359024
#>
#> $Precision
#> [1] 0.9874999
#>
#> $precisionEfficacy
#> [1] 0.9870384
#>
#> $NPV
#> [1] 0.9958678
#>
#> $npvEfficacy
#> [1] 0.8838934
#>
#> $Recall
#> [1] 0.8876404
#>
#> $recallEfficacy
#> [1] 0.8834915
#>
#> $Specificity
#> [1] 0.9995852
#>
#> $specificityEfficicacy
#> [1] 0.9883459
#>
#> $f1Score
#> [1] 0.9349112
#>
#> $f1ScoreEfficacy
#> [1] 0.9323989
C:4eth4v4424215b11b9.R
The miceCI() function calculates confidence intervals for all aggregated metrics. This is accomplished by calculating the metrics using a large number of subsets from the entire dataset. Subsets are generated using bootstrapping. In our examples below, 1,000 replicates are used with reach replicate including 70% of the available samples. The lowPercentile and highPercentile arguments allow for defining the confidence interval to use. In our example, it is configured for a 95% confidence interval. The result is a dataframe object the includes the mean, median, and upper and lower confidence intervals for all aggregated metrics.
data(mcData)
<- miceCI(rep=1000,
ciResultsMC frac=.7,
$ref,
mcData$pred,
mcDatalowPercentile=0.025,
highPercentile=0.975,
mappings=c("Barren",
"Forest",
"Impervious",
"Low Vegetation",
"Mixed Dev",
"Water"),
multiclass=TRUE)
print(ciResultsMC)
#> metric mean median low.ci high.ci
#> 1 overallAccuracy 0.9366679 0.9366989 0.9330284 0.9401434
#> 2 MICE 0.7934428 0.7934450 0.7829382 0.8033249
#> 3 macroPA 0.6625287 0.6624393 0.6394932 0.6848644
#> 4 macroRTBUCE 0.6443700 0.6444234 0.6206823 0.6672057
#> 5 macroUA 0.7391670 0.7390207 0.7198489 0.7587803
#> 6 macroCTBICE 0.7024025 0.7023108 0.6822655 0.7228386
#> 7 macroF1 0.6987213 0.6986863 0.6790950 0.7178079
#> 8 macroF1Efficacy 0.6721069 0.6720102 0.6513987 0.6917839
C:4eth4v4424215b11b9.R
data(biData)
<- miceCI(rep=1000,
ciResultsBi frac=.7,
$ref,
biData$pred,
biDatalowPercentile=0.025,
highPercentile=0.975,
mappings = c("Mined",
"Not Mined"),
multiclass=FALSE,
positiveIndex=1)
print(ciResultsBi)
#> metric mean median low.ci high.ci
#> 1 overallAccuracy 0.9956274 0.9957143 0.9934286 0.9974286
#> 2 MICE 0.9361483 0.9363576 0.9047585 0.9637775
#> 3 Precision 0.9870046 0.9903845 0.9629629 0.9999999
#> 4 precisionEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 5 NPV 0.9870046 0.9903845 0.9629629 0.9999999
#> 6 npvEfficacy 0.9865279 0.9900386 0.9616807 0.9999999
#> 7 Recall 0.8886629 0.8897348 0.8305004 0.9366276
#> 8 recallEfficacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 9 Specificity 0.8886629 0.8897348 0.8305004 0.9366276
#> 10 specificityEfficicacy 0.8845593 0.8854809 0.8242343 0.9341545
#> 11 f1Score 0.9350095 0.9355530 0.9004728 0.9641469
#> 12 f1ScoreEfficacy 0.9324995 0.9330913 0.8969650 0.9627475
C:4eth4v4424215b11b9.R
Lastly, MICE metrics can be compared between two models. This requires the reference labels and the predictions from two separate models. The comparison is performed using a paired t-test, a large number of bootstrap replicates, and by comparing the difference between the calculated metric on a pairwise basis for each bootstrap sample. In the provided example, which makes use of the compareData.rda data provided with micer, the mean difference between MICE metrics for a random forest and single decision tree model is 0.108, and the two models are suggested to be statistically different using a 95% confidence interval.
data(compareData)
set.seed(42)
<- miceCompare(ref=compareData$ref,
compareResult result1=compareData$rfPred,
result2=compareData$dtPred,
reps=1000,
frac=.7)
print(compareResult)
#>
#> Paired t-test
#>
#> data: resultsDF$mice1 and resultsDF$mice2
#> t = 262.52, df = 999, p-value < 2.2e-16
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#> 0.1063643 0.1079664
#> sample estimates:
#> mean difference
#> 0.1071654
C:4eth4v4424215b11b9.R