--- title: "Adjustments and Comparisons" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adjustments and Comparisons} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` One of the most powerful features of COINr is the possibility to copy, adjust and compare coins. A coin is structured list that represents a composite indicator. Since it is an R object like any other, it can be copied and modified, and alternative versions can be easily compared. This generally requires four steps: 1. Make a copy of the coin 2. Adjust the coin 3. Regenerate the coin 4. Compare coins These will be explained in the following sections. # Regeneration The first three points on the list above will be addressed here. We must begin by explaining the "Log" of a coin. In COINr, some functions are distinguished as "building functions". These functions start with a capital letter (with one exception), and have the following defining features: 1. When a building function is run, it creates a new data set in `.$Data`. 2. When a building function is run, it records its function arguments in `.$Log`. Building functions are the following: Function Description ------------------ --------------------------------------------------------------- `new_coin()` Initialise a coin object given indicator data and metadata `Screen()` Screen units based on data availability rules `Denominate()` Denominate/scale indicators by other indicators `Impute()` Impute missing data `Treat()` Treat outliers and skewed distributions `Normalise()` Normalise indicators onto a common scale `Aggregate()` Aggregate indicators using weighted mean Let's explain the concept of the "Log" now with an example. We will build the example coin manually, then look inside the coin's Log list: ```{r} library(COINr) # create new coin by calling new_coin() coin <- new_coin(ASEM_iData, ASEM_iMeta, level_names = c("Indicator", "Pillar", "Sub-index", "Index")) # look in log str(coin$Log, max.level = 2) ``` Looking in the log, we can see that it is a list with an entry "new_coin", which contains exactly the arguments that we passed to `new_coin()`: `iData`, `iMeta`, the level names, and two other arguments which are the default values of the function. There is also another logical variable called `can_regen` which is for internal use only. This demonstrates that when we call a building function, its arguments are stored in the coin. To show another example, if we apply the `Normalise()` function: ```{r} # normalise coin <- Normalise(coin, dset = "Raw") # view log str(coin$Log, max.level = 2) ``` Now we additionally have a "Normalise" entry, with all the function arguments that we specified, plus defaults. Now, the reason that building functions write to the log, is that it allows coins to be *regenerated*, which means automatically re-running the building functions that were used to create the coin and its data sets. This is done with a function called `Regen()`: ```{r} # regenerate the coin coin <- Regen(coin, quietly = FALSE) ``` When `Regen()` is called, it runs the buildings *in the order that they are found in the log*. This is an important point because if you iteratively re-run building functions, you might end up with an order that is not what you expect. You can check the log if you have any doubts (anyway you would probably encounter an error if the order is incorrect). Also, each building function can only be run once in a regeneration. So why regenerate coins - aren't the results exactly the same? Yes, unless you modify something first. And this brings us to the copying and modifying points. Let us take an example: first, we'll build the full example coin, then we'll make a copy of our existing coin: ```{r} # build full example coin coin <- build_example_coin(quietly = TRUE) # copy coin coin2 <- coin ``` At this point, the coins are identical. What if we want to test an alternative methodology, for example a different normalisation method? This can be done by editing the Log of the coin, then regenerating. Here, we will change the normalisation method to percentile ranks, and regenerate. To make this change it is necessary to target the right argument. Let's first see what is already in the Log for `Normalise()`: ```{r} str(coin2$Log$Normalise) ``` At the moment, the normalisation is min-max onto the interval of 0 to 100. We will change this to the new function `n_prank()`: ```{r} # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) ``` And that's it. In summary, we copied the coin, edited its log to a different normalisation methodology, and then regenerated the results. Now what remains is to compare the results, and this is dealt with in the next section. Before that, let's consider what kind of things we can change in a coin. Anything in the Log can be changed, but of course it is up to you to change it to something valid. As long as you carefully follow the function help pages, this shouldn't be any more difficult than using the functions directly. You can also change anything else about the coin, including the input data, by targeting the log of `new_coin()`. Changing anything outside of the Log will not generally have an effect because the coin will be recreated by `new_coin()` during regeneration and this will be overwritten. The exception is if you use the `from` argument of `Regen()`: in this case the regeneration will only begin from the function name that you pass to it. This partial regeneration can also be useful to speed up computation time. # Adding/removing indicators One adjustment that may be of interest is to add and remove indicators. This needs to be done with care because removing an indicator requires that it is removed from both `iData` and `iMeta` when building the coin with `new_coin()`. It is not possible to remove indicators after the coin is assembled, without completely regenerating the coin. One way to add or remove indicators is to edit the `iData` and `iMeta` data frames by hand and then rebuild the coin. Another way is to regenerate the coin, but use the `exclude` argument of `new_coin()`. A short cut function, `change_ind()` can be also used to quickly add or remove indicators from the framework, and regenerate the coin, all in one command. ```{r} # copy base coin coin_remove <- coin # remove two indicators and regenerate the coin coin_remove <- change_ind(coin, drop = c("LPI", "Forest"), regen = TRUE) coin_remove ``` The `drop` argument is used to specify which indicators to remove. The `add` argument adds indicators, although any indicators specified by `add` must be available in the original `iData` and `iMeta` that were passed to `new_coin()`. This means that `add` can only be used if you have previously excluded some of the indicators. In general, if you want to test the effect of different indicators, you should include all candidate indicators in `iData` and `iMeta` and use `exclude` from `new_coin()` and/or `change_ind()` to select subsets. The advantage of doing it this way is that different subsets can be tested as part of a sensitivity analysis, for example. In fact `change_ind()` simply edits the `exclude` argument of `new_coin()`, but is a quick way of doing this. Moreover it is safer, because it performs a few checks on the indicator codes to add or remove. It is also possible to effectively remove indicators by setting weights to zero. This is similar to the above approach but not necessarily identical: weights only come into play at the aggregation step, which is usually the last operation. If you perform unit screening, or imputation, the presence of zero-weighted indicators could still influence the results, depending on the settings. The effects of removing indicators and aggregates can also be tested using the `remove_elements()` function, which removes all indicators or aggregates in a specified level and calculates the impact. # Comparison Comparing coins is helped by two dedicated functions, `compare_coins()` and `compare_coins_multi()`. The former is for comparing two coins only, whereas the latter allows to compare more than two coins. Let's start by comparing the two coins we have: the default example coin, and the same coin but with a percentile rank normalisation method: ```{r} # compare index, sort by absolute rank difference compare_coins(coin, coin2, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE) ``` This shows that for the overall index, the maximum rank change is 10 places for Portugal. We can compare ranks or scores, for any indicator or aggregate in the index. This also works if the number of units changes. At the moment, the coin has an imputation step which fills in all `NA`s. We could alternatively filter out any units with less than 90% data availability and remove the imputation step. ```{r} # copy original coin coin90 <- coin # remove imputation entry completely (function will not be run) coin90$Log$Impute <- NULL # set data availability threshold to 90% coin90$Log$Screen$dat_thresh <- 0.9 # we also need to tell Screen() to use the denominated dset now coin90$Log$Screen$dset <- "Denominated" # regenerate coin90 <- Regen(coin90) # summarise coin coin90 ``` We can see that we are down to 46 units after the screening step. Now let's compare with the original coin: ```{r} # compare index, sort by absolute rank difference compare_coins(coin, coin90, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE) ``` The removed units are marked as `NA` in the second coin. Finally, to demonstrate comparing multiple coins, we can call the `compare_coins_multi()` function: ```{r} compare_coins_multi(list(Nominal = coin, Prank = coin2, NoLPIFor = coin_remove, Screen90 = coin90), dset = "Aggregated", iCode = "Index") ``` This simply shows the ranks of each of the three coins side by side. We can also choose to compare scores, and to display rank changes or absolute rank changes. Obviously a requirement is that the coins must all have some common units, and must all have `iCode` and `dset` available within.