Introduction

The discAUC package was written to be a generic method to calculate various forms of area under the curve (AUC) for discounting data. The original formulation of AUC for discounting data was Myerson et al. (2001) and the formulations for AUClog and AUCord are described in Borges et al. (2016). The point of this package is not to provide a better way to calculate AUC, rather it is to standardize the calculations. The goals of this package are to:

Have a single function (AUC) to calculate AUC.
Have a function to easily convert probability discounting for AUC.
- As is the standard practice, AUC for probability discounting requires converting the likelihood of receiving the outcome to the odds against receiving the outcome.
Have a function to easily prepare discounting data to calculate AUCord.
Have a function to easily prepare discounting data to calculate AUClog.
Have a function that will impute missing indifference points when delay/social distance/odds against receiving the outcome/etc. are equal to zero.

Loading the Package and Setting up Data

#Load discounting AUC package
library(discAUC)

#Load dplyr, which aids in data manipulation
library(dplyr)

The one major assumption of this package is that discounting data are Tidy. The principles of tidy data are outlined by Wickham (2014). To overly simplify, the discounting data should be organized in a long-format in which there is only one indifference point per row of data. All of the pertinent information for that indifference point (e.g., subject/participant identifier, condition, delay, etc.) should be organized in different columns of the data. Two example data sets are included with the package that demonstrate the expected data format and can be used to test the package.

#Example Tidy Delay Discounting Data
examp_DD
#> # A tibble: 360 × 4
#>    subject delay_months outcome       prop_indiff
#>      <dbl>        <dbl> <chr>               <dbl>
#>  1     103       0.0333 alcohol             0.878
#>  2     103       0.25   alcohol             0.749
#>  3     103       0.5    alcohol             0.747
#>  4     103       1      alcohol             0.741
#>  5     103       6      alcohol             0.249
#>  6     103      60      alcohol             0.241
#>  7     103       0.0333 entertainment       0.987
#>  8     103       0.25   entertainment       0.933
#>  9     103       0.5    entertainment       0.929
#> 10     103       1      entertainment       0.935
#> # … with 350 more rows

The indifference points are the last column in this example data set (“prop_indiff”). For each indifference point the subject number, delay to receiving the indifference point, and the type of outcome being discounted are all labeled in different columns. The tidy format allows for easier data manipulation and organization. For example, if you only wanted to calculate AUC for a specific subject or for a specific set of conditions then you filter the data based on the variables of interest.

#Filter example DD data by subject (relies on dplyr library)
examp_DD %>%
  filter(subject == 103)
#> # A tibble: 24 × 4
#>    subject delay_months outcome       prop_indiff
#>      <dbl>        <dbl> <chr>               <dbl>
#>  1     103       0.0333 alcohol             0.878
#>  2     103       0.25   alcohol             0.749
#>  3     103       0.5    alcohol             0.747
#>  4     103       1      alcohol             0.741
#>  5     103       6      alcohol             0.249
#>  6     103      60      alcohol             0.241
#>  7     103       0.0333 entertainment       0.987
#>  8     103       0.25   entertainment       0.933
#>  9     103       0.5    entertainment       0.929
#> 10     103       1      entertainment       0.935
#> # … with 14 more rows

#Filter example DD data by outcome type
examp_DD %>%
  filter(outcome=="alcohol")
#> # A tibble: 90 × 4
#>    subject delay_months outcome prop_indiff
#>      <dbl>        <dbl> <chr>         <dbl>
#>  1     103       0.0333 alcohol       0.878
#>  2     103       0.25   alcohol       0.749
#>  3     103       0.5    alcohol       0.747
#>  4     103       1      alcohol       0.741
#>  5     103       6      alcohol       0.249
#>  6     103      60      alcohol       0.241
#>  7     108       0.0333 alcohol       0.298
#>  8     108       0.25   alcohol       0.298
#>  9     108       0.5    alcohol       0.298
#> 10     108       1      alcohol       0.298
#> # … with 80 more rows

Calculating AUC

For the initial demonstrations, we will use the median indifference points for money to show the calculations for AUC. We will obtain the median indifference points for both delay discounting (DD) and probability discounting (PD).

#Subject -987.987 are precalculated median indifference points for each outcome.
DD_med_indiff = examp_DD %>%
                filter(subject == -987.987,
                outcome == "$100 Gain")

PD_med_indiff = examp_PD %>%
                filter(subject == -987.987,
                outcome == "$100 Gain")

#Note that the median indifference point subject number (-987.987) is truncated in the output.
DD_med_indiff
#> # A tibble: 6 × 4
#>   subject delay_months outcome   prop_indiff
#>     <dbl>        <dbl> <chr>           <dbl>
#> 1   -988.       0.0333 $100 Gain       0.957
#> 2   -988.       0.25   $100 Gain       0.862
#> 3   -988.       0.5    $100 Gain       0.771
#> 4   -988.       1      $100 Gain       0.750
#> 5   -988.       6      $100 Gain       0.456
#> 6   -988.      60      $100 Gain       0.200

PD_med_indiff
#> # A tibble: 6 × 4
#>   subject  prob outcome   prop_indiff
#>     <dbl> <dbl> <chr>           <dbl>
#> 1   -988.  0.95 $100 Gain       0.957
#> 2   -988.  0.9  $100 Gain       0.862
#> 3   -988.  0.7  $100 Gain       0.771
#> 4   -988.  0.5  $100 Gain       0.750
#> 5   -988.  0.3  $100 Gain       0.456
#> 6   -988.  0.05 $100 Gain       0.200

If the data are in a tidy format, AUC can be calculated simply by supplying the data to the AUC function and entering the necessary parameters. Several parts of the function are looking for the variable names in the data that indicate the relevant data. The variable names should be entered within quotes ("prop_indiff" instead of prop_indiff). The x_axis is the delay/social distance/likelihood of receiving the outcome. The amount must be set manually because it is possible that an indifference point with the maximum value (amount of the larger outcome) does not exist in the data. The grouping factor is designed to aid in calculating AUC across a whole data set at once (see below). At this time a grouping factor must be included, even if there is only one set of indifference points being used to calculate AUC.

AUC(dat = DD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = "subject")
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.359

To calculate probability discounting, set the flag of prob_disc = TRUE. The function will convert all of the likelihoods (prob) to odds against receiving the outcome. AUC will then be calculated based on the odds against receiving the outcome. Note, that the output will not clearly indicate that you calculated AUC for probability discounting data because the function only returns AUC values.

AUC(dat = PD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "prob",
    amount = 1,
    grouping = "subject",
    prob_disc = TRUE)
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.372

Different Versions of AUC

Borges et al. (2016) outlined two alternative methods for calculating AUC. AUClog simply transforms the x_axis values and calculates AUC based on the transformed values. AUCord transforms the x_axis values into their ordinal position and calculates AUC based on the ordinal position.

Calculating AUCord

We will describe calculating AUCord first because it can be done simply by changing the type of AUC in the function. Specifically, type = "ordinal"

#Ordinal AUC for DD data
AUC(dat = DD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = "subject",
    type = "ordinal")
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.359

#Ordinal AUC for PD data
AUC(dat = PD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "prob",
    amount = 1,
    groupings = "subject",
    prob_disc = TRUE,
    type = "ordinal")
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.372

Note on the order of operations: When calculating AUCord for probability discounting data, the AUC function will obtain the odds against and then calculate the ordinal values based on the odds against.

Calculating AUClog

Calculating AUClog with the AUC function is also relatively simple. However, in addition to setting the type = "log" you can also specify the base of the logarithm you wish to use. By default, the function uses a base of 2. The AUC function uses an adjustment procedure (based on the average distance between log transformed values) to account for when the x_axis value equals 0. See below for a full description of the correction types.

#Ordinal AUC for DD data
AUC(dat = DD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = "subject",
    type = "log")
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.693

#Ordinal AUC for PD data with log base 10
AUC(dat = PD_med_indiff,
    indiff = "prop_indiff",
    x_axis = "prob",
    amount = 1,
    groupings = "subject",
    prob_disc = TRUE,
    type = "log",
    log_base = 10)
#> # A tibble: 1 × 2
#>   subject   AUC
#>     <dbl> <dbl>
#> 1   -988. 0.676

Calculating AUC for larger data sets

A single data set might have several sets of indifference points. The example data sets include indifference one set of indifference points per outcome for each participant. Instead of calculating AUC one-by-one for each specific set, AUC can be calculated en masse for all of the sets of indifference points. To use the en masse AUC calculations, you need to set the variables that can be used to identify each specific set of indifference points in the groupings factor. The column names should be entered within quotes.

#For demonstration, filter for median indifference points for all outcomes.
DD_med_outcomes = examp_DD %>%
              filter(subject == -987.987)

#Simple AUC
AUC(dat = DD_med_outcomes,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = "outcome")
#> # A tibble: 4 × 2
#>   outcome          AUC
#>   <chr>          <dbl>
#> 1 $100 Gain     0.359 
#> 2 alcohol       0.0953
#> 3 entertainment 0.405 
#> 4 food          0.158

For this example, we simplified the example data to the median indifference points for each outcome. We added grouping = "outcome" to indicate that the function should return an AUC value for each outcome in the median indifference point data set. The AUC returns a different AUC value for each level in the “grouping” factor.

The grouping function can handle more than one variable at a time by using the combine function c() on the left hand side of the grouping = in the AUC function. The columns that should be used to identify the independent the sets of indifference points should be entered as c("variable1", "variable2", "variable3", etc.). In the following example, AUC is calculated for each outcome by subject.

#AUC by outcome and subject
AUC(dat = examp_DD,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = c("outcome", "subject"))
#> # A tibble: 60 × 3
#> # Groups:   outcome [4]
#>    outcome   subject      AUC
#>    <chr>       <dbl>    <dbl>
#>  1 $100 Gain   -988. 0.359   
#>  2 $100 Gain     -2  0.000278
#>  3 $100 Gain     -1  1       
#>  4 $100 Gain    103  0.789   
#>  5 $100 Gain    108  0.768   
#>  6 $100 Gain    119  0.669   
#>  7 $100 Gain    122  0.462   
#>  8 $100 Gain    142  0.0642  
#>  9 $100 Gain    169  0.0673  
#> 10 $100 Gain    175  0.465   
#> # … with 50 more rows

The order of the grouping variable does not change the resulting AUC data, only the order in which the columns are returned.

#AUC by outcome and subject
AUC(dat = examp_DD,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = c("outcome", "subject"))
#> # A tibble: 60 × 3
#> # Groups:   outcome [4]
#>    outcome   subject      AUC
#>    <chr>       <dbl>    <dbl>
#>  1 $100 Gain   -988. 0.359   
#>  2 $100 Gain     -2  0.000278
#>  3 $100 Gain     -1  1       
#>  4 $100 Gain    103  0.789   
#>  5 $100 Gain    108  0.768   
#>  6 $100 Gain    119  0.669   
#>  7 $100 Gain    122  0.462   
#>  8 $100 Gain    142  0.0642  
#>  9 $100 Gain    169  0.0673  
#> 10 $100 Gain    175  0.465   
#> # … with 50 more rows

#AUC by subject and outcome
AUC(dat = examp_DD,
    indiff = "prop_indiff",
    x_axis = "delay_months",
    amount = 1,
    grouping = c("subject", "outcome"))
#> # A tibble: 60 × 3
#> # Groups:   subject [15]
#>    subject outcome            AUC
#>      <dbl> <chr>            <dbl>
#>  1   -988. $100 Gain     0.359   
#>  2   -988. alcohol       0.0953  
#>  3   -988. entertainment 0.405   
#>  4   -988. food          0.158   
#>  5     -2  $100 Gain     0.000278
#>  6     -2  alcohol       0.000278
#>  7     -2  entertainment 0.000278
#>  8     -2  food          0.000278
#>  9     -1  $100 Gain     1       
#> 10     -1  alcohol       1       
#> # … with 50 more rows

Finer Control of AUC calculations

If you are calculating probability discounting, AUClog, or AUCord, technically the AUC() function does not transform you x_axis values. The AUC only uses the default AUC equation as described by Myerson et al. (2001). That is AUC is the sum of all the trapezoid area between successive indifference points. A single trapezoid has an area of,

\[ (x_2 -x_1)*\frac{y_1+y_2}{2} \]

where the x values are the successive delays, social distances, odds against, etc. and the y values are the successive indifference points. The x values are all standardized by the maximum delay, social distance, odds against, etc. The y values are all standardized by the amount of the larger outcome.

Within the discAUC package are separate functions that calculate the transformations on the indifference points supplied to the AUC() function. The AUC() function calls those specific transformation functions and then uses the transformed data in the above formula. In other words, there are no special versions of the above formula like,

\[ [log(x_2) -log(x_1)]*\frac{y_1+y_2}{2} \]

rather, each transformation is conducted and then fed back into the original formulation of AUC.

Each of the specific transformation functions can be called on their own. Using the functions prior to the AUC function is useful if you would like fine control of all of the pieces of the AUC calculation or you would like to inspect the transformed data prior to the AUC calculation.

AUC_zeros()

Myerson et al. (2001) call for including an indifference point when the x_axis value = 0. If an experimental assessment of the indifference point at X = 0 was not obtained then the value should be added. If the indifference points need to be added, the value of each added indifference should be equal to the larger outcome (i.e., A from a discounting model).

The AUC_zeros function will add zeros to a tidy set of indifference points. The function will only add the indifference points at X = 0 if they did not already exist in the data. Additionally, a new column will be added to the data file that will indicate which indifference points were originally included and which indifference points were added by the function.

#Examp_DD data did not include indifference points when delay = 0
AUC_zeros(dat = examp_DD,
          indiff = "prop_indiff",
          x_axis = "delay_months",
          amount = 1,
          groupings = c("subject","outcome"))
#> # A tibble: 420 × 5
#>    subject outcome       delay_months prop_indiff orig 
#>      <dbl> <chr>                <dbl>       <dbl> <lgl>
#>  1     103 alcohol             0            1     FALSE
#>  2     103 alcohol             0.0333       0.878 TRUE 
#>  3     103 alcohol             0.25         0.749 TRUE 
#>  4     103 alcohol             0.5          0.747 TRUE 
#>  5     103 alcohol             1            0.741 TRUE 
#>  6     103 alcohol             6            0.249 TRUE 
#>  7     103 alcohol            60            0.241 TRUE 
#>  8     103 entertainment       0            1     FALSE
#>  9     103 entertainment       0.0333       0.987 TRUE 
#> 10     103 entertainment       0.25         0.933 TRUE 
#> # … with 410 more rows

With probability discounting, the odds against values will be in the opposite order of the likelihoods of receiving the outcome. For that reason, if the data are probability discounting data you must indicate that prob_disc = TRUE in the function call. This will impute an indifference point with a likelihood of occurring of 100%. With a likelihood of 100% the odds against receiving that outcome will be converted to 0.

#Examp_PD data did not include indifference points when prob = 1
AUC_zeros(dat = examp_PD,
          indiff = "prop_indiff",
          x_axis = "prob",
          amount = 1,
          groupings = c("subject","outcome"),
          prob_disc = TRUE
          )
#> # A tibble: 420 × 5
#>    subject outcome        prob prop_indiff orig 
#>      <dbl> <chr>         <dbl>       <dbl> <lgl>
#>  1     103 alcohol        1          1     FALSE
#>  2     103 alcohol        0.95       0.878 TRUE 
#>  3     103 alcohol        0.9        0.749 TRUE 
#>  4     103 alcohol        0.7        0.747 TRUE 
#>  5     103 alcohol        0.5        0.741 TRUE 
#>  6     103 alcohol        0.3        0.249 TRUE 
#>  7     103 alcohol        0.05       0.241 TRUE 
#>  8     103 entertainment  1          1     FALSE
#>  9     103 entertainment  0.95       0.987 TRUE 
#> 10     103 entertainment  0.9        0.933 TRUE 
#> # … with 410 more rows

prep_odds_against()

With the prep_odds_against() function, you can indicate the x_axis variable and the function will calculate the odds against receiving the outcome. Odds against is the typical form for displaying probability discounting because the value of the outcome will decrease as the odds against receiving the outcome increase. The formula for calculating the odds against is

\[ \frac{1-p}{p} \]

where p is the probability of receiving the outcome.


#Make sure to indicate groupings, if necessary
prep_odds_against(dat = examp_PD,
                  x_axis = "prob",
                  groupings = c("subject","outcome"))
#> # A tibble: 360 × 5
#> # Groups:   subject, outcome [60]
#>    subject  prob outcome   prop_indiff prob_against
#>      <dbl> <dbl> <chr>           <dbl>        <dbl>
#>  1   -988.  0.95 $100 Gain       0.957       0.0526
#>  2   -988.  0.9  $100 Gain       0.862       0.111 
#>  3   -988.  0.7  $100 Gain       0.771       0.429 
#>  4   -988.  0.5  $100 Gain       0.750       1     
#>  5   -988.  0.3  $100 Gain       0.456       2.33  
#>  6   -988.  0.05 $100 Gain       0.200      19     
#>  7   -988.  0.95 alcohol         0.569       0.0526
#>  8   -988.  0.9  alcohol         0.273       0.111 
#>  9   -988.  0.7  alcohol         0.329       0.429 
#> 10   -988.  0.5  alcohol         0.189       1     
#> # … with 350 more rows

The odds against will be added as a new variable in the data set. The name of the variable will be the name of the original probability variable (“prob” in this case) with “_against” added afterwards. For the examp_PD data set, the odds against will be included as prob_against. Indicating the indifference points in this function is unnecessary, because it only transforms the probability of receiving the outcomes.

prep_ordinal()

The prep_ordinal() function behaves very similarly to prep_odds_against(). prep_ordinal() transforms the x_axis values into the ordinal position. Note that if there are indifference points for when the x_axis = 0, the ordinal value will be calculated as 0.


#Groupings must be specified, if necessary
prep_ordinal(dat = examp_DD,
             x_axis = "delay_months",
             groupings = c("subject","outcome"))
#> # A tibble: 360 × 5
#> # Groups:   subject, outcome [60]
#>    subject delay_months outcome   prop_indiff delay_months_ord
#>      <dbl>        <dbl> <chr>           <dbl>            <int>
#>  1   -988.       0.0333 $100 Gain       0.957                1
#>  2   -988.       0.25   $100 Gain       0.862                2
#>  3   -988.       0.5    $100 Gain       0.771                3
#>  4   -988.       1      $100 Gain       0.750                4
#>  5   -988.       6      $100 Gain       0.456                5
#>  6   -988.      60      $100 Gain       0.200                6
#>  7   -988.       0.0333 alcohol         0.569                1
#>  8   -988.       0.25   alcohol         0.273                2
#>  9   -988.       0.5    alcohol         0.329                3
#> 10   -988.       1      alcohol         0.189                4
#> # … with 350 more rows

The ordinals will be added as a new variable in the data set. The name of the variable will be the name of the original probability variable (“delay_months” in this case) with “_ord” added afterwards. For the examp_DD data set, the odds against will be included as delay_months_ord.

If ordinal values are desired for probability discounting data, then the prob_disc = TRUE flag must be set. For probability discounting, the ordinal values will be calculated in reverse order to match taking ordinal values of the odds against receiving the outcome. In other words, for delay discounting ordinal position of a delay increases as delay increases but for probability typically we want ordinal value increasing as the probability decreases.


prep_ordinal(dat = examp_PD,
             x_axis = "prob",
             groupings = c("subject","outcome"),
             prob_disc = TRUE)
#> # A tibble: 360 × 5
#> # Groups:   subject, outcome [60]
#>    subject  prob outcome   prop_indiff prob_ord
#>      <dbl> <dbl> <chr>           <dbl>    <int>
#>  1   -988.  0.95 $100 Gain       0.957        1
#>  2   -988.  0.9  $100 Gain       0.862        2
#>  3   -988.  0.7  $100 Gain       0.771        3
#>  4   -988.  0.5  $100 Gain       0.750        4
#>  5   -988.  0.3  $100 Gain       0.456        5
#>  6   -988.  0.05 $100 Gain       0.200        6
#>  7   -988.  0.95 alcohol         0.569        1
#>  8   -988.  0.9  alcohol         0.273        2
#>  9   -988.  0.7  alcohol         0.329        3
#> 10   -988.  0.5  alcohol         0.189        4
#> # … with 350 more rows

prep_ordinals_all()

prep_ordinal_all() is currently included to account for situations in which participants experience different x_axis values. For example, indifference points might be obtained for participant 1 at 1 week, 1 month, and 6 months and for participant 2 at 1 week, 3 months, and 1 year. In such a case, it would not be appropriate to label the ordinals as 1, 2, and 3 for the respective subjects. Young (2016) has proposed using a Halton sequence to measure discounting, but to my knowledge this technique has not be used. In a Halton sequence, a vast number of x_axis values would be used to provide a more accurate picture of the global discounting curve.

To account for this sort of situation, the prep_ordinal_all() function calculates the ordinal value for each x_axis value based on the respective ordinal position in the list of all x_axis values not just the ordinal position of values for a specific subject.

As there is currently limited utility for this technique, an example data set is not included in the package and must be created


#Create data based on values included in above example
examp_ord_all = 
  tibble(
    sub = c(1, 1, 1, 2, 2, 2),
    delay_weeks = c(1, 4, 26, 1, 13, 52)
    )

#Groupings are not necessary
prep_ordinal_all(dat = examp_ord_all,
                 x_axis = "delay_weeks")
#> # A tibble: 6 × 3
#>     sub delay_weeks delay_weeks_ord
#>   <dbl>       <dbl>           <int>
#> 1     1           1               1
#> 2     1           4               2
#> 3     1          26               4
#> 4     2           1               1
#> 5     2          13               3
#> 6     2          52               5

As with prep_ordinal(), if the data are from probability discounting then prep_ordinal_all() will reverse order the ordinal values.

prep_log()

Accounting for log(0)

When log transforming any values, a common concern is how to account for log(0) which is undefined.

Adding a correction factor

A common recommendation is to add a constant value (e.g., 1) to all values in the data set and transform the “corrected” values. One limitation of just using correction factor can change the relative distances between the log transformed x_axis values (those relative distances being the whole point of the log transformation in the first place).


prep_log_AUC(dat = examp_DD,
             x_axis = "delay_months",
             type = "corr")
#> # A tibble: 360 × 5
#>    subject delay_months outcome       prop_indiff log_delay_months
#>      <dbl>        <dbl> <chr>               <dbl>            <dbl>
#>  1     103       0.0333 alcohol             0.878           0.0473
#>  2     103       0.25   alcohol             0.749           0.322 
#>  3     103       0.5    alcohol             0.747           0.585 
#>  4     103       1      alcohol             0.741           1     
#>  5     103       6      alcohol             0.249           2.81  
#>  6     103      60      alcohol             0.241           5.93  
#>  7     103       0.0333 entertainment       0.987           0.0473
#>  8     103       0.25   entertainment       0.933           0.322 
#>  9     103       0.5    entertainment       0.929           0.585 
#> 10     103       1      entertainment       0.935           1     
#> # … with 350 more rows

The correction factor can be specified by setting correction to the desired value.


prep_log_AUC(dat = examp_DD,
             x_axis = "delay_months",
             type = "corr",
             correction = .25)
#> # A tibble: 360 × 5
#>    subject delay_months outcome       prop_indiff log_delay_months
#>      <dbl>        <dbl> <chr>               <dbl>            <dbl>
#>  1     103       0.0333 alcohol             0.878           -1.82 
#>  2     103       0.25   alcohol             0.749           -1    
#>  3     103       0.5    alcohol             0.747           -0.415
#>  4     103       1      alcohol             0.741            0.322
#>  5     103       6      alcohol             0.249            2.64 
#>  6     103      60      alcohol             0.241            5.91 
#>  7     103       0.0333 entertainment       0.987           -1.82 
#>  8     103       0.25   entertainment       0.933           -1    
#>  9     103       0.5    entertainment       0.929           -0.415
#> 10     103       1      entertainment       0.935            0.322
#> # … with 350 more rows

An additional problem that will crop up when calculating log transformed x_axis values is that you will have negative log(X) values, which is problematic for the standard method for calculating AUC. The following method for correcting log transformed x_axis values accounts for the negative log(X) values.

Adjusting log transformed values

Included in this package is a method for correcting log(0) values that is referred to as “adjust.” The adjust method calculates the mean distance between each successive log transformed x_axis value, forces log(0) = 0, and then add the mean log distance to the log transformed indifference points. Essentially, the adjust procedure uses a correction factor that will shift the indifference points upwards without changing the underlying distribution of the data. Specifically,

\[ \kappa = \frac{\sum_{n=1}^{N-1} (\log(X_{n+1}) - \log(X_{n}))}{N} \]

and

\[ X_{logtrans} = \log(X_{old}) + \kappa \]

The following code block demonstrates the calculations for adjustment correction.


#Initial vector with delays (in weeks)
delays = c(0, 0.25, 1, 4, 26)

#Log transform delays
log_delays = log(delays, base = 2)

#Display values
log_delays
#> [1]     -Inf -2.00000  0.00000  2.00000  4.70044

#Eliminate log(0) = -Inf
non_zero_log = log_delays[-1]

#Calculate the difference between succesive non-zero indifference points
log_diff = diff(non_zero_log)

#Print log_diff
log_diff
#> [1] 2.00000 2.00000 2.70044

#Adjustment factor
adjustment = mean(log_diff)

#Adjust log delays
new_log_delays = log_delays + adjustment

#Set log(0) = 0
new_log_delays[1] = 0

#Print adjusted log delays
new_log_delays
#> [1] 0.0000000 0.2334799 2.2334799 4.2334799 6.9339196

Additionally, the adjust method includes an automated correction for x_axis values that are between 0 and 1. The AUC formula assumes that the lowest x_axis value is 0. However, if an x_axis value is between 0 and 1 (for example 1 week will be expressed as 0.25 if the delays are expressed as months) then the log transformed x_axis value will be negative. The negative x_axis values will introduce errors to the AUC calculations. One solution is to express all indifference points in the lowest possible unit (weeks instead of months, days instead of weeks, etc.). The solution this package uses is to increase all of the log transformed x_axis values by the absolute value of the minimum log transformed x_axis value. Specifically,

\[ X_{logtrans} = \log(X_{old}) + \log(\min(X)) \]

is the formula for correcting for negative log transformed x_axis values.

In total, the final logs transformed X values can be found with the formula,

\[ X_{logtrans} = \log(X_{old}) + \kappa + \log(\min(X)) \]

This adjustment procedure is the default method the AUC function uses is an “adjustment” to all of the log transformed values.

The following code block gives an example of the "adjust" method for log AUC.


#Default adjust method
prep_log_AUC(dat = examp_DD,
             x_axis = "delay_months",
             type = "adjust")
#> # A tibble: 360 × 5
#>    subject delay_months outcome       prop_indiff log_delay_months
#>      <dbl>        <dbl> <chr>               <dbl>            <dbl>
#>  1     103       0.0333 alcohol             0.878             0   
#>  2     103       0.25   alcohol             0.749             2.91
#>  3     103       0.5    alcohol             0.747             3.91
#>  4     103       1      alcohol             0.741             4.91
#>  5     103       6      alcohol             0.249             7.49
#>  6     103      60      alcohol             0.241            10.8 
#>  7     103       0.0333 entertainment       0.987             0   
#>  8     103       0.25   entertainment       0.933             2.91
#>  9     103       0.5    entertainment       0.929             3.91
#> 10     103       1      entertainment       0.935             4.91
#> # … with 350 more rows

If the adjust method is still desired with out the decimal correction, set dec_offset = FALSE.


#Log adjust method with no decimal correction
prep_log_AUC(dat = examp_DD,
             x_axis = "delay_months",
             type = "adjust",
             dec_offset = FALSE)
#> # A tibble: 360 × 5
#>    subject delay_months outcome       prop_indiff log_delay_months
#>      <dbl>        <dbl> <chr>               <dbl>            <dbl>
#>  1     103       0.0333 alcohol             0.878            -4.91
#>  2     103       0.25   alcohol             0.749            -2   
#>  3     103       0.5    alcohol             0.747            -1   
#>  4     103       1      alcohol             0.741             0   
#>  5     103       6      alcohol             0.249             2.58
#>  6     103      60      alcohol             0.241             5.91
#>  7     103       0.0333 entertainment       0.987            -4.91
#>  8     103       0.25   entertainment       0.933            -2   
#>  9     103       0.5    entertainment       0.929            -1   
#> 10     103       1      entertainment       0.935             0   
#> # … with 350 more rows

Inverse Hyperbolic Sine Transformation (IHS)

The final method for log transformation is not strictly a log transformation. Gilroy et al. (2021) proposed using the IHS transformation to aid in conduct demand analyses. Specifically, IHS was proposed to handle consumption values obtained from hypothetical purchase tasks and the associated problems of log transforming zero consumption values. I highly recommend reading their description of the IHS transformation. The first main benefit of the IHS transformation is that \(\mathrm{IHS}(0) = 0\). The second benefit of IHS is that it produces values that are generally similar to log transformed values (i.e., a high degree of correlation).

Specifically,

\[ \mathrm{IHS}(X) = \sinh^{-1}{ X} = \mathrm{arsinh } (X) \]

and the specific calculations for a single value are

\[ \mathrm{IHS}(X) = \ln{(X + \sqrt{X^2 + 1})} \]


#IHS transformation
prep_log_AUC(dat = examp_DD,
             x_axis = "delay_months",
             type = "IHS")
#> # A tibble: 360 × 5
#>    subject delay_months outcome       prop_indiff log_delay_months
#>      <dbl>        <dbl> <chr>               <dbl>            <dbl>
#>  1     103       0.0333 alcohol             0.878           0.0333
#>  2     103       0.25   alcohol             0.749           0.247 
#>  3     103       0.5    alcohol             0.747           0.481 
#>  4     103       1      alcohol             0.741           0.881 
#>  5     103       6      alcohol             0.249           2.49  
#>  6     103      60      alcohol             0.241           4.79  
#>  7     103       0.0333 entertainment       0.987           0.0333
#>  8     103       0.25   entertainment       0.933           0.247 
#>  9     103       0.5    entertainment       0.929           0.481 
#> 10     103       1      entertainment       0.935           0.881 
#> # … with 350 more rows

Combining Finer Control Functions

The functions described above can be used in a successive fashion. For example, when obtaining AUClog for probability discounting: first, probabilities of receiving an outcome should be converted to odds against and then second, log transformations should be performed on odds against values.

The main factor to account for when feeding the results from one function to the next function is that the x_axis value of interest changes after transformations have been transformed. Following the above example, we transform probability values into odds against values. For the log transformation, we want to treat the odds against values as the x_axis of interest.


#Calculate odds against
examp_PD_odds = prep_odds_against(dat = examp_PD,
                  x_axis = "prob",
                  groupings = c("subject","outcome"))

#Print odds against
examp_PD_odds
#> # A tibble: 360 × 5
#> # Groups:   subject, outcome [60]
#>    subject  prob outcome   prop_indiff prob_against
#>      <dbl> <dbl> <chr>           <dbl>        <dbl>
#>  1   -988.  0.95 $100 Gain       0.957       0.0526
#>  2   -988.  0.9  $100 Gain       0.862       0.111 
#>  3   -988.  0.7  $100 Gain       0.771       0.429 
#>  4   -988.  0.5  $100 Gain       0.750       1     
#>  5   -988.  0.3  $100 Gain       0.456       2.33  
#>  6   -988.  0.05 $100 Gain       0.200      19     
#>  7   -988.  0.95 alcohol         0.569       0.0526
#>  8   -988.  0.9  alcohol         0.273       0.111 
#>  9   -988.  0.7  alcohol         0.329       0.429 
#> 10   -988.  0.5  alcohol         0.189       1     
#> # … with 350 more rows

#Add zeros, but already converted to odds against so prob_disc = FALSE
#Note the x_axis value was changed to "prob_against" which was the newly added column
examp_PD_odds = AUC_zeros(dat = examp_PD_odds,
          x_axis = "prob_against",
          indiff = "prop_indiff",
          amount = 1,
          groupings = c("subject","outcome"),
          prob_disc = FALSE)

#Print odds agianst with zeros
examp_PD_odds
#> # A tibble: 420 × 6
#>    subject outcome   prob_against prop_indiff orig   prob
#>      <dbl> <chr>            <dbl>       <dbl> <lgl> <dbl>
#>  1   -988. $100 Gain       0            1     FALSE NA   
#>  2   -988. $100 Gain       0.0526       0.957 TRUE   0.95
#>  3   -988. $100 Gain       0.111        0.862 TRUE   0.9 
#>  4   -988. $100 Gain       0.429        0.771 TRUE   0.7 
#>  5   -988. $100 Gain       1            0.750 TRUE   0.5 
#>  6   -988. $100 Gain       2.33         0.456 TRUE   0.3 
#>  7   -988. $100 Gain      19            0.200 TRUE   0.05
#>  8   -988. alcohol         0            1     FALSE NA   
#>  9   -988. alcohol         0.0526       0.569 TRUE   0.95
#> 10   -988. alcohol         0.111        0.273 TRUE   0.9 
#> # … with 410 more rows

#Odds against to log_odds against.
#Note the x_axis value was changed to "prob_against" which was the newly added column
examp_PD_log_odds = prep_log_AUC(dat = examp_PD_odds,
                                 x_axis = "prob_against",
                                 type = "adjust")

examp_PD_log_odds
#> # A tibble: 420 × 7
#>    subject outcome   prob_against prop_indiff orig   prob log_prob_against
#>      <dbl> <chr>            <dbl>       <dbl> <lgl> <dbl>            <dbl>
#>  1   -988. $100 Gain       0            1     FALSE NA                0   
#>  2   -988. $100 Gain       0.0526       0.957 TRUE   0.95             1.70
#>  3   -988. $100 Gain       0.111        0.862 TRUE   0.9              2.78
#>  4   -988. $100 Gain       0.429        0.771 TRUE   0.7              4.72
#>  5   -988. $100 Gain       1            0.750 TRUE   0.5              5.95
#>  6   -988. $100 Gain       2.33         0.456 TRUE   0.3              7.17
#>  7   -988. $100 Gain      19            0.200 TRUE   0.05            10.2 
#>  8   -988. alcohol         0            1     FALSE NA                0   
#>  9   -988. alcohol         0.0526       0.569 TRUE   0.95             1.70
#> 10   -988. alcohol         0.111        0.273 TRUE   0.9              2.78
#> # … with 410 more rows

Finally, when using the completely transformed data, simply supply it to the AUC function.


AUC(dat = examp_PD_log_odds,
    indiff = "prop_indiff",
    x_axis = "log_prob_against",
    amount = 1,
    groupings = c("subject","outcome"))
#> # A tibble: 60 × 3
#> # Groups:   subject [15]
#>    subject outcome          AUC
#>      <dbl> <chr>          <dbl>
#>  1   -988. $100 Gain     0.676 
#>  2   -988. alcohol       0.309 
#>  3   -988. entertainment 0.735 
#>  4   -988. food          0.379 
#>  5     -2  $100 Gain     0.0833
#>  6     -2  alcohol       0.0833
#>  7     -2  entertainment 0.0833
#>  8     -2  food          0.0833
#>  9     -1  $100 Gain     1     
#> 10     -1  alcohol       1     
#> # … with 50 more rows

Using the discAUC package

Jonathan E. Friedel

2023 March, 17