---
title: "Advanced climwin"
author: "Liam D. Bailey and Martijn van de Pol"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced climwin}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

In our introductory vignette we described the basic features available in the `climwin` package. Below, we will look in more detail at more advanced features available to users. We will cover:

* Fitting more complex baseline models (e.g. mixed effects models)

* Testing for climate thresholds.

* Dealing with data grouped over multiple years.

* Carrying out spatial replication.

* Fitting more complex curves to climate window data using `weightwin`.

******************************

## More complex baseline models

In our previous vignette we fitted a simple linear model to be used as our baseline and studied the linear effect of temperature; however, in many cases users may want to design more complex baseline models. We will discuss a number of examples here.

### Adding interactions with climate

Until now, we have assumed that the effects of climate do not interact with any other variables. This may not always be the case. As of v1.1.0, users can build a custom model structure with the ability to include interactions between climate and other variables. In our example using the Mass dataset, we might expect the effect of temperature on chick mass to vary with the average age of the mothers. 

Firstly, it is necessary to create a variable called 'climate' in the biological dataset. The variable 'climate' doesn't need to contain any information, it is simply a placeholder to specify where climate data will be included in the model.

*NOTE: It is necessary to call the new object 'climate' so that `climwin` can distinguish this variable from others. Other variable names will not work.*

```{r, eval = F}

Mass$climate <- 1

```

Now that we have our 'climate' variable, we can add it into our baseline model. We can then specify how we want our climatic data to interact with other variables in the model.

`baseline = lm(Mass ~ climate*Age, data = Mass)`

In this example, climatic data from each tested climate window will be included in the model interacting with parent age.    

```{r, eval = F}

Interaction <- slidingwin(xvar = list(Temp = MassClimate$Temp),
                          cdate = MassClimate$Date,
                          bdate = Mass$Date,
                          baseline = lm(Mass ~ climate*Age, data = Mass),
                          cinterval = "day",
                          range = c(150, 0),
                          type = "absolute", refday = c(20, 05),
                          stat = "mean",
                          func = "lin")

```

If we look at the best model from this analysis, we can see the interaction between temperature and age included in the model coefficients.

```{r, eval = F}

summary(Interaction[[1]]$BestModel)

```

```{r, eval = F}

Call:
lm(formula = yvar ~ climate + Age + climate:Age, data = modeldat)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6266 -1.5716  0.2878  1.6086  4.7510 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 170.2628     7.1678  23.754  < 2e-16 ***
climate      -5.5466     0.9200  -6.029 3.32e-07 ***
Age          -2.6046     2.6603  -0.979    0.333    
climate:Age   0.4024     0.3395   1.185    0.242    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.449 on 43 degrees of freedom
Multiple R-squared:  0.7778,	Adjusted R-squared:  0.7623 
F-statistic: 50.17 on 3 and 43 DF,  p-value: 4.267e-14

```

### Mixed effects models

It will often be important to fit models with random effects terms (e.g. Individual ID). `climwin` includes functionality for mixed effects models using both `lme4` and `nlme`. Random effects terms can easily be added to the baseline model. We can use the 'Offspring' dataset in R as an example.

`baseline = lmer(Offspring ~ 1 + (1|BirdID), data = Offspring, REML = F)`

*NOTE: Mixed effects models should be fitted using a maximum likelihood approach (i.e. REML = FALSE). If this is not specified in the baseline model, *climwin* will make this adjustment internally.*

In a more specific scenario, we may want to include our climatic variable as a random effect in our model. As above, this can be done by creating a 'climate' variable in our dataset and including it in the baseline model.

`baseline = lmer(Mass ~ 1 + (1|climate), data = Mass, REML = F)`

******************************

## Climate thresholds

Many studies, may be interested in testing climate windows using climatic thresholds. When testing climatic thresholds, we assume that a biological response is driven by days that surpass a particular climatic value. For example, seed germination may be influenced by the number of days over 30 degrees. Alternatively, temperature may only influence organism survival when it falls below freezing. A common example of such a climate threshold is growing degree days often used in the study of plant systems.

These can be achieved in `slidingwin` using the three parameters 'upper', 'lower' and, 'binary'. When a value is provided for the parameter 'upper', `slidingwin` will create a new climate dataset where all values equal to or below this threshold are set at 0. Similarly, when a value is set for 'lower', all values equal to or above this threshold will be set at 0. When values are provided for both 'upper' and 'lower', all values that fall *outside* these two threshold will be set at 0.

`upper = 30`

| Date        | Original temperature | Threshold temperature|
|:-----------:|:--------------------:|:--------------------:|
| 01/06/2015  |    25.9              | 0                    |
| 02/06/2015  |    24.0              | 0                    |
| 03/06/2015  |    32.5              | 32.5                 |
| 04/06/2014  |    28.1              | 0                    |
| 05/06/2014  |    30.5              | 30.5                 |
| 06/06/2014  |    30.0              | 0                    |
| 07/06/2014  |    31.2              | 31.2                 |
| 08/06/2014  |    27.0              | 0                    |
| ...         | ...                  | ...                  |

`upper = 30`
`lower = 25`

| Date        | Original temperature | Threshold temperature|
|:-----------:|:--------------------:|:--------------------:|
| 01/06/2015  |    25.9              | 25.9                 |
| 02/06/2015  |    24.0              | 0                    |
| 03/06/2015  |    32.5              | 0                    |
| 04/06/2014  |    28.1              | 28.1                 |
| 05/06/2014  |    30.5              | 0                    |
| 06/06/2014  |    30.0              | 0                    |
| 07/06/2014  |    31.2              | 0                    |
| 08/06/2014  |    27.0              | 27.0                 |
| ...         | ...                  | ...                  |


In some circumstances we may assume that all values past the climatic threshold will have an equally large impact on the biological response. In this case, we would set 'binary' to TRUE so that all non-zero values are set at 1. By default however, 'binary' will be set at FALSE, so that all values past the climatic threshold keep their original value.

`upper = 30`
`binary = TRUE`

| Date        | Original temperature | Threshold temperature |
|:-----------:|:--------------------:|:---------------------:|
| 01/06/2015  |    25.9              | 0                     |
| 02/06/2015  |    24.0              | 0                     |
| 03/06/2015  |    32.5              | 1                     |
| 04/06/2014  |    28.1              | 0                     |
| 05/06/2014  |    30.5              | 1                     |
| 06/06/2014  |    30.0              | 0                     |
| 07/06/2014  |    31.2              | 1                     |
| 08/06/2014  |    27.0              | 0                     |
| ...         | ...                  | ...                   |

### A worked example

Below we will provide a worked example using the `Mass` and `MassClimate` dataset. In this example, lets imagine we are interested in testing the impact of the number of days above freezing on our mass response variable. To do this we would set both our 'upper' and 'binary' parameters.

`upper = 0`
`binary = TRUE`

As we are interested in measuring the *number* of days above freezing, we set our stat parameter to 'sum'. Otherwise, we used model parameter values identical to our earlier vignette.

```{r message = FALSE}

library(climwin)

```

```{r, eval = FALSE}

MassWin <- slidingwin(xvar = list(Temp = MassClimate$Temp),
                      cdate = MassClimate$Date,
                      bdate = Mass$Date,
                      baseline = lm(Mass ~ 1, data = Mass),
                      cinterval = "day",
                      range = c(150, 0),
                      upper = 0, binary = TRUE,
                      type = "absolute", refday = c(20, 05),
                      stat = "sum",
                      func = "lin")
                       
```

When we examine the best model data, we can see that our climate data is now count data.

```{r, eval = FALSE}

head(MassWin[[1]]$BestModelData)

```

|Yvar  |climate |
|:----:|:------:|
|140   |0       |
|138   |0       |
|136   |1       |
|135   |2       |
|134   |0       |
|134   |0       |

******************************

## Biological measurements grouped across two years

Many long-term datasets that will be suitable for `climwin` are likely to be measured during Northern hemisphere spring/summer (e.g. breeding data). These biological records are measured during the middle of the year, meaning that biological records can be easily grouped by year. Yet in other circumstances biological measurements will fall across two years, particularly in Southern hemisphere species where spring/summer falls across the new year period [e.g., [1](https://doi.org/10.1086/659101)].

This can cause issues when fitting 'absolute' climate windows. As described in the introductory vignette, 'absolute' windows will use a set reference day for all biological records. Where biological measurements cross two years however, measurements from the same breeding season can be split up. In the table below, where a reference day of November 1st is used, all those measurements taken at the start of the breeding season are given a date of November 1st 2014 while all values following the new year are set at November 1st 2015. This is obviously unrealistic, as biological measurements in January 2015 cannot be impacted by climatic conditions that will occur 11 months in the future!

|Date      |Reference Date|
|:--------:|:------------:|
|05/11/2014|01/11/2014    |
|10/11/2014|01/11/2014    |
|01/12/2014|01/11/2014    |
|12/12/2014|01/11/2014    |
|01/01/2015|01/11/2015    |
|07/01/2015|01/11/2015    |

As a solution, `climwin` includes a 'cohort' parameter that allows users to specify which biological measurements should be grouped together (e.g. when they are from same breeding season). Each biological record should be given a cohort level (see below), which is taken into account when setting the reference day for climate window analyses.

|Date      |Cohort|Reference Date|
|:--------:|:----:|:------------:|
|05/11/2014|2014  |01/11/2014    |
|10/11/2014|2014  |01/11/2014    |
|01/12/2014|2014  |01/11/2014    |
|12/12/2014|2014  |01/11/2014    |
|01/01/2015|2014  |01/11/2014    |
|07/01/2015|2014  |01/11/2014    |

Some example code showing how the 'cohort' argument can be used is provided below. In this example we use data from a southern hemisphere breeding bird.

```{r, eval = FALSE}

SizeWin <- slidingwin(xvar = list(Temp = SizeClimate$Temperature),
                      cdate = SizeClimate$Date,
                      bdate = Size$Date,
                      baseline = lm(Size ~ 1, data = Size),
                      cohort = Size$Cohort,
                      cinterval = "day",
                      range = c(150, 0),
                      type = "absolute", refday = c(01, 10),
                      stat = "mean",
                      func = "lin")
                       
```

******************************

## Spatial replication

To detect climate signals using `climwin` can often require large amounts of data, particularly if the relationship between climate and biological response is weak [[2](https://doi.org/10.1111/2041-210X.12590)]. To obtain the required data through temporal replication requires the collection of data over multiple years, often decades; however, spatial replication may also allow users to expand their sample size over a shorter period by collecting data from multiple sites/populations.

Using spatial replication assumes that the relationship between the biological response and climatic predictor is consistent across the different measured populations. Where this assumption is valid, spatial replication can help expand the amount of data available for `climwin` analyses.

### A worked example

Spatially replicated data can be analysed using the `slidingwin` function with the addition of the 'spatial' parameter. As with regular `slidingwin` analysis, analysis with spatial replication requires a separate biological and climate dataset. However, these datasets should now contain an additional variable which specifies the site at which biological and climate data were collected. Below, we have called this parameter 'SiteID'.


| Date        | Mass (g) | SiteID |
|:-----------:|:--------:|:------:|
| 04/06/2015  |    120   |A       |
| 05/06/2015  |    123   |A       |
| 07/06/2015  |    110   |B       |
| 07/06/2015  |    140   |A       |
| 06/06/2015  |    138   |B       |
| ...         | ...      | ...    |


| Date        | Temperature | SiteID |
|:-----------:|:-----------:|:------:|
| 01/06/2015  |    15       |A       |
| 02/06/2015  |    16       |A       |
| 03/06/2015  |    12       |A       |
| 04/06/2015  |    18       |A       |
| 05/06/2015  |    20       |A       |
| 06/06/2015  |    23       |A       |
| 07/06/2015  |    21       |A       |
| 01/06/2015  |    10       |B       |
| 02/06/2015  |    12       |B       |
| 03/06/2015  |    9        |B       |
| 04/06/2015  |    5        |B       |
| 05/06/2015  |    13       |B       |
| 06/06/2015  |    10       |B       |
| 07/06/2015  |    11       |B       |
| ...         | ...         | ...    |

*NOTE: The climate dataset for spatially replicated `climwin` analysis will often include duplicate dates. In a regular `climwin` analysis this will lead to errors.*

With these new datasets, we can carry out a `slidingwin` analysis with the addition of a 'spatial' parameter.

```{r, eval = FALSE}

MassWin <- slidingwin(xvar = list(Temp = Climate$Temp),
                      cdate = Climate$Date,
                      bdate = Biol$Date,
                      baseline = lm(Mass ~ 1, data = Biol),
                      cinterval = "day",
                      range = c(150, 0),
                      type = "absolute", refday = c(20, 05),
                      stat = "mean",
                      func = "lin", spatial = list(Biol$SiteID, Climate$SiteID))
                       
```

The 'spatial' parameter is a list item that includes the SiteID variable for the biological and climate datasets respectively. When `slidingwin` fits individual climate windows, each biological record will be matched with the climate data from the corresponding site.

******************************

## `weightwin` function

When we run regular `slidingwin` analyses we assume that all days within the climate window are evenly weighted (see figure below). While this is often a convenient assumption, it can be considered biologically unrealistic.

```{r, echo = FALSE, fig.width = 5, fig.height = 5}

Unweight <- data.frame(Time = seq(0, 100, 1), Weight = c(rep(0, times = 25), rep(1, times = 50), rep(0, 26)))
Unweight$Weight <- Unweight$Weight/sum(Unweight$Weight)

par(mar = c(5, 4.25, 4, 2) + 0.1)
plot(x = Unweight$Time, y = Unweight$Weight, type = "l", ylab = "Weight", xlab = "Time", ylim = c(0, 0.05),
     yaxt = "n", xaxt = "n",
     lwd = 2, 
     cex.lab = 1.25, cex.axis = 1.25, cex = 1.5)
axis(2, cex.axis = 1.25, cex.lab = 1.25, yaxp = c(0, 0.05, 2))
axis(1, cex.axis = 1.5, cex.lab = 1.25, xaxp = c(0, 100, 2),
     mgp = c(2, 1.5, 0))

```

This assumption can be a problem when we are considering more complex hypotheses. For example, we may be interested in looking for climate windows where the importance of climate decays slowly over time. The function `weightwin` allows users to fit either Weibull (below left) and generalised extreme value (GEV; below right) weight distributions to climate data as an alternative to a uniform weight distribution, thus removing the assumption of uniformity.

```{r, echo = FALSE, fig.width = 8, fig.height = 4}

par(mfrow = c(1, 2))
duration <- 365
j        <- seq(1:duration) / duration
k        <- seq(-10, 10, by = (2 * 10 / duration))
weight   <- 3 / 0.2 * ((j[1:duration] - 0) / 0.2) ^ (3 - 1) * exp( - ((j[1:duration] - 0) / 0.2) ^ 3)
plot((weight / sum(weight)), type = "l", ylab = "Weight", xlab = "Day", cex.lab = 1.5, cex.axis = 1.5, main = "Weibull distribution")

weight <- evd::dgev(k[1:duration], loc = 1, scale = 2, shape = -1, log = FALSE)
plot((weight / sum(weight)), type = "l", ylab = "Weight", xlab = "Day", cex.lab = 1.5, cex.axis = 1.5, main = "GEV distribution")

```

Instead of varying the start and end date of climate windows like `slidingwin`, `weightwin` instead uses an optimisation function to vary the shape, scale and location of either of these weight functions. Each weight function is then applied to the climate data, which is then used to produce a climate model and $\Delta AICc$ value. Therefore, although the method of optimising climate data is different, the ultimate output (i.e. $\Delta AICc$ of a climate window compared to a null model) is the same.

### A worked example
`weightwin` can often be useful to use with climate data where we have already identified a climate window using `slidingwin`. Here, we will use `weightwin` to further investigate the `Mass` and `MassClimate` data included with the `climwin` package.

The basic parameters in `weightwin` are the same as `slidingwin` (though note the absence of the 'stat' parameter). In addition however, we must designate which weight distribution we want to use. In this case we consider a Weibull function.

`weightfunc = "W"`

Next, we must set the location, scale and shape values for the starting distribution that will be used to begin the optimisation procedure. The default values (3, 0.2, 0) are often appropriate for fitting Weibull distributions. However, you can explore different parameter values using the `explore` function.

The optimisation function used in `weightwin` has the potential to get 'stuck' during the optimisation routine. With the argument 'n' we can run our `weightwin` analysis multiple times to ensure we are finding the true optimal weight distribution.

`n = 5`
 
```{r, eval = FALSE}

set.seed(100)

weight <- weightwin(n = 5, xvar = list(Temp = MassClimate$Temp), cdate = MassClimate$Date, 
                    bdate = Mass$Date, 
                    baseline = lm(Mass ~ 1, data = Mass), 
                    range = c(150, 0), 
                    func = "lin", type = "absolute", 
                    refday = c(20, 5), 
                    weightfunc = "W", cinterval = "day",
                    par = c(3, 0.2, 0))

```

If we look at the iterations object in the `weightwin` output, we can see the outcome of all 5 iterations.

```{r, eval = F}

weight$iterations

```

| start_shape | start_scale | start_location |deltaAICc  |
|:-----------:|:-----------:|:--------------:|:---------:|
| 3.0         |    0.2      |0.0             |-68.21592  |
| 3.1         |    2.7      |-4.5            |-68.21592  |
| 0.7         |    2.7      |-5.1            |-68.21592  |
| 8.1         |    3.8      |-4.5            |-68.21592  |
| 1.8         |    6.3      |-1.2            |-35.34690  |

start_shape, start_scale and start_location show the starting parameters used for each iteration of `weightwin`. In the deltaAICc column we can see the outcomes of the different iterations. While 4 of the iterations reached the same peak, the 5th got stuck during the optimisation routine. This is a useful demonstration of why multiple iterations are important.

It seems that there is a clear peak that our optimisation routine has been able to identify, next we can visualise that peak using the `explore` function.

We can use the shape, scale and location parameters of the first iteration to fit our plot.

```{r, eval = F}

weight[[1]]$WeightedOutput

```

|deltaAICc    | start_shape | start_scale    | start_location | 
|:-----------:|:-----------:|:--------------:|:--------------:|
| -68.21592   |    2.17     |0.35            |0.00            |

```{r, echo = FALSE, fig.width = 5, fig.height = 5}

explore(weightfunc = "W", shape = 2.17, scale = 0.35, loc = 0)

```

In the above plot, we can see that the importance of temperature declines rapidly either side of the peak.

If we compare this value to the delta AICc obtained from the `slidingwin` function we can see that the `weightwin` function is better able to explain variation in our mass parameter as we have removed the assumption of a uniform weight distribution. 

|`slidingwin`|`weightwin`|
|:----------:|:---------:|
|-64.81      |-68.22     |

### Pros and cons

`weightwin` can provide greater detail on the relationship between climate and the biological response, such as the occurrence of exponential functions. Additionally, by using more diverse weight distributions, `weightwin` will often generate models with better $\Delta AICc$ values, which may be especially important when users are most interested in achieving high explanatory power. Furthermore, by using an optimisation routine `weightwin` often tests far fewer models than `slidingwin`, allowing for more rapid analysis.

Despite these benefits, `weightwin` will not always be the most appropriate function for all scenarios. Firstly, the nature of the fitted weight distributions means that `weightwin` can only detect single climate signals, which forces users to detect and compare potential climate signals with separate analyses. This is not an issue using `slidingwin`.

Secondly, `weightwin` can be a more technically complex process. While the above example works easily, in other scenarios the optimisation procedures may often get stuck on false optima or fail to converge. In these cases, users may be required to test different starting parameters and adjust optimisation characteristics such as step size. Often this procedure can be inhibitive for users with less technical knowledge.

Finally, `weightwin` can only be used for testing mean climate, with no capacity to consider other aggregate statistics. Therefore, whether one chooses to use `weightwin` or `slidingwin` will largely depend on the summary statistic used, the level of detail desired, and the user's technical knowledge.

******************************

To ask additional questions or report bugs please e-mail:

liam.bailey@anu.edu.au

or visit our [google group](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!forum/climwin-r-group)