---
title: "Scanning parameter grid to understand circadian clock model"
author: "Ye Yuan"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Parameter grid scan to understand circadian clock model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "80%",
  dpi = 300
)
```

This vignette documents the `grid_scan()` function provided in this package 
which allows high-performance repeated simulation runs over a grid of parameters 
and/or initial states. Here, we take the classical LG circadian clock model 
and demonstrate the following:

1. Under a wide range of initial states, LG model converges to a determined 
limit cycle attractor reflecting the circadian clock.
2. Such stability holds even when model parameters changes (albeit shape/period 
of the attractor will change).

Workflow shown here shall be generally useful for interpretation of wet-lab 
experiment results as study of clock mutants typically ends up with hypotheses 
where certain parameter(s) of the circadian clock is/are affected.

`grid_scan()` is generally useful for cases where you have an Odin model too; 
this function is decoupled from specifics of circadian clock models.

```{r setup}
library(clockSim)
library(matrixStats)
library(dplyr)
```

## Baseline: a limit cycle oscillator

Under wildtype condition the circadian clock is a limit cycle oscillator with \~24hr period. The LG model reproduces such behavior. To see this we run simulation with default setting:

```{r}
model_gen <- getOdinGen()$continuous_LG
model <- model_gen$new()

sim_hours <- seq(from = 0, to = 2400, by = 1)
res <- model$run(sim_hours) |> as.data.frame()
res$time <- res$t
plot(plot_phase(res, M_T, C_N))
plot(plot_timeSeries(res, 0, 240, 1, 6, M_T, C_N))
print(compute_period(res$M_T |> tail(n = 240), method = "lomb"))
```

As shown above:

1.  A stable limit cycle attractor is observed.
2.  First 10 days show stabilization against the all-0 initial state.
3.  Last 10 days show a 23.9hr period reflecting a circadian clock.

While promising, however, the LG model is a complex 10-state system. We next explore different properties of the system:

1.  Would different initial states always lead to the clock limit cycle?
2.  Would increase/decrease of parameters disrupt the attractor?

## Initial state grid scan

10-variable makes it impossible to do a extensive grid on initial states. 
To see how fast our simulation runs:

```{r}
run_eta(model, sim_hours)
```

With each simulation at \~10ms, we should be able to do 1) a very coarse grid on all 10 states with \~3 for each state, or 2) an extensive grid on selected states (say TIM/PER RNA):

$$
3^{10}*10\mbox{ms}\sim 250^{2}*10\mbox{ms}\sim600\mbox{sec}
$$

With the CRAN build time limitation, in this vignette we will not run the \~10min grid above but instead deal with a smaller grid on TIM/PER RNA.

### Grid generation

First, we define the grid. We compute summary of all states in the baseline simulation and use the min/mean/max to define the grid. For each state, we scan N multiples of mean+N\*spread of the baseline simulation, where spread = (max-min)/2 and N = 2,3, ..., Nmax. For the CRAN grid Nmax=5.

```{r}
# Compute summary
summary <- res |>
  select(-t, -time) |>
  apply(2, summary)
# Only keep min/mean/max
summary <- summary[c(1,4,6),]
# Add on mean+N*spread, N=2,3,...,N_max
N_max <- 3 # For larger scan increase this. CRAN=3
get_multiples <- function(s, k) {
  # Extract components
  min <- s[1, ]
  mean <- s[2, ]
  delta <- s[3, ] - min
  
  # Create new rows using vectorized operations
  multiples <- outer(k, delta) |> sweep(2, mean, "+")
  attr(multiples, "original") <- s
  
  # Return
  multiples
}
summary <- get_multiples(summary, 2:N_max)
summary <- summary[,c("M_T", "M_P")] # Only RNA states
# Create grid
grid <- expand.grid(
  summary |> as.data.frame(), KEEP.OUT.ATTRS = FALSE)
#   User variables for initial state start with setUserInitial_
names(grid) <- paste0("setUserInitial_",names(grid))
```

### Running `grid_scan()`

Now perform grid scan of initial states. `grid_scan()` is designed for 
efficient repeated model execution on parameter grid. Refer to its help for 
more details, and here I summarize its design concept:

1. Allow scanning arbitrary grid generated by, e.g., `expand.grid()`.
2. Parallel computing based on base R `parallel`.
3. Return raw run data or allow user to apply a summary function (to reduce memory load).

We want the following statistics:

1.  Did the simulation converge (i.e., successful integration)?
2.  Did the simulation fall into the "default" attractor (i.e., the solution when initial states are all-zero default)? For this, we compute max NRMSE (among states) and min cosine similarity (among simulation time) to capture the largest deviation. For explanation of these metrics, refer to `?compute_rmse` and `?compute_cosine`.

These statistics mean we apply the following function to each run data:

```{r}
default_attractor <- model_gen$new()$run(sim_hours)
default_attractor <- 
  default_attractor[(length(sim_hours)-240):length(sim_hours),]
stat.fn <- function(raw_run, reference = default_attractor){
  # Return code == 2 means successful integration (at least for lsoda)
  succ <- attr(raw_run, "istate")[1] == 2
  # Subset only the last 240 time points - should be stabilized
  raw_run <- raw_run[(nrow(raw_run)-240):nrow(raw_run),]
  # Compute normalized RMSE
  nrmse <- compute_rmse(raw_run, reference, normalize = "range")
  nrmse <- max(nrmse)
  # Compute cosine similarity
  cos <- compute_cosine(raw_run, reference)
  cos <- min(cos)
  # Return
  c(converged = succ, nrmse = nrmse, cos = cos)
}
```

Before running, let's see how this statistics function will slow down simulation. As shown, we add about \~10% time and \~40% memory footprint, which is minimal.

```{r}
print(bench::mark(stat.fn(model$run(sim_hours))))
print(bench::mark(model$run(sim_hours)))
```

Now let's run the grid scan of initial states. We run `grid_scan`, and then wrangle the results into a data frame. As below, all converged to the same attractor.

```{r}
scan <- 
  grid_scan(model_gen, grid, apply.fn = stat.fn, 
            n.core = 1, custom.export = "default_attractor",
            sim_hours)

process_scan <- function(){
  .scanDF <- scan |> unlist(use.names = FALSE) |> matrix(ncol = 3, byrow=TRUE)
  colnames(.scanDF) <- names(scan[[1]])
  result <- cbind(grid, .scanDF |> as.data.frame())
  summary(
    result |> select(converged, nrmse, cos)
    )
}

process_scan()
```

Plot two phase diagrams as an example (note that I rerun `grid_scan()` with 
default `apply_fn = identity` to get the raw data for plotting):

```{r,fig.show="hold",out.width="30%"}
# Rerun scan
scan <- 
  grid_scan(model_gen, grid, apply.fn = identity,
            n.core = 1, custom.export = "default_attractor",
            sim_hours)
# Show first and last of grid
first <- grid[1,]
last <- grid[nrow(grid),]
print(first)
print(last)
plot(plot_phase(scan[[1]] |> as.data.frame(), M_T, C_N))
plot(plot_phase(scan[[nrow(grid)]] |> as.data.frame(), M_T, C_N))
```


How about modifying one parameter and see if that affects initial state stability with the same grid? Just add that parameter new value as a constant to the grid!

Remember that our goal is to see whether parameter change causes convergence property change - so our reference attractor should be computed with the new parameter too. 
As below, just as an example, doubling translation rate of TIM mRNA does not 
affect the behavior - all initial states still converged to the clock attractor.

```{r}
new_grid <- grid
new_grid$k_sT <- 1.8 # 2X of original, check model$content()
new_model <- model_gen$new()
new_model$set_user(k_sT = 1.8)
default_attractor <- new_model$run(sim_hours)
default_attractor <- 
  default_attractor[(length(sim_hours)-240):length(sim_hours),]
# Then, repeat the same code above.
scan <- 
  grid_scan(model_gen, new_grid, apply.fn = stat.fn, 
            n.core = 1, custom.export = "default_attractor",
            sim_hours)

process_scan()
```