---
title: "TPLS_example2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{TPLS_example2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Hello, from Arthur

This script shows how one can use T-PLS to assess cross-validation performance.
To see how to use T-PLS to build a predictor, see TPLS_example1.



## Loading library and tutorial data

```{r setup}
library(TPLSr)
attach(TPLSdat)
```

X is the single trial betas. It has 3714 columns, each of which corresponds to a voxel.
Y is binary variable to be predicted. In this case, the Y was whether the participant chose left or right button. Hopefully, when we create whole-brain predictor, we should be able to see left and right motor areas.
subj is a numerical variable that tells us the subject number that each observation belongs to.
In this dataset, there are only 3 subjects.
run is a numerical variable that tells us the scanner run that each observation belongs to.
In this dataset, each of the 3 subjects had 8 scan runs.

## Cross Validation

There are only 3 subjects in this dataset, so we will do 3-fold CV.
This entails repeating the following step 3 times
* 1. Divide the data into training and testing. In this case, 2 subjects in training and 1 subject in testing.
* 2. Using just the training data (i.e., 2 subjects), do secondary cross-validation to choose best tuning parameter
* 3. Based on the best tuning parameter, fit a whole-brain predictor using all training data (2 subjects).
* 4. Assess how well the left out subject is predicted
* 5. Repeat 1~4 

```{r}
ACCstorage <- rep(NA, 3)
for (i in 1:3) { # primary cross-validation fold
  test = subj==i; train = !test
  
  # perform nested cross-validation within training data
  cvmdl = TPLS_cv(X[train,],Y[train],subj[train])
  cvstats = evalTuningParam(cvmdl,"Pearson",X[train,],Y[train],1:25,seq(0,1,0.05),run[train])
  
  # fit T-PLS model using all training data based on best tuning parameter
  mdl = TPLS(X[train,],Y[train])
  
  # predict the testing subject
  score = TPLSpredict(mdl,cvstats$compval_best,cvstats$threshval_best,X[test,])
  prediction = 1*(score > 0.5)
  
  # assess performance of prediction
  ACCstorage[i] = mean(prediction==Y[test])
}

mean(ACCstorage) # out-of-sample CV performance
```