hJAM Vignette

Lai Jiang

2020-02-11

library(hJAM)

Overview

hJAM is a package developed to implement the hJAM model, which is designed to estimate the associations between multiple intermediates and outcome in a Mendelian Randomization or Transcriptome analysis.

Mendelian randomization (MR) and transcriptome-wide association studies (TWAS) can be viewed as the same approach within the instrumental variable analysis framework using genetic variants. They differ in their intermediates: MR focuses on modifible risk factors while TWAS focuses on gene expressions. We can use a two-stage hierarchical model to unify the framework of MR and TWAS. Details are described in our paper.

Implementation

We have two methods in our pacakge:

General input

The input of the two hJAM model includes:

Get conditional \(\hat{A}\) matrix

To generate a conditional estimate \(\hat{A}\) matrix from a marginal estimate \(\hat{A}\) matrix, users can use the get_cond_A (if number of intermediates > 1) or get_cond_alpha (if number of intermediate = 1) functions in hJAM package. Examples are given in next section.

For MR questions, the intermediates are modifiable risk factors. The marginal \(\hat{A}\) can be extracted from different GWAS whose the outcomes are the risk factors of interests. For example, for intermediate as body mass index, the marginal \(\hat{\alpha}\) vector can be extracted from the GIANT consortium.

For TWAS questions, the intermediates are gene expressions. There are two ways to obtain the elements in \(\hat{A}\) matrix.

  1. GTEx portal: the GTEx project provides marginal summary statistics between SNPs and gene expressions in different tissues.

  2. PredictDB: the PredictDB is developed by the PrediXcan group. It uses elastic net on individual level data from the GTEx project.


Implementation with caution:


Example

In our package, we prepared a data example which we have described in detail in our paper. In this data example, we focus on the conditional effects of body mass index (BMI) and type 2 diabetes (T2D) on myocardial infarction (MI).

We identified 75 and 136 significantly BMI- and T2D-associated SNPs from GIANT consortium and DIAGRAM+GERA+UKB, respectively. In this set of SNPs, there was one overlapping SNP in both the instrument sets for BMI and T2D. In total, we have 210 SNPs identified. The association estimates between the 210 SNPs and MI were collected from UK Biobank.

Data exploration

A quick look at the data in the example -

data("conditional_A")
data("betas.Gy")
data("SNPs_info")
conditional_A[1:10, ]
##                bmi          t2d
##  [1,]  0.019531085  0.072587211
##  [2,]  0.025262061  0.013586392
##  [3,] -0.005147363  0.089673178
##  [4,]  0.046302578  0.041313103
##  [5,]  0.016849395 -0.004564683
##  [6,]  0.012345062  0.036207829
##  [7,]  0.042065546  0.013786703
##  [8,]  0.073772221  0.118575569
##  [9,] -0.004559686  0.051510367
## [10,] -0.012843666  0.091788493
betas.Gy[1:10]
##  [1]  0.0197298348  0.0133151413  0.0008717583  0.0213550014 -0.0031514278
##  [6] -0.0062721407  0.0108936262 -0.0109635708  0.0003266145  0.0077205399
SNPs_info[1:10, ]
##           SNP Major_A    ref_frq
## 1   rs2296173       G 0.12079449
## 2    rs657452       A 0.54742602
## 3  rs12088739       A 0.87859749
## 4   rs3101336       C 0.67815160
## 5  rs12566985       G 0.67936765
## 6  rs12401738       A 0.15707337
## 7  rs11165643       T 0.50081070
## 8  rs17024393       C 0.03080665
## 9   rs1127655       C 0.42298338
## 10  rs2493394       G 0.19092015

In this package, we embed two fucntions for the users to check the SNPs they use in the analysis visually:

scatter_plot_p = SNPs_scatter_plot(A = conditional_A, betas.Gy = betas.Gy, num_X = 2)
heatmap_p = SNPs_heatmap(Gl)

Conversion of \(\hat{A}\) matrix

You could use function get_cond_A function to run JAM on the marginal estimates \(\hat{A}\) matrix and convert it into a conditional estimates \(\hat{A}\) matrix.

data("marginal_A")
cond_A = get_cond_A(marginal_A = marginal_A, Gl = Gl, N.Gx = 339224, ridgeTerm = T)
cond_A[1:10, ]
##                bmi          t2d
##  [1,]  0.019531085  0.072587211
##  [2,]  0.025262061  0.013586392
##  [3,] -0.005147363  0.089673178
##  [4,]  0.046302578  0.041313103
##  [5,]  0.016849395 -0.004564683
##  [6,]  0.012345062  0.036207829
##  [7,]  0.042065546  0.013786703
##  [8,]  0.073772221  0.118575569
##  [9,] -0.004559686  0.051510367
## [10,] -0.012843666  0.091788493

hJAM

The default version of hJAM restricts the intercept to be zero.

hJAM::hJAM_lnreg(betas.Gy = betas.Gy, N.Gy = 459324, A = conditional_A, Gl = Gl, ridgeTerm = TRUE) # 459324 is the sample size of the UK Biobank GWAS of MI
## ------------------------------------------------------ 
##                    hJAM output                         
## ------------------------------------------------------ 
## Number of SNPs used in model: 210 
## 
##     Estimate StdErr         95% CI       Pvalue
## bmi    0.322  0.061 (0.222, 0.422) 1.268210e-07
## t2d    0.119  0.017 (0.091, 0.147) 3.176604e-12
## ------------------------------------------------------

Another method in this package is hJAM with Egger regression, which is analogus to MR egger. It allows the intercept to be non-zero.

hJAM::hJAM_egger(betas.Gy = betas.Gy, N.Gy = 459324, A = conditional_A, Gl = Gl, ridgeTerm = TRUE) # 459324 is the sample size of the UK Biobank GWAS of MI
## ------------------------------------------------------ 
##                    hJAM egger output                   
## ------------------------------------------------------ 
## Number of SNPs used in model: 210 
## 
##     Estimate StdErr         95% CI       Pvalue
## bmi    0.302  0.070 (0.186, 0.417) 1.841101e-05
## t2d    0.107  0.027 (0.063, 0.151) 5.839645e-05
## 
## Intercept
##      Est.Int StdErr.Int 95% CI.Int        Pvalue.Int
## [1,] "0.453" "0.787"    "(-0.841, 1.748)" "0.565"   
## ------------------------------------------------------

Conclusion

We presented the main usage of hJAM package. For more details about each function, please go check the package documentation. If you would like to give us feedback or report issue, please tell us on Github.