Introduction to SNPannotator package

Investigating the Functional Characteristics of Selected SNPs and Their Vicinity Genomic Region

Overview

SNPannotator is a bioinformatics tool designed to annotate genetic variants detected in genome-wide association (GWAS) studies. While GWAS identifies statistical associations between genetic variants and phenotypic traits, it does not explain the biological mechanisms underlying these associations. Post-GWAS analysis is therefore essential to fill this gap by determining the functional impact of these variants on human biology. This manual provides a step-by-step guide to using the SNPannotator package for post-GWAS analysis, helping researchers better understand the genetic architecture of complex traits and diseases.


Installation

The easiest way to install the SNPannotator package is to get it from the Comprehensive R Archive Network (CRAN). Required dependencies will be downloaded automatically:

    # this will automatically download and install the dependencies.
    install.packages("SNPannotator")

Input Preparation

SNPannotator requires two input files:

  1. GWAS top hit variants:

A text file containing a list of the top independent GWAS variants. Each line of this file corresponds to a single variant rsID, which represents a genetic locus associated with the studied trait.

  1. Configuration File:

A text-based configuration file that users must edit before running the analysis. The configuration file can be obtained using the following command:

# first, load the library
library(SNPannotator)

# save the configuration file to a specific folder
getConfigFile('/home/user/project1/postGWAS')

Step by step guide to running the package

1- Create an empty folder to save the output result files. Ensure that current user and R environment has the necessary read/write permissions for this folder.

2- Prepare the variants input file. Each line of the file should correspond to a single variant rsID. It is recommended to place this file in the newly created folder for easy access.

3- Obtain the configuration file from the package using the command getConfigFile('/path/to/folder'). It is recommended to copy this file into the newly created folder for easy access. Edit the parameters of this file as needed.

4- Run the annotation pipeline by running run_annotation('/path/to/config.ini').


Annotation steps of the top GWAS variants

The steps included in this pipeline are:


Package functions:

Please refer to the package manual for detailed information and examples.

demo_annotation()

This function provides a quick way to test the package by generating a sample output for one variant. Report files are saved in the current working directory.

Example:


getConfigFile()

Copy a sample configuration file to specified folder.

Function parameters:

Example:


run_annotation()

This is the main package function, which receives the path to a configuration file (.ini) for running the parameter.

Function parameters:

Example:


findProxy()

this function can be used to find variants that are in high LD with a list of selected variants.

Function parameters:

Example:


findPairwiseLD()

This function computes the linkage disequilibrium (LD) between the selected variants using data from the Ensembl website.

Function parameters

Example:


findGenomicPos()

This function retrieves variant information from the GTEx portal using either an rsID or a variant ID formatted as CHR_POS_REF_ALT. If an rsID is provided, the function returns the corresponding genomic positions in both GRCh37 and GRCh38 builds. When searching for an rsID based on genomic position, the position parameter should be specified according to the GRCh38 reference genome.

Function parameters

Example:


findRSID()

This function retrieves variant information from Ensembl based on the specified genomic position. It takes the chromosome number, start position, and end position as input parameters and searches for variants within this window, using the specified genomic build. If only the start position is provided, the function automatically sets the end position equal to the start position. This is particularly relevant for SNP variants, where the start and end positions are the same. The function returns all variants found within the defined window.

Function parameters

Example:


stringdb_annotation()

This function takes a vector of gene symbols, retrieves their interaction partners from STRING DB, and performs functional enrichment analysis.

Function parameters

Example: