Help for package SSHAARP

Version:

2.0.8

Date:

2025-06-17

Title:

Searching Shared HLA Amino Acid Residue Prevalence

Maintainer:

Livia Tran <livia.tran@ucsf.edu>

Depends:

R (≥ 3.6)

Description:

Processes amino acid alignments produced by the 'IPD-IMGT/HLA (Immuno Polymorphism-ImMunoGeneTics/Human Leukocyte Antigen) Database' to identify user-defined amino acid residue motifs shared across HLA alleles, HLA alleles, or HLA haplotypes, and calculates frequencies based on HLA allele frequency data. 'SSHAARP' (Searching Shared HLA Amino Acid Residue Prevalence) uses 'Generic Mapping Tools (GMT)' software and the 'GMT' R package to generate global frequency heat maps that illustrate the distribution of each user-defined map around the globe. 'SSHAARP' analyzes the allele frequency data described by Solberg et al. (2008) <doi:10.1016/j.humimm.2008.05.001>, a global set of 497 population samples from 185 published datasets, representing 66,800 individuals total. Users may also specify their own datasets, but file conventions must follow the prebundled Solberg dataset, or the mock haplotype dataset.

License:

GPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

Imports:

data.table, stringr, gtools, gmt, DescTools, dplyr, utils, filesstrings, purrr, stringi, HLAtools

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

SystemRequirements:

GMT (5 or 6), Ghostscript (>=9.6)

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-06-17 21:34:59 UTC; tran

Author:

Livia Tran [aut, cre], Steven Mack [aut], Josh Bredeweg [ctb], Dale Steinhardt [ctb]

Repository:

CRAN

Date/Publication:

2025-06-18 00:40:06 UTC

Population Allele Locating Mapmaker

Description

Produces a frequency heatmap for a specified allele, amino-acid motif, or haplotype based on the allele frequency data in the Solberg dataset.

Usage

PALM(
  variant,
  variantType,
  filename,
  mask = FALSE,
  color = TRUE,
  filterMigrant = TRUE,
  mapScale = TRUE,
  direct = getwd(),
  AFND = TRUE,
  generateLowFreq = TRUE,
  resolution = 500
)

Arguments

variant

An allele or an amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. Haplotypes must contain alleles that follow the aforementioned format, and must be delimited by "~".

variantType

Specifies whether the variant is an allele, motif, or haplotype.

filename

The full file path of the user specified dataset if the user wishes to use their own file, or the pre-bundled Solberg dataset or mock haplotype dataset. User provided datasets must be a .dat, .txt, or.csv file, and must conform to the structure and format of the datasets bundled with the package. Allele and motif datasets should follow the Solberg dataset format, and haplotype datasets should follow the SSHAARP haplotype mock data format.

mask

A logical parameter that determines if areas with little to no population coverage should be masked. The default value is set to FALSE.

color

A logical parameter that identifies if the heat maps should be made in color (TRUE) or gray scale (FALSE). The default value is TRUE.

filterMigrant

A logical parameter that determines if admixed populations (OTH) and migrant populations (i.e. any complexities with the 'mig') should be excluded from the dataset. The default value is TRUE.

mapScale

A logical parameter that determines if the max frequency value of the map scale should be 1 (FALSE), or if it should represent the maximum frequency of the chosen motif, allele, or haplotype (TRUE). The default value is TRUE.

direct

The directory into which the map produced is written. The default directory is the user's working directory.

AFND

A logical parameter that determines whether the user specified dataset is data from AFND. This parameter is only relevant if haplotype maps are being made. Default is TRUE.

generateLowFreq

A logical parameter that determines whether maps should be generated for a variant if the maximum frequency for the variant is low frequency. Low frequency populations are defined as those with a frequency of 0.000, indicating three zeros after the decimal point.

resolution

An integer for raster resolution in dpi for the final map output. It is not recommended to go below 400. Default is set to 500.

Value

The specified motif and the directory into which the heat map was written are returned in an invisible character vector. Otherwise, a warning message is returned.

Note

IMGT protein alignments will be generated for the locus of the specified variant the first time PALM is executed for a given locus. The alignments will be saved to the temp directory and referenced by PALM. PALM checks if the locus specific alignment is present in the temp directory; if it is not, a protein alignment object will be built for the locus. Restarting the R session will remove existing alignments.

The produced frequency heatmap is generated by using the Generic Mapping Tools (GMT) R Package, which is an interface between R and the GMT map making software.

The Solberg dataset is the tab-delimited ‘1-locus-alleles.dat’ text file in the results.zip archive at http://pypop.org/popdata/.

The Solberg dataset is also prepackaged into SSHAARP as 'solberg_dataset'.

A mock haplotype dataset modeled after the AFND network's haplotype dataset structure is available for usage under "mock_haplotype_dataset".

While the map legend identifies the highest frequency value, values in this range may not be represented on the map due to frequency averaging over neighboring populations.

References

Solberg et.al. (2008) <doi: 10.1016/j.humimm.2008.05.001>

Examples

#Example to produce a motif color map where migrant populations are filtered out, mask is off
## Not run: PALM("DRB1*26F~28E~30Y",
variantType="motif",
mask = FALSE,
filterMigrant=TRUE,
filename = SSHAARP::solberg_dataset)
## End(Not run)

#Example to produce an allele greyscale map where migrant populations are not filtered, mask is on
## Not run: PALM("DRB1*01:01",
variantType="allele",
mask = TRUE, color=FALSE,
filterMigrant=FALSE,
filename = SSHAARP::solberg_dataset)
## End(Not run)

#Example to produce a color allele map with mapScale T and the allele has more than 2 fields
## Not run: PALM("DRB1*01:01:01",
variantType="allele",
filterMigrant=FALSE,
mapScale=TRUE,
filename = SSHAARP::solberg_dataset)
## End(Not run)

#Example to produce a color haplotype map with default parameters with the mock haplotype dataset
## Not run: PALM("DRB1*01:01~A*01:01",
variantType = "haplotype",
filename = SSHAARP::mock_haplotype_dataset)
## End(Not run)

Check allele syntax

Description

Checks if allele syntax is valid.

Usage

checkAlleleSyntax(allele, filename)

Arguments

allele

An allele name written in the IPD-IMGT/HLA Database format.

filename

The full file path of the user specified dataset if the user wishes to use their own file, or the pre-bundled Solberg dataset. User provided datasets must be a .dat, .txt, or.csv file, and must conform to the structure and format of the Solberg dataset.

Value

TRUE if allele syntax is correct. Otherwise,a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Check haplotype syntax

Description

Checks if alleles in a haplotype have correct syntax and an appropriate number of fields.

Usage

checkHaplotypeSyntax(haplotype, filename)

Arguments

haplotype

A haplotype where allele names are written in the IPD-IMGT/HLA Database format, and have 1-4 fields. Alleles in haplotypes must be delimited by "~".

filename

The full file path of the user specified dataset if the user wishes to use their own file, or pre-bundled mock haplotype dataset. User provided datasets must be a .dat, .txt, or.csv file, and must conform to the structure and format of the mock haplotype dataset bundled with the package.

Value

TRUE if all alleles in entered haplotype have correct syntax and appropriate number of fields. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Check locus validity and if the locus is present in the user specified dataset

Description

Checks if the locus in the entered variant is a protein-coding gene annotated by the IPD-IMGT/HLA Database, and if it is in the user specified dataset.

Usage

checkLocusDataset(variant, filename)

Arguments

variant

filename

Value

TRUE if locus is a protein-coding gene and is in the specified dataset. Otherwise, a vector with FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Check motif syntax

Description

Checks if motif syntax is valid.

Usage

checkMotifSyntax(motif, filename)

Arguments

motif

An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids.

filename

Value

TRUE if the motif syntax is valid. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Examples


#Example with correct motif syntax where user specified dataset is the Solberg dataset
checkMotifSyntax("DRB1*26F~28E~30Y", filename=SSHAARP::solberg_dataset)

#Example with incorrect motif syntax where user specified dataset is the Solberg dataset
checkMotifSyntax("DRB1****26F~28E", filename=SSHAARP::solberg_dataset)

Checks if amino acid positions in motif exist

Description

Checks if amino acid positions in the entered motif exist in IMGTprotalignments.

Usage

checkPosition(motif, filename, alignments)

Arguments

motif

An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. This function ONLY checks if the entered amino acid positions exist in IMGTprotalignments.

filename

alignments

A list object of sub-lists of data frames of protein alignments for the HLA and HLA-region genes supported in the ANHIG/IMGTHLA GitHub Repository. Alignments will always be the most recent version IPD-IMGT/HLA Database version.

Value

TRUE if all of the amino acid positions in a motif exist. Otherwise, a vector with FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Dataset manipulation for motifs and alleles

Description

Returns a modified version of the user selected dataset that includes a column of locus*allele names, is sorted by by population name, and is reduced to the specified locus. Cardinal coordinates are converted to their Cartesian equivalents (i.e. 50S is converted to -50).

Usage

dataSubset(variant, filename)

Arguments

variant

filename

Value

A data frame containing a reformatted version of the user selected dataset, with rows ordered by population name, Cartesian coordinates in the latit and longit columns, and limited to populations with data for the specified locus. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

The Solberg dataset is the tab-delimited ‘1-locus-alleles.dat’ text file in the results.zip archive at http://pypop.org/popdata/.

The Solberg dataset is also prepackaged into SSHAARP as 'solberg_dataset'.

Dataset manipulation for haplotypes

Description

Returns the user input dataset that contains the selected haplotype.

Usage

dataSubsetHaplo(haplotype, filename, AFND, alignments)

Arguments

haplotype

A haplotype where allele names are written in the IPD-IMGT/HLA Database format, and have 1-4 fields. Alleles in haplotypes must be delimited by "~".

filename

AFND

A logical parameter that determines whether the user specified dataset is data from AFND. This parameter is only relevant if haplotype maps are being made.

alignments

Value

A two element list with 1) a subset data frame containing only haplotypes with alleles present in the user input haplotype, and 2) a data frame of the full dataset. Alleles with two fields will be evaluated with their three and four field allele equivalents, and alleles with three fields will be evaluated with their four field allele equivalent. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Returns an alignment data frame of alleles that share a specific amino acid motif

Description

Returns an alignment data frame of alleles that share a specific amino acid motif.

Usage

findMotif(motif, filename, alignments)

Arguments

motif

An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids.

filename

alignments

Value

An amino acid alignment dataframe of alleles that share the specified motif. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Locus and allele or motif extraction for motifs, allele, or haplotype mapping

Description

Extracts locus and allele or motif information from variant for alleles or motifs

Usage

getVariantInfo(variant)

Arguments

variant

An amino acid motif or allele. Amino acid motifs must be in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Alleles must have 1-4 fields.

Value

A list object with loci and allele or motif information.

Note

For internal SSHAARP use only.

Checks if name portion of entered variant is present

Description

Checks if the name portion of the entered variant is present. Names consist of information following the locus and asterisk of the entered variant.

Usage

isNamePresent(variant, variantType)

Arguments

variant

variantType

Identifies whether the variant is an allele or motif.

Value

TRUE if name is present. Otherwise, a vector with FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Mock Haplotype Dataset

Description

A dataframe containing mock haplotype data modeled after the Allele Frequency Network Database (AFND) haplotype data structure.

Usage

mock_haplotype_dataset

Format

An object of class data.frame with 24989 rows and 7 columns.

Designates dataset by either reading in file the user has provided, or using the Solberg dataset or mock haplotype dataset

Description

Returns the user specified dataset, either by reading in the file the user has provided, or by using the Solberg dataset or mock haplotype dataset. If the user provides a dataset and the filename is not found, an error will be returned. If the user provided dataset does not have the same number of columns or the same column names as the reference datasets, an error message will be returned.

Usage

readFilename(filename, variant)

Arguments

filename

variant

Value

A dataframe of the user specified dataset.

Note

For internal SSHAARP use only.

Solberg Dataset

Description

A dataframe of the original Solberg dataset, which is a global dataset of 497 population samples from 185 published datasets, representing 66,800 individuals. For more information on the Solberg dataset, please see the vignette.

Usage

solberg_dataset

Format

A dataframe with 20163 rows and 13 columns.

Source

results.zip file from http://pypop.org/popdata/

References

Solberg et.al "Balancing selection and heterogeneity across the classical human leukocyte antigen loci: A meta-analytic review of 497 population studies". Human Immunology (2008) 69, 443–464

validateVariantType

Description

Validates the variantType parameter in PALM()

Usage

validateVariantType(vT)

Arguments

vT

Specifies whether the variant is an allele, motif, or haplotype.

Value

TRUE if the entered variantType is accepted ('motif', 'allele', or 'haplotype'). FALSE for any other provided entry.

Note

For internal SSHAARP use only

Verifies the allele entered is present in the IMGT protein alignments

Description

Verifies the allele entered is present in IMGT protein alignments

Usage

verifyAlleleANHIG(allele, filename, alignments)

Arguments

allele

An allele name written in the IPD-IMGT/HLA Database format.

filename

alignments

Value

TRUE if allele is present in the IMGTprotalignment object. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Verifies the alleles in entered haplotype are present in IMGTprotalignments

Description

Verifies the alleles in entered haplotype are present in IMGTprotalignments.

Usage

verifyAlleleANHIGHaplo(haplotype, filename, alignments)

Arguments

haplotype

A haplotype where allele names are written in the IPD-IMGT/HLA Database format, and have 1-4 fields. Alleles in haplotypes must be delimited "~".

filename

alignments

Value

TRUE if all alleles in a haplotype are present in the IMGTprotalignment object. Otherwise, a vector containing FALSE and an error message is returned.

Note

For internal SSHAARP use only.

Verifies the allele entered is present in specified dataset

Description

Verifies the allele entered is present in the specified dataset.

Usage

verifyAlleleDataset(allele, filename, alignments)

Arguments

allele

An allele name written in the IPD-IMGT/HLA Database format.

filename

alignments

Value

TRUE if the allele is present in the specified data set, and the filtered allele dataset. If a user enters an allele with more than two fields and has selected the Solberg dataset as the data source, a message informing the user that the allele has been truncated is appended to the output. If an allele entered is valid, but is not present in the user provided dataset, a warning message is returned.

Note

For internal SSHAARP use only.