Title: | Haplotype-Aware CNV Analysis from scRNA-Seq |
URL: | https://github.com/kharchenkolab/numbat/, https://kharchenkolab.github.io/numbat/ |
Version: | 1.4.2 |
Description: | A computational method that infers copy number variations (CNVs) in cancer scRNA-seq data and reconstructs the tumor phylogeny. 'numbat' integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. 'numbat' can be used to: 1. detect allele-specific copy number variations from single-cells; 2. differentiate tumor versus normal cells in the tumor microenvironment; 3. infer the clonal architecture and evolutionary history of profiled tumors. 'numbat' does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). Additional examples and documentations are available at https://kharchenkolab.github.io/numbat/. For details on the method please see Gao et al. Nature Biotechnology (2022) <doi:10.1038/s41587-022-01468-y>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 4.1.0), Matrix |
Imports: | ape, caTools, data.table, dendextend, dplyr (≥ 1.1.1), GenomicRanges, ggplot2, ggraph, ggtree, glue, hahmmr, igraph, IRanges, logger, magrittr, methods, optparse, parallel, parallelDist, patchwork, pryr, purrr, Rcpp, RhpcBLASctl, R.utils, scales, scistreer (≥ 1.1.0), stats4, stringr, tibble, tidygraph, tidyr (≥ 1.3.0), vcfR, zoo |
Suggests: | ggrastr, ggrepel, knitr, matrixStats, testthat (≥ 3.0.0), |
Config/testthat/edition: | 3 |
LinkingTo: | Rcpp, RcppArmadillo, roptim |
NeedsCompilation: | yes |
SystemRequirements: | GNU make |
Author: | Teng Gao [cre, aut], Ruslan Soldatov [aut], Hirak Sarkar [aut], Evan Biederstedt [aut], Peter Kharchenko [aut] |
Maintainer: | Teng Gao <tgaoteng@gmail.com> |
RoxygenNote: | 7.2.3 |
Packaged: | 2024-09-19 20:45:19 UTC; tenggao |
Repository: | CRAN |
Date/Publication: | 2024-09-20 12:20:07 UTC |
Get the modes of a vector
Description
Get the modes of a vector
Usage
Modes(x)
Numbat R6 class
Description
Used to allow users to plot results
Value
a new 'Numbat' object
Public fields
label
character Sample name
gtf
dataframe Transcript annotation
joint_post
dataframe Joint posterior
exp_post
dataframe Expression posterior
allele_post
dataframe Allele posetrior
bulk_subtrees
dataframe Bulk profiles of lineage subtrees
bulk_clones
dataframe Bulk profiles of clones
segs_consensus
dataframe Consensus segments
tree_post
list Tree posterior
mut_graph
igraph Mutation history graph
gtree
tbl_graph Single-cell phylogeny
clone_post
dataframe Clone posteriors
gexp_roll_wide
matrix Smoothed expression of single cells
P
matrix Genotype probability matrix
treeML
matrix Maximum likelihood tree as phylo object
hc
hclust Initial hierarchical clustering
Methods
Public methods
Method new()
initialize Numbat class
Usage
Numbat$new(out_dir, i = 2, gtf = gtf_hg38, verbose = TRUE)
Arguments
out_dir
character string Output directory
i
integer Get results from which iteration (default=2)
gtf
dataframe Transcript gtf (default=gtf_hg38)
verbose
logical Whether to output verbose results (default=TRUE)
Returns
a new 'Numbat' object
Method plot_phylo_heatmap()
Plot the single-cell CNV calls in a heatmap and the corresponding phylogeny
Usage
Numbat$plot_phylo_heatmap(...)
Arguments
...
additional parameters passed to plot_phylo_heatmap()
Method plot_exp_roll()
Plot window-smoothed expression profiles
Usage
Numbat$plot_exp_roll(k = 3, n_sample = 300, ...)
Arguments
k
integer Number of clusters
n_sample
integer Number of cells to subsample
...
additional parameters passed to plot_exp_roll()
Method plot_mut_history()
Plot the mutation history of the tumor
Usage
Numbat$plot_mut_history(...)
Arguments
...
additional parameters passed to plot_mut_history()
Method plot_sc_tree()
Plot the single cell phylogeny
Usage
Numbat$plot_sc_tree(...)
Arguments
...
additional parameters passed to plot_sc_tree()
Method plot_consensus()
Plot consensus segments
Usage
Numbat$plot_consensus(...)
Arguments
...
additional parameters passed to plot_sc_tree()
Method plot_clone_profile()
Plot clone cnv profiles
Usage
Numbat$plot_clone_profile(...)
Arguments
...
additional parameters passed to plot_clone_profile()
Method cutree()
Re-define subclones on the phylogeny.
Usage
Numbat$cutree(max_cost = 0, n_cut = 0)
Arguments
max_cost
numeric Likelihood threshold to collapse internal branches
n_cut
integer Number of cuts on the phylogeny to define subclones
Method clone()
The objects of this class are cloneable with this method.
Usage
Numbat$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
centromere regions (hg19)
Description
centromere regions (hg19)
Usage
acen_hg19
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 22 rows and 3 columns.
centromere regions (hg38)
Description
centromere regions (hg38)
Usage
acen_hg38
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 22 rows and 3 columns.
Utility function to make reference gene expression profiles
Description
Utility function to make reference gene expression profiles
Usage
aggregate_counts(count_mat, annot, normalized = TRUE, verbose = TRUE)
Arguments
count_mat |
matrix/dgCMatrix Gene expression counts |
annot |
dataframe Cell annotation with columns "cell" and "group" |
normalized |
logical Whether to return normalized expression values |
verbose |
logical Verbosity |
Value
matrix Reference gene expression levels
Examples
ref_custom = aggregate_counts(count_mat_ref, annot_ref, verbose = FALSE)
Call CNVs in a pseudobulk profile using the Numbat joint HMM
Description
Call CNVs in a pseudobulk profile using the Numbat joint HMM
Usage
analyze_bulk(
bulk,
t = 1e-05,
gamma = 20,
theta_min = 0.08,
logphi_min = 0.25,
nu = 1,
min_genes = 10,
exp_only = FALSE,
allele_only = FALSE,
bal_cnv = TRUE,
retest = TRUE,
find_diploid = TRUE,
diploid_chroms = NULL,
classify_allele = FALSE,
run_hmm = TRUE,
prior = NULL,
exclude_neu = TRUE,
phasing = TRUE,
verbose = TRUE
)
Arguments
bulk |
dataframe Pesudobulk profile |
t |
numeric Transition probability |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
theta_min |
numeric Minimum imbalance threshold |
logphi_min |
numeric Minimum log expression deviation threshold |
nu |
numeric Phase switch rate |
min_genes |
integer Minimum number of genes to call an event |
exp_only |
logical Whether to run expression-only HMM |
allele_only |
logical Whether to run allele-only HMM |
bal_cnv |
logical Whether to call balanced amplifications/deletions |
retest |
logical Whether to retest CNVs after Viterbi decoding |
find_diploid |
logical Whether to run diploid region identification routine |
diploid_chroms |
character vector User-given chromosomes that are known to be in diploid state |
classify_allele |
logical Whether to only classify allele (internal use only) |
run_hmm |
logical Whether to run HMM (internal use only) |
prior |
numeric vector Prior probabilities of states (internal use only) |
exclude_neu |
logical Whether to exclude neutral segments from retesting (internal use only) |
phasing |
logical Whether to use phasing information (internal use only) |
verbose |
logical Verbosity |
Value
a pseudobulk profile dataframe with called CNV information
Examples
bulk_analyzed = analyze_bulk(bulk_example, t = 1e-5, find_diploid = FALSE, retest = FALSE)
Annotate a consensus segments on a pseudobulk dataframe
Description
Annotate a consensus segments on a pseudobulk dataframe
Usage
annot_consensus(bulk, segs_consensus, join_mode = "inner")
Arguments
bulk |
dataframe Pseudobulk profile |
segs_consensus |
datatframe Consensus segment dataframe |
Value
dataframe Pseudobulk profile
Annotate haplotype segments after HMM decoding
Description
Annotate haplotype segments after HMM decoding
Usage
annot_haplo_segs(bulk)
example reference cell annotation
Description
example reference cell annotation
Usage
annot_ref
Format
An object of class data.frame
with 50 rows and 2 columns.
Annotate copy number segments after HMM decoding
Description
Annotate copy number segments after HMM decoding
Usage
annot_segs(bulk, var = "cnv_state")
Arguments
bulk |
dataframe Pseudobulk profile |
Value
a pseudobulk dataframe
Annotate the theta parameter for each segment
Description
Annotate the theta parameter for each segment
Usage
annot_theta_mle(bulk)
Arguments
bulk |
dataframe Pseudobulk profile |
Value
dataframe Pseudobulk profile
Annotate rolling estimate of imbalance level theta
Description
Annotate rolling estimate of imbalance level theta
Usage
annot_theta_roll(bulk)
Arguments
bulk |
a pseudobulk dataframe |
Value
a pseudobulk dataframe
Annotate genes on allele dataframe
Description
Annotate genes on allele dataframe
Usage
annotate_genes(df, gtf)
Arguments
df |
dataframe Allele count dataframe |
gtf |
dataframe Gene gtf |
Value
dataframe Allele dataframe with gene column
Laplace approximation of the posterior of expression fold change phi
Description
Laplace approximation of the posterior of expression fold change phi
Usage
approx_phi_post(
Y_obs,
lambda_ref,
d,
alpha = NULL,
beta = NULL,
mu = NULL,
sig = NULL,
lower = 0.2,
upper = 10,
start = 1
)
Arguments
Y_obs |
numeric vector Gene expression counts |
lambda_ref |
numeric vector Reference expression levels |
d |
numeric Total library size |
alpha |
numeric Shape parameter of the gamma distribution |
beta |
numeric Rate parameter of the gamma distribution |
mu |
numeric Mean of the normal distribution |
sig |
numeric Standard deviation of the normal distribution |
lower |
numeric Lower bound of phi |
upper |
numeric Upper bound of phi |
start |
numeric Starting value of phi |
Value
numeric MLE of phi and its standard deviation
Laplace approximation of the posterior of allelic imbalance theta
Description
Laplace approximation of the posterior of allelic imbalance theta
Usage
approx_theta_post(
pAD,
DP,
p_s,
lower = 0.001,
upper = 0.499,
start = 0.25,
gamma = 20
)
Arguments
pAD |
numeric vector Variant allele depth |
DP |
numeric vector Total allele depth |
p_s |
numeric vector Variant allele frequency |
lower |
numeric Lower bound of theta |
upper |
numeric Upper bound of theta |
start |
numeric Starting value of theta |
gamma |
numeric Gamma parameter of the beta-binomial distribution |
calculate entropy for a binary variable
Description
calculate entropy for a binary variable
Usage
binary_entropy(p)
example pseudobulk dataframe
Description
example pseudobulk dataframe
Usage
bulk_example
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3935 rows and 83 columns.
Calculate LLR for an allele HMM
Description
Calculate LLR for an allele HMM
Usage
calc_allele_LLR(pAD, DP, p_s, theta_mle, theta_0 = 0, gamma = 20)
Arguments
pAD |
numeric vector Phased allele depth |
DP |
numeric vector Total allele depth |
p_s |
numeric vector Phase switch probabilities |
theta_mle |
numeric MLE of imbalance level theta (alternative hypothesis) |
theta_0 |
numeric Imbalance level in the null hypothesis |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
Value
numeric Log-likelihood ratio
Calculate allele likelihoods
Description
Calculate allele likelihoods
Usage
calc_allele_lik(pAD, DP, p_s, theta, gamma = 20)
Arguments
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
theta |
numeric Haplotype imbalance |
gamma |
numeric Overdispersion in the allele-specific expression |
Calculate expression distance matrix between cell populatoins
Description
Calculate expression distance matrix between cell populatoins
Usage
calc_cluster_dist(count_mat, cell_annot)
Arguments
count_mat |
dgCMatrix Gene expression counts |
cell_annot |
dataframe specifying the cell ID and cluster memberships |
Value
a distance matrix
Calculate LLR for an expression HMM
Description
Calculate LLR for an expression HMM
Usage
calc_exp_LLR(
Y_obs,
lambda_ref,
d,
phi_mle,
mu = NULL,
sig = NULL,
alpha = NULL,
beta = NULL
)
Arguments
Y_obs |
numeric vector Gene expression counts |
lambda_ref |
numeric vector Reference expression levels |
d |
numeric vector Total library size |
phi_mle |
numeric MLE of expression fold change phi (alternative hypothesis) |
mu |
numeric Mean parameter for the PLN expression model |
sig |
numeric Dispersion parameter for the PLN expression model |
alpha |
numeric Hyperparameter for the gamma poisson model (not used) |
beta |
numeric Hyperparameter for the gamma poisson model (not used) |
Value
numeric Log-likelihood ratio
Calculate the MLE of expression fold change phi
Description
Calculate the MLE of expression fold change phi
Usage
calc_phi_mle_lnpois(Y_obs, lambda_ref, d, mu, sig, lower = 0.1, upper = 10)
Check the format of a allele dataframe
Description
Check the format of a allele dataframe
Usage
check_allele_df(df)
Arguments
df |
dataframe Allele dataframe |
Value
dataframe Allele dataframe
check inter-individual contamination
Description
check inter-individual contamination
Usage
check_contam(bulk)
Arguments
bulk |
dataframe Pseudobulk profile |
check noise level
Description
check noise level
Usage
check_exp_noise(bulk)
Arguments
bulk |
dataframe Pseudobulk profile |
check the format of lambdas_ref
Description
check the format of lambdas_ref
Usage
check_exp_ref(lambdas_ref)
Arguments
lambdas_ref |
matrix Expression reference profile |
Value
matrix Expression reference profile
Check the format of a count matrix
Description
Check the format of a count matrix
Usage
check_matrix(count_mat)
Arguments
count_mat |
matrix Count matrix |
Value
matrix Count matrix
check the format of a given consensus segment dataframe
Description
check the format of a given consensus segment dataframe
Usage
check_segs_fix(segs_consensus_fix)
Arguments
segs_consensus_fix |
dataframe Consensus segment dataframe |
Value
dataframe Consensus segment dataframe
Check the format of a given clonal LOH segment dataframe
Description
Check the format of a given clonal LOH segment dataframe
Usage
check_segs_loh(segs_loh)
Arguments
segs_loh |
dataframe Clonal LOH segment dataframe |
Value
dataframe Clonal LOH segment dataframe
choose beest reference for each cell based on correlation
Description
choose beest reference for each cell based on correlation
Usage
choose_ref_cor(count_mat, lambdas_ref, gtf)
Arguments
count_mat |
dgCMatrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
gtf |
dataframe Transcript gtf |
Value
named vector Best references for each cell
chromosome sizes (hg19)
Description
chromosome sizes (hg19)
Usage
chrom_sizes_hg19
Format
An object of class data.table
(inherits from data.frame
) with 22 rows and 2 columns.
chromosome sizes (hg38)
Description
chromosome sizes (hg38)
Usage
chrom_sizes_hg38
Format
An object of class data.table
(inherits from data.frame
) with 22 rows and 2 columns.
classify alleles using viterbi and forward-backward
Description
classify alleles using viterbi and forward-backward
Usage
classify_alleles(bulk)
Arguments
bulk |
dataframe Pesudobulk profile |
Value
dataframe Pesudobulk profile
Plot CNV heatmap
Description
Plot CNV heatmap
Usage
cnv_heatmap(
segs,
var = "group",
label_group = TRUE,
legend = TRUE,
exclude_gap = TRUE,
genome = "hg38"
)
Arguments
segs |
dataframe Segments to plot. Need columns "seg_start", "seg_end", "cnv_state" |
var |
character Column to facet by |
label_group |
logical Label the groups |
legend |
logical Display the legend |
exclude_gap |
logical Whether to mark gap regions |
genome |
character Genome build, either 'hg38' or 'hg19' |
Value
ggplot Heatmap of CNVs along the genome
Examples
p = cnv_heatmap(segs_example)
Combine allele and expression pseudobulks
Description
Combine allele and expression pseudobulks
Usage
combine_bulk(allele_bulk, exp_bulk)
Arguments
allele_bulk |
dataframe Bulk allele profile |
exp_bulk |
dataframe Bulk expression profile |
Value
dataframe Pseudobulk allele and expression profile
Do bayesian averaging to get posteriors
Description
Do bayesian averaging to get posteriors
Usage
compute_posterior(PL)
Arguments
PL |
dataframe Likelihoods and priors |
Value
dataframe Posteriors
Merge adjacent set of nodes
Description
Merge adjacent set of nodes
Usage
contract_nodes(G, vset, node_tar = NULL, debug = FALSE)
Arguments
G |
igraph Mutation graph |
vset |
vector Set of adjacent vertices to merge |
Value
igraph Mutation graph
example gene expression count matrix
Description
example gene expression count matrix
Usage
count_mat_example
Format
An object of class dgCMatrix
with 1024 rows and 173 columns.
example reference count matrix
Description
example reference count matrix
Usage
count_mat_ref
Format
An object of class dgCMatrix
with 1000 rows and 50 columns.
Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
Description
Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
Usage
detect_clonal_loh(bulk, t = 1e-05, snp_rate_loh = 5, min_depth = 0)
Arguments
bulk |
dataframe Pseudobulk profile |
t |
numeric Transition probability |
snp_rate_loh |
numeric The assumed SNP density in clonal LOH regions |
min_depth |
integer Minimum coverage to filter SNPs |
Value
dataframe LOH segments
Examples
segs_loh = detect_clonal_loh(bulk_example)
example allele count dataframe
Description
example allele count dataframe
Usage
df_allele_example
Format
An object of class data.frame
with 41167 rows and 11 columns.
Run smoothed expression-based hclust
Description
Run smoothed expression-based hclust
Usage
exp_hclust(
count_mat,
lambdas_ref,
gtf,
sc_refs = NULL,
window = 101,
ncores = 1,
verbose = TRUE
)
Arguments
count_mat |
dgCMatrix Gene counts |
lambdas_ref |
matrix Reference expression profiles |
gtf |
dataframe Transcript GTF |
sc_refs |
named list Reference choices for single cells |
window |
integer Sliding window size |
ncores |
integer Number of cores |
verbose |
logical Verbosity |
expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
Description
expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
Usage
expand_states(sc_post, segs_consensus)
Arguments
sc_post |
dataframe Single-cell posteriors |
segs_consensus |
dataframe Consensus segments |
Value
dataframe Single-cell posteriors with multi-allelic CNVs split into different entries
Fill neutral regions into consensus segments
Description
Fill neutral regions into consensus segments
Usage
fill_neu_segs(segs_consensus, segs_neu)
Arguments
segs_consensus |
dataframe CNV segments from multiple samples |
segs_neu |
dataframe Neutral segments |
Value
dataframe Collections of neutral and aberrant segments with no gaps
filter for mutually expressed genes
Description
filter for mutually expressed genes
Usage
filter_genes(count_mat, lambdas_ref, gtf, verbose = FALSE)
Arguments
count_mat |
dgCMatrix Gene expression counts |
lambdas_ref |
named numeric vector A reference expression profile |
gtf |
dataframe Transcript gtf |
Value
vector Genes that are kept after filtering
Find the common diploid region in a group of pseudobulks
Description
Find the common diploid region in a group of pseudobulks
Usage
find_common_diploid(
bulks,
grouping = "clique",
gamma = 20,
theta_min = 0.08,
t = 1e-05,
fc_min = 2^0.25,
alpha = 1e-04,
min_genes = 10,
ncores = 1,
debug = FALSE,
verbose = TRUE
)
Arguments
bulks |
dataframe Pseudobulk profiles (differentiated by "sample" column) |
grouping |
logical Whether to use cliques or components in the graph to find dipoid cluster |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
theta_min |
numeric Minimum imbalance threshold |
t |
numeric Transition probability |
fc_min |
numeric Minimum fold change to call quadruploid cluster |
alpha |
numeric FDR cut-off for q values to determine edges |
ncores |
integer Number of cores to use |
Value
list Ploidy information
fit a Beta-Binomial model by maximum likelihood
Description
fit a Beta-Binomial model by maximum likelihood
Usage
fit_bbinom(AD, DP)
Arguments
AD |
numeric vector Variant allele depth |
DP |
numeric vector Total allele depth |
Value
MLE of alpha and beta
fit gamma maximum likelihood
Description
fit gamma maximum likelihood
Usage
fit_gamma(AD, DP, start = 20)
Arguments
AD |
numeric vector Variant allele depth |
DP |
numeric vector Total allele depth |
Value
a fit
fit a PLN model by maximum likelihood
Description
fit a PLN model by maximum likelihood
Usage
fit_lnpois(Y_obs, lambda_ref, d)
Arguments
Y_obs |
numeric vector Gene expression counts |
lambda_ref |
numeric vector Reference expression levels |
d |
numeric Total library size |
Value
numeric MLE of mu and sig
Fit a reference profile from multiple references using constrained least square
Description
Fit a reference profile from multiple references using constrained least square
Usage
fit_ref_sse(Y_obs, lambdas_ref, gtf, min_lambda = 2e-06, verbose = FALSE)
Arguments
Y_obs |
vector |
lambdas_ref |
named vector |
gtf |
dataframe |
Value
fitted expression profile
negative binomial model
Description
negative binomial model
Usage
fit_snp_rate(gene_snps, gene_length)
genome gap regions (hg19)
Description
genome gap regions (hg19)
Usage
gaps_hg19
Format
An object of class data.table
(inherits from data.frame
) with 28 rows and 3 columns.
genome gap regions (hg38)
Description
genome gap regions (hg38)
Usage
gaps_hg38
Format
An object of class data.table
(inherits from data.frame
) with 30 rows and 3 columns.
Generate alphabetical postfixes
Description
Generate alphabetical postfixes
Usage
generate_postfix(n)
Arguments
n |
vector of integers |
Value
vector of alphabetical postfixes
Genotyping main function
Description
Genotyping main function
Usage
genotype(label, samples, vcfs, outdir, het_only = FALSE, chr_prefix = TRUE)
Arguments
label |
character Individual/sample label |
samples |
vector Sample names |
vcfs |
list of vcfR VCFs from cellsnp-lite pileup |
outdir |
character Output directory |
het_only |
logical Whether to only use heterozygous SNPs |
chr_prefix |
logical Whether to add chr prefix |
Value
integer Status code
Aggregate into pseudobulk alelle profile
Description
Aggregate into pseudobulk alelle profile
Usage
get_allele_bulk(df_allele, nu = 1, min_depth = 0)
Arguments
df_allele |
dataframe Single-cell allele counts |
nu |
numeric Phase switch rate |
min_depth |
integer Minimum coverage to filter SNPs |
Value
dataframe Pseudobulk allele profile
Get an allele HMM
Description
Get an allele HMM
Usage
get_allele_hmm(pAD, DP, p_s, theta, gamma = 20)
Arguments
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
theta |
numeric Haplotype imbalance |
gamma |
numeric Overdispersion in the allele-specific expression |
Value
HMM object
get CNV allele posteriors
Description
get CNV allele posteriors
Usage
get_allele_post(df_allele, haplotypes, segs_consensus)
Arguments
df_allele |
dataframe Allele counts |
haplotypes |
dataframe Haplotype classification |
segs_consensus |
dataframe Consensus CNV segments |
Value
dataframe Allele posteriors
Aggregate single-cell data into combined bulk expression and allele profile
Description
Aggregate single-cell data into combined bulk expression and allele profile
Usage
get_bulk(
count_mat,
lambdas_ref,
df_allele,
gtf,
subset = NULL,
min_depth = 0,
nu = 1,
segs_loh = NULL,
verbose = TRUE
)
Arguments
count_mat |
dgCMatrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
df_allele |
dataframe Single-cell allele counts |
gtf |
dataframe Transcript gtf |
subset |
vector Subset of cells to aggregate |
min_depth |
integer Minimum coverage to filter SNPs |
nu |
numeric Phase switch rate |
segs_loh |
dataframe Segments with clonal LOH to be excluded |
verbose |
logical Verbosity |
Value
dataframe Pseudobulk gene expression and allele profile
Examples
bulk_example = get_bulk(
count_mat = count_mat_example,
lambdas_ref = ref_hca,
df_allele = df_allele_example,
gtf = gtf_hg38)
Map cells to the phylogeny (or genotypes) based on CNV posteriors
Description
Map cells to the phylogeny (or genotypes) based on CNV posteriors
Usage
get_clone_post(gtree, exp_post, allele_post)
Arguments
gtree |
tbl_graph A cell lineage tree |
exp_post |
dataframe Expression posteriors |
allele_post |
dataframe Allele posteriors |
Value
dataframe Clone posteriors
Aggregate into bulk expression profile
Description
Aggregate into bulk expression profile
Usage
get_exp_bulk(count_mat, lambdas_ref, gtf, verbose = FALSE)
Arguments
count_mat |
dgCMatrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
gtf |
dataframe Transcript gtf |
Value
dataframe Pseudobulk gene expression profile
get the single cell expression likelihoods
Description
get the single cell expression likelihoods
Usage
get_exp_likelihoods(
exp_counts,
diploid_chroms = NULL,
use_loh = FALSE,
depth_obs = NULL,
mu = NULL,
sigma = NULL
)
Arguments
exp_counts |
dataframe Single-cell expression counts (CHROM, seg, cnv_state, gene, Y_obs, lambda_ref) |
diploid_chroms |
character vector Known diploid chromosomes |
use_loh |
logical Whether to include CNLOH regions in baseline |
Value
dataframe Single-cell CNV likelihood scores
compute single-cell expression posteriors
Description
compute single-cell expression posteriors
Usage
get_exp_post(
segs_consensus,
count_mat,
gtf,
lambdas_ref,
sc_refs = NULL,
diploid_chroms = NULL,
use_loh = NULL,
segs_loh = NULL,
ncores = 30,
verbose = TRUE,
debug = FALSE
)
Arguments
segs_consensus |
dataframe Consensus segments |
count_mat |
dgCMatrix gene expression count matrix |
gtf |
dataframe transcript gtf |
lambdas_ref |
matrix Reference expression profiles |
Value
dataframe Expression posteriors
get the single cell expression dataframe
Description
get the single cell expression dataframe
Usage
get_exp_sc(segs_consensus, count_mat, gtf, segs_loh = NULL)
Arguments
segs_consensus |
dataframe Consensus segments |
count_mat |
dgCMatrix gene expression count matrix |
gtf |
dataframe Transcript gtf |
Value
dataframe single cell expression counts annotated with segments
Get a tidygraph tree with simplified mutational history.
Description
Specify either max_cost or n_cut. max_cost works similarly as h and n_cut works similarly as k in stats::cutree. The top-level normal diploid clone is always included.
Usage
get_gtree(tree, P, n_cut = 0, max_cost = 0)
Arguments
tree |
phylo Single-cell phylogenetic tree |
P |
matrix Genotype probability matrix |
n_cut |
integer Number of cuts on the phylogeny to define subclones |
max_cost |
numeric Likelihood threshold to collapse internal branches |
Value
tbl_graph Phylogeny annotated with branch lengths and mutation events
Get phased haplotypes
Description
Get phased haplotypes
Usage
get_haplotype_post(bulks, segs_consensus, naive = FALSE)
Arguments
bulks |
dataframe Subtree pseudobulk profiles |
segs_consensus |
dataframe Consensus CNV segments |
naive |
logical Whether to use naive haplotype classification |
Value
dataframe Posterior haplotypes
Helper function to get inter-SNP distance
Description
Helper function to get inter-SNP distance
Usage
get_inter_cm(d)
Arguments
d |
numeric vector Genetic positions in centimorgan (cM) |
Value
numeric vector Inter-SNP genetic distances
Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
Description
Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
Usage
get_internal_nodes(den, node, labels)
Arguments
den |
dendrogram |
node |
character Node name |
labels |
character vector Leaf labels |
get joint posteriors
Description
get joint posteriors
Usage
get_joint_post(exp_post, allele_post, segs_consensus)
Arguments
exp_post |
dataframe Expression single-cell CNV posteriors |
allele_post |
dataframe Allele single-cell CNV posteriors |
segs_consensus |
dataframe Consensus CNV segments |
Value
dataframe Joint single-cell CNV posteriors
Get average reference expressio profile based on single-cell ref choices
Description
Get average reference expressio profile based on single-cell ref choices
Usage
get_lambdas_bar(lambdas_ref, sc_refs, verbose = TRUE)
Arguments
lambdas_ref |
matrix Reference expression profiles |
sc_refs |
vector Single-cell reference choices |
verbose |
logical Print messages |
Get the cost of a mutation reassignment
Description
Get the cost of a mutation reassignment
Usage
get_move_cost(muts, node_ori, node_tar, l_matrix)
Arguments
muts |
character Mutations dlimited by comma |
node_ori |
character Name of the "from" node |
node_tar |
character Name of the "to" node |
Value
numeric Likelihood cost of the mutation reassignment
Get the least costly mutation reassignment
Description
Get the least costly mutation reassignment
Usage
get_move_opt(G, l_matrix)
Arguments
G |
igraph Mutation graph |
l_matrix |
matrix Likelihood matrix of mutation placements |
Value
numeric Lieklihood cost of performing the mutation move
Get the internal nodes of a dendrogram and the leafs in each subtree
Description
Get the internal nodes of a dendrogram and the leafs in each subtree
Usage
get_nodes_celltree(hc, clusters)
Arguments
hc |
hclust Clustering results |
clusters |
named vector Cutree output specifying the terminal clusters |
Value
list Interal node subtrees with leaf memberships
Get ordered tips from a tree
Description
Get ordered tips from a tree
Usage
get_ordered_tips(tree)
Extract consensus CNV segments
Description
Extract consensus CNV segments
Usage
get_segs_consensus(bulks, min_LLR = 5, min_overlap = 0.45, retest = TRUE)
Arguments
bulks |
dataframe Pseudobulks |
min_LLR |
numeric LLR threshold to filter CNVs |
min_overlap |
numeric Minimum overlap fraction to determine count two events as as overlapping |
Value
dataframe Consensus segments
get neutral segments from multiple pseudobulks
Description
get neutral segments from multiple pseudobulks
Usage
get_segs_neu(bulks)
process VCFs into SNP dataframe
Description
process VCFs into SNP dataframe
Usage
get_snps(vcf)
Arguments
vcf |
vcfR object |
Value
dataframe SNP information
Find maximum lilkelihood assignment of mutations on a tree
Description
Find maximum lilkelihood assignment of mutations on a tree
Usage
get_tree_post(tree, P)
Arguments
tree |
phylo Single-cell phylogenetic tree |
P |
matrix Genotype probability matrix |
Value
list Mutation
example smoothed gene expression dataframe
Description
example smoothed gene expression dataframe
Usage
gexp_roll_example
Format
An object of class data.frame
with 10 rows and 2000 columns.
gene model (hg19)
Description
gene model (hg19)
Usage
gtf_hg19
Format
An object of class data.table
(inherits from data.frame
) with 26841 rows and 5 columns.
gene model (hg38)
Description
gene model (hg38)
Usage
gtf_hg38
Format
An object of class data.table
(inherits from data.frame
) with 26807 rows and 5 columns.
gene model (mm10)
Description
gene model (mm10)
Usage
gtf_mm10
Format
An object of class data.table
(inherits from data.frame
) with 30336 rows and 5 columns.
example hclust tree
Description
example hclust tree
Usage
hc_example
Format
An object of class hclust
of length 7.
example joint single-cell cnv posterior dataframe
Description
example joint single-cell cnv posterior dataframe
Usage
joint_post_example
Format
An object of class data.table
(inherits from data.frame
) with 3806 rows and 71 columns.
Annotate the direct upstream or downstream mutations on the edges
Description
Annotate the direct upstream or downstream mutations on the edges
Usage
label_edges(G)
Arguments
G |
igraph Mutation graph |
Value
igraph Mutation graph
Label the genotypes on a mutation graph
Description
Label the genotypes on a mutation graph
Usage
label_genotype(G)
Arguments
G |
igraph Mutation graph |
Value
igraph Mutation graph
Log memory usage
Description
Log memory usage
Usage
log_mem()
Log a message
Description
Log a message
Usage
log_message(msg, verbose = TRUE)
Arguments
msg |
string Message to log |
verbose |
boolean Whether to print message to console |
Make a group of pseudobulks
Description
Make a group of pseudobulks
Usage
make_group_bulks(
groups,
count_mat,
df_allele,
lambdas_ref,
gtf,
min_depth = 0,
nu = 1,
segs_loh = NULL,
ncores = NULL
)
Arguments
groups |
list Contains fields named "sample", "cells", "size", "members" |
count_mat |
dgCMatrix Gene counts |
df_allele |
dataframe Alelle counts |
lambdas_ref |
matrix Reference expression profiles |
gtf |
dataframe Transcript GTF |
min_depth |
integer Minimum allele depth to include |
segs_loh |
dataframe Segments with clonal LOH to be excluded |
ncores |
integer Number of cores |
Value
dataframe Pseudobulk profiles
Mark the tumor lineage of a phylogeny
Description
Mark the tumor lineage of a phylogeny
Usage
mark_tumor_lineage(gtree)
Arguments
gtree |
tbl_graph Single-cell phylogeny |
Value
tbl_graph Phylogeny annotated with tumor versus normal compartment
example mutation graph
Description
example mutation graph
Usage
mut_graph_example
Format
An object of class igraph
of length 5.
Rolling estimate of expression fold change phi
Description
Rolling estimate of expression fold change phi
Usage
phi_hat_roll(Y_obs, lambda_ref, d_obs, mu, sig, h)
Estimate of expression fold change phi in a segment
Description
Estimate of expression fold change phi in a segment
Usage
phi_hat_seg(Y_obs, lambda_ref, d, mu, sig)
example single-cell phylogeny
Description
example single-cell phylogeny
Usage
phylogeny_example
Format
An object of class tbl_graph
(inherits from igraph
) of length 345.
Plot a group of pseudobulk HMM profiles
Description
Plot a group of pseudobulk HMM profiles
Usage
plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)
Arguments
bulks |
dataframe Pseudobulk profiles annotated with "sample" column |
... |
additional parameters passed to plot_psbulk() |
ncol |
integer Number of columns |
title |
logical Whether to add titles to individual plots |
title_size |
numeric Size of titles |
Value
a ggplot object
Examples
p = plot_bulks(bulk_example)
Plot consensus CNVs
Description
Plot consensus CNVs
Usage
plot_consensus(segs)
Arguments
segs |
dataframe Consensus segments |
Value
ggplot object
Examples
p = plot_consensus(segs_example)
Plot single-cell smoothed expression magnitude heatmap
Description
Plot single-cell smoothed expression magnitude heatmap
Usage
plot_exp_roll(
gexp_roll_wide,
hc,
k,
gtf,
lim = 0.8,
n_sample = 300,
reverse = TRUE,
plot_tree = TRUE
)
Arguments
gexp_roll_wide |
matrix Cell x gene smoothed expression magnitudes |
hc |
hclust Hierarchical clustring result |
k |
integer Number of clusters |
gtf |
dataframe Transcript GTF |
lim |
numeric Limit for expression magnitudes |
n_sample |
integer Number of cells to subsample |
reverse |
logical Whether to reverse the cell order |
plot_tree |
logical Whether to plot the dendrogram |
Value
ggplot A single-cell heatmap of window-smoothed expression CNV signals
Examples
p = plot_exp_roll(gexp_roll_example, gtf = gtf_hg38, hc = hc_example, k = 3)
Plot mutational history
Description
Plot mutational history
Usage
plot_mut_history(
G,
clone_post = NULL,
edge_label_size = 4,
node_label_size = 6,
node_size = 10,
arrow_size = 2,
show_clone_size = TRUE,
show_distance = TRUE,
legend = TRUE,
edge_label = TRUE,
node_label = TRUE,
horizontal = TRUE,
pal = NULL
)
Arguments
G |
igraph Mutation history graph |
clone_post |
dataframe Clone assignment posteriors |
edge_label_size |
numeric Size of edge label |
node_label_size |
numeric Size of node label |
node_size |
numeric Size of nodes |
arrow_size |
numeric Size of arrows |
show_clone_size |
logical Whether to show clone size |
show_distance |
logical Whether to show evolutionary distance between clones |
legend |
logical Whether to show legend |
edge_label |
logical Whether to label edges |
node_label |
logical Whether to label nodes |
horizontal |
logical Whether to use horizontal layout |
pal |
named vector Node colors |
Value
ggplot object
Examples
p = plot_mut_history(mut_graph_example)
Plot single-cell CNV calls along with the clonal phylogeny
Description
Plot single-cell CNV calls along with the clonal phylogeny
Usage
plot_phylo_heatmap(
gtree,
joint_post,
segs_consensus,
clone_post = NULL,
p_min = 0.9,
annot = NULL,
pal_annot = NULL,
annot_title = "Annotation",
annot_scale = NULL,
clone_dict = NULL,
clone_bar = TRUE,
clone_stack = TRUE,
pal_clone = NULL,
clone_title = "Genotype",
clone_legend = TRUE,
line_width = 0.1,
tree_height = 1,
branch_width = 0.2,
tip_length = 0.2,
annot_bar_width = 0.25,
clone_bar_width = 0.25,
bar_label_size = 7,
tvn_line = TRUE,
clone_line = FALSE,
exclude_gap = FALSE,
root_edge = TRUE,
raster = FALSE,
show_phylo = TRUE
)
Arguments
gtree |
tbl_graph The single-cell phylogeny |
joint_post |
dataframe Joint single cell CNV posteriors |
segs_consensus |
datatframe Consensus segment dataframe |
clone_post |
dataframe Clone assignment posteriors |
p_min |
numeric Probability threshold to display CNV calls |
annot |
dataframe Cell annotations, dataframe with 'cell' and additional annotation columns |
pal_annot |
named vector Colors for cell annotations |
annot_title |
character Legend title for the annotation bar |
annot_scale |
ggplot scale Color scale for the annotation bar |
clone_dict |
named vector Clone annotations, mapping from cell name to clones |
clone_bar |
logical Whether to display clone bar plot |
clone_stack |
character Whether to plot clone assignment probabilities as stacked bar |
pal_clone |
named vector Clone colors |
clone_title |
character Legend title for the clone bar |
clone_legend |
logical Whether to display the clone legend |
line_width |
numeric Line width for CNV heatmap |
tree_height |
numeric Relative height of the phylogeny plot |
branch_width |
numeric Line width in the phylogeny |
tip_length |
numeric Length of tips in the phylogeny |
annot_bar_width |
numeric Width of annotation bar |
clone_bar_width |
numeric Width of clone genotype bar |
bar_label_size |
numeric Size of sidebar text labels |
tvn_line |
logical Whether to draw line separating tumor and normal cells |
clone_line |
logical Whether to display borders for clones in the heatmap |
exclude_gap |
logical Whether to mark gap regions |
root_edge |
logical Whether to plot root edge |
raster |
logical Whether to raster images |
show_phylo |
logical Whether to display phylogeny on y axis |
Value
ggplot panel
Examples
p = plot_phylo_heatmap(
gtree = phylogeny_example,
joint_post = joint_post_example,
segs_consensus = segs_example)
Plot a pseudobulk HMM profile
Description
Plot a pseudobulk HMM profile
Usage
plot_psbulk(
bulk,
use_pos = TRUE,
allele_only = FALSE,
min_LLR = 5,
min_depth = 8,
exp_limit = 2,
phi_mle = TRUE,
theta_roll = FALSE,
dot_size = 0.8,
dot_alpha = 0.5,
legend = TRUE,
exclude_gap = TRUE,
genome = "hg38",
text_size = 10,
raster = FALSE
)
Arguments
bulk |
dataframe Pseudobulk profile |
use_pos |
logical Use marker position instead of index as x coordinate |
allele_only |
logical Only plot alleles |
min_LLR |
numeric LLR threshold for event filtering |
min_depth |
numeric Minimum coverage depth for a SNP to be plotted |
exp_limit |
numeric Expression logFC axis limit |
phi_mle |
logical Whether to plot estimates of segmental expression fold change |
theta_roll |
logical Whether to plot rolling estimates of allele imbalance |
dot_size |
numeric Size of marker dots |
dot_alpha |
numeric Transparency of the marker dots |
legend |
logical Whether to show legend |
exclude_gap |
logical Whether to mark gap regions and centromeres |
genome |
character Genome build, either 'hg38' or 'hg19' |
text_size |
numeric Size of text in the plot |
raster |
logical Whether to raster images |
Value
ggplot Plot of pseudobulk HMM profile
Examples
p = plot_psbulk(bulk_example)
Plot single-cell smoothed expression magnitude heatmap
Description
Plot single-cell smoothed expression magnitude heatmap
Usage
plot_sc_tree(
gtree,
label_mut = TRUE,
label_size = 3,
dot_size = 2,
branch_width = 0.5,
tip = TRUE,
tip_length = 0.5,
pal_clone = NULL
)
Arguments
gtree |
tbl_graph The single-cell phylogeny |
label_mut |
logical Whether to label mutations |
label_size |
numeric Size of mutation labels |
dot_size |
numeric Size of mutation nodes |
branch_width |
numeric Width of branches in tree |
tip |
logical Whether to plot tip point |
tip_length |
numeric Length of the tips |
pal_clone |
named vector Clone colors |
Value
ggplot A single-cell phylogeny with mutation history labeled
Examples
p = plot_sc_tree(phylogeny_example)
Get the total probability from a region of a normal pdf
Description
Get the total probability from a region of a normal pdf
Usage
pnorm.range.log(lower, upper, mu, sd)
HMM object for unit tests
Description
HMM object for unit tests
Usage
pre_likelihood_hmm
Format
An object of class list
of length 10.
Preprocess allele data
Description
Preprocess allele data
Usage
preprocess_allele(sample, vcf_pu, vcf_phased, AD, DP, barcodes, gtf, gmap)
Arguments
sample |
character Sample label |
vcf_pu |
dataframe Pileup VCF from cell-snp-lite |
vcf_phased |
dataframe Phased VCF from eagle2 |
AD |
dgTMatrix Alt allele depth matrix from pileup |
DP |
dgTMatrix Total allele depth matrix from pileup |
barcodes |
vector List of barcodes from pileup |
gtf |
dataframe Transcript GTF |
gmap |
dataframe Genetic map |
Value
dataframe Allele counts by cell
reference expression magnitudes from HCA
Description
reference expression magnitudes from HCA
Usage
ref_hca
Format
An object of class matrix
(inherits from array
) with 24756 rows and 12 columns.
reference expression counts from HCA
Description
reference expression counts from HCA
Usage
ref_hca_counts
Format
An object of class matrix
(inherits from array
) with 24857 rows and 12 columns.
Relevel chromosome column
Description
Relevel chromosome column
Usage
relevel_chrom(df)
Arguments
df |
dataframe Dataframe with chromosome column |
Get unique CNVs from set of segments
Description
Get unique CNVs from set of segments
Usage
resolve_cnvs(segs_all, min_overlap = 0.5, debug = FALSE)
Arguments
segs_all |
dataframe CNV segments from multiple samples |
min_overlap |
numeric scalar Minimum overlap fraction to determine count two events as as overlapping |
Value
dataframe Consensus CNV segments
retest consensus segments on pseudobulks
Description
retest consensus segments on pseudobulks
Usage
retest_bulks(
bulks,
segs_consensus = NULL,
t = 1e-05,
min_genes = 10,
gamma = 20,
nu = 1,
use_loh = FALSE,
diploid_chroms = NULL,
ncores = 1,
exclude_neu = TRUE,
min_LLR = 5
)
Arguments
bulks |
dataframe Pseudobulk profiles |
segs_consensus |
dataframe Consensus segments |
use_loh |
logical Whether to use loh in the baseline |
diploid_chroms |
vector User-provided diploid chromosomes |
Value
dataframe Retested pseudobulks
retest CNVs in a pseudobulk
Description
retest CNVs in a pseudobulk
Usage
retest_cnv(
bulk,
theta_min = 0.08,
logphi_min = 0.25,
gamma = 20,
allele_only = FALSE,
exclude_neu = TRUE
)
Arguments
bulk |
pesudobulk dataframe |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
allele_only |
whether to retest only using allele data |
Value
a dataframe of segments with CNV posterior information
Check the format of a given file
Description
Check the format of a given file
Usage
return_missing_columns(file, expected_colnames = NULL)
Run multiple HMMs
Description
Run multiple HMMs
Usage
run_group_hmms(
bulks,
t = 1e-04,
gamma = 20,
alpha = 1e-04,
min_genes = 10,
nu = 1,
common_diploid = TRUE,
diploid_chroms = NULL,
allele_only = FALSE,
retest = TRUE,
run_hmm = TRUE,
exclude_neu = TRUE,
ncores = 1,
verbose = FALSE,
debug = FALSE
)
Arguments
bulks |
dataframe Pseudobulk profiles |
t |
numeric Transition probability |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
alpha |
numeric P value cut-off to determine segment clusters in find_diploid |
common_diploid |
logical Whether to find common diploid regions between pseudobulks |
diploid_chroms |
character vector Known diploid chromosomes to use as baseline |
allele_only |
logical Whether only use allele data to run HMM |
retest |
logcial Whether to retest CNVs |
run_hmm |
logical Whether to run HMM segments or just retest |
ncores |
integer Number of cores |
Run workflow to decompose tumor subclones
Description
Run workflow to decompose tumor subclones
Usage
run_numbat(
count_mat,
lambdas_ref,
df_allele,
genome = "hg38",
out_dir = tempdir(),
max_iter = 2,
max_nni = 100,
t = 1e-05,
gamma = 20,
min_LLR = 5,
alpha = 1e-04,
eps = 1e-05,
max_entropy = 0.5,
init_k = 3,
min_cells = 50,
tau = 0.3,
nu = 1,
max_cost = ncol(count_mat) * tau,
n_cut = 0,
min_depth = 0,
common_diploid = TRUE,
min_overlap = 0.45,
ncores = 1,
ncores_nni = ncores,
random_init = FALSE,
segs_loh = NULL,
call_clonal_loh = FALSE,
verbose = TRUE,
diploid_chroms = NULL,
segs_consensus_fix = NULL,
use_loh = NULL,
min_genes = 10,
skip_nj = FALSE,
multi_allelic = TRUE,
p_multi = 1 - alpha,
plot = TRUE,
check_convergence = FALSE,
exclude_neu = TRUE
)
Arguments
count_mat |
dgCMatrix Raw count matrices where rownames are genes and column names are cells |
lambdas_ref |
matrix Either a named vector with gene names as names and normalized expression as values, or a matrix where rownames are genes and columns are pseudobulk names |
df_allele |
dataframe Allele counts per cell, produced by preprocess_allele |
genome |
character Genome version (hg38, hg19, or mm10) |
out_dir |
string Output directory |
max_iter |
integer Maximum number of iterations to run the phyologeny optimization |
max_nni |
integer Maximum number of iterations to run NNI in the ML phylogeny inference |
t |
numeric Transition probability |
gamma |
numeric Dispersion parameter for the Beta-Binomial allele model |
min_LLR |
numeric Minimum LLR to filter CNVs |
alpha |
numeric P value cutoff for diploid finding |
eps |
numeric Convergence threshold for ML tree search |
max_entropy |
numeric Entropy threshold to filter CNVs |
init_k |
integer Number of clusters in the initial clustering |
min_cells |
integer Minimum number of cells to run HMM on |
tau |
numeric Factor to determine max_cost as a function of the number of cells (0-1) |
nu |
numeric Phase switch rate |
max_cost |
numeric Likelihood threshold to collapse internal branches |
n_cut |
integer Number of cuts on the phylogeny to define subclones |
min_depth |
integer Minimum allele depth |
common_diploid |
logical Whether to find common diploid regions in a group of peusdobulks |
min_overlap |
numeric Minimum CNV overlap threshold |
ncores |
integer Number of threads to use |
ncores_nni |
integer Number of threads to use for NNI |
random_init |
logical Whether to initiate phylogney using a random tree (internal use only) |
segs_loh |
dataframe Segments of clonal LOH to be excluded |
call_clonal_loh |
logical Whether to call segments with clonal LOH |
verbose |
logical Verbosity |
diploid_chroms |
vector Known diploid chromosomes |
segs_consensus_fix |
dataframe Pre-determined segmentation of consensus CNVs |
use_loh |
logical Whether to include LOH regions in the expression baseline |
min_genes |
integer Minimum number of genes to call a segment |
skip_nj |
logical Whether to skip NJ tree construction and only use UPGMA |
multi_allelic |
logical Whether to call multi-allelic CNVs |
p_multi |
numeric P value cutoff for calling multi-allelic CNVs |
plot |
logical Whether to plot results |
check_convergence |
logical Whether to terminate iterations based on consensus CNV convergence |
exclude_neu |
logical Whether to exclude neutral segments from CNV retesting (internal use only) |
Value
a status code
example CNV segments dataframe
Description
example CNV segments dataframe
Usage
segs_example
Format
An object of class data.table
(inherits from data.frame
) with 27 rows and 30 columns.
Calculate simes' p
Description
Calculate simes' p
Usage
simes_p(p.vals, n_dim)
Simplify the mutational history based on likelihood evidence
Description
Simplify the mutational history based on likelihood evidence
Usage
simplify_history(G, l_matrix, max_cost = 150, n_cut = 0, verbose = TRUE)
Arguments
G |
igraph Mutation graph |
l_matrix |
matrix Mutation placement likelihood matrix (node by mutation) |
Value
igraph Mutation graph
filtering, normalization and capping
Description
filtering, normalization and capping
Usage
smooth_expression(count_mat, lambdas_ref, gtf, window = 101, verbose = FALSE)
Arguments
count_mat |
dgCMatrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
gtf |
dataframe Transcript gtf |
Value
dataframe Log(x+1) transformed normalized expression values for single cells
Smooth the segments after HMM decoding
Description
Smooth the segments after HMM decoding
Usage
smooth_segs(bulk, min_genes = 10)
Arguments
bulk |
dataframe Pseudobulk profile |
min_genes |
integer Minimum number of genes to call a segment |
Value
dataframe Pseudobulk profile
predict phase switch probablity as a function of genetic distance
Description
predict phase switch probablity as a function of genetic distance
Usage
switch_prob_cm(d, nu = 1, min_p = 1e-10)
Arguments
d |
numeric vector Genetic distance in cM |
nu |
numeric Phase switch rate |
min_p |
numeric Minimum phase switch probability |
Value
numeric vector Phase switch probability
T-test wrapper, handles error for insufficient observations
Description
T-test wrapper, handles error for insufficient observations
Usage
t_test_pval(x, y)
test for multi-allelic CNVs
Description
test for multi-allelic CNVs
Usage
test_multi_allelic(bulks, segs_consensus, min_LLR = 5, p_min = 0.999)
Arguments
bulks |
dataframe Pseudobulk profiles |
segs_consensus |
dataframe Consensus segments |
min_LLR |
numeric CNV LLR threshold to filter events |
p_min |
numeric Probability threshold to call multi-allelic events |
Value
dataframe Consensus segments annotated with multi-allelic events
Rolling estimate of imbalance level theta
Description
Rolling estimate of imbalance level theta
Usage
theta_hat_roll(major_count, minor_count, h)
Arguments
major_count |
vector of major allele count |
minor_count |
vector of minor allele count |
h |
window size |
Value
rolling estimate of theta
Estimate of imbalance level theta in a segment
Description
Estimate of imbalance level theta in a segment
Usage
theta_hat_seg(major_count, minor_count)
Arguments
major_count |
vector of major allele count |
minor_count |
vector of minor allele count |
Value
estimate of theta
Annotate the direct upstream or downstream node on the edges
Description
Annotate the direct upstream or downstream node on the edges
Usage
transfer_links(G)
Arguments
G |
igraph Mutation graph |
Value
igraph Mutation graph
UPGMA and WPGMA clustering
Description
UPGMA and WPGMA clustering
Usage
upgma(D, method = "average", ...)
Arguments
D |
A distance matrix. |
method |
The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". The default is "average". |
... |
Further arguments passed to or from other methods. |
example VCF header
Description
example VCF header
Usage
vcf_meta
Format
An object of class character
of length 65.
Viterbi for clonal LOH detection
Description
Viterbi for clonal LOH detection
Usage
viterbi_loh(hmm, ...)
Arguments
hmm |
HMM object; expect variables x (SNP count), snp_sig (snp rate standard deviation), pm (snp density for ref and loh states), pn (gene lengths), d (total expression depth), y (expression count), lambda_star (reference expression rate), mu (global expression mean), sig (global expression standard deviation), Pi (transition prob matrix), delta (prior for each state), phi (expression fold change for each state) |