This vignette for package
groupedHyperframe documents the creation
of groupedHyperframe object, the batch processes defined
for a groupedHyperframe, and aggregations over multi-level
grouping structure.
Package groupedHyperframe requires the
development versions of spatstat family of
packages.
devtools::install_github('spatstat/spatstat'); packageDate('spatstat')
devtools::install_github('spatstat/spatstat.data'); packageDate('spatstat.data')
devtools::install_github('spatstat/spatstat.explore'); packageDate('spatstat.explore')
devtools::install_github('spatstat/spatstat.geom'); packageDate('spatstat.geom')
devtools::install_github('spatstat/spatstat.linnet'); packageDate('spatstat.linnet')
devtools::install_github('spatstat/spatstat.model'); packageDate('spatstat.model')
devtools::install_github('spatstat/spatstat.random'); packageDate('spatstat.random')
devtools::install_github('spatstat/spatstat.sparse'); packageDate('spatstat.sparse')
devtools::install_github('spatstat/spatstat.univar'); packageDate('spatstat.univar')
devtools::install_github('spatstat/spatstat.utils'); packageDate('spatstat.utils')Examples in this vignette require that the search path
has
Users should remove parameter mc.cores = 1L from all
examples and use the default option, which engages all CPU cores on the
current host for macOS. The authors are forced to have
mc.cores = 1L in this vignette in order to pass
CRAN’s submission check.
A development version of package
groupedHyperframe is hosted on Github.
| Term / Abbreviation | Description | Reference |
|---|---|---|
attr |
Attributes | base::attr;
base::attributes |
CRAN, R |
The Comprehensive R Archive Network | https://cran.r-project.org |
data.frame |
Data frame | base::data.frame |
formula |
Formula | stats::formula |
fv, fv.object |
Function value table | spatstat.explore::fv.object |
groupedData |
Grouped data frame | nlme::groupedData |
hypercolumn |
Column of hyper data frame | spatstat.geom::hyperframe |
hyperframe |
Hyper data frame | spatstat.geom::hyperframe |
inherits |
Class inheritance | base::inherits |
kerndens |
Kernel density | stats::density.default()$y |
matrix |
Matrix | base::matrix |
mc.cores |
Number of CPU cores to use | parallel::mclapply,
parallel::detectCores |
multitype |
Multitype object | spatstat.geom::is.multitype |
ppp, ppp.object |
(Marked) point pattern | spatstat.geom::ppp.object |
~ g1/.../gm |
Nested grouping structure | nlme::groupedData;
nlme::lme |
quantile |
Quantile | stats::quantile |
S3 |
R’s simplest object oriented system |
https://adv-r.hadley.nz/s3.html |
search |
Search path | base::search |
Surv |
Survival object | survival::Surv |
trapz, cumtrapz |
(Cumulative) trapezoidal integration | pracma::trapz;
pracma::cumtrapz; https://en.wikipedia.org/wiki/Trapezoidal_rule |
groupedHyperframe ClassThe S3 class groupedHyperframe
inherits from hyperframe class, in a similar
fashion as groupedData class inherits from
data.frame class.
A groupedHyperframe object, in addition to
hyperframe object, has attribute(s)
attr(., 'group'), a formula to specify the
grouping structuregroupedHyperframe with
ppp-hypercolumnFunction grouped_ppp() creates a
groupedHyperframe with
one-and-only-one ppp-hypercolumn.
Multiple ppp-hypercolumns will not be supported in
foreseeable future, as we would need to check for name clash in
$marks from the multiple ppp-hypercolumns,
which is too much trouble.
In the following example, the argument formula
specifies
numeric mark
hladr and multitype mark
phenotype, on the left-hand-sideOS, gender and
age, before the | separator on the
right-hand-sideimage_id nested in
patient_id, after the | separator on
the right-hand-side.(s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id,
data = wrobel_lung, mc.cores = 1L))
#>
#> Grouped Hyperframe: ~patient_id/image_id
#>
#> 25 image_id nested in
#> 5 patient_id
#>
#> OS gender age patient_id image_id ppp.
#> 1 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2 3488+ F 85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3 3488+ F 85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4 3488+ F 85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5 3488+ F 85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6 1605 M 66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7 1605 M 66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8 1605 M 66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9 1605 M 66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10 1605 M 66 #02 1-037-393 [57403,16934].im3 (ppp)Function grouped_ppp() has parameter coords
which specifies the column name of \(x\)- and \(y\)-coordinates in the input
data. Default coords = ~ x + y indicates the
use of data$x and data$y for \(x\)- and \(y\)-coordinates, respectively. Users may
use coords = FALSE for data without \(x\)- and \(y\)-coordinates. In this case, the
coordinates are filled with randomly generated numbers, and the returned
groupedHyperframe has a
pseudo.ppp-hypercolumn.
(s_a = grouped_ppp(Ki67 ~ Surv(recfreesurv_mon, recurrence) + race + age | patientID/tissueID,
data = Ki67, coords = FALSE, mc.cores = 1L))
#>
#> Grouped Hyperframe: ~patientID/tissueID
#>
#> 207 tissueID nested in
#> 200 patientID
#>
#> recfreesurv_mon recurrence race age patientID tissueID ppp.
#> 1 100 0 White 66 PT00037 TJUe_I17 (pseudo.ppp)
#> 2 22 1 Black 42 PT00039 TJUe_G17 (pseudo.ppp)
#> 3 99 0 White 60 PT00040 TJUe_F17 (pseudo.ppp)
#> 4 99 0 White 53 PT00042 TJUe_D17 (pseudo.ppp)
#> 5 112 1 White 52 PT00054 TJUe_J18 (pseudo.ppp)
#> 6 12 1 Black 51 PT00059 TJUe_N17 (pseudo.ppp)
#> 7 64 0 Asian 50 PT00062 TJUe_J17 (pseudo.ppp)
#> 8 56 0 White 37 PT00068 TJUe_F19 (pseudo.ppp)
#> 9 79 0 White 68 PT00082 TJUe_P19 (pseudo.ppp)
#> 10 26 1 Black 55 PT00084 TJUe_O19 (pseudo.ppp)ppp-HypercolumnIn this section, we outline the batch process of spatial point
pattern analyses applicable to the ppp-hypercolumn of a
hyperframe.
Note that these spatial point pattern analyses should
not be applied to a
pseudo.ppp-hypercolumn, as the \(x\)- and \(y\)-coordinates are randomly generated
psuedo numbers.
Batch processes that add a fv-hypercolumn to the input
hyperframe include
| Function | Workhorse | Applicable To |
|---|---|---|
Emark_() |
spatstat.explore::Emark |
numeric marks (e.g.,
hladr) in ppp-hypercolumn |
Vmark_() |
spatstat.explore::Vmark |
numeric marks |
markcorr_() |
spatstat.explore::markcorr |
numeric marks |
markvario_() |
spatstat.explore::markvario |
numeric marks |
Gcross_() |
spatstat.explore::Gcross |
multitype marks (e.g.,
phenotype) |
Kcross_() |
spatstat.explore::Kcross |
multitype marks |
Jcross_() |
spatstat.explore::Jcross |
multitype marks |
Batch processes that add a numeric-hypercolumn to the
input hyperframe include
| Function | Workhorse | Applicable To |
|---|---|---|
nncross_() |
spatstat.geom::nncross.ppp(., what = 'dist') |
multitype marks (e.g.,
phenotype) |
Following example shows that multiple batch processes may be applied
to a hyperframe (or groupedHyperframe) in a
pipeline (|>).
r = seq.int(from = 0, to = 250, by = 10)
out = s |>
Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
# Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
# markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow
# markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow
Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
# Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast
#> The returned hyperframe (or
groupedHyperframe) has
fv-hypercolumn hladr.E, created
by function Emark_() on numeric mark
hladrfv-hypercolumn phenotype.G,
created by function Gcross_() on multitype
mark phenotypenumeric-hypercolumn
phenotype.nncross, created by function
nncross_() on multitype mark
phenotypeout
#>
#> Grouped Hyperframe: ~patient_id/image_id
#>
#> 25 image_id nested in
#> 5 patient_id
#>
#> OS gender age patient_id image_id ppp. hladr.E phenotype.G
#> 1 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv)
#> 2 3488+ F 85 #01 0-889-121 [42689,19214].im3 (ppp) (fv) (fv)
#> 3 3488+ F 85 #01 0-889-121 [42806,16718].im3 (ppp) (fv) (fv)
#> 4 3488+ F 85 #01 0-889-121 [44311,17766].im3 (ppp) (fv) (fv)
#> 5 3488+ F 85 #01 0-889-121 [45366,16647].im3 (ppp) (fv) (fv)
#> 6 1605 M 66 #02 1-037-393 [56576,16907].im3 (ppp) (fv) (fv)
#> 7 1605 M 66 #02 1-037-393 [56583,15235].im3 (ppp) (fv) (fv)
#> 8 1605 M 66 #02 1-037-393 [57130,16082].im3 (ppp) (fv) (fv)
#> 9 1605 M 66 #02 1-037-393 [57396,17896].im3 (ppp) (fv) (fv)
#> 10 1605 M 66 #02 1-037-393 [57403,16934].im3 (ppp) (fv) (fv)
#> phenotype.nncross
#> 1 (numeric)
#> 2 (numeric)
#> 3 (numeric)
#> 4 (numeric)
#> 5 (numeric)
#> 6 (numeric)
#> 7 (numeric)
#> 8 (numeric)
#> 9 (numeric)
#> 10 (numeric)When nested grouping structure ~g1/g2/.../gm is present,
we may aggregate over the
fv-hypercolumn(s)numeric-hypercolumn(s)numeric marks in the ppp-hypercolumnby either one of the grouping levels ~g1,
~g2, …, or ~gm. If the lowest grouping
~gm is specified, then no aggregation is performed.
The returned object of various aggregation functions,
aggregate_fv(), aggregate_quantile() and
aggregate_kerndens(), is data.frame instead of
hyperframe. This is because the aggregated results are
stored in matrix-columns, while the hyperframe
class does not support matrix-column.
fv-hypercolumn(s)Function aggregate_fv() aggregates
spatstat.explore::plot.fv. In the following example, we
have
matrix-column hladr.E.value,
aggregated function value from fv-hypercolumn
hladr.Ematrix-column phenotype.G.value,
aggregated function value from fv-hypercolumn
phenotype.Gmatrix-column hladr.E.cumtrapz,
aggregated cumulative trapezoid area from fv-hypercolumn
hladr.Ematrix-column
phenotype.G.cumtrapz, aggregated cumulative
trapezoid area from fv-hypercolumn
phenotype.Gafv = out |>
aggregate_fv(by = ~ patient_id, f_aggr_ = 'mean', mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(afv) # number of patients
#> [1] 5
names(afv)
#> [1] "OS" "gender" "age"
#> [4] "patient_id" "hladr.E.value" "hladr.E.cumtrapz"
#> [7] "phenotype.G.value" "phenotype.G.cumtrapz"
dim(afv$hladr.E.cumtrapz) # N(patient) by length(r)
#> [1] 5 25numeric-hypercolumn(s) and
numeric mark(s) in ppp-hypercolumnFunction aggregate_quantile() aggregates
numeric-hypercolumn(s). In the
following example, we have
matrix-column
phenotype.nncross.quantile, aggregated quantile of
numeric-hypercolumn
phenotype.nncrossnumeric mark(s) in the
ppp-hypercolumn. In the following example, we have
matrix-column hladr.quantile,
aggregated quantile of numeric mark
hladr in ppp-hypercolumnq = out |>
aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(q)
#> [1] 5
names(q)
#> [1] "OS" "gender"
#> [3] "age" "patient_id"
#> [5] "phenotype.nncross.quantile" "hladr.quantile"
dim(q$phenotype.nncross.quantile)
#> [1] 5 11
dim(q$hladr.quantile)
#> [1] 5 11Function aggregate_kerndens() aggregates
numeric-hypercolumn(s). In
the following example, we have
matrix-column
phenotype.nncross.kerndens, aggregated kernel
density of numeric-hypercolumn
phenotype.nncrossnumeric mark(s) in the
ppp-hypercolumn. In the following example, we have
matrix-column hladr.kerndens,
aggregated kernel density of numeric mark
hladr in ppp-hypercolumn(mdist = out$phenotype.nncross |> unlist() |> max())
#> [1] 354.2968
d = out |>
aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(d)
#> [1] 5
names(d)
#> [1] "OS" "gender"
#> [3] "age" "patient_id"
#> [5] "phenotype.nncross.kerndens" "hladr.kerndens"
dim(d$phenotype.nncross.kerndens)
#> [1] 5 512