--- title: "State Space Reconstruction (SSR)" author: "Wenbo Lv" date: "2025-05-16" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SSR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Spatial State Space Reconstruction (S-SSR) Takens theory proves that for a dynamical system $\phi$, if its trajectory converges to an attractor manifold $M$—a bounded and invariant set of states—then there exists a smooth mapping between the system $\phi$ and its attractor $M$. Consequently, the time series observations of $\phi$ can be used to reconstruct the structure of $M$ through delay embedding. According to the generalized embedding theorem, for a compact $d$-dimensional manifold $M$ and a set of observation functions $\langle h_1, h_2, \ldots, h_L \rangle$, the mapping $\psi_{\phi,h}(x) = \langle h_1(x), h_2(x), \ldots, h_L(x) \rangle$ is an embedding of $M$ when $L \geq 2d + 1$. Here, *embedding* refers to a one-to-one map that resolves all singularities of the original manifold. The observation functions $h_i$ can take the form of time-lagged values from a single time series, lags from multiple time series, or even completely distinct measurements. The former two are simply special cases of the third. This embedding framework can be extended to *spatial cross-sectional data*, which lack temporal ordering but are observed over a spatial domain. In this context, the observation functions can be defined using the values of a variable at a focal spatial unit and its surrounding neighbors (known as *spatial lags* in spatial statistics). Specifically, for a spatial location $s$, the embedding can be written as: $$ \psi(x, s) = \langle h_s(x), h_{s(1)}(x), \ldots, h_{s(L-1)}(x) \rangle, $$ where $h_{s(i)}(x)$ denotes the observation function of the $i$-th order spatial lag unit relative to $s$. These spatial lags provide the necessary diversity of observations for effective manifold reconstruction. In practice, if a given spatial lag order involves multiple units, summary statistics such as the mean or directionally-weighted averages can be used as the observation function to maintain a one-to-one embedding. ## Usage examples ### An example of spatial lattice data Load the `spEDM` package and its county-level population density data: ``` r library(spEDM) popd_nb = spdep::read.gal(system.file("case/popd_nb.gal",package = "spEDM")) ## Warning in spdep::read.gal(system.file("case/popd_nb.gal", package = "spEDM")): ## neighbour object has 4 sub-graphs popd = readr::read_csv(system.file("case/popd.csv",package = "spEDM")) ## Rows: 2806 Columns: 7 ## ── Column specification ──────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (7): x, y, popd, elev, tem, pre, slope ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. popd_sf = sf::st_as_sf(popd, coords = c("x","y"), crs = 4326) popd_sf ## Simple feature collection with 2806 features and 5 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346 ## Geodetic CRS: WGS 84 ## # A tibble: 2,806 × 6 ## popd elev tem pre slope geometry ## * ## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879) ## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877) ## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548) ## 4 258. 23 17.4 1555. 0.932 (116.241 30.104) ## 5 211. 101 16.3 1494. 3.34 (116.173 30.495) ## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839) ## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412) ## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514) ## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558) ## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983) ## # ℹ 2,796 more rows ``` Embedding the variable `popd` from county-level population density: ``` r v = embedded(popd_sf,"popd",E = 9,tau = 0,trend.rm = FALSE) v[1:5,c(2,5,6)] ## [,1] [,2] [,3] ## [1,] 440.7237 962.7204 1664.756 ## [2,] 500.0166 919.6000 2408.766 ## [3,] 494.4070 1435.0165 1958.686 ## [4,] 1520.0814 1488.2727 2066.748 ## [5,] 298.5497 2326.8429 1290.188 ``` ``` r plot3D::scatter3D(v[,2], v[,5], v[,6], colvar = NULL, pch = 19, col = "red", theta = 45, phi = 10, cex = 0.45, bty = "f", clab = NA, tickmarks = FALSE) ## Warning: no DISPLAY variable so Tk is not available ``` ![**Figure 1**. The reconstructed shadow manifolds for `popd`.](../man/figures/ssr/fig1-1.png) ### An example of spatial grid data Load the `spEDM` package and its soil pollution data: ``` r library(spEDM) cu = terra::rast(system.file("case/cu.tif", package="spEDM")) cu ## class : SpatRaster ## dimensions : 131, 125, 3 (nrow, ncol, nlyr) ## resolution : 5000, 5000 (x, y) ## extent : 360000, 985000, 1555000, 2210000 (xmin, xmax, ymin, ymax) ## coord. ref. : North_American_1983_Albers ## source : cu.tif ## names : cu, industry, ntl ## min values : 1.201607, 0.000, 0 ## max values : 319.599274, 0.615, 63 ``` Embedding the variable `cu` from soil pollution data: ``` r r = spEDM::embedded(cu,"cu",E = 9,tau = 0,trend.rm = FALSE) r[1:5,c(1,5,9)] ## [,1] [,2] [,3] ## [1,] 14.80663 13.84810 15.40681 ## [2,] 14.06487 14.07629 15.67661 ## [3,] 14.07302 14.15578 15.78888 ## [4,] 13.38009 14.11540 15.63029 ## [5,] 13.81006 14.25782 15.30874 ``` ``` r plot3D::scatter3D(r[,1], r[,5], r[,9], colvar = NULL, pch = 19, col = "#e77854", theta = 45, phi = 10, cex = 0.45, bty = "f", clab = NA, tickmarks = FALSE) ``` ![**Figure 2**. The reconstructed shadow manifolds for `cu`.](../man/figures/ssr/fig2-1.png) ### Determining Optimal Embedding Dimension Using False Nearest Neighbours The false nearest neighbours (FNN) method helps identify the appropriate embedding dimension for reconstructing the state space of a time series or spatial cross-sectional. A low embedding dimension that minimizes false neighbours is considered optimal. Below are two examples applying the `fnn()` function. In both cases, `eps` is set to one-tenth of the standard deviation of the variable of interest, providing a relative threshold for identifying false neighbours. ``` r fnn(popd_sf,"popd", eps = stats::sd(popd_sf$popd) / 10) ## E:1 E:2 E:3 E:4 E:5 E:6 E:7 E:8 ## 0.9707769 0.6404134 0.4885959 0.4012830 0.3346401 0.3296507 0.3260870 0.3150392 ## E:9 ## 0.3150392 ``` ``` r fnn(cu,"cu", eps = stats::sd(terra::values(cu[["cu"]]),na.rm = TRUE) / 10) ## E:1 E:2 E:3 E:4 E:5 E:6 E:7 ## 0.98577099 0.56042748 0.13215267 0.08103817 0.07731298 0.07517557 0.06998473 ## E:8 E:9 ## 0.06632061 0.05673282 ```