FIESTA - Small Area Estimators

Small Area (SA) module overview

FIESTA’s Small Area (SA) module was set up as a platform to integrate with current Small Area Estimators available on CRAN including the JoSAE (Breidenbach 2015), sae (Molina and Marhuenda 2015), and hbsae (Boonstra 2012) packages that use unit-level and area-level models such as the Empirical Best Linear Unbiased Prediction (EBLUP) estimation strategy and the hierarchical Bayesian estimation strategy. Rao (2003) discusses the benefits of the EBLUP for balancing potential bias of synthetic estimators against the instability of a direct estimator. White et al (2021) discusses the benefits of Small Area Estimation in a hierarchical Bayesian context, especially for forestry data. The module includes functional steps for checking, compiling, and formatting FIA plot data and auxiliary spatial information for input to R packages, such as JoSAE (Breidenbach 2015), sae (Molina and Marhuenda 2015), or hbsae (Boonstra 2012) and translates integrated package output to FIESTA output format.

Functions in FIESTA used for fitting Small Area Estimators include the modSAarea function for area estimates and modSAtree for tree estimates. The modSApop function is used to get population data needed for small area estimation. Below is a description and table of contents for the sections related to these functions:

FUNCTION DESCRIPTION
modSApop Creates population data for small area estimation.
modSAarea Produces area level estimates through small area estimation.
modSAtree Produces tree level estimates through small area estimation.

Objective of tutorial

The main objective of this tutorial is to demonstrate how to use FIESTA for generating estimates using estimators from the JoSAE, sae, and hbsae R packages. The following examples are for generating estimates and estimated variances using standard FIA Evaluation data from FIA’s National database, with custom Estimation unit and Stratification information. The examples use data from three inventory years of field measurements in the state of Wyoming, from FIADB_1.7.2.00, last updated June 20, 2018, downloaded on June 25, 2018 and stored as internal data objects in FIESTA.

Example data - Wyoming (WY), Inventory Years 2011-2012

View SA Example Data
Data Frame Description
WYplt WY plot-level data
WYcond WY condition-level data
WYtree WY tree-level data
External data Description
WYbighorn_adminbnd.shp Polygon shapefile of WY Bighorn National Forest Administrative boundary*
WYbighorn_districtbnd.shp Polygon shapefile of WY Bighorn National Forest District boundaries**
WYbighorn_forest_nonforest_250m.tif GeoTIFF raster of predicted forest/nonforest (1/0) for stratification***
WYbighorn_dem_250m.img Erdas Imagine raster of elevation change, in meters****

*USDA Forest Service, Automated Lands Program (ALP). 2018. S_USA.AdministrativeForest (). Description: An area encompassing all the National Forest System lands administered by an administrative unit. The area encompasses private lands, other governmental agency lands, and may contain National Forest System lands within the proclaimed boundaries of another administrative unit. All National Forest System lands fall within one and only one Administrative Forest Area.

**USDA Forest Service, Automated Lands Program (ALP). 2018. S_USA.RangerDistrict (http://data.fs.usda.gov/geodata/edw). Description: A depiction of the boundary that encompasses a Ranger District.

***Based on MODIS-based classified map resampled from 250m to 500m resolution and reclassified from 3 to 2 classes: 1:forest; 2:nonforest. Projected in Albers Conical Equal Area, Datum NAD27 (Ruefenacht et al. 2008). Clipped to extent of WYbighorn_adminbnd.shp.

****USGS National Elevation Dataset (NED), resampled from 30m resolution to 250m. Projected in Albers Conical Equal Area, Datum NAD27 (U.S. Geological Survey 2017). Clipped to boundary of WYbighorn_adminbnd.shp.

Set up

First, you’ll need to load the FIESTA library:

library(FIESTA)

Next, you’ll need to set up an “outfolder”. This is just a file path to a folder where you’d like FIESTA to send your data output. For our purposes in this vignette, we have saved our outfolder file path as the outfolder object in a temporary directory. We also set a few default options preferred for this vignette.

outfolder <- tempdir()

Get data for examples

View Getting Data

Now that we’ve loaded FIESTA and setup our outfolder, we can retrieve the data needed to run the examples. First, we point to some external data and predictor layers stored in FIESTA and derive new predictor layers using the terra package.

# File names for external spatial data
WYbhfn <- system.file("extdata", "sp_data/WYbighorn_adminbnd.shp", package="FIESTA")
WYbhdistfn <- system.file("extdata", "sp_data/WYbighorn_districtbnd.shp", package="FIESTA")
WYbhdist.att <- "DISTRICTNA"

fornffn <- system.file("extdata", "sp_data/WYbighorn_forest_nonforest_250m.tif", package="FIESTA")
demfn <- system.file("extdata", "sp_data/WYbighorn_dem_250m.img", package="FIESTA")

# Derive new predictor layers from dem
library(terra)
dem <- rast(demfn)
slpfn <- paste0(outfolder, "/WYbh_slp.img")
slp <- terra::terrain(dem,
                      v = "slope",
                      unit = "degrees",
                      filename = slpfn, 
                      overwrite = TRUE, 
                      NAflag = -99999.0)
aspfn <- paste0(outfolder, "/WYbh_asp.img")
asp <- terra::terrain(dem,
                      v = "aspect",
                      unit = "degrees", 
                      filename = aspfn,
                      overwrite = TRUE, 
                      NAflag = -99999.0)

Next, we set up our small area domains with FIESTA::spGetSAdoms. For more information on how to use this function, please see the sp vignette included with FIESTA (link).

smallbnd <- WYbhdistfn
smallbnd.domain <- "DISTRICTNA"

Next, we can get our FIA plot data and set up our auxiliary data. We can get our FIA plot data with the spGetPlots function from FIESTA, which accesses data through FIA’s DataMart. Here, data are downloaded for all states intersecting boundary, then subset to boundary.

SApltdat <- spGetPlots(bnd = WYbhdistfn,
                       xy_datsource = "obj",
                       xy = WYplt,
                       xy_opts = list(xy.uniqueid = "CN", xvar = "LON_PUBLIC", 
                                    yvar = "LAT_PUBLIC", xy.crs = 4269),
                       datsource = "obj",
                       istree = TRUE,
                       isseed = TRUE,
                       dbTabs = list(plot_layer = WYplt, cond_layer = WYcond,
                                   tree_layer = WYtree, seed_layer = WYseed),
                       eval = "custom",
                       eval_opts = list(invyrs = 2011:2013),
                       showsteps = TRUE,
                       returnxy = TRUE,
                       savedata_opts = savedata_options(outfolder = outfolder))
output
## ================================================================================
plot

str(SApltdat, max.level = 1)
output
## List of 11
##  $ spxy       :Classes 'sf' and 'data.frame':    56 obs. of  9 variables:
##   ..- attr(*, "sf_column")= chr "geometry"
##   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: 3 3 3 3 3 3 3 3
##   .. ..- attr(*, "names")= chr [1:8] "PLT_CN" "INVYR" "STATECD" "UNITCD" ...
##  $ tabs       :List of 4
##  $ tabIDs     :List of 4
##  $ pltids     :'data.frame': 56 obs. of  8 variables:
##  $ bnd        :Classes 'sf' and 'data.frame':    3 obs. of  5 variables:
##   ..- attr(*, "sf_column")= chr "geometry"
##   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA
##   .. ..- attr(*, "names")= chr [1:4] "REGION" "FORESTNUMB" "DISTRICTNU" "DISTRICTNA"
##  $ puniqueid  : chr "CN"
##  $ xy.uniqueid: chr "PLT_CN"
##  $ pjoinid    : chr "CN"
##  $ states     : chr "Wyoming"
##  $ invyrs     : int [1:3] 2011 2012 2013
##  $ args       :List of 13

Finally, we must generate the dataset with predictors for small area estimation. We can do this with the spGetAuxiliary function from FIESTA. Again, see the sp vignette for further information on this function.

rastlst.cont <- c(demfn, slpfn, aspfn)
rastlst.cont.name <- c("dem", "slp", "asp")
rastlst.cat <- fornffn
rastlst.cat.name <- "fornf"

unit_layer <- WYbhdistfn
unitvar <- "DISTRICTNA"

auxdat <- spGetAuxiliary(
  xyplt = SApltdat$spxy,
  uniqueid = "PLT_CN",
  unit_layer = unit_layer,
  unitvar = "DISTRICTNA",
  rastlst.cont = rastlst.cont,
  rastlst.cont.name = rastlst.cont.name,
  rastlst.cont.stat = "mean",
  rastlst.cont.NODATA = 0,
  rastlst.cat = rastlst.cat,
  rastlst.cat.name = rastlst.cat.name,
  asptransform = TRUE,
  rast.asp = aspfn,
  keepNA = FALSE,
  showext = FALSE,
  savedata = FALSE
)
names(auxdat)
str(auxdat, max.level = 1)
output
## List of 12
##  $ unitvar       : chr "DISTRICTNA"
##  $ pltassgn      :'data.frame':  56 obs. of  15 variables:
##  $ pltassgnid    : chr "PLT_CN"
##  $ unitarea      :'data.frame':  3 obs. of  2 variables:
##  $ areavar       : chr "ACRES_GIS"
##  $ unitzonal     :'data.frame':  3 obs. of  8 variables:
##  $ inputdf       :Classes 'data.table' and 'data.frame': 4 obs. of  7 variables:
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##  $ prednames     : chr [1:5] "dem" "slp" "asp_cos" "asp_sin" ...
##  $ zonalnames    : chr [1:7] "dem" "slp" "asp_cos" "asp_sin" ...
##  $ predfac       : chr "fornf"
##  $ npixelvar     : chr "npixels"
##  $ predfac.levels:List of 1

Examples

modSApop

Example 1: Creating our population dataset with modMApop

View Example

We can create our population data for model-assisted estimation. To do so, we use the modSApop function in FIESTA. We must assign our plot data with the pltdat argument, the auxiliary dataset with the auxdat argument, and set information for our small areas with the smallbnd and smallbnd.domain arguments. The spGetPlots and spGetAuxiliary functions have done much of the hard work for us so far, so we can just run a simple call to modSApop:

SApopdat <- modSApop(pltdat = SApltdat, 
                     auxdat = auxdat,
                     smallbnd = WYbhdistfn,
                         smallbnd.domain = smallbnd.domain)

Note that the modSApop function returns a list with lots of information and data for us to use. For a quick look at what this list includes we can use the str function:

str(SApopdat, max.level = 1)
output
## List of 25
##  $ module         : chr "SA"
##  $ smallbnd       :Classes 'sf' and 'data.frame':    3 obs. of  6 variables:
##   ..- attr(*, "sf_column")= chr "geometry"
##   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
##   .. ..- attr(*, "names")= chr [1:5] "REGION" "FORESTNUMB" "DISTRICTNU" "DISTRICTNA" ...
##  $ smallbnd.domain: chr "DISTRICTNA"
##  $ condx          :Classes 'data.table' and 'data.frame':    66 obs. of  17 variables:
##   ..- attr(*, "sorted")= chr [1:2] "PLT_CN" "CONDID"
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##  $ pltcondx       :Classes 'data.table' and 'data.frame':    66 obs. of  30 variables:
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##   ..- attr(*, "sorted")= chr "DSTRBCD1"
##  $ cuniqueid      : chr "PLT_CN"
##  $ condid         : chr "CONDID"
##  $ ACI.filter     : chr "COND_STATUS_CD == 1"
##  $ dunitarea      :Classes 'data.table' and 'data.frame':    3 obs. of  2 variables:
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##   ..- attr(*, "sorted")= chr "DOMAIN"
##  $ areavar        : chr "ACRES_GIS"
##  $ areaunits      : chr "acres"
##  $ dunitvar       : chr "DOMAIN"
##  $ dunitlut       :Classes 'data.table' and 'data.frame':    3 obs. of  8 variables:
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##   ..- attr(*, "sorted")= chr "DOMAIN"
##  $ plotsampcnt    :'data.frame': 2 obs. of  3 variables:
##  $ condsampcnt    :'data.frame': 4 obs. of  3 variables:
##  $ states         : chr "Wyoming"
##  $ invyrs         :List of 1
##  $ estvar.area    : chr "CONDPROP_ADJ"
##  $ adj            : chr "plot"
##  $ treex          :'data.frame': 1691 obs. of  22 variables:
##  $ tuniqueid      : chr "PLT_CN"
##  $ adjtree        : logi FALSE
##  $ seedx          :'data.frame': 102 obs. of  11 variables:
##  $ prednames      : chr [1:5] "dem" "slp" "asp_cos" "asp_sin" ...
##  $ predfac        : chr "fornf"

Now that we’ve created our population dataset, we can move on to estimation.

modSAarea

Example 2: Area of forest land, unit-level EBLUP

View Example

First, we can set up our predictors as a vector:

all_preds <- c("slp", "dem", "asp_cos", "asp_sin", "fornf")

Next, we fit the unit-level EBLUP with the JoSAE R package.

area1 <- modSAarea(
  SApopdatlst = SApopdat,        # pop - population calculations for WY, post-stratification
  prednames = all_preds,         # est - character vector of predictors to be used in the model
  SApackage = "JoSAE",           # est - character string of the R package to do the estimation
  SAmethod = "unit"              # est - method of small area estimation. Either "unit" or "area"
  )
output
## REML estimate of variance ratio: 0.09281 
## numerical integration of f(x):   24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
plot

The modSAarea function outputs a list, and we can see our estimates and estimation method.

str(area1, max.level = 1)
output
## List of 3
##  $ est    :Classes 'data.table' and 'data.frame':    3 obs. of  3 variables:
##   ..- attr(*, ".internal.selfref")=<externalptr> 
##   ..- attr(*, "sorted")= chr "DOMAIN"
##  $ raw    :List of 13
##  $ multest:'data.frame': 3 obs. of  23 variables:
area1$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 224320.1                  16.68
## 2:   Powder River Ranger District 149729.4                  26.23
## 3:         Tongue Ranger District 284851.4                  13.15
area1$raw$SAmethod
output
## [1] "unit"

We can also look further into the raw list below:

str(area1$raw, max.level = 1)
output
## List of 13
##  $ dunit_totest   :'data.frame': 3 obs. of  18 variables:
##  $ domdat         :'data.frame': 66 obs. of  22 variables:
##  $ module         : chr "SA"
##  $ esttype        : chr "AREA"
##  $ SApackage      : chr "JoSAE"
##  $ SAmethod       : chr "unit"
##  $ estnm          : chr "est"
##  $ predselect.unit:'data.frame': 1 obs. of  8 variables:
##  $ predselect.area:'data.frame': 1 obs. of  8 variables:
##  $ SAobjlst       :List of 1
##  $ estvar         : chr "AREA_ADJ"
##  $ areaunits      : chr "acres"
##  $ estunits       : chr "acres"

Example 3: Area of forest land, area-level EBLUP

View Example

In this example, we fit an area level EBLUP with JoSAE, while only using slp as a predictor. We use only one predictor in the area level model because at the area level, we only have three rows in our dataset. Since we also have a random effect term, the model we fit can have a maximum of one predictor without being exactly singular.

area2 <- modSAarea(
  SApopdatlst = SApopdat,   # pop - population calculations for WY, post-stratification
  prednames = "slp",        # est - character vector of predictors to be used in the model
  SApackage = "JoSAE",      # est - character string of the R package to do the estimation
  SAmethod = "area",        # est - method of small area estimation. Either "unit" or "area"
  multest = TRUE
  )
output
## REML estimate of variance ratio: 0.09973 
## numerical integration of f(x):   23.3 with absolute error < 1.1e-05
## numerical integration of x*f(x): 16.04 with absolute error < 7.3e-06
## posterior mean for variance ratio: 0.6883
plot

We again can see our estimates. Notably, we have slightly larger percent sampling errors to the unit-level model fit in Example 2. This is likely due to only being able to incorporate one predictor’s worth of information to the model.

area2$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 246496.3                  19.80
## 2:   Powder River Ranger District 141310.9                  29.94
## 3:         Tongue Ranger District 295969.3                  14.01

Since FIESTA will attempt fit all models when running modSAarea, we can look at all the different modeling approaches and their estimates with the multest object.

area2$multest
output
##                           DOMAIN LARGEBND NBRPLT       DIR     DIR.se  JU.Synth
## 1 Medicine Wheel Ranger District        1     16 0.6510417 0.11592491 0.6162340
## 2   Powder River Ranger District        1     19 0.3947368 0.11034865 0.5908995
## 3         Tongue Ranger District        1     21 0.7500000 0.09128709 0.6022887
##     JU.GREG JU.GREG.se  JU.EBLUP JU.EBLUP.se.1    hbsaeU  hbsaeU.se       JFH
## 1 0.6572931 0.11784552 0.6371466     0.1006813 0.6396496 0.10232397 0.6762099
## 2 0.4003265 0.11170013 0.4654312     0.1221846 0.4596133 0.10878798 0.4226600
## 3 0.7434288 0.08878553 0.6986757     0.1061928 0.7021458 0.09930096 0.7152836
##      JFH.se  JA.synth JA.synth.se      saeA   saeA.se hbsaeA hbsaeA.se
## 1 0.1339159 0.7295675   0.2534818 0.6762099 0.1339159     NA        NA
## 2 0.1265380 0.4879924   0.2458881 0.4226600 0.1265380     NA        NA
## 3 0.1001961 0.5965936   0.2044029 0.7152836 0.1001961     NA        NA
##   NBRPLT.gt0 AOI AREAUSED
## 1         11   1 364526.4
## 2          9   1 334337.0
## 3         17   1 413778.9

Notably, the hbsae models returned NAs with this model, likely due to computational issues with the integral they compute. Not to worry, though, we will fit models with hbsae in the next example.

Example 4: Area of forest land, hierarchical Bayesian models

View Example

FIESTA also supports the use of hierarchical Bayesian (HB) models through the hbsae package as an alternative to EBLUPs. These models use the same model specification as the EBLUP, however they fit the model using a hierarchical Bayesian framework, and get parameter estimates through numerical integration. Luckily, we do not have to take an integral ourselves to fit these models, we can just change the SApackage argument.

area3 <- modSAarea(
  SApopdatlst = SApopdat,   # pop - population calculations for WY, post-stratification
  prednames = all_preds,    # est - character vector of predictors to be used in the model
  SApackage = "hbsae",      # est - character string of the R package to do the estimation
  SAmethod = "unit",        # est - method of small area estimation. Either "unit" or "area"
  multest = TRUE
  )
output
## REML estimate of variance ratio: 0.09281 
## numerical integration of f(x):   24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
plot

We can again check our estimates, small area method, and small area package.

area3$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 226061.8                  16.01
## 2:   Powder River Ranger District 147433.8                  23.40
## 3:         Tongue Ranger District 285934.0                  12.89
area3$raw$SAmethod
output
## [1] "unit"
area3$raw$SApackage
output
## [1] "hbsae"

Example 5: Ara of forest land, hierarchical Bayesian models, changing prior distribution

View Example

Notably, we can also set priors on the ratio of between and within area variation with hbsae. By default, FIESTA uses a weakly informative half-Cauchy prior on this parameter as suggested by White et al (2021), but in this example we will fit the same model as before, but with a flat prior.

area4 <- modSAarea(
  SApopdatlst = SApopdat,     # pop - population calculations for WY, post-stratification
  prednames = all_preds,      # est - character vector of predictors to be used in the model
  SApackage = "hbsae",        # est - character string of the R package to do the estimation
  SAmethod = "unit",          # est - method of small area estimation. Either "unit" or "area"
  prior = function(x) 1       # est - prior on ratio of between and within area variation
  )
output
## REML estimate of variance ratio: 0.09281 
## numerical integration of f(x):

Let’s check our results compared to Example 3 (same model with half-Cauchy prior)

area3$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 226061.8                  16.01
## 2:   Powder River Ranger District 147433.8                  23.40
## 3:         Tongue Ranger District 285934.0                  12.89
area4$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District       NA                     NA
## 2:   Powder River Ranger District       NA                     NA
## 3:         Tongue Ranger District       NA                     NA

Due to rounding we do in FIESTA, we see the same result. However, the estimates are slightly different. We can see this with the model objects supplied in the output list from FIESTA:

Example 6: Area of forest land, with model variable selection, JoSAE unit level EBLUP

View Example

FIESTA supports model variable selection via the elastic net. To use model selection, we set the modelselect argument to TRUE.

area5 <- modSAarea(
  SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
  prednames = all_preds,       # est - character vector of predictors to be used in the model
  SApackage = "JoSAE",         # est - character string of the R package to do the estimation
  SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area"
  modelselect = TRUE           # est - elastic net variable selection
  )
output
## REML estimate of variance ratio: 0.09281 
## numerical integration of f(x):   24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
plot

We can now look at estimates with our subset of selected predictors and the predictors that were selected.

area5$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 224320.1                  16.68
## 2:   Powder River Ranger District 149729.4                  26.23
## 3:         Tongue Ranger District 284851.4                  13.15
area5$raw$predselect.unit
output
##   LARGEBND LARGEBND TOTAL         slp           dem    asp_cos     asp_sin
## 1 SApopdat        1     1 0.004745915 -0.0001050103 0.03462268 -0.06178897
##       fornf2
## 1 -0.3936355

modSAtree

We will set our estimate variable and filter now. We set estvar to "VOLCFNET" for net cubic foot volume, and filter with estvar.filter set to "STATUSCD == 1" so we only consider live trees in our estimation.

estvar <- "VOLCFNET"
live_trees <- "STATUSCD == 1"

Example 7: Total net cubic-foot volume of live trees (at least 5 inches diameter)

View Example

Now, we can look at the total net cubic-foot volume of live trees, filtered for live trees that are at least 5 inches in diameter. We use the estvar and live_trees objects defined above to set our response variable and filter, and then compute the estimates.

tree1 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = all_preds,       # est - character vector of predictors to be used in the model
    SApackage = "JoSAE",         # est - character string of the R package to do the estimation
    SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area" 
    landarea = "FOREST",         # est - forest land filter
    estvar = estvar,             # est - net cubic-foot volume
    estvar.filter = live_trees   # est - live trees only
    )
output
## REML estimate of variance ratio: 0.01551 
## numerical integration of f(x):   48.34 with absolute error < 2.6e-05
## numerical integration of x*f(x): 20.58 with absolute error < 1.3e-06
## posterior mean for variance ratio: 0.4257
plot

With both modSAtree and modSAarea, FIESTA will return your requested estimates specified with the SApackage and SAmethod arguments in the est item, but will return all possible estimates in the multest item. We can see these estimates below:

tree1$est
output
##                            DOMAIN  Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 363534113                  24.03
## 2:   Powder River Ranger District 340070520                  36.20
## 3:         Tongue Ranger District 445171568                  25.10
tree1$multest
output
##                           DOMAIN LARGEBND NBRPLT       DIR   DIR.se  JU.Synth
## 1 Medicine Wheel Ranger District        1     16 1175.1682 356.2906  967.9523
## 2   Powder River Ranger District        1     19  893.9355 315.0740 1090.1786
## 3         Tongue Ranger District        1     21 1340.8077 262.0320 1035.6947
##     JU.GREG JU.GREG.se JU.EBLUP JU.EBLUP.se.1    hbsaeU hbsaeU.se      JFH
## 1 1122.3465   297.2844  997.278      239.6664 1043.9097  252.6630 1000.776
## 2  761.5703   252.7649 1017.149      368.2086  907.4927  260.3908  905.716
## 3 1215.3733   240.0583 1075.868      270.0690 1134.7262  236.6279 1302.397
##     JFH.se JA.synth JA.synth.se     saeA  saeA.se hbsaeA hbsaeA.se NBRPLT.gt0
## 1      NaN 1000.776         NaN 1057.755 353.8882     NA        NA         10
## 2 50.58935  905.716    50.58935  963.751 419.8500     NA        NA          8
## 3      NaN 1302.397         NaN 1356.026 471.4950     NA        NA         16
##   AOI AREAUSED
## 1   1 364526.4
## 2   1 334337.0
## 3   1 413778.9

Notably, the area level models are NA in for this model, as there were more predictors than degrees of freedom in the model at the area level.

Example 8: Total net cubic-foot volume of live trees (at least 5 inches diameter), using model selection

View Example

We can bring the modelselect parameter into play with modSAtree as well as modSAarea. In the below code, we set modelselect = TRUE to use the elastic net variable selection before fitting the model.

tree2 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = all_preds,       # est - character vector of predictors to be used in the model
    SApackage = "JoSAE",         # est - character string of the R package to do the estimation
    SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area"  
    landarea = "FOREST",         # est - forest land filter
    estvar = estvar,             # est - net cubic-foot volume
    estvar.filter = live_trees,   # est - live trees only
    modelselect = TRUE
    )
output
## REML estimate of variance ratio: 0.01633 
## numerical integration of f(x):   47.54 with absolute error < 2e-05
## numerical integration of x*f(x): 20.28 with absolute error < 6.4e-06
## posterior mean for variance ratio: 0.4265
plot

We now can look at the selected predictors and estimates.

tree2$raw$predselect.unit
output
##   LARGEBND LARGEBND TOTAL       slp      dem asp_cos   asp_sin   fornf2
## 1 SApopdat        1     1 -63.70287 0.558082       0 -549.9468 -1176.35
tree2$est
output
##                            DOMAIN  Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 363864644                  23.85
## 2:   Powder River Ranger District 338811410                  35.84
## 3:         Tongue Ranger District 446043897                  24.74

Example 9: Basal area of live trees (at least 5 inches diameter), unit EBLUP from JoSAE

View Example

We can also use different response variables to estimate, and in this example we chose basal area. We also returned titles by using returntitle = TRUE.

tree3 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = all_preds,       # est - character vector of predictors to be used in the model
    SApackage = "JoSAE",         # est - character string of the R package to do the estimation
    SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area"  
    landarea = "FOREST",         # est - forest land filter
    estvar = "BA",               # est - net cubic-foot volume
    estvar.filter = live_trees,  # est - live trees only
    returntitle = TRUE
    )
output
## REML estimate of variance ratio: 0.03877 
## numerical integration of f(x):   21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
plot

Now we can take a look at our estimates:

tree3$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 22043124                  21.61
## 2:   Powder River Ranger District 18865170                  32.06
## 3:         Tongue Ranger District 29693976                  19.54

and see our title list since we set returntitle to TRUE.

tree3$titlelst
output
## $title.estpse
## [1] "Basal area of live trees (at least 1 inch dia), in square feet, and percent sampling error on forest land DOMAIN"
## 
## $title.yvar
## [1] "Basal area, in square feet"
## 
## $title.estvar
## [1] "Basal area of live trees (at least 1 inch dia)"
## 
## $title.unitvar
## [1] "DOMAIN"
## 
## $title.ref
## [1] "Wyoming, 2011-2013"
## 
## $outfn.estpse
## [1] "tree_BA_forestland"
## 
## $outfn.rawdat
## [1] "tree_BA_forestland_rawdata"
## 
## $title.tot
## [1] "Basal area of live trees (at least 1 inch dia), in square feet, on forest land; Wyoming, 2011-2013"
## 
## $title.unitsn
## [1] "square feet"

Example 10: Basal area of live trees (at least 5 inches diameter), area EBLUP from sae

View Example

Now, we can of course fit a different model to estimate basal area. In this case, we choose to use dem to predict basal area with an area-level EBLUP from the sae package.

tree4 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = "dem",       # est - character vector of predictors to be used in the model
    SApackage = "sae",         # est - character string of the R package to do the estimation
    SAmethod = "area",           # est - method of small area estimation. Either "unit" or "area"  
    landarea = "FOREST",         # est - forest land filter
    estvar = "BA",               # est - net cubic-foot volume
    estvar.filter = live_trees,  # est - live trees only
    returntitle = TRUE
    )
output
## REML estimate of variance ratio: 0.02124 
## numerical integration of f(x):   24.4 with absolute error < 6e-06
## numerical integration of x*f(x): 10.37 with absolute error < 5.6e-06
## posterior mean for variance ratio: 0.4248
plot

Now we can take a look at our estimates.

tree4$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 22014434                  32.81
## 2:   Powder River Ranger District 17546600                  41.96
## 3:         Tongue Ranger District 35374035                  27.57

Example 11: Basal area of live trees (at least 5 inches diameter), saving to an outfolder

View Example

One may want to easily save FIESTA output to your computer, rather than just having it live in the R environment. FIESTA makes this easy with the use of the savedata and savedata_opts arguments. If savedata = TRUE, by default the output will be saved to your working directory, but we can set an outfolder in savedata_opts to choose where the data should be saved. There are many other savedata_opts, and these can be seen by looking at the help file for the savedata_options function (use help(savedata_options) with FIESTA loaded in your R environment).

tree5 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = all_preds,       # est - character vector of predictors to be used in the model
    SApackage = "JoSAE",         # est - character string of the R package to do the estimation
    SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area"  
    landarea = "FOREST",         # est - forest land filter
    estvar = "BA",               # est - net cubic-foot volume
    estvar.filter = live_trees,  # est - live trees only
    savedata = TRUE,
    savedata_opts = savedata_options(
      outfolder = outfolder
    )
    )
output
## REML estimate of variance ratio: 0.03877 
## numerical integration of f(x):   21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
plot

Example 12: Basal area of live trees (at least 5 inches diameter), HB model

View Example

We can also of course use a different model to predict basal area, and in this case we use a HB unit level model from hbsae. We also save to an outfolder again, this time giving a file name prefix with the outfn.pre arguement.

tree6 <- modSAtree(
    SApopdatlst = SApopdat,      # pop - population calculations for WY, post-stratification
    prednames = all_preds,       # est - character vector of predictors to be used in the model
    SApackage = "hbsae",         # est - character string of the R package to do the estimation
    SAmethod = "unit",           # est - method of small area estimation. Either "unit" or "area"  
    landarea = "FOREST",         # est - forest land filter
    estvar = "BA",               # est - net cubic-foot volume
    estvar.filter = live_trees,  # est - live trees only
    savedata = TRUE,
    savedata_opts = savedata_options(
      outfolder = outfolder,
      outfn.pre = "HB_unit"
    )
    )
output
## REML estimate of variance ratio: 0.03877 
## numerical integration of f(x):   21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
plot

We can see the files in the outfolder here:

list.files(outfolder, pattern = "HB")
output
## character(0)

And the estimates here:

tree6$est
output
##                            DOMAIN Estimate Percent Sampling Error
## 1: Medicine Wheel Ranger District 22549224                  21.66
## 2:   Powder River Ranger District 17699101                  26.52
## 3:         Tongue Ranger District 30499650                  17.10

References

Breidenbach J. 2018. JoSAE: Unit-Level and Area-Level Small Area Estimation.

Molina I, Marhuenda Y. 2015. sae: An R Package for Small Area Estimation. The R Journal, 7(1), 81–98. https://journal.r-project.org/archive/2015/RJ-2015-007/RJ-2015-007.pdf.

Rao, J.N.K. 2003. Small Area Estimation. Wiley, Hoboken, New Jersey.