NEWS R Documentation

## dbarts News

### CHANGES IN VERSION 0.9-23

#### NEW FEATURES

• No longer depends on gfortran.

• Uses SIMD instructions on M1 Macs.

• Added experimental callback functionality to rbart_vi.

#### USER-VISIBLE CHANGES

• Custom loss functiosn for xbart now require an additional weights argument.

#### BUG FIXES

• Fixed a multithreaded issue leading to inconsistent results with xbart.

• rbart_vi should now correctly use default arguments.

• rbart_vi now works with keepTrainingFits as false.

• Weighted binary responses sample latent variables from the correct distribution.

• Extracting the values from the posterior predictive distribution for models with weights now incorporates them into the variance.

• Weighted values are considered in loss functions for crossvalidation.

### CHANGES IN VERSION 0.9-21

#### NEW FEATURES

• extract now accepts as a type "trees", which allows for easier inspection of models fit with "keepTrees" as TRUE.

• print generics now exist for bart and rbart fits; implementation thanks to Emil Hvitfeldt.

• xbart now accepts a seed argument to enhance reproducibility.

• bart/bart2 (and dbarts through its tree.prior argument) accept splitprobs/ split.probs which controls the prior probability that any variable is used when splitting observations.

#### USER-VISIBLE CHANGES

• fitted for rbart_vi models now uses a C++ implementation for the expected value that uses less memory and is faster.

#### BUG FIXES

• xbart for binary outcomes with log loss no longer returns NaN when some subset of the response is perfectly predicted by the covariates. Bug report thanks to Marcela Veselkova.

### CHANGES IN VERSION 0.9-20

#### NEW FEATURES

• dbarts now exposes access to the underlying proposal rules and their probabilities through its proposal.probs argument. bart2 response to the same argument, while bart uses proposalprobs.

• bart, bart2, and rbart_vi accept a seed argument that will yield reproducible results, even when running with multiple threads and multiple chains.

#### USER-VISIBLE CHANGES

• The interface registered under R_RegisterCCallable has changed to reflect proper fixed hyperpriors for k.

• Samples of the end-node sensitivity parameter, k, are returned by rbart_vi when it modeled.

• Burn-in samples of the end-node sensitivity parameter, k, are included in the results of bart, bart2, and rbart_vi.

• rbart_vi will now look for group.by and group.by.test in the data and test arguments before looking in the formula or calling environments.

#### BUG FIXES

• Fix for k mixing across chains when running multithreaded and with k being modeled. Bug report thanks to Noah Greifer.

• Fix for xbart with method = "k-fold" when data not evenly divided by number of folds. Rug report thanks to Jesse (@ALEXLANGLANG on Github).

• Sampler method getLatents and corresponding C function now add user supplied offset to result.

• Saved, flattened trees now correctly partition observations on left and right.

### CHANGES IN VERSION 0.9-19

#### NEW FEATURES

• Samplers now have method sampleNodeParametersFromPrior. When used in conjunction with sampleTreesFromPrior allow the model to fully make predictions from the prior distribution.

• dbartsControl (and now bart/bart2 through ...) now accept rngSeed argument. This can be used to generate reproducible results with multiple threads. It should only be used for testing, as the thread-specific pRNGs are seeded using sequential draws from a pRNG created with the user-supplied seed.

• C interface supports dbarts_createStateExpression and dbarts_initializeState which can be used to re-create samplers that were allocated using forked multithreading.

• C interface also supports dbarts_predict, dbarts_setControl, and dbarts_printTrees.

• Exports makeTestModelMatrix to allow package authors to create test data at a later point from training data.

#### USER-VISIBLE CHANGES

• varcount for bart fits now has dimnames set.

• residuals generic added to bart and rbart_vi.

#### BUG FIXES

• Parallelization for rbart now creates the correct number of chains.

• Should now compile on non-x86 architectures. Report thanks to Lars Viklund.

• Fixed hang when verbose = TRUE for multiple threads and multiple chains. Report thanks to Noah Greifer.

• Fixed potential memory access errors when recreating sample from saved state.

• Correctly de-serializes saved tree structure.

### CHANGES IN VERSION 0.9-18

#### NEW FEATURES

• Sampler now explicitly supports setSigma for use in hierarchical models.

• Sampler function setOffset has an additional argument of updateScale. When the response is continuous and updateScale is TRUE, the implicit scaling, effecting the node parameters' variance, is adjusted to match the range of the new data. This optionally reverts the change of version 0.9-13 with the intention of being used only during warmup when using an offset that is itself being sampled.

#### BUG-FIXES

• Extraneous print line from debugging 0.9-17.

• Eliminated two race conditions from multithreaded crossvalidation. Report thanks to Ignacio Martinez.

• Eliminated garbage read on construction of crossvalidation sampler, removing inconsistencies across multiple runs with the same starting seed.

• makeModelMatrixFromDataFrame now converts character vectors to factors instead of dropping them. Report thanks to Colin Carlson.

### CHANGES IN VERSION 0.9-17

#### BUG-FIXES

• Memory leak for predict when keepTrees is FALSE.

### CHANGES IN VERSION 0.9-16

#### NEW FEATURES

• Added extract and fitted generics for bart models. Respects "train" and "test" sets of observations while returning "ev" - samples from the posterior of the individual level expected value, "bart" - the sum of trees component; same as "ev" for linear models but on the probit scale for binary ones, and "ppd" - samples from the posterior predictive distribution. To synergize with fitted.glm, "response" can be used as a synonym for "ev" and "link" can be used as a synonym for "bart".

#### USER-VISIBLE CHANGES

• predict for bart models with binary outcomes returns a result on the probability scale, not probit. The argument value is deprecated - use type instead.

• predict further conforms to the same system of arguments as extract and fitted.

#### BUG-FIXES

• xbart with a k-hyperprior should no longer crash. Report thanks to Colin Carlson.

### CHANGES IN VERSION 0.9-14

#### NEW FEATURES

• Fits from rbart_vi now work with generics fitted, extract, and predict. extract retrieves samples from the posterior distribution for the training and test samples, fitted applies averages across those samples, while predict can be used to obtain values for completely new observations.

#### USER-VISIBLE CHANGES

• predict for rbart_vi takes value "ev" instead of "post-mean" to clarify what is being returned, i.e. samples from the posterior distribution of the observation-level expected values.

#### BUG-FIXES

• save/load should work correctly. Report thanks to Jeremy Coyle.

### CHANGES IN VERSION 0.9-13

#### USER-VISIBLE CHANGES

• predict now works when trees aren't saved, for use in testing Metropolis-Hasting proposals.

• The offset slot no longer changes the relative scaling of the response. This stabilizes predictions across iterations. For a semantic where the scaling does change, use setResponse instead.

### CHANGES IN VERSION 0.9-12

#### NEW FEATURES

• Varying intercepts model for probit regression.

### CHANGES IN VERSION 0.9-10

#### NEW FEATURES

• A hyperpriors for k has now been implemented. Passing k = chi(degreesOfFreedom, scale) now penalizes small values of k, encouraging more shrinkage.

#### USER-VISIBLE CHANGES

• Hyperprior of chi(1.25, Inf) is now default for bart2 with binary outcomes. The default accuracy should improve substantially.

#### BUG-FIXES

• xbart divides data correctly with random subsampling.

### CHANGES IN VERSION 0.9-9

#### NEW FEATURES

• More control over cut points has been added. It is now possible to specify the cut points for a variable once and subsequently change that predictor without also modifying the cuts using sampler$setCutPoints and sampler$setPredictor.

• sampler$getTrees implemented to get a flattened, depth-first down left traversal of the trees. #### USER-VISIBLE CHANGES • For sampler$setPredictor, an argument specifies whether or not to rollback or force the change if the new data would result in a leaf having 0 observations.

• pdbart and pd2bart now work with formula/data specifications, as well as taking models or samplers that have previously stored trees.

#### OPTIMIZATIONS

• Stores x as integer matrix of the max of which cut point an observation is to the left of, by default using 16 bit integers. Limited to 65535 cut points. That can be increased with some special compilation instructions.

• Uses CPU dispatch and SIMD instructions for some operations. This and the integer x make BART about 30% faster on datasets of around 10k observations.

• Saved trees are stored using significantly less memory.

### CHANGES IN VERSION 0.9-8

#### NEW FEATURES

• plot now works for fits from rbart_vi.

#### USER-VISIBLE CHANGES

• rbart_vi new reports varcount.

• bart2 now defaults to not storing trees due to the memory cost.

• bart2 now defaults to using quantile rules to decide splits.

#### BUG-FIXES

• predict for binary outcomes now correct.

• Fix for verbose multithreading on Linux, reported by @ignacio82 on github.

• General improvements to slice sampler in rbart_vi thanks to reports from Yutao Liu.

• sampler$plotTree now handles multiple chains correctly. • Negative log loss for xbart with binary outcomes should now be computed correctly. ### CHANGES IN VERSION 0.9-2 #### NEW FEATURES • rbart_vi fits a simple varying intercept, random effects model. ### CHANGES IN VERSION 0.9-0 #### NEW FEATURES • Now natively supports multiple chains running in parallel. • Objects fit by bart can be used with the predict generic when instructed to save the trees. • New function bart2 introduced, similar to bart but with more efficient default parameters. #### USER-VISIBLE CHANGES • dbartsControl has had two parameters renamed: numSamples is now defaultNumSamples and numBurnIn is now defaultNumBurnIn. • dbartsControl supports parameters runMode, n.chains, rngKind and rngNormalKind. • In the C interface, a new function (setRNGState) has been added to specify the states of the random number generators, of which there is now one for every chain. • State objects saved by the handles no longer contain the total fits, since they can be rebuild from the tree fits. States are also lists of objects now, with one corresponding to each chain. Tree fits and strings are matrices corresponding to the number of trees and saved samples. ### CHANGES IN VERSION 0.8-6 #### NEW FEATURES • random subsampling crossvalidation (xbart) has been implemented in C++. Refits model using current set of trees for changes in hyperparameters n.trees, k, power, and base. Natively parallelized. • Rudimentary tree plotting added to sampler (sampler$plotTree).

• Exported dbartsData as a way of constructing data objects and setting the data seen by the sampler all at once. Sampler now supports sampler\$setData().

#### USER-VISIBLE CHANGES

• keepevery argument to bart matches BayesTree.

• bart now has argument keepcall to suppress storing the call object.

• bart now accepts a weights argument.

• MakeModelMatrixFromDataFrame now implemented in C, supports an argument for tracking/keeping dropped values from factors.

#### BUG-FIXES

• Usage of weights was causing incorrect updates to posterior for \sigma^2.

• Should now JIT byte compile correctly.

• Cuts derived from quantiles should now be valid.

### CHANGES IN VERSION 0.8-4

#### NEW FEATURES

• Uses a rejection sampler to simulated binary latent variables (CP Robert 2009, http://arxiv.org/pdf/0907.4010.pdf). Code thanks to Jared Murray.

• Now encapsulates its own random number generator, so that the C++ objects can safely be used in parallel. Shouldn't affect pure-R users unless their RNG has non-exported state (i.e. Box-Muller normal kind).

• Includes a offset.test vector that can be controlled independently of the offset vector, but in general inherits behavior from it. Set at creation with dbarts() or after with setTestOffset or setTestPredictorAndOffset.

#### USER-VISIBLE CHANGES

• By default, no longer attempts to obtain identical results as BayesTree. To recover this behavior, compile from source with configure.args = "--enable-match-bayes-tree".

• Changing the entirety of the test matrix using setTestPredictor no longer allowed. Use setTestPredictors instead.

• Changing the predictor can now result in failure if the covariates would leave an end-node empty. setPredictor returns a logical as to success.

• Saved dbarts objects may not be compatible and should be re-created to be sure of valdity.

• Now requires R versions >= 3.1.0.

#### BUG FIXES

• Corrected binary latent variable sampler and no longer multiply adds offset (reported by Jared Murray).

• Relatively embarassing bug related to loop-unrolling when n mod 5 != 0 fixed.

• Correct aggregation of results for multithreaded variance calculations.