Selection Criteria for Parameters in grasps

Precision matrix estimation requires selecting appropriate regularization parameter λ to balance sparsity (number of edges) and model fit (likelihood), and a mixing parameter α to trade off between element-wise (individual-level) and block-wise (group-level) penalties.

Background: Negative Log-Likelihood

In a Gaussian graphical model (GGM), the data matrix X_n × d consists of n independent and identically distributed observations X₁, …, X_n drawn from N_d(μ, Σ). Let Ω = Σ⁻¹ denote the precision matrix, and define the empirical covariance matrix as $S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$. Up to an additive constant, the negative log-likelihood (nll) for Ω simplified to $$ \mathrm{nll}(\Omega) = \frac{n}{2}[-\log\det(\Omega) + \mathrm{tr}(S\Omega)]. $$ The edge set E(Ω) is determined by the non-zero off-diagonal entries: an edge (i, j) is included if and only if ω_ij ≠ 0 for i < j. The number of edges is therefore given by |E(Ω)|.

Selection Criteria

where ξ ∈ [0, 1] is a tuning parameter. Setting ξ = 0 reduces EBIC to the classic BIC.

Figure 1 illustrates the K-fold cross-validation procedure used for tuning the parameters λ and α. The notation #λ and #α denotes the number of candidate values considered for λ and α, respectively, forming a grid of #λ × #α total parameter combinations. For each of the K iterations, negative log-likelihood loss is evaluated for all parameter combinations, yielding K performance values per combination. The optimal parameter pair is selected as the one achieving the lowest average loss across the K iterations.

Reference

Akaike, Hirotogu. 1973. “Information Theory and an Extension of the Maximum Likelihood Principle.” In Second International Symposium on Information Theory, edited by Boris Nikolaevich Petrov and Frigyes Csáki, 267–81. Budapest, Hungary: Akadémiai Kiadó.

Chen, Jiahua, and Zehua Chen. 2008. “Extended Bayesian Information Criteria for Model Selection with Large Model Spaces.” Biometrika 95 (3): 759–71. https://doi.org/10.1093/biomet/asn034.

Fan, Jianqing, Han Liu, Yang Ning, and Hui Zou. 2017. “High Dimensional Semiparametric Latent Graphical Model for Mixed Data.” Journal of the Royal Statistical Society Series B: Statistical Methodology 79 (2): 405–21. https://doi.org/10.1111/rssb.12168.

Foygel, Rina, and Mathias Drton. 2010. “Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23 (NIPS 2010), edited by J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, 604–12. Red Hook, NY, USA: Curran Associates, Inc. https://dl.acm.org/doi/10.5555/2997189.2997257.

Schwarz, Gideon. 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6 (2): 461–64. https://doi.org/10.1214/aos/1176344136.

Wang, Lan, Yongdai Kim, and Runze Li. 2013. “Calibrating Nonconvex Penalized Regression in Ultra-High Dimension.” The Annals of Statistics 41 (5): 2505–36. https://doi.org/10.1214/13-AOS1159.

Introduction

Background: Negative Log-Likelihood

Selection Criteria

Reference