This vignette is intended for users who use mbreaks to estimate and test for linear regression models in the presence of multiple structural changes. The package offers a set of comprehensive tools which deal with both pure and partial structural change models. In particular, it provides the Sup F tests for 0 versus a known number of structural changes and the double maximum (UD max) tests for 0 versus an unknown number of structural changes. The sequential tests for $l$ versus $l+1$ structural changes are also available to determine the number of structural changes (Bai and Perron (1998), Bai and Perron (2003)). The package also includes methods of estimating the number of structural changes via information criteria (Yao (1988), Liu et al. (1997)) as well as a built-in function to visualize the fit of the estimated structural break model with the estimated number $m^*$ of structural changes. A comprehensive call to conduct all of the procedures contained in package is provided.

Econometric framework of `mbreaks` package

The package mbreaks provides R users with minimum but comprehensive functions to analyze multiple structural changes in linear regression models in which the regressors and errors are non-trending. The framework is based on the econometric model of the following form:

\[ y_t = x_t^{\prime}\beta + z_t^{\prime} \delta_j + u_t; \\ t=T_{j-1}+1,...,T_j, {\quad}{\text{for}}{\quad}j=1,...,m+1\] where $\underset{(p \times 1)}{x_t}$ is a vector of regressors with fixed coefficients (if any) and $\underset{(q \times 1)}{z_t}$ is a vector of regressors with coefficients subject to change. The break dates are $t=T_j$ for $j=1,...,m$ and $T_0=0$ and $T_{m+1}=T$ so that $T$ is the entire sample size. If $p=0$, the model is called a pure structural change model, and if $p>0$, the model is called a partial structural change model. The twofold goals of this package is to enable user to:

Test for the presence of structural changes by:
- supF tests for no structural change versus a known number of structural changes: dotest() (Bai and Perron (1998))
- UDmax test for no structural change versus unknown number of structural changes: dotest() (Bai and Perron (1998))
Select the number of structural changes ($m^*$) by:
- sequential tests:
  - Determine the number of structural changes by using $\sup F_T(l+1|l)$ test: dosequa() (Bai and Perron (1998),Bai and Perron (2003))
  - Estimate one break at a time by using the repartition method: dorepart() (Bai (1997))
- information criterion:
  - Bayesian Information Criterion (BIC): doorder() (Yao (1988))
  - Modified Schwarz Criterion (MSIC): doorder() (Liu et al. (1997))

If the number of structural changes is known (or specified), users can use dofix() function to estimate the model with the corresponding number of changes. There are 3 classes in the package, corresponding to 2 types of diagnostic tests and one for estimation based on the selected number of structural changes. In summary:

sbtests: S3 class returned from dotest() function. It includes summary of the supF tests and UDmax test with test statistics, critical values, and summary tables which could be viewed in the console by print() method or open in a separated tab by View()
seqtests: S3 class returned from doseqtests() function. It includes summary of the sequential tests for $l$ versus $l+1$ structural changes with test statistics, critical values and a summary table which could be viewed in the console via print() method or open in separated tab by View()
model: S3 class returned from estimation procedures above including dosequa(), doorder(), dorepart(), and dofix() of models with $m^*$ structural changes. The class model contains numerous information which is summarized comprehensively into 3 main tables: i) break dates (estimated break dates and corresponding asymptotic confidence intervals based on assumptions on the regressors and errors), ii) regime-specific coefficients (estimates in each regime and corrected standard errors based on assumptions on the regressors and errors) and iii) full-sample coefficients if $p>0$ (estimates and corrected standard errors based on assumptions on the regressors and errors). Besides the information presented in the 3 main tables, model contains fields (with majority of them are class matrix) which users can access by using operator $ for further analysis in R such as fitted values, residuals and the name of procedure used.

Usage, arguments and examples

Usage of main functions in `mbreaks`

The previous section introduced the framework on which mbreaks package is built and summarizes the classes of procedures available to users. In this section, we will illustrate the syntax of high-level functions and cover the arguments that users might want to customize to match their model with data and empirical strategy.

The mbreaks package designs high-level functions to have identical arguments with default values recommended by the literature to save users the burden. Users can use mdl(), a comprehensive function that invokes all high-level functions explained in previous section ¹:

#the data set for the example is real.Rda
data(real)
#carry out all testing and estimating procedures via a single call
rate = mdl('rate',data=real,eps1=0.15)
#display the results from the procedures
rate
-----------------------------------------------------
The number of breaks is estimated by KT 
Pure change model with 2 estimated breaks.
Minimum SSR = 455.950 

Estimated date:
        Break1  Break2
Date        47      79
95% CI (37,50) (77,82)
90% CI (40,49) (78,81)

Estimated regime-specific coefficients:
           Regime 1       Regime 2      Regime 3
 (SE) 1.355 (0.155) -1.796 (0.511) 5.643 (0.603)

No full sample regressors
-----------------------------------------------------
a) SupF tests against a fixed number of breaks

        1 break 2 breaks 3 breaks 4 breaks 5 breaks
Sup F    57.906   43.014   33.323   24.771   18.326
10% CV    7.040    6.280    5.210    4.410    3.470
5% CV     8.580    7.220    5.960    4.990    3.910
2.5% CV  10.180    8.140    6.720    5.510    4.340
1% CV    12.290    9.360    7.600    6.190    4.910

b) UDmax tests against an unknown number of breaks

   UDMax 10% CV  5% CV 2.5% CV  1% CV
1 57.906  7.460  8.880  10.390 12.370
-----------------------------------------------------
supF(l+1|l) tests using global optimizers under the null

         supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
Seq supF    57.906    33.927    14.725     0.033     0.000
10% CV       7.040     8.510     9.410    10.040    10.580
5% CV        8.580    10.130    11.140    11.830    12.250
2.5% CV     10.180    11.860    12.660    13.400    13.890
1% CV       12.290    13.890    14.800    15.280    15.760
-----------------------------------------------------
To access additional information about specific procedures
  (not included above), type stored variable name + '$' + procedure name

Users should find the syntax minimal similar to lm() in stats package. It is required to specify the name of dependent variable $y$ followed by the two types of the regressors $z$ and $x$ from the data frame. Note that $z$ automatically includes a constant term. If the model is a pure structural change model, no $x$ is specified. If none of $z$ and $x$ are specified, the program assumes that this is a mean shift model (because a constant term is included in $z$ by default). The names of regressors must match the names used in the data frame, otherwise errors will be displayed and execution halted. As we will explain in the following section, the package prepares various options to be specified by users. These are set at default value if not specified in mdl() or any high-level functions of the procedures: dotest(), dosequa(), doorder(), dorepart(), and dofix()etc.

Options for high-level functions

The model requires a trimming value $\epsilon$ as a small interval of length $h = \lfloor \epsilon T\rfloor$ is required between any two adjacent segments. Users can modify $\epsilon$ in arguments passing to all high-level functions ² by setting $\epsilon{\quad} {\in}$ {0,0.05, 0.10, 0.15, 0.25}. ³. If the user’s input value for eps1 is invalid, it will be set to default value eps1=0.15. Also, if eps1=0, users can directly (and are required to explicitly specify) hin parameter for estimation. This option is only available to model selection via information criteria like doorder(), model estimation via dofix(). All procedures based on hypothesis testing such as dorepart(), dosequa(), dotest() and doseqtests() will not work with 0 trimming level eps1.
Argument m specifies the maximum number of breaks considered in the model. This argument is automatically matched with eps1 argument. If the program finds that m is invalid (non-positive or larger than that allowed by the sample size given the trimming value eps1), it will be set automatically to maximal breaks allowed by the sample size and the specified trimming level eps1. The default value is m=5.
The following options related to assumptions on the structure of long-run covariance matrix of ${z_tu_t}$ and ${x_tu_t}$ (if any):
- robust: Allow for heteroskedasticity and autocorrelation in $u_t$. The default value is robust=1. If set to robust=0, the errors are assumed to be a martingale difference sequence.
- hetvar: Allow the variance of the errors $u_t$ to be different across segments. This option is not allowed to be 0 when robust=1.
- prewhit: Set to 1 if users want to prewhiten the residuals with an AR(1) process prior to estimating the long-run covariance matrix. The default value is prewhit=1.
The following options related to the second moments of the regressors $z_t$ and $x_t$ (if any):
- hetdat: Set to 1 to allow the second moment matrices of $z_t$ and $x_t$ (if any) to be different across segments. Set to 0 otherwise. It is recommended to set hetdat=1 for $p>0$. The default value is hetdat=1
- hetq: Set to 1 to allow the second moment matrices of $z_t$ and $x_t$ (if any) to be different across segments. This is used in construction of the confidence intervals for the break dates. If hetq=0, the second moment matrices of the regressors are assumed to be identical across segments. The default value is hetq=1.
- hetomega: Set to 1 to allow the long-run covariance matrix of $z_t u_t$ to be different across segments. Set to 0 otherwise. This is used in construction of the confidence interval for the break dates. The default value is hetomega=1.
Additional options specific to a partial structural change model: ⁴
- maxi: Maximum number of iterations if no convergence attained when running the iterative procedure to estimate partial structural change model. The default is maxi=20
- eps: Criterion for convergence of the iterative procedure. The default value is eps=0.0001
- fixb: Set to 1 if users intend to provide initial values for $\beta$, where the initial values are supplied as matrix betaini of size $(p \times 1)$. If betaini is invalid, the program will throw an error and stop.

There are two options available to all high-level functions:

const: Set to 1 to include constant in $z$. The default value is const=1. Users can turn off the constant in $z$ by setting const=0.
printd: Set to 1 to print intermediate outputs from estimation procedures of the program to console. The default value is printd=0

Additional specific options:

doorder(): option ic. Users can specify which information criterion used for selecting number of breaks. Available information criteria are modified BIC 'KT' following Kurozumi and Tuvaandorj (2011), 'BIC' following Yao (1988) and modified SIC 'LWZ' following Liu et al. (1997)
dosequa() and dorepart(): option signif. Option to specify significant level used in the sequential tests or the repartition method to determine the number of structural changes. The default value is signif=2 corresponding to the 5% significance level. Other values are signif=1 for the 10%, signif=3 for the 2.5%, and signif=4 for the 1% significance levels, respectively.⁵

Empirical examples

The vignette replicates 2 empirical exercises: i) US real interest rate and ii) New Keynesian Phillips curve.

US real interest rate

Garcia and Perron (1996),Bai and Perron (2003) considered a mean shift model:

\[y_t = \mu_j + u_t, \quad\text{for } t = T_{j-1}+1,...,T_{j}\quad\text{and } j=1,...,m.\]

for the US real interest rate series from 1961Q1 to 1986Q3. We allow heteroskedasticity and serial correlation in the errors $u_t$ by using the heteroskedasticity and autocorrelation consistent (HAC) long-run covariance estimate using the default setting (robust=1) with the prewhitened residuals also by the default setting (prewhit=1). Here, instead of invoking mdl(), we demonstrate the specific syntax to obtain the model with the number of structural changes $m^*$ selected by modified BIC information 'KT'

#estimating the mean-shift model with BIC (the default option is ic=`'KT'`, which use modified BIC as criterion)
rate_mBIC = doorder('rate',data=real)
#NOTE: equivalent to rate$KT; type rate$KT to compare with new result


# visualization of estimated model with modified BIC (in the argument, we can replace rate$KT with rate_mBIC for exact same graph; recall that `$` is the operator to refer to field BIC in return list from mdl())
plot_model(rate$KT, title = 'US Exchange rate')

The plot_model() function takes any estimated structural break model of class model and makes a graph with the following contents:

The observed $y$, fitted $\hat{y}_{m^*}$ values from a model with $m^*$ breaks (estimated or pre-specified depending on function called) and fitted $\hat{y}_0$ from a no break model for a direct comparison
The estimated break dates with labels in chronological order and confidence interval depends on CI argument (which is either 0.90 or 0.95) for respective dates.
The confidence interval (also depends on argument CI) for fitted values $\hat{y}_{m^*}$⁶

To show flexibility of class model in the package, we can reproduce a similar graph using information returned from stored variable rate_BIC.

  #collect model information
  m = rate_mBIC$nbreak           #number of breaks
  y = rate_mBIC$y                #vector of dependent var
  zreg = rate_mBIC$z             #matrix of regressors with changing coefs
  date = rate_mBIC$date          #estimated date
  fity = rate_mBIC$fitted.values #fitted values of model
  bigT = length(y)
  #compute the null model
  fixb = solve((t(zreg) %*% zreg)) %*% t(zreg) %*% y
  fity_fix = zreg%*%fixb    #fitted values of null model
  
  
  #plots the model
  tx = seq(1,bigT,1)
  range_y = max(y)-min(y);
  plot(tx,y,type='l',col="black", xlab='time',ylab="y", 
       ylim=c(min(y)-range_y/10,max(y)+range_y/10),lty=1)
  #plot fitted values series for break model
  lines(tx, fity,type='l', col="blue",lty=2)
  #plot fitted values series for null model
  lines(tx, fity_fix,type='l', col="dark red",lty=2)
  
  #plot estimated dates + CIs
  for (i in 1:m){
    abline(v=date[i,1],lty=2)
    if (rate_mBIC$CI[i,1] < 0){rate_mBIC$CI[i,1] = 0}
    if(rate_mBIC$CI[i,2]>bigT){ rate_mBIC$CI[i,2]=bigT}
    segments(rate_mBIC$CI[i,1],min(y)*(12+i/m)/10,rate_mBIC$CI[i,2],min(y)*(12+i/m)/10,lty=1,col='red')
  }
  
  legend(0,max(y)+range_y/10,legend=c("observed y",paste(m,'break y'),"0 break y"),
        lty=c(1,2,2), col=c("black","blue","red"), ncol=1)

New Keynesian Phillips Curve

Perron and Yamamoto (2015) investigates the stability of New Keynesian Phillips curve model proposed by Galı and Gertler (1999) via linear model:

\[\pi_t = \mu + \gamma \pi_{t-1} + \kappa x_t + \beta E_t \pi_{t+1} + u_t\]

where $\pi_t$ is inflation rate at time t, $E_t$ is an expectation operator conditional on information available up to $t$, and $x_t$ is a real determinant of inflation. In this example, we will reproduce specific results of the paper with ready-to-use dataset:

data(nkpc)
#x_t is GDP gap
  z_name = c('inflag','ygap','inffut')
  #we can invoke each test separately by using dotest() and doseqtests()
  supF_ygap = dotest('inf',z_name,data=nkpc,prewhit = 0, eps1 = 0.1,m=1)
  #z regressors' names are passed in the argument as an array, which equivalent to above argument call with z_name
  seqF_ygap = doseqtests('inf',c('inflag','ygap','inffut'),data=nkpc,prewhit = 0, eps1=0.1)
  #see test results
  supF_ygap

a) SupF tests against a fixed number of breaks

        1 break
Sup F    22.220
10% CV   14.810
5% CV    16.760
2.5% CV  18.620
1% CV    20.750

b) UDmax tests against an unknown number of breaks

   UDMax 10% CV  5% CV 2.5% CV  1% CV
1 22.220 15.230 17.000  18.750 20.750
  seqF_ygap

supF(l+1|l) tests using global optimizers under the null

         supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
Seq supF    22.220    12.559    22.294    13.565    18.463
10% CV      14.810    16.700    17.840    18.510    19.130
5% CV       16.760    18.560    19.530    20.240    20.720
2.5% CV     18.620    20.300    21.180    21.860    22.400
1% CV       20.750    22.400    23.550    24.130    24.540
  
  
#x_t is labor income share 
  #or invoke all tests using mdl() 
  nkpc_lbs = mdl('inf',c('inflag','lbs','inffut'),data=nkpc,prewhit = 0, eps1=0.1, m=5)

There are no breaks selected by BIC and estimation is skipped

There are no breaks selected by LWZ and estimation is skipped

There are no breaks selected by KT and estimation is skipped
  nkpc_lbs$sbtests

a) SupF tests against a fixed number of breaks

        1 break 2 breaks 3 breaks 4 breaks 5 breaks
Sup F    30.592   69.630   37.894   32.765   30.607
10% CV   14.810   13.560   12.360   11.430   10.610
5% CV    16.760   14.720   13.300   12.250   11.290
2.5% CV  18.620   15.880   14.220   12.960   11.940
1% CV    20.750   17.240   15.300   13.930   12.780

b) UDmax tests against an unknown number of breaks

   UDMax 10% CV  5% CV 2.5% CV  1% CV
1 69.630 15.230 17.000  18.750 20.750
  nkpc_lbs$seqtests

supF(l+1|l) tests using global optimizers under the null

         supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
Seq supF    30.592    11.408    11.595    12.564    26.418
10% CV      14.810    16.700    17.840    18.510    19.130
5% CV       16.760    18.560    19.530    20.240    20.720
2.5% CV     18.620    20.300    21.180    21.860    22.400
1% CV       20.750    22.400    23.550    24.130    24.540

To replicate the results, we turn off prewhit option.

The values of SupF 30.6 and F(2|1) 30.6 test statistics are equivalent to 30.6 and 11.4 and the values 22.2 and 22.2 are equivalent to 22.2 and 12.6 in table VI of Perron and Yamamoto (2015) . Given the Sup F(2|1) statistics in both regressions is smaller than the 10% critical values 14.81 and both Sup F test statistic of 0 versus 1 break is larger than the 1% critical values 20.75, we conclude there is only 1 break detected.

The estimated break dates 1:Q1991 also match 1991:Q1 in the reported table. The package exactly replicates results presented in Perron and Yamamoto (2015). Given the estimated date from the sequential approach, we could split the sample into two subsamples and conduct the two stage least squares (2SLS) in each subsample as suggested by Perron and Yamamoto (2015):

Using the matrix formula for 2SLS estimator below in IV regression, we are able to obtain close estimates for first subsample as reported in table VII (Perron and Yamamoto (2015)). \[ \hat{\beta}_{IV} = (X'P_ZX)^{-1}X'P_Zy \\ P_Z = Z(Z'Z)^{-1}Z' \] where $X = [X_1 \;\; X_2]$ is matrix of both endogenous regressor $X_1 = \pi_{t+1}$ and exogenous regressors $X_2 = [1,\pi_{t-1},x_t]$ and $Z = [Z_1 \;\; Z_2]$ is matrix of instruments including excluded instruments $Z_1$ and included instruments $Z_2 = [1,\pi_{t-1}]$. In total, the instruments used in first stage regression are lags of inflation, labor income share, output gap, interest spread, wage inflation and commodity price (6 instruments).

The IV estimates $\hat{\theta}_{IV}=(\hat{\mu},\hat{\gamma},\hat{\kappa},\hat{\beta})$ for 1960:Q1-1991:Q1 subsample and 1991:Q2-1 are

	$\mu$	$\gamma(\pi_{t-1})$	$\kappa(x_t)$	$\beta(E_t\pi_{t+1})$
1960:Q1-1991:Q1	0.001	0.338	-0.002	0.632
$SE_1$	0.002	0.115	0.009	0.130
1991:Q2-1997:Q4	0.010	0.589	-0.024	-0.866
$SE_2$	0.007	0.225	0.045	0.309

The table above replicates Perron and Yamamoto (2015) table VII exactly

Reference

Bai, J., 1997. Estimating multiple breaks one at a time. Econometric Theory 13, 315–352.

Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 1–22.

Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple structural changes. Econometrica 47–78.

Galı, J., Gertler, M., 1999. Inflation dynamics: A structural econometric analysis. Journal of Monetary Economics 44, 195–222.

Garcia, R., Perron, P., 1996. An analysis of the real interest rate under regime shifts. Review of Economics and Statistics 111–125.

Kurozumi, E., Tuvaandorj, P., 2011. Model selection criteria in multivariate models with multiple structural changes. Journal of Econometrics 164, 218–238.

Liu, J., Wu, S., Zidek, J.V., 1997. On segmented multivariate regression. Statistica Sinica 497–525.

Perron, P., Yamamoto, Y., 2015. Using OLS to estimate and test for structural changes in models with endogenous regressors. Journal of Applied Econometrics 30, 119–144.

Yao, Y.-C., 1988. Estimating the number of change-points via schwarz’criterion. Statistics & Probability Letters 6, 181–189.

Users can call independent functions to carry out specific procedures as outlined above instead of conducting all 6 main procedures provided by package via mdl()↩︎
The high level functions are mdl(), dofix(), dosequa(), dorepart(), doorder(), dotest(), doseqtests()↩︎
This argument is different from eps where eps sets the convergence criterion for iterative scheme when estimating partial change model.↩︎
The results of pure structural change models are not affected by these options↩︎
The above options are used extensively in all high-level functions of the program to specify required assumptions on structural break model. Additional options relating to formatting output in plot_model() function will not be explained in this section. For more information type ?plot_model or ?mdl() to understand the distinctions between two types of options↩︎
The option CI for plot_model() is used to specify the confidence interval around estimates of break dates and fitted values. For fitted values, it is computed as: \[ (\hat{y}_t^{m^*-},\hat{y}_t^{m^*+}) = (x_t'(\hat{\beta}-Z_{CI} \hat{s.e.}(\hat{\beta}))+z_t'(\hat{\delta}-Z_{CI}\hat{s.e.}(\hat{\delta})),x_t'(\hat{\beta}+Z_{CI} \hat{s.e.}(\hat{\beta}))+z_t'(\hat{\delta}+Z_{CI}\hat{s.e.}(\hat{\delta})))\] where \[ Z_{CI} = \begin{cases} 1.96 & CI=0.95 \\ 1.65 & CI=0.90 \end{cases} \] For break dates, the confidence interval is obtained via limiting distribution of $\hat{T}_j$ (Bai and Perron (1998)) ↩︎

Estimating and testing for multiple structural changes with `mbreaks` package

Linh Nguyen, Pierre Perron, Yohei Yamamoto

Econometric framework of `mbreaks` package

Usage, arguments and examples

Usage of main functions in `mbreaks`

Options for high-level functions

Empirical examples

US real interest rate

New Keynesian Phillips Curve

Reference

Estimating and testing for multiple structural changes with mbreaks package

Linh Nguyen, Pierre Perron, Yohei Yamamoto

Econometric framework of mbreaks package

Usage, arguments and examples

Usage of main functions in mbreaks

Options for high-level functions

Empirical examples

US real interest rate

New Keynesian Phillips Curve

Reference

Estimating and testing for multiple structural changes with `mbreaks` package

Econometric framework of `mbreaks` package

Usage of main functions in `mbreaks`