@q  file: errhdling.w @>
@q%   Copyright Dave Bone 1998 - 2015@>
@q% /*@>
@q%    This Source Code Form is subject to the terms of the Mozilla Public@>
@q%    License, v. 2.0. If a copy of the MPL was not distributed with this@>
@q%    file, You can obtain one at http://mozilla.org/MPL/2.0/.@>
@q% */@>

@** Error detection and handling.\fbreak
Let's review how this can be done.
Within a grammar's production there are points where  an invalid symbol
could  arrive.
If one does not program for it, the parser will go kapout.
So what are the options open to a grammar writer?
First there is a ``{\bf failed}'' directive in the ``fsm'' construct
that will field aborted parses.
It is the last chance to deal with errors in a rather 
insensitive way.
If  there are many contexts within the grammar that could go
wrong then this approach is too insensitive to be specific 
about the context's  
error point.
Though  the errant current token is available
to report on, what was the inapproproiate context that threw it?
Well u could try to figure it out from the remnants on the parse stack.

To deal with specific error points, the \QUEshift{}, \ALLshift{}, 
and \INVshift{} symbols can catch
errant tokens, or one can be very specific in specifiying the errant T to catch.
This last option can be very daunting when one has 500+ T to deal with
and lets be honest not really appropriate.
This was why i introduced the meta-terminals \QUEshift{} and \ALLshift.
To catch a rogue and associate  syntax directed code to handle the situation,
these symbols MUST be within prefix subrules where they are the 
last symbol in the subrule's symbol string.
What does this mean?
Having a string of symbols where these catch T symbols are burried
 within a larger symbol string
means the subrule's containing these symbols 
will not be executed as its sentence has not been
completely recognized.
For example:\fbreak
\INDENT{.4in}{\subrule{} a \QUEshift{} b --- will not handle the error
at the \QUEshift{} point}
\INDENT{.4in}{\subrule{} a Rqueshift b --- will catch the problem}
\INDENT{.7in}{Rule Rqueshift \subrule{} \QUEshift{}...}
\INDENT{.9in}{will catch the error with appropriate syntax directed code directive }
\fbreak
Caution: The ranking of meta-terminal shifts: 1 and a 
2 and a 3 ---\QUEshift{}, \INVshift{}, \ALLshift{}\fbreak
The \QUEshift{} symbol is checked first for its presence within the current parse state
followed by the \INVshift{} symbol as it is normally used to
get out of a quasi-ambigous parse. The \ALLshift{} aka wild shifter 
is the last to be checked in the parse state.
It is their presence within the parse state that activates their use.
The \QUEshift{} is an error statement and was my reason to put it at the head of the 
conditional shifts.
So watch your shifts as this could catch u like me.
Remove 1 of the 2 competing shift symbols: \ALLshift or \INVshift. For the moment i
have not issued an error message on this situation.
@^ To do error message conditional shift ranking both \ALLshift{} and \INVshift{} in state@> 
@^ To do error message conditional shift \INVshift{} takes precedence over \ALLshift{}@> 
\fbreak
\fbreak 
Dictate no 1: Last symbol in subrule's symbol string must be the catcher in the Error\fbreak
Make sure your error catch point has \QUEshift{} or \ALLshift{} as its 
last symbol within the symbol string and let your syntax directed code
decree the error escape route to be taken.
Yeah that's fine but what if the symbol string to be recognized 
contains many catch points?
Just make each symbol string segment a separate rule with the error code catch point
being the last symbol in the string competing with its legitimate accepted T symbols and
use these rules  within another rule's subrule as part of its symbol string to be recognized!
The lr algorithm is a collection of various symbol string configurations
per state
in various accepted T points along their parsing.
So by transitive closure these prefix rules get included in the state to
be  
recognized along with the other similar prefix symbols.
When the prefix rule's ```rhs'' boundary is recognized, depending
on the error catcher used, the reduce will fire either in good form or
as an error.
\fbreak
\fbreak
What to do when an error is detected?\fbreak
For now i have not thought out error correction strategies though i am marginally
aware of the backtracking techniques.
I will now discuss current programming options open to the grammar writer.
Depending on the context, the thread could abort which is the most drastic.
This takes place when no error catching is programmed
and \O2 issues a runtime message on the aborted grammar with its run stack goodies.
This might be okay to get things going but isn't too appropriate within a 
production environment.
Well the catch points have 2 programming options available:\fbreak
\INDENT{.4in}{1) return an error token back to the calling grammar
and stop parsing of the active grammar}
\INDENT{.4in}{2) abort the parse and field it using the ``failed'' 
directive to  return an error T}
Point 1 should be your main course of action.
That is both macros |RSVP| and |RSVP_FSM|  return a T back to the calling
grammar through the accept queue facility
as if the parse was successfull. This is what point 2 does using the |RSVP_FSM| macro
as its execution is within the ``fsm'' context of the grammar and not the reducing rule.
The calling grammar can then field this returned T specifically
or use the two meta-terminal \QUEshift{} or \ALLshift{} to deal with them.
They are allowed in any subrule symbol string  context: 
thread calls where its returned T can be one of these symbols, and the
regular subrule symbol string.
\fbreak
\fbreak
Pinpointing where the error occured in the source file\fbreak
Built into \O2 is the facility to tag each T with its approriate source file's GPS --- 
filename, line number, and character position. These co-ordinates
are used to print out the errant source line
 with an arrow underlining the errant source token.
So when an error T is created, use of the |set_rc| and variants allows one to pinpoint the 
error T against the GPS's source file T. 
Have a read on ``Abstract symbol class for all symbols'' --- |CAbs_lr1_sym|.
\fbreak
\fbreak
Some subtleties on making the errant T fire off the error
catching syntact directed code.\fbreak
Let me pose a question: What happens when the errant T is not in the lookahead set
to reduce that subrule? Well it will not get executed! Ugh. This is just 
not acceptable Dave.
Well to the rescue is the \QUEshift{} symbol.
It is not in the token stream but represents an errant situation.
So where is this errant T placed?
When one enters the subrule's syntact directed code segment, 
all its subrule's elements have been shifted onto the parse stack
where this last errant symbol is  represented by \QUEshift{}.
But the \QUEshift{} symbol {\bf  does not advance past the errant T}
 as in regular parsing. 
So what does this mean?
The current errant T is also the lookahead symbol for the reduction.
But wait what if this T is not in the lookahead set to reduce this subrule.
Well i made this type of reduce a lr(0) context: no lookahead symbol required
to reduce the subrule.

To get at the current elements on the parse stack,
\O2 emits within each subrule's c++ code the stack frame with each 
subrule's symbol string assigned to ``|sf->pxx__|'' where xx is the symbol's
string position.
This is the difference to \ALLshift: \ALLshift{} depends on the lookahead set to reduce.
Now what then is the advantage to using \ALLshift?
One can test its under-its-hood T's enumerate value and then take error action or
stop use of the \ALLshift{} facility that allows the grammar
to continue parsing up to the ``start rule''.
As it's a wild symbol shifter, it really lowers the grammar's parse tables sizes
and eases the grammar writer's typing.
\fbreak
\fbreak
Dictate no 2: Games on returning the new lookahead T back to the calling grammar\fbreak
U can play games with resetting the new lookahead T that is passed back
 with its |RSVP| T companion within the accept queue.
This is what happens when 
just 1 T is returned: the lookahead T is  the parse stream continue point
and  also its  contents to set the calling parser's current token to continue with. 
As an aside why use the returned lookahead's T  contents
instead of just resetting the continue T from the token stream's
container 
using the lookahead token position?
Well u could also remap the current token into another T type
due to say a symbol table remapping --- like Pascal and its ``const-id'', ''function-id''
as  described in the railroad diagrams of ``The Pascal Reference Manual''.
The remapping facility is open for use via the ``Table lookup functor'' facility.
The following methods adjust the parser's token stream:\fbreak
\INDENT{.4in}{|override_current_token_pos(symbol,position)|}
\INDENT{.4in}{|override_current_token(symbol)|}
\INDENT{.4in}{|reset_current_token(position)|}

In a dual competing threads situation
where  each grammar have accepted their 
parse and are returning their booty to the calling grammar,
the calling grammar must use arbitration to select the T gift and
sets its parse stream accordingly and the balance in the ``accept queue''
are so-to-speak thrown away. 
Of course the {\bf arbitration} facility 
is programmed by the compiler writer when 2 or more successfull threads are
returning their booty back to the calling grammar.
Normally this does not occur as there is just one thread that will
report its findings but this city is built on rock and nondeterminism.
So a subset / superset competition, or an accept and error combo
 is quite acceptable and for the arbitrator's
choosing. Forgotten arbitration code  will be regurgitated by the \O2 library
in message form
for your fixing.

The one caveat to watch for is: What is the current 
token and its position in the parse stream 
when it enters the subrule's syntax directed code?
\QUEshift{} still has the errant T as its current T and to reset back
to the previous T u only subtract 1 from the current token position.
\ALLshift{} demands 2 be subtracted as the current T is the new lookahead T.
So u've been warned.
\fbreak
\fbreak
Some comments on stopping a parse by syntax directed code:\fbreak
Apart from the don't do anything approach, the grammar writer can talk to the parser
and dictate
his intentions. The 2 methods open are abort-the-parse or stop-parsing.
The abort-the-parse action allows the thread to stop without any T returned to 
the caller grammar
or  
 use the {\bf failed} directive to last-chance return an  error T back to the caller.
The stop-parsing approach returns a T back to the user but does not want
to continue the complete parse through to its ``start rule''.
It just short-circuits the overall grammar's parsing action.
Remember that if the parse has been successfull ``why complete the parsing 
thru to start-rule?''.
Depending on your local grammar logic this might be the most expedient way to program.
Here are the 2 methods to do this:\fbreak
\INDENT{.4in}{|set_abort_parse(true)|}
\INDENT{.4in}{|set_stop_parse(true)|}
What about the reducing of this subrule?
Well it occurs, as entry into the syntax directed code that
contains the grammar writer's code to execute these  statements 
are kosher reducing conditions.
So why the ``abort-parse'' versus ``stop-parse'' difference.
``stop-parse'' should contain the |RSVP| macro that
enters the returned T into the calling grammar's ``accept-queue''.
The ``abort-parse'' normally does not contain this action.
\fbreak
\fbreak
Warning no 3: if \ALLshift{} being used, don't forget to turn it off.\fbreak
This symbol is voracious: eats and  eats everything in its path.
So u can arrive at trying to eat the ``end-of-the-parse-stream'' ``|eog|'' symbol
forever...
\O2 guards against this but is rather abrupt in its message to the grammar writer
and stopping
of the parse immediately. 
So u'll see in some the suggested grammars  |set_use_all_shift_off| method being 
called to get out of this perpetual motion and possiblely continue up the parse chain to the 
``start rule''.
Here is  a list of some \O2 grammars having error handling and premature
 stopping of a parse to learn from.
\INDENT{.4in}{1) |o2_lcl_opts.lex| and called thread |o2_lcl_opt.lex| --- command line parser}
\INDENT{.4in}{2) |la_express.lex| --- |set_abort_parse(true)| thread's la expression parser}
\INDENT{.4in}{3) |c_string.lex| --- semantic example stopping a parse and programmed fsa}
Point 1 gives an example of how the ``failed'' directive in the called thread |o2_lcl_opt.lex|
is programmed  and ``|set_stop_parse(true)|'' use in the calling grammar |o2_lcl_opts.lex|
of a monolitic grammar. |pass3.lex| and point 2 give more examples on monolithic use
to aborting. Point 3 also shows programming use of the ``|set_abort_parse(true)|''.
For the really curious, why not use the find/grep/xargs combo to settle your appetite against
\O2's grammars.
\fbreak
\fbreak
The last word, amen and  happy parsing.\fbreak
Remember that the normal flow of errors should be placed into the ``error queue'' and
then post processed to report its findings. 
|ADD_TOKEN_TO_ERROR_QUEUE| and its variant |FSM_ADD_TOKEN_TO_ERROR_QUEUE| 
allow u to do this. |pass3.lex| gives lots of examples and \O2's program shows its
way of post-verbing the troubles. 
And with all this error stutter, each grammar does a post-execution  grammar  
cleanup on current parsing
for the next round of their calling. Again what does this mean?
A semi-abort was done just to stop its execution leaving the grammar in an abort state. 
But
each grammar does a resetting to a clean slate for its next round of
calling either by ``procedure call'' if no nesting calls of itself is occuring
or by the heavy thread call.
Hygiene is important so the cat washes itself for the next eating.