@q file: intro.w@> @q% Copyright Dave Bone 1998 - 2015@> @q% /*@> @q% This Source Code Form is subject to the terms of the Mozilla Public@> @q% License, v. 2.0. If a copy of the MPL was not distributed with this@> @q% file, You can obtain one at http://mozilla.org/MPL/2.0/.@> @q% */@> @** Summary of Yacco2's Linker --- threads and their bit maps.\fbreak \Olinker produces ``thread bit maps'' for each terminal and a global table of to-run threads, and a linker document describing all the processed grammars extracted from their grammar's ``fsm-comments'' and their called threads. Standalone grammars: monolithic y, are also traversed to add their thread booty to the global bit maps. As Linker is a companion program of Yacco2, please see Yacco2's documentation for appropriate development of data structures that are germain to both. Linker receives its input either interactively or as a command line. The file to compile contains a list of grammar first set control files generated by Yacco2. Each grammar's ``first set'' file generated by Yacco2 uses a naming convention of the ``thread name'' supplied by the grammar's |fsm-filename| with an extension of ``fsc'' indicating a ``first set'' control file. At present, Linker's inputted file is handcrafted by the grammar writer due to Yacco2 not having a generation database of compiled grammars. The below diagram shows \O2{}Linker's manufacturing assembly line for the ``bit maps'' object file within an Unix context where the ``ld'' linker program resolves the globals described in the following section. The ``pdf'' generated document provides an overview description of all the grammars in use. ``CC'' is the c++ compiler on my Sun Solaris work station. \fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/o2linker_overview.1"}{1}{1} \fbreak \fbreak The ``O2linker\_doc.pdf'' document provides an index of all grammars compiled with comments about their claim-to-fame. It is a nice overview document of your language software when debugging or just getting a feel for all those defined grammars. It demarks the threaded grammars from their monolithic brethren. @*2 Globals --- those unresolved static variables used by Yacco2's library.\fbreak The following variables are made global for Yacco2's parse library usuage and placed within |yacco2| namespace:\fbreak \ptindent{1) |THDS_STABLE__| - vector of threads for the running} \ptindent{2) |T_ARRAY_HAVING_THD_IDS__| - terminals' thread sets} \ptindent{3) |BIT_MAPS_FOR_SALE__| - block of 32 bit words for bit map manufacture} \ptindent{4) |BIT_MAP_IDX__| - index used to dispense bit maps} \ptindent{5) |TOTAL_NO_BIT_WORDS__| - total number of words for sale} To speed things up and not be involved with the |malloc| issues of new / delete, |BIT_MAPS_FOR_SALE__| is a constant size of bit map words. Its size is controlled by the |TOTAL_NO_BIT_WORDS| macro and indicated at run-time by the |TOTAL_NO_BIT_WORDS__| global. Why not use c++ template containers? Depending on the context, they are not very efficient and in some cases they are not robust depending on whose compiler one uses. So ``roll your own'' is my mode of operandi. At least I can bark at myself instead of dealing with charades of good-citizenship a la Microsoft and their bug reporting - fixes: their stated policy of 24 hours response after submitting the bug report and now its 2 weeks and I'm still waiting for an acknowledgement. Have you seen their latest website for bug reporting? No direct button to send the report, only the platitude of how wastefull it is at their cost to deal with bug reports so make sure that you review their stated digest of bugs before submitting. How does democracy come into play when bugs are bugs: Fix them! Ranking bugs by developer votes is just (you substitute one of the following adjectives: stupid, senseless, witless, dull, brainless, weak-brained, muddle-headed, beef-witted, unentertaining). You get my sentiment. Each global will be developed in their appropriate sections. @*2 Some definitions within Linker's context.\fbreak What is a ``first set'' within the linker's context?\fbreak It is the terminals within the Start state configuration (aka ``Closure only'' state) of an automaton. There is some subtlety to this statement as an epsilon rule (empty right-hand-side), special terminals ${\vert+\vert}$ and ${\vert.\vert}$, and threads being called from within the ``Closure only'' state require special treatment as their terminals are outside of this state. These situations require going into other state configurations to determine the terminals. Possibly an epsilon rule and special terminals but definitely threads presents a more difficult problem as the grammar being parsed does not have access to the called thread's first set. Why? A called thread is another grammar requiring compilation. This is why Linker is required to post process all grammars in a global context to calculate the first sets for grammars calling threads out of their ``Closure only'' state. \fbreak \fbreak Why first sets anyway?\fbreak Originally this was not programmed. I wanted to prove my concepts first before optimizing for speed. Reality demands that a good idea be also practical. What good is parsing with threads when they are orders of magnitude slower than the current crop of parsing paradigns? So to overcome slowness, first sets provide the potential of whether a thread should run. How? If the current token to be parsed is in the first set of the thread then launch it from the running parse context. This optimization prevents false thread starts. \fbreak \fbreak Why the name \Olinker instead of for example ``first setter'' or jet?\fbreak I checked to the current name of tools required to prepare a program as an executable. In a sense Linker in my mind's eye was a concept that brought together the loose ends of the grammars and made it ready to parse. First setter is also appropriate. \fbreak \fbreak How are first sets generated?\fbreak The basic problem solved by Linker is the following: within a grammar, threads can be called. If threads called are within the Start state of a grammar, the grammar's first set now has a dependence on another thread's first set. Of course this call sequence is transitive: thread A can call thread B can call thread C. Both threads B and C can have a call thread dependency within their first sets. It is this thread dependency that the Linker resolves by applying the transitive closure operation across a graph of thread nodes to outcome the explicit first set of terminals per dependent thread. The output from Linker is a global vector of threads to be run, and their global first set bit maps optimized against all terminals. That is, what threads can be run by this current terminal. @*2 Overview of \Olinker's generated components.\fbreak Threads are just your normal procedures that get launched under the guidance of the threading facility native to the operating system instead of being called directly. Yacco2's parse library documentation goes into the detail between each run-time advantages threads versus procedure calls: After experimenting with a stop watch, the winner is the threads approach due to c++'s overhead of object birth-run-destructor cycle. Now the issue becomes ``how to determine and provide a thread id?'' that can be referenced globally and locally. Why the distinction between global and local contexts? Local contexts are the individual grammars that can reference other threads who are defined globally outside of its domain. As raised before, Linker's raison-d'\^etre is post-processing in a global context all the compiled threads. As the \o2 compiler / compiler processes the grammar in a local parsing context, references to threads within the grammar must be delayed thru indirection in its generated tables. This indirection comes from objects whose references are globally defined via the ``extern'' c(++) language facility and resolved by the appropriate language linker. An entente between the global and local contexts was signed in 1066 whereby the charter of thread rights defined the |Thread_entry| structure. Each |Thread_entry| uses a naming convention of: Ixxx where xxx is the thread name without its namespace drawn from the grammar. \o2 generates references to these thread entries whilst Linker creates the actual thread entry entities. The |Thread_entry| contains a pointer to the literal thread name used in \o2's tracings, the address to its called procedure, and its thread id that is a key to appropriate tables and thread maps. Thread ids are positive whole numbers starting from 0. The thread names are sorted in ascending sequence. Each thread's id is its relative position within this sorted sequence. @*2 Thread bit maps.\fbreak To give speed and economy of space, 32 bit words are used whereby each bit within a word represents the presence or absence of a specific thread. Why 32 bit words? This falls nicely in the current crop of CPUs. As 64 bit envy becomes mainstream, the number of bits per word can be changed by |cweb|'s macro facility. The mapping of a thread id into its bit map, uses modulo 32 on the thread id. The quotient indicates which word to access and the remainder indicates which specific bit within the word represents the thread. Due to the open number of threads being defined, the total number of words making up a thread map is specific to each parsing environment. For example, the current number of threads for \o2 is under 50. So, each thread map is 2 words long. Use of bit maps allows set processing with its operators like intersection to take place. Due to other optimizations to be discussed later, the number of words per bit map for \Yacco2's parsing library is calculated at runtime from the global |THDS_STABLE__| structure that carries the total number of threads being supported for the current parse environment. The globals |TOTAL_NO_OF_BIT_WORDS__|, |BIT_MAPS_FOR_SALE__|, and |BIT_MAP_IDX__| are used to manufacture bit maps. Why not manufacture bit maps at time of \o2 running? Neither the local grammar or \o2 has any notion to what its or other threads ids are. Also, I did not want to impose on the grammar writer an unique thread id assignment within each grammar nor an approximation of the total number of threads being developed. Now, \Yacco2's runtime library has a just-in-time thread map manufacture process. Each parsing thread's state keeps a list of threads to possibly run but list processing to launch threads is too slow. So when a parsing thread arrives at a state configuration needing thread bit maps, \Yacco2's library will manufacture it using the globals |BIT_MAPS_FOR_SALE__| and |BIT_MAP_IDX__|. |BIT_MAP_IDX__| controls the current index into |BIT_MAPS_FOR_SALE__| that houses the words for the bit maps. This newly minted thread map is now available throughout the balance of the parsing of the thread by storing the map address in the state configuration. The below diagram depicts a partial bit map configuration:\fbreak \fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/linker.2"}{1}{1} \fbreak \fbreak The word 0 above is in ``big endian'' sequence composed of 4 bytes: byte 0 is the first covering the most significant bit 31 thru to bit 24 while byte 3 contains the least significant bit 0 where its bits range from 7 thru to 0. It illustrates threads 31, 23, 15, 7, and 0 as being available for the running as their bits are ``turned on'' represented by ``1''. Word 1 would have the thread id range of 63 thru to 32. Each thread id in a word is just a replay of the modulo: Q is the word map no * 32 + the R the remainder indicating the specific bit from 0..31. @*2 Linker's languages.\fbreak There are 3 separate languages to be parsed by Linker:\fbreak \ptindent{1) Terminal alphabet --- literal names of each terminal} \ptindent{2) Each grammar's first set control file generated by \o2 --- xxx.fsc} \ptindent{3) Linker's input control file} These languages are simplistic but requires proper parsing that, of course, will use Yacco2's parse library. @*2 Terminal alphabet.\fbreak This is your terminal vocabulary defined by the 4 classes of terminals: LR constants, raw characters, errors, and ``user created'' terminals. Its content provides the literal description per terminal used for annotated comments in the emitted file. Their input order is also their enumeration. This file is outputted by \o2 when one requests the terminals to be compiled using one of its gen options: ``-t'' for user terminals or ``-err'' for errors. It uses the filename provided by the grammar's T-enumeration's file-name construct with a ``fsc'' extension (aka first set control). For the \o2 grammars, it looks like this:\fbreak \listing{"/usr/local/yacco2/diagrams/intro1.txt"} \fbreak The outputted vocabulary file is ``|yacco2_T_enumeration.fsc|'' from this above example. The language uses a bracketing construct ``T-alphabet'' ... ``end-T-alphabet'' to contain the list of terminals by their name defined by the grammar's ``terminals'' constructs: Lrk, raw characters, errors, and terminals. Below is an example:\fbreak \listing{"/usr/local/yacco2/diagrams/intro2.txt"} @*3 Terminal enumeration.\fbreak As the inputted terminals are by name, to be more efficient, an enumeration scheme is used whereby each terminal's relative position within this list starting at 0 becomes their assigned number working out along the positive whole numbers. From the example above, the |LR1_questionable_shift_operator| terminal is assigned 0, |LR1_eog| terminal is assigned 1 and so on. In \Yacco2's documentation you'll find ``how it applies'' this enumeration to the shifting and reducing sets within the automaton tables. The storage for these sets favors space economy over speed. @*3 First set declarations.\fbreak \o2 outputs per grammar their first set files having a file naming convention of the grammar name with an extension of ``fsc''. Below is ``subrule\_def.fsc'' first set control file outputted by the subrule\_def.lex grammar.\fbreak \let\setuplistinghook = \linenumberedlisting \listing{"/usr/local/yacco2/diagrams/intro3.txt"} This is a summary of what was found within the grammar. I included line numbers each delimited by a ``:'' to the left of the language's individual source lines as reference points for the discussion that follows. Lines 5 -- 11 above are the prelude declarations of the compiled grammar. The 2 items of interest within the prelude are the keywords ``transitive'' and ``monolithic''. Both use a boolean declaration of ``y'' or ``n''. The ``transitive'' keyword indicates whether this grammar needs transitive processing. The ``monolithic'' keyword indicates whether the grammar is stand alone or threaded. The 3 components following the prelude indicate the intent by their names. Lines 12 -- 13 houses the ``list-of-native-first-set-terminals'' component: the grammar's explicitly used terminals within its first set where there this example has none. The ``list-of-transitive-threads'' component: lines 14 -- 16 indicates the directly called threads from their first set. This list is recursively called by assessing their called threads's ``fsc'' files: the transitive adjective is a memory jog to recursion. Please see |first_set_for_threads| grammar on how the thread's first set is calculated as there is a subtle difference caused by the \INVshift in its first set calculation versus the regular first set calcaluation of a rule or lr state. Lines 17 -- 28 is a cross reference list of used threads throughout the grammar and not just the first set. Lines 29 -- 30 is the extracted fsm-comments from the grammar. It is used in generating the Linker's document indexing all the grammars with their comments, and their called thread graph with highlighting of nested grammars. @*2 Linker's control language.\fbreak There is not much to this control file. It provides the terminal vocabulary file, the file to output by Linker, a preamble that allows the grammar writer to insert code ahead of the ``to be emitted'' code, and finally the list of grammars' ``fsc'' files to process. Here is a sample of a handcrafted Linker control file drawn from the \o2 parsing environment.\fbreak \let\setuplistinghook = \relax \listing{"/usr/local/yacco2/diagrams/intro4.txt"} \fbreak Basicly, Linker builds a graph by grammar and does its transitive moves to produce concrete first sets for each unresolved grammar or thread. To improve performance, a global thread map per terminal will be created indicating the potential threads that can be run. @*2 Catalogue of Linker's files.\fbreak Linker's Input files to |cweb|:\fbreak \ptindent{1) |o2linker.w| --- master Linker file that starts things off} \ptindent{2) |intro.w| --- inroduction } \ptindent{3) |prog.w| --- linker cweb code } \ptindent{4) |o2linker_externs.w| --- external procedures } \ptindent{5) |pms.w| --- running comments of life} \ptindent{6) |bugs.w| --- yep i do them} \ptindent{7) |testsuites.w| --- self proof of code} \ptindent{8) |sampleoutput.w|} \ptindent{9) |types.w| --- provides a way around burping output formats} \ptindent{10) |includes.w| --- C++ include code} \ptindent{11) /usr/local/yacco2/externals/common\_defs.w --- external procedures } \ptindent{12) |o2linker_defs.w| --- external procedures } |cweb| generated files:\fbreak \ptindent{1) |o2linker.h| --- linker definitions} \ptindent{2) |o2linker.cpp| --- linker program} \ptindent{3) |xxx.cpp| --- first sets for grammars provided by the input file} \Yacco2 library memorabilia:\fbreak \ptindent{1) yacco2 --- library namespace} \ptindent{2) ``\Whereyacco2/library'' --- \yacco2's library directory} \ptindent{3) ${< yacco2.h >}$ --- Yacco2's library header file} \ptindent{4) ``\Whereyacco2/library/lib/xxxx'' - xxxx is the Debug or Release of the object library} \ptindent{5) ``yacco2build.a'' - static \yacco2's library} Dependency files from other Yacco2 sub-systems:\fbreak \ptindent{|yacco2.h| - basic definitions used by Yacco2} \ptindent{|yacco2_T_enumeration.h| - terminal enumeration for \Yacco2's terminal grammar alphabet} \ptindent{|yacco2_err_symbols.h| - error terminal definitions from \Yacco2's grammar alphabet} \ptindent{|yacco2_characters.h| - raw character definitions from \Yacco2's grammar alphabet} \ptindent{|yacco2_k_symbols.h| - constant terminal definitions from \Yacco2's grammar alphabet} \ptindent{|yacco2_terminals.h| - regular terminal definitions from \Yacco2's grammar alphabet} \ptindent{|*.h| - assorted grammar definitions from \Yacco2 to parse} \ptindent{|o2linker_externs.h| - external support routines common for \olinker} Dictionaries\fbreak \ptindent{|T_DICTIONARY| --- terminals} \ptindent{|GRAMMAR_DICTIONARY|} \ptindent{|YACCO2_STBL| --- symbol table} \ptindent{|USED_THREADS_LIST|} \ptindent{|THREAD_ID_FIRST_SET|} \ptindent{|T_THREAD_ID_LIST|} Grammars\fbreak \ptindent{|T_alphabet.lex|} \ptindent{|linker_pass3.lex| --- control file parser } \ptindent{|link_cleanser.lex| --- lexical grammar stripping off comments etc} \ptindent{|fsc_file.lex| --- syntactic grammar of fsc file} Document\fbreak \ptindent{|o2linker_doc.w| --- overall index describing the grammars processed: cweave / pdftex} @*2 Global macro definitions.\fbreak @d SPECULATIVE_NO_BIT_WORDS 20 @d LR1_QUESTIONABLE_SHIFT_OPERATOR 0 @d LR1_EOG 1 @d LR1_EOLR 2 @d LR1_PARALLEL_OPERATOR 3 @d LR1_REDUCE_OPERATOR 4 @d LR1_PROCEDURE_CALL_OPERATOR 7 @d LR1_INVISIBLE_SHIFT_OPERATOR 5 @d LR1_ALL_SHIFT_OPERATOR 6 @d LR1_FSET_TRANSIENCE_OPERATOR 7 @d SMALL_BUFFER_4K 1024*4 @d BITS_PER_WORD 32 @d TOTAL_NO_BIT_WORDS 1024*2*100