Date: 5 November 91 Message No: 035 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 4 Fourth and last installment of DEK's September comments. Also included in DEK's package were some general comments to Peter Breitenlohner, who is, with DEK's encouragement, updating PATGEN to accommodate the new features of TeX 3. This is still a work in progress; however, anyone else who has examined PATGEN with this extension in mind might want to get in touch with Peter directly. (Sorry, Peter, for not warning you ahead of time.) ######################################################################## TeX -- incompatibilities between \input and \openin This report is my own. When updating our user documentation for AMS-TeX, AMS-LaTeX, et al., I try to keep the files that will be distributed together in one directory, and run (La)TeX from another, to segregate the files that are created by the run from the originals. There is no problem with files based on Plain, but LaTeX first checks for the existence of some files with \openin and, if they are there, then applies \input . The problem is that (at least in the VMS and some PC implementations) \openin checks only the connected directory, not the path specified by TeXinputs: . Discussions with other users and implementors have uncovered the fact that some implementors have added the logical path to the \openin procedure, while others have not. (The DEC-20 implementation did check TeXinputs: for both \input and \openin, so I had rashly assumed that was what was supposed to happen.) I understand that the WEB code for the two procedures is different, but I believe it's not clear whether or why \openin should *not* check the same input path as \input , and that means the implementors are free to make their own interpretation. A clear statement of what the behavior should be would be very helpful. [ dek: I had some correspondence about this a few months ago, but I forgot what I said. The difference in code between \input and \openin is actually to allow reading files from a system area under \read without requiring a full path name, but not under \openin. However, lots of operating systems make it nicer to define environmental variables for sequences of places to try, and such implementations naturally make use of the more general paths on \openin as well as \read. Clearly LaTeX is important enough that the implementors should make LaTeX as easy to use as possible. I need the feature also with my use of Plain TeX: I have put ".." on my standard input path list, so that I can go to a subdirectory to make a DVI file and partial cross reference files that won't disturb anything on the parent directory. I recommend therefore that implementors use environmental variables for directory path lists (or a default one if the programmer hasn't set it up) whenever the operating system allows it. My favored conventions on implementation questions in general are expressed by the change files I have contributed to the distribution [under `local' directory] ... these are for Pascal, _not_ C, versions of TeX and MF but I do use them heavily. Incidentally I dislike several aspects of WEB-to-C versions on Stanford computers, especially the treatment of command lines -- they don't check .fmt files for garbage but I guess that hasn't been a problem. ] ************************************************************************ WEB system -- dealing with repeating code in .WEB files Date: Tue 26 Mar 91 20:09:38-EST From: bbeeton To: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk Subject: possible bugs, requesting verification [ this was in response to my saying that any large .web file is likely to have repeating blocks of code, and thus a line number is advisable when listing differences. where he says "web" he clearly means "tangle". ] Date: Sat, 23 Mar 1991 20:22 PST From: Don Hosek Subject: Re: Updates to TeX.WEB, MF.WEB, GFtoDVI, et al. Incidentally, the repeating code problem is particularly nasty when writing change files. WEB only checks to see if the first line matches rather than the whole of the text in the @x..@y block which caused me a great deal of grief when I wrote the CMS change file for MFT; there are a lot of repeating first lines of a block of text. -dh ------- Date: Wed, 27 Mar 91 18:33:18 GMT From: Chris Thompson Subject: Re: [possible bugs, requesting verification] ... As regards Don Hosek's complaint about Web (and it applies to both TANGLE and WEAVE, of course, which is presumably why he confusingly says "Web"): I agree, it's a right pain. I understand why it got like that, though: it means that the programs never have to buffer more than one line from each input file at a time. I am still paranoid about storage consumption as well, in an age where there is little sympathy for this! The worst cases are when one *removes* a change (maybe one had an ahead- of-base-source bug fix, for example) and suddenly later changes start failing to match (or even worse, succeed in matching) the wrong part of the file. One has to increase the context, often into the TeX parts of the modules, to ensure uniqueness. I would certainly suggest that you send Don Hosek's remarks to Don Knuth, but I fear he probably won't want to change anything in this area now. Chris Thompson ------- Date: Thu, 28 Mar 91 17:03:50 GMT From: Chris Thompson Subject: Re: [possible bugs, requesting verification] ... It is all documented: WEBMAN.DVI page 10, section 11 says: "Whenever the first ``old line'' of a change is found to match a line in the web_file, all other lines in that change must match too." There is no bug; it is just a rather painful spec to live with, as Don Hosek says. It is a bit painful, as well, that it just reflects the @y, and doesn't tell you *which* lines mismatched! Chris ------- [ dek: WEB was never intended to be the "last word"; I expect second generation systems to do all kinds of things with much greater generality. I stopped when I had something good enough to get on with what needed to be done. There is something painful about every system, but really I have lots more problems with all the other software I have to live with! ] ######################################################################## Date: Tue, 30 APR 91 21:31:17 BST Reply-to: Brian {Hamilton Kelly} From: TEX@rmcs.cranfield.ac.uk To: BNB <@nsfnet-relay.ac.uk:BNB@MATH.AMS.com> Subject: Possible bug in TeX V3.1 ??? Dear barbara, I'm loth to say this, because I know how rarely anything is genuinely a bug, but I'm a bit suspicious about TeX's unpacking and repacking of ligatures before and after hyphenation. I have a font that, like DEK's example in the New TeX and MF announcement, has a variant of "s" for the ends of words (it's actually an updating of my Greek fonts). So the ligature program reads as follows: % % Ligatures for sigma at end of "word" % boundarychar := 255; ligtable "s": 255 =:| "c", "." =:| "c", "," =:| "c", ":" =:| "c", ";" =:| "c", "!" =:| "c", "?" =:| "c", ")" =:| "c", "/" =:| "c", "]" =:| "c", "*" =:| "c"; % % Note that s is not ligatured with apostrophe, so that one can write things % like s''ena sp'iti ('' in this font produces an apostrophe, since ' % on its own is an acute accent.) % Now this works beautifully, most of the time. However, if TeX decides that it should attempt hyphenation, and if the word being hyphenated ends in punctuation, such as ".", then \S898 of TeX.web gives up on taking apart the ligatures when it meets the period, since it's a non-letter (has lc_code=0) --- so the word "xomol'ogysys." (where 'o is a ligature defined in the font, and for which "'" has a non-zero lccode) gets passed to the hyphenation procedure as the 12 characters "x o m o l ' o g y s y s" in hc[1:hn]. After hyphenation has been considered, the period is no longer hanging around for reconstitute to put back together. (Incidentally, when I read the code, I'd convinced myself that the final "s" wasn't going to be present in hc[hn] for hyphenate to consider, but the VAX-Pascal debugger shows that it _is_ there!) I've managed to effect a workaround, by setting the \lccodes for all the punctuation that enters into the ligature program for sigma; but then the hyphenation algorithm is given the _13_ characters "x o m o l ' o g y s y s .", which I'm sure it's unlikely to be completely happy with (although it does seem to find the same breaks); surely, it will no longer recognize any explicit end-of-word marks in the \patterns? [ dek: that is true but perhaps there areen't so many patterns in Greek. (Of course I am not happy with this workaround either) ] Perhaps, instead of setting \lccode`\.=`\., I should perhaps set it as \lccode`\.=256, so that it's non-zero, but doesn't pass the character {\it per se\/} into the hyphenation algorithm. Perhaps you could ask Don what he advises, or whether perhaps \S898 should _complete_ its dismantling of the ligature, and only afterwards exclude the non-letter characters, noting the whole sequence for reconstitute's benefit. Brian ------- Date: Thu, 02 May 91 00:49:54 BST From: Chris Thompson To: bbeeton Cc: Brian {Hamilton Kelly} Subject: Re: [[TEX@rmcs.cranfield.ac.uk: Possible bug in TeX V3.1 ???]] Barbara, There does indeed appear to be something murky going on. I am not at all familiar with the code for reconstituting new-style ligatures, so a full report will take a little while... Setting the \lccode's for punctuation to pretend that they are letters is a terrible way to have to work round the problem, and B{HK} is of course right that > it will no > longer recognize any explicit end-of-word marks in the \patterns On the other hand > I should perhaps set it as > \lccode`\.=256 certainly won't work: \lccode values are restricted to 0..255. I am not sure what he is trying to say here. Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk ------- Date: Thu, 2 MAY 91 22:11:57 BST Reply-to: Brian {Hamilton Kelly} From: TEX@rmcs.cranfield.ac.uk To: BNB <@nsfnet-relay.ac.uk:BNB@MATH.AMS.com> Subject: Re: [[TEX@rmcs.cranfield.ac.uk: Possible bug in TeX V3.1 ???]] Chris, In message of Thu, 02 May 91 00:49:54 BST, Chris Thompson wrote: > There does indeed appear to be something murky going on. I am not at all > familiar with the code for reconstituting new-style ligatures, so a full > report will take a little while... Thanks! I thought I was going nuts at first! > Setting the \lccode's for punctuation to pretend that they are letters > is a terrible way to have to work round the problem, and B{HK} is of > course right that > > it will no > > longer recognize any explicit end-of-word marks in the \patterns > > On the other hand > > I should perhaps set it as > > \lccode`\.=256 > certainly won't work: \lccode values are restricted to 0..255. I am not > sure what he is trying to say here. Sorry, I hadn't tried this; in fact, only thought of it when composing the message. By analogy with \hyphenchar, I was hoping that I could set an \lccode to a non-character value, and thus ensure (perhaps?) that the hyphenation algorithm wouldn't consider the punctuation characters, since they'd be excluded in |hc|; but I see now that |lc_code| is a |equiv| and thus in a |quarter_word|, so cannot be set outside range 0..255 (as TeX tells me when I try!) One thought: Perhaps when unravelling ligature nodes in the pre-hyphenation phase, TeX should take cognizance of whether the ligature program was one that used =:| or |=:| (I'm not sure of |=:) and still stop at the [ dek: ^^ is this a smily face? ] punctuation character, corresponding to the "retained" character, but remember that it "belonged" and thus still be able to reconstitute it correctly afterwards. Best regards, Brian ------- Date: Fri, 03 May 91 17:17:13 BST From: Chris Thompson To: Barbara Beeton Cc: Brian {Hamilton Kelly} Subject: Re: [[TEX@rmcs.cranfield.ac.uk: Possible bug in TeX V3.1 ???]] Barbara, & Brian, I have been looking at the pre- and post- hyphenation code, and I have come to the conclusion that Brian's problem is probably a bug, rather than a feature. The horizontal list for either "...ogysys " or "...ogysys. " contains a ligature node with |lig_char| = "c" and component character list containing just "s"; the "." is a separate character node in the second case. The difference is that in the first case the ligature node's |sub_type| is 1 ("formed from a right boundary character") while in the second it is 0. If automatic hyphenation is invoked, section 898 reconstructs the original character list up to "s", and this is what is intended. The difference in the cases is that section 903 sets |bchar:=255| in the first case (|hb| is the ligature node described above) but not in the second case. This allows |reconstitute| to rebuild the ligature node, but only in the first case. The reason I think this is probably a bug is the asymmetry of treatment of the left-hand and right-hand edges of the word in section 903. There |ha| is examined in great detail in order to decide what to put into |hu[0]| and |init_lig|, in particular if the word is preceded by punctuation characters that alter the first letter of the word (e.g. by |=:> ligatures) then such effects will be recreated by by |reconstitute|. On the other hand, |bchar| is set only to |non_char| |font_bchar[hf]|; any following punctuation characters (at |q=link(hb)|) are never examined. Certainly I think this ought to be brought to Don's attention, if he has any to spare for TeX at the moment. I think it may actually be a matter of some urgency, in that otherwise people like Brian trying to use right-boundary effects in fonts may be forced into using inappropriate work-rounds; maybe even ones that would not survive a proper fix. Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk ------- [ dek: Chris is absolutely right; I should have provided better right context to the reconstitution algorithm. [Why did I get into this?!] Draft changes are being put into version 3.14$\alpha$ In the new version the effect of \noboundary between a word and punctuation will be lost (for example ...ogysys\noboundary. _will_ now convert the s to a c ) but that minor problem is much worse than the present alternative. An explicit kern ...ogysys\noboundary\kern0pt. will preserve such noboundary status if necessary. The going rate for bugs in the 1989 code is $10.24, so Brian gets credit for this one! ] ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 035 ] -------