MAF : Appendices : (C) How MAF works

Appendix C: How MAF works

This description is how MAF works is intended to give a flavour of how MAF looks for an automatic structure or a confluent rewriting system, not as a complete description. If you are at all interested you are recommended to refer to the C++ code, which the author has gone to considerable effort to document thoroughly with extensive comments.

This explanation disregards the relatively trivial stages of initially reading the input file, and finally writing the output files. Otherwise the processing of MAF in automata and kbprog falls into three distinct areas, two of which are fairly definitely ordered in time, but all three of which are essentially co-routines of each other.

MAF's three parts - the simplified version

  1. In its initial phase (which may be suppressed altogether) MAF uses the Knuth-Bendix procedure to discover new equations. MAF will continue to use this procedure until either the process terminates because the rewriting system has become confluent, or until it decides that it should attempt to build the automata for an automatic structure, or some other halting criterion holds and MAF exits with only provisional results. Roughly speaking during this phase of operation MAF is behaving like the KBMAG implementation of kbprog. Let us call this the Knuth-Bendix phase. This phase is managed by a committee of C++ objects, the most important of which is called Rewriter Machine, which is responsible for most high-level decisions about this phase. Rewriter Machine relies on several other important objects: Node Manager (which manages almost all the data stored in memory, and can be viewed both as an index automaton, a partial coset table, and a rewriting system: the equations in the rewriting system exist only as paths between nodes in this object), Job Manager (which prioritises and actually carries out much of the work of maintaining the data in Node Manager), and Equation Manager (which decides what to do with equations that these and other objects ask to construct). Even if the Knuth-Bendix phase is suppressed entirely these are still very important objects, because all changes to the rewriting system are carried out by them. Also every putative element of the object in which other parts of the code may be interested, notably every word-difference, is represented as a node in Node Manager. A less powerful object called the pool (an instance of a classed named Equation DB) is also important for understanding the detailed operation of the Knuth-Bendix phase.
  2. Assuming it does decide to build an automatic structure, at the point where MAF decides to do so, it starts behaving like the KBMAG implementation of gpmakesfa. Let us call this the automatic structure phase.

    In fact a better term for this phase would be the output phase since its code is in fact responsible for creating all the output files that automata may generate - it is for example responsible for turning MAF's very different internal data structures that represent a confluent rewriting system into the corresponding .kbprog and .reduce files). Word-difference machines are also turned from an internal representation into their output form via this code.

  3. The third area of processing is the maintenance of a word-difference machine. This processing is usually only performed if MAF is going to build an automatic structure. It is never used if the input file describes a monoid rather than a group, or if the user has specifically requested that such processing should not be carried out. This part of the processing happens more or less equally in both phases, except that if MAF has been told to expect the rewriting system to become confluent it does not begin until it has. This area of processing will be referred to as difference tracking, and it is managed by a C++ object called Difference Tracker.

MAF's three parts - the not so simplified version

In fact the distinction between these three areas of processing, though helpful in terms of understanding how MAF works, and for relating MAF to KBMAG, is much less clear in practice than the previous section suggests.

Difference Tracker interactions with Rewriter Machine

For example, Difference Tracker may quite strongly affects how the Knuth-Bendix phase proceeds, since MAF uses information that this phase returns to help decide which equations should be considered next. Difference Tracker is also quite capable of itself discovering new equations (because it enforces various consistency requirements in the word-difference machine), and Difference Tracker will ask Node Manager to insert these just like equations discovered in the course of Knuth-Bendix completion (though both Job Manager and Equation Manager will get a say as to when and if this gets done). Conversely, Node Manager may detect errors in the word-difference machine, which Job Manager will then schedule for correction, and at the appropriate time it will inform Difference Tracker of the details of the error to be corrected, and ask it to perform the correction. In the event of a major collapse Rewriter Machine may decide to delete Difference Tracker altogether, and then recreate it when the rewriting system has stabilised again.

Interactions between automatic structure phase and Knuth-Bendix phase

Again, even the distinction between the Knuth-Bendix phase and the automatic structure phase is not clear-cut. For example, if it is discovered during the latter phase that the rewriting system should be confluent, the automatic structure phase will temporarily abort itself and the Knuth-Bendix phase is resumed until confluence really has been achieved. Also "corrections" discovered by the various checks in the automatic structure phase are again fed into the Rewriter Machine, and not just into the word-difference machine (as they would be in KBMAG), and this can result in the discovery of many further equations and word-differences. And, in principle at least, MAF can decide that it made a poor decision when it switched to the automatic structure phase, go back to to Knuth-Bendix phase, and then try the automatic structure phase again later. In fact, this is rarely occurs, because the "correction" capabilities of the automatic structure phase are usually strong enough that they can succeed in building the automatic structure even if the Knuth-Bendix phase is omitted entirely. In fact for many "easy" infinite groups this is actually the fastest way of proceeding. The reason for keeping the Knuth-Bendix phase is two-fold:

  1. For groups with many word-differences, especially largish finite groups, MAF's Knuth-Bendix phase is far superior to its automatic structure phase at finding equations quickly, and for such groups attempting to build an automatic structure, which in any case is not really needed since we can find a confluent rewriting system, would fail because the automata would be too big prior to minimisation.

  2. Even for groups where Knuth-Bendix can be omitted (and this includes all of the most difficult examples supplied with KBMAG) doing so may waste time, especially when the group is finite.

Additional processing in the Knuth-Bendix phase

The "Knuth-Bendix" phase is far from being a straight-forward Knuth-Bendix implementation:

The procedure for building automatic structure

The simplified version

In essence MAF follows a similar scheme to KBMAG for building the automatic structure:

  1. Construct a candidate word-acceptor from the current word-difference machine
  2. Construct a candidate multiplier
  3. Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x. If not then u*x is reducible and has an unknown word-difference. Update the difference machine for all such u, until either the validation of the multiplier closes, or some abort condition is fulfilled. Then return to step 1.

In this outline the algorithm appears simpler than KBMAG's which once it has a word-acceptor at all only builds a new word-acceptor if step 2 fails. In MAF step 2 usually cannot fail because the word-difference machine is maintained in a way that guarantees that if u*x = v is recognised as a reduction, then so also is v*X = u. Building a word-acceptor from a difference machine without this property is the the only reason that building the multiplier can fail, provided the word-acceptor being used has been built from the strongest word-difference machine currently available. MAF will only keep word-acceptors from one iteration to the next if the word-acceptor appears to be stable, and it takes longer to build than the multiplier.

Although this outline omits steps which we shall come onto later, it would be the path taken through the code for "easy" input files.

Strong and weak word-difference machines

MAF does usually use the strongest word-difference machine when building the word-acceptor, which contains all possible transitions between all known word differences. However, if building the word-acceptor takes a long time MAF may try to use a weak word-difference machine, which only contains the transitions that have actually been proved to be necessary through their being needed to read the equation using the word-difference machine. In this case building the multiplier can fail. MAF will try to check for this in advance by looking for words that were rejected by the last word-acceptor built from a strong word-difference machine, that are now accepted by the acceptor just built from the weak word-difference machine. Because of all the extra complication this entails MAF will usually only try this once, and decide that using the strong word-difference machine is the best policy, but there are some groups for which building the acceptor from a weak word-difference machine does prove better. For example, for examples/coxeter/5335 trial word-acceptors built from the weak word-difference machine are built far more quickly, and have far fewer states than acceptors built from the strong word-difference machine, and so MAF uses these for most iterations of the automatic structure phase for this group.

The handling of strong and weak word-difference machines is somewhat analogous to KBMAG's use of .diff2 and .diff1 word-difference machines, but the weak word-difference machine usually has many more states and transitions between states than the .diff1 machine would, since it contains all known word-differences and all transitions that have been seen in an equation, whether or not the equation is primary.

Non shortlex automatic structures

In the case where MAF is building an automatic structure for a word-ordering other than shortlex a certain amount of pre-processing is performed against strong word-difference machines to eliminate transitions which ought not to be present in the final correct word-difference machine. This usually improves the chances of building a word-acceptor successfully even though it may sometimes cause the word-acceptor to accept words that could in fact have been rejected.

A fuller description of the automatic structure phase

  1. Using the current word-difference machine check that none of the equations that have contributed to the word-differences have a reducible prefix or right hand side. Also check that no words of the form x*wd*y where x and y are generators or IdWord, and wd is a word-difference, have previously unknown reductions. Perform consistency checks on the word-difference machine. Rebuild it if need be.
  2. Construct a candidate word-acceptor from the current word-difference machine. MAF uses a much simpler set of states for the word-acceptor than KBMAG, which the author believes recognises more reducible words in the case where some word-differences are missing, and which he also believes is less inclined to exponential behaviour. Put simply, the word-acceptor accepts reductions where there are a padding characters in the interior of the RHS word, which KBMAG goes to a lot of trouble to exclude, and for no obvious reason, since such reductions certainly prove a word reducible if they occur, and if they do not at least will tend to increase the similarity between earlier and later states, which will lead to more chances of the acceptor closing early.
  3. Estimate the number of states that will be present in the unminimised multiplier. If it is very large, because the word-acceptor is very large, then do one or more of the following:
    1. If this is the first time we have tried to build the multiplier, and the word-acceptor is improbably large speculatively create all the equations present in the defining states of the word-acceptor and the L1-acceptor. Assuming this finds some missing word-differences return to step 1.
    2. If the word-acceptor is so large that the multiplier will be difficult to calculate, but we did not perform the action just described in i.), then compute the L1-acceptor and the "primary recogniser". The primary recogniser is an FSA similar to a multiplier which accepts a word pair (u,v) if u is minimally reducible, v is accepted and u=v. Then calculate the "exists" FSA of this acceptor and use the "and not" of the L1-acceptor and this FSA to find some words that are known to be reducible, but for which we do not have a complete set of word-differences. If the "primary recogniser" will itself be too big to construct MAF will simply explore the L1-acceptor looking for equations with new word-differences.
    3. If ii) does not succeed in finding any new differences (which means that the word-acceptor may be correct and that we do now know all the primary differences) we perform the same checks that would be made when checking the multiplier, but without actually constructing it. Assuming this finds missing word-differences then return to step 1. This step is new to MAF. It is expensive, but less expensive than failing altogether to construct a multiplier because it is too big. Also this validation can detect errors earlier in the tree than can the multiplier checks, because it can detect pairs of words v,w, for which there exists a word u such that u,v and u,w give rise to the same difference. If such words are found then v=w and the word-acceptor is wrong after all.
    4. If iii) finds errors but they none of them could be corrected return to the Knuth-Bendix phase. If we aborted the checks before fully completing them MAF may decide to explore the word-acceptor to look for secondary word-differences at points where these is a partial reduction. In practice neither of these things happens. In any case step iii) is only carried out for the most difficult groups and coset systems.
  4. Construct a candidate multiplier. Generally MAF uses the strongest word-difference machine available, but if the estimate of the number of states in the multiplier is very high it uses the weak word-difference machine, which only contains transitions actually seen in confirmed equations. This will usually reduce the number of states in the multiplier a lot, but failure is more or less certain. However we do not mind that because we think the word-difference machine and acceptor are both wrong.
  5. Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x.