MAF : Appendix C : How MAF works

MAF : Appendices : (C) How MAF works

Appendix C: How MAF works

This description is how MAF works is intended to give a flavour of how MAF looks for an automatic structure or a confluent rewriting system, not as a complete description. If you are at all interested you are recommended to refer to the C++ code, which the author has gone to considerable effort to document thoroughly with extensive comments.

This explanation disregards the relatively trivial stages of initially reading the input file, and finally writing the output files. Otherwise the processing of MAF in automata and kbprog falls into three distinct areas, two of which are fairly definitely ordered in time, but all three of which are essentially co-routines of each other.

MAF's three parts - the simplified version

In its initial phase (which may be suppressed altogether) MAF uses the Knuth-Bendix procedure to discover new equations. MAF will continue to use this procedure until either the process terminates because the rewriting system has become confluent, or until it decides that it should attempt to build the automata for an automatic structure, or some other halting criterion holds and MAF exits with only provisional results. Roughly speaking during this phase of operation MAF is behaving like the KBMAG implementation of kbprog. Let us call this the Knuth-Bendix phase. This phase is managed by a committee of C++ objects, the most important of which is called Rewriter Machine, which is responsible for most high-level decisions about this phase. Rewriter Machine relies on several other important objects: Node Manager (which manages almost all the data stored in memory, and can be viewed both as an index automaton, a partial coset table, and a rewriting system: the equations in the rewriting system exist only as paths between nodes in this object), Job Manager (which prioritises and actually carries out much of the work of maintaining the data in Node Manager), and Equation Manager (which decides what to do with equations that these and other objects ask to construct). Even if the Knuth-Bendix phase is suppressed entirely these are still very important objects, because all changes to the rewriting system are carried out by them. Also every putative element of the object in which other parts of the code may be interested, notably every word-difference, is represented as a node in Node Manager. A less powerful object called the pool (an instance of a classed named Equation DB) is also important for understanding the detailed operation of the Knuth-Bendix phase.
Assuming it does decide to build an automatic structure, at the point where MAF decides to do so, it starts behaving like the KBMAG implementation of gpmakesfa. Let us call this the automatic structure phase.
In fact a better term for this phase would be the output phase since its code is in fact responsible for creating all the output files that automata may generate - it is for example responsible for turning MAF's very different internal data structures that represent a confluent rewriting system into the corresponding .kbprog and .reduce files). Word-difference machines are also turned from an internal representation into their output form via this code.
The third area of processing is the maintenance of a word-difference machine. This processing is usually only performed if MAF is going to build an automatic structure. It is never used if the input file describes a monoid rather than a group, or if the user has specifically requested that such processing should not be carried out. This part of the processing happens more or less equally in both phases, except that if MAF has been told to expect the rewriting system to become confluent it does not begin until it has. This area of processing will be referred to as difference tracking, and it is managed by a C++ object called Difference Tracker.

MAF's three parts - the not so simplified version

In fact the distinction between these three areas of processing, though helpful in terms of understanding how MAF works, and for relating MAF to KBMAG, is much less clear in practice than the previous section suggests.

Difference Tracker interactions with Rewriter Machine

For example, Difference Tracker may quite strongly affects how the Knuth-Bendix phase proceeds, since MAF uses information that this phase returns to help decide which equations should be considered next. Difference Tracker is also quite capable of itself discovering new equations (because it enforces various consistency requirements in the word-difference machine), and Difference Tracker will ask Node Manager to insert these just like equations discovered in the course of Knuth-Bendix completion (though both Job Manager and Equation Manager will get a say as to when and if this gets done). Conversely, Node Manager may detect errors in the word-difference machine, which Job Manager will then schedule for correction, and at the appropriate time it will inform Difference Tracker of the details of the error to be corrected, and ask it to perform the correction. In the event of a major collapse Rewriter Machine may decide to delete Difference Tracker altogether, and then recreate it when the rewriting system has stabilised again.

Interactions between automatic structure phase and Knuth-Bendix phase

Again, even the distinction between the Knuth-Bendix phase and the automatic structure phase is not clear-cut. For example, if it is discovered during the latter phase that the rewriting system should be confluent, the automatic structure phase will temporarily abort itself and the Knuth-Bendix phase is resumed until confluence really has been achieved. Also "corrections" discovered by the various checks in the automatic structure phase are again fed into the Rewriter Machine, and not just into the word-difference machine (as they would be in KBMAG), and this can result in the discovery of many further equations and word-differences. And, in principle at least, MAF can decide that it made a poor decision when it switched to the automatic structure phase, go back to to Knuth-Bendix phase, and then try the automatic structure phase again later. In fact, this is rarely occurs, because the "correction" capabilities of the automatic structure phase are usually strong enough that they can succeed in building the automatic structure even if the Knuth-Bendix phase is omitted entirely. In fact for many "easy" infinite groups this is actually the fastest way of proceeding. The reason for keeping the Knuth-Bendix phase is two-fold:

For groups with many word-differences, especially largish finite groups, MAF's Knuth-Bendix phase is far superior to its automatic structure phase at finding equations quickly, and for such groups attempting to build an automatic structure, which in any case is not really needed since we can find a confluent rewriting system, would fail because the automata would be too big prior to minimisation.
Even for groups where Knuth-Bendix can be omitted (and this includes all of the most difficult examples supplied with KBMAG) doing so may waste time, especially when the group is finite.

Additional processing in the Knuth-Bendix phase

The "Knuth-Bendix" phase is far from being a straight-forward Knuth-Bendix implementation:

Fast overlap detection

The Knuth-Bendix implementation does not need to laboriously compare every possible pair of equations. The Node Manager object allows it to quickly create a list of which equations possibly could overlap and where, so that overlaps can be discovered very quickly. Unless you ask MAF to look for non-minimal overlaps, this list will only include the minimal overlaps (an overlap is minimal if no subword that does not include either the first or the last symbol in the overlap word is reducible). In fact MAF can even afford to build the list of overlaps for any particular equation, and consider these overlaps, multiple times, and it may well choose to do so, because the equations from many overlaps will usually be discarded initially, because they have properties that the "committee" regards as being undesirable.
Conjugation
MAF will create (through Job Manager) the right conjugate of each equation that is created (i.e. it will turn u*g = v into v*G = u), which is analogous to how transitions are inserted into a coset table during coset enumeration. Later on Job Manager may form other conjugates, and when relevant, ask Difference Tracker to learn all the new equations. In MAF conjugation may also include consideration of all overlaps of the form x*L and L*x where L is the original LHS and x is a generator.
Special overlaps
Whenever a new equation is inserted, MAF notes, and schedules for later consideration, any particularly promising overlaps (detection of at least some overlaps is an inevitable by-product of MAF's mode of operation, but not all the overlaps detected in this way are promising). Any such overlaps are considered outside the regular schedule of Knuth-Bendix completion and are called "special overlaps". If a collapse occurs there will often be very many of these. The amount of "special overlap" processing is configurable, and it can be disabled altogether if desired.
Partial reductions
MAF notes points where the Node Manager contains a partial reduction, that is a transition to an LHS which is less than the full length of the word that has just been read. If the word read is u*g, then MAF will consider the non-minimal overlap u*g*g^-1 (where this is an overlap), as this may well prove that u is reducible. This is very similar to scanning a coset in a coset table with a relator. This processing may also be turned off, but it is generally beneficial, sometimes very much so.
Fast word reduction

Node Manager functions as a fast index automaton, which means that when a word is being reduced there is never any need to backtrack except when a rewrite is done. In KBMAG word reduction during the Knuth Bendix phase has to backtrack whenever the index automaton gets to a "no LHS here state", but there are no such states in Node Manager, because "prefix change" transitions are constantly maintained.

As each equation is created it immediately forces all equations in which its LHS occurs as a subword of the LHS or RHS to be either eliminated (if the LHS occurs as a subword in the prefix of the LHS), demoted (if the LHS occurs as a trailing subword of the original LHS), or scheduled for correction (if the LHS occurs as a subword of the RHS). When an equation is demoted MAF needs to consider the overlap now formed by the old and new LHS. A new primary equation may result, and MAF may or may not decide to keep the old LHS, perhaps after updating its RHS. All this processing may result in many new equations.

Doing either of these things has been dismissed as impractical in most other Knuth-Bendix implementations, due to the overhead it entails. In theory MAF might need to examine every node multiple times for every equation it inserts to make fast reduction possible. However MAF maintains data which usually allows it to exclude large parts of Node Manager's data from consideration when this is done. In practice MAF's current approach seems to work very well the majority of the time; in fact it is largely only doing the work it would have to do anyway at the time it wanted to find the overlaps of the new equation were the data not maintained as it is.

The updating is quite a delicate operation because Node Manager is modifying itself, and deleting parts of a structure to which there are potentially many outstanding pointers: for example the Difference Tracker holds pointers to word-differences, and these are stored as states within Node Manager. So there is a reference counting scheme to keep such data alive until all the pointers are released: the data is removed from the tree but knows enough about where it came from to be able to reconstruct its former context.
Word-difference corrections

When word-differences become reducible, or when equations which "own" word-differences are eliminated, MAF will tell the word-difference machine to correct itself. The first of these may processes result in the discovery of new equations: the relevant code is very similar to coincidence processing in a coset enumerator, because two word-differences may merge into one, which can have dramatic consequences.
If MAF finds a word for which the reduction of the inverse of the reduction of the inverse is not the original word, it creates an equation. This only happens when word-differences are being computed using "half differences", because otherwise MAF has no reason to look at inverses.
Deductions

When MAF finds two LHS with a common cancellable final symbol and an identical RHS it creates a new equation (All equations are grouped by RHS so this is obvious to it). (A symbol would be cancellable on the right if there is an equation with an empty RHS in which it is the first symbol of the LHS). MAF is, of course, very careful not to assume cancellation applies generally - it is a program that works with monoids, not just groups). A similar check is performed for LHS with a common prefix, but only if both equations are primary.

Equations created through this and other similar checks, are called deductions (equations discovered by Difference Tracker are also treated as deductions, and right conjugation is handled through a deduction as well). Though all these equations would eventually be discovered by Knuth-Bendix completion, it is beneficial to discover them early as there is very little overhead in doing so.
Spurious word-difference elimination

From time to time MAF may use the current state of the word-difference machine (if it exists) to look for words that ought to be reducible but which Rewriter Machine/Node Manager does not yet know are reducible. Such words often lead to the creation of spurious word differences, so eliminating them helps MAF determine when word-differences are stabilising.
The pool

MAF limits the amount of data in Node Manager by putting "uninteresting" long new equations into the pool, or even discarding them altogether. Periodically the pool is checked, and more equations from it are put into Node Manager. This is very important for the good performance of MAF with "pathological" presentations. MAF does discard some equations, but only if they will be reconstructed later on. In each phase of the Knuth-Bendix procedure the various objects that make up Rewriter Machine have to abide by a "contract". Initially the contract is flexible, and allows the "committee" to throw away many equations and ignore unpromising overlaps. Gradually the contract becomes progressively more onerous, forcing MAF to retain more and more data in each phase, so that if there is a finite confluent rewriting system, it will be discovered. This mode of operation makes MAF function rather like a Felsch style coset enumerator, because it will try to discover as many short equations as possible before creating long ones. On the other hand it does consider long overlaps early, so that a short equation that arises from a long overlap will be discovered as soon as possible. This style of processing probably accounts for MAF's outstanding performance with the degen4c presentation

The procedure for building automatic structure

The simplified version

In essence MAF follows a similar scheme to KBMAG for building the automatic structure:

Construct a candidate word-acceptor from the current word-difference machine
Construct a candidate multiplier
Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x. If not then u*x is reducible and has an unknown word-difference. Update the difference machine for all such u, until either the validation of the multiplier closes, or some abort condition is fulfilled. Then return to step 1.

In this outline the algorithm appears simpler than KBMAG's which once it has a word-acceptor at all only builds a new word-acceptor if step 2 fails. In MAF step 2 usually cannot fail because the word-difference machine is maintained in a way that guarantees that if u*x = v is recognised as a reduction, then so also is v*X = u. Building a word-acceptor from a difference machine without this property is the the only reason that building the multiplier can fail, provided the word-acceptor being used has been built from the strongest word-difference machine currently available. MAF will only keep word-acceptors from one iteration to the next if the word-acceptor appears to be stable, and it takes longer to build than the multiplier.

Although this outline omits steps which we shall come onto later, it would be the path taken through the code for "easy" input files.

Strong and weak word-difference machines

MAF does usually use the strongest word-difference machine when building the word-acceptor, which contains all possible transitions between all known word differences. However, if building the word-acceptor takes a long time MAF may try to use a weak word-difference machine, which only contains the transitions that have actually been proved to be necessary through their being needed to read the equation using the word-difference machine. In this case building the multiplier can fail. MAF will try to check for this in advance by looking for words that were rejected by the last word-acceptor built from a strong word-difference machine, that are now accepted by the acceptor just built from the weak word-difference machine. Because of all the extra complication this entails MAF will usually only try this once, and decide that using the strong word-difference machine is the best policy, but there are some groups for which building the acceptor from a weak word-difference machine does prove better. For example, for examples/coxeter/5335 trial word-acceptors built from the weak word-difference machine are built far more quickly, and have far fewer states than acceptors built from the strong word-difference machine, and so MAF uses these for most iterations of the automatic structure phase for this group.

The handling of strong and weak word-difference machines is somewhat analogous to KBMAG's use of .diff2 and .diff1 word-difference machines, but the weak word-difference machine usually has many more states and transitions between states than the .diff1 machine would, since it contains all known word-differences and all transitions that have been seen in an equation, whether or not the equation is primary.

Non shortlex automatic structures

In the case where MAF is building an automatic structure for a word-ordering other than shortlex a certain amount of pre-processing is performed against strong word-difference machines to eliminate transitions which ought not to be present in the final correct word-difference machine. This usually improves the chances of building a word-acceptor successfully even though it may sometimes cause the word-acceptor to accept words that could in fact have been rejected.

A fuller description of the automatic structure phase

Using the current word-difference machine check that none of the equations that have contributed to the word-differences have a reducible prefix or right hand side. Also check that no words of the form x*wd*y where x and y are generators or IdWord, and wd is a word-difference, have previously unknown reductions. Perform consistency checks on the word-difference machine. Rebuild it if need be.
Construct a candidate word-acceptor from the current word-difference machine. MAF uses a much simpler set of states for the word-acceptor than KBMAG, which the author believes recognises more reducible words in the case where some word-differences are missing, and which he also believes is less inclined to exponential behaviour. Put simply, the word-acceptor accepts reductions where there are a padding characters in the interior of the RHS word, which KBMAG goes to a lot of trouble to exclude, and for no obvious reason, since such reductions certainly prove a word reducible if they occur, and if they do not at least will tend to increase the similarity between earlier and later states, which will lead to more chances of the acceptor closing early.
Estimate the number of states that will be present in the unminimised multiplier. If it is very large, because the word-acceptor is very large, then do one or more of the following:
1. If this is the first time we have tried to build the multiplier, and the word-acceptor is improbably large speculatively create all the equations present in the defining states of the word-acceptor and the L1-acceptor. Assuming this finds some missing word-differences return to step 1.
2. If the word-acceptor is so large that the multiplier will be difficult to calculate, but we did not perform the action just described in i.), then compute the L1-acceptor and the "primary recogniser". The primary recogniser is an FSA similar to a multiplier which accepts a word pair (u,v) if u is minimally reducible, v is accepted and u=v. Then calculate the "exists" FSA of this acceptor and use the "and not" of the L1-acceptor and this FSA to find some words that are known to be reducible, but for which we do not have a complete set of word-differences. If the "primary recogniser" will itself be too big to construct MAF will simply explore the L1-acceptor looking for equations with new word-differences.
3. If ii) does not succeed in finding any new differences (which means that the word-acceptor may be correct and that we do now know all the primary differences) we perform the same checks that would be made when checking the multiplier, but without actually constructing it. Assuming this finds missing word-differences then return to step 1. This step is new to MAF. It is expensive, but less expensive than failing altogether to construct a multiplier because it is too big. Also this validation can detect errors earlier in the tree than can the multiplier checks, because it can detect pairs of words v,w, for which there exists a word u such that u,v and u,w give rise to the same difference. If such words are found then v=w and the word-acceptor is wrong after all.
4. If iii) finds errors but they none of them could be corrected return to the Knuth-Bendix phase. If we aborted the checks before fully completing them MAF may decide to explore the word-acceptor to look for secondary word-differences at points where these is a partial reduction. In practice neither of these things happens. In any case step iii) is only carried out for the most difficult groups and coset systems.
Construct a candidate multiplier. Generally MAF uses the strongest word-difference machine available, but if the estimate of the number of states in the multiplier is very high it uses the weak word-difference machine, which only contains transitions actually seen in confirmed equations. This will usually reduce the number of states in the multiplier a lot, but failure is more or less certain. However we do not mind that because we think the word-difference machine and acceptor are both wrong.
Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x.

Appendix C: How MAF works

MAF's three parts - the simplified version

MAF's three parts - the not so simplified version

Difference Tracker interactions with Rewriter Machine

Interactions between automatic structure phase and Knuth-Bendix phase

Additional processing in the Knuth-Bendix phase

Fast overlap detection

Conjugation

Special overlaps

Partial reductions

Fast word reduction

Word-difference corrections

Deductions

Spurious word-difference elimination

The pool

The procedure for building automatic structure

The simplified version

Strong and weak word-difference machines

Non shortlex automatic structures

A fuller description of the automatic structure phase