An Outline of MAF algorithms
This description is how MAF works is only intended to give a flavour of how MAF works, and is not really a very accurate description. Internally MAF is, in the author's opinion, a very complex though, he hopes, elegant piece of code. If you are at all interested you are recommended to refer to the C++ code, which the author has gone to considerable effort to document thoroughly with extensive comments.
This explanation disregards the relatively trivial stages of initially reading the input file, and finally writing the output files. Otherwise the processing of MAF in automata and these other programs falls into three distinct areas, two of which are fairly definitely ordered in time, but all three are essentially co-routines of each other.
- In its initial phase (which may be suppressed altogether) MAF uses a Knuth-Bendix based procedure to discover new equations. MAF will continue to use this procedure until either the process terminates because the rewriting system has become confluent, or until it decides that it should attempt to build the automata for an automatic structure, or some other halting criterion holds and MAF exits with only provisional results. Roughly speaking during this phase of operation MAF is behaving like the KBMAG implementation of kbprog. Let us call this the Knuth Bendix phase.
- Assuming it does decide to build an automatic-structure, at the point where MAF decides to do so, it starts behaving like the KBMAG implementation of gpmakesfa. Let us call this the Automatic Structure phase.
(Although the automatic structure phase is only properly enabled for automatic groups and for automatic coset systems, the "automatic" code is in fact responsible for building all the FSAs that automata builds - it is for example responsible for turning MAF's very different internal data structures that represent a confluent rewriting system into the corresponding .kbprog and .reduce files). Even difference machines are turned from an internal representation into their final form via this code.
- The third area of processing is the maintenance of a word-difference machine. This processing is only performed if MAF is going to build an automatic structure. So it is never used if the input file describes a monoid rather than a group, or if the user has specifically requested that such processing should not be carried out. This part of the processing happens more or less equally in both phases, except that if MAF has been told to expect the rewriting system to become confluent it does not begin until it has. I call this area of processing "Difference Tracking".
In fact the distinction between areas of processing, though helpful in terms of understanding how MAF works, and for relating MAF to KBMAG, is much less clear in practice than this explanation suggests. For example, the maintenance of the word-difference machine may quite strongly affects how the Knuth-Bendix phase proceeds, since MAF uses information that this phase returns to help decide which equations should be considered next. The Difference Tracker is also quite capable of itself discovering new equations (because it enforces various consistency requirements in the difference machine), and these are inserted into the first phase's reduction FSAs just like equations discovered in the ordinary way.
Again, even the distinction between the Knuth-Bendix phase and the Automatic Structure phase is not clear-cut. For example if it is discovered during the latter phase that the rewriting system should be confluent, this phase will termporarily abort itself and Knuth Bendix phase is resumed until confluence really has been achieved. Also "corrections" discovered by the various checks in the Automatic Structure phase are again fed into the reduction FSA, and not just the difference machine, and this can result in the discovery of many further equations and word differences. And, in principle, MAF can decide for other reasons that it made a poor decision when it switched to the Automatic Structure phase, go back to to Knuth Bendix phase, and then try its Automatic Structure phase again later. In fact this is very rare, because during the development of MAF it has become the case that the "correction" capabilities of the Automatic Structure phase have become strong enough that generally speaking they can succeed in building the structure even if the Knuth Bendix phase is omitted entirely. In fact for many "easy" infinite groups this is actually the fastest way of proceeding. The reason for keeping the Knuth Bendix phase is two-fold:
For groups with many word differences, especially largish finite groups, MAF's Knuth Bendix is far superior to its Automata building phase in finding equations quickly, and for such groups attempting to build an automatic structure, which in any case is not really needed since we can find a confluent rewriting system, would fail because the automata would be too big prior to minimisation.
Even for groups where Knuth Bendix can be omitted (and this includes all of the most difficult examples
supplied with KBMAG) doing so may waste time, especially when the group is finite.
It is perhaps also worth mentioning that the first phase is far from being a straight-forward Knuth-Bendix implementation:
- The Knuth Bendix implementation does not need to laboriously compare every possible pair of equations. It has a data structure which allows it to know which equations possibly could overlap and where, and in fact this data structure is the same as the reduction FSA, which is not implemented as an FSA during this phase even though it can function like one. This data structure is called Rewriter_Machine in the MAF source code.
- MAF will create conjugates of each equation as it is created, and when relevant ask the word difference machine to learn all of them. In MAF conjugation may also include consideration of all overlaps of the form xL and Ly where L is the original LHS.
- Whenever a new equation is inserted, MAF notes, and schedules for later consideration, any particularly
promising overlaps. These are considered outside the regular schedule of KB expansion and are called "special overlaps". If a collapse occurs there will often be very many of these. The amount of "special overlap" processing is configurable.
- As each equation is created it immediately forces all equations in which its LHS occurs as a subword of the LHS or RHS to be either eliminated and/or scheduled for correction. This frequently results in many new equations. (This is quite a delicate operation because the Rewriter_Machine is modifying itself, and deleting parts of a structure to which there are potentially many outstanding pointers: for example the Difference Tracker holds pointers to word differences, and these are stored as states within Rewriter_Machine. So there is a reference counting scheme to keep such data alive until all the pointers are released: the data is removed from the tree but knows enough about where it came from to be able to reconstruct its former context)
- When word differences become reducible, or when equations which "own" word-differences are eliminated, MAF will tell the word difference machine to correct itself. This may also result in the discovery of new equations.
- If MAF finds a word for which the reduction of the inverse of the reduction of the inverse is not the word it first thought of, it creates an equation. Normally this only happens when word differences are being computed, because otherwise MAF has no reason to look at inverses. By default, inverses are only computed when word differences are required, because computation of inverses usually results in a big increase in the number of nodes, since typically every subword of a node needs to be created in this case. In fact for some word orderings they are not calculated even in that case, and word differences are calculated so far as possible without using inverses.
- When MAF finds two LHSes with a common cancellable final symbol and an identical RHS it creates a new equation (All equations are grouped by RHS so this is obvious to it). (A symbol would be cancellable on the right if there is an equation with an empty RHS in which it is the first symbol of the LHS). MAF is, of course, very careful not to assume cancellation applies generally - it is a program that works with monoids, not just groups). A similar check is performed for LHS's with a common prefix, but only if both equations are primary.
- If MAF notices that a word that has a supposedly irreducible prefix can in fact be reduced more than once, and that the final word is "too short" it creates an equation.
- From time to time MAF may use the current state of the difference machine (if it exists) to look for words that ought to be reducible but which the reduction FSA does not know are reducible.
- MAF limits the size of the reduction FSA may putting "uninteresting" long new equations in a "pool". Periodically the pool is checked, and more equations are put into the reduction FSA. This is very important for the good performance of MAF with "pathological" presentations. MAF does discard some equations, but only
if they can be reconstructed later on.
- In fact the volume of equations that are produced other than by the main Knuth Bendix procedure often exceeds by a wide margin those found by it, and there is a "Job Manager", which decides which of the various pending tasks to do next.
The procedure for building automatic structure
In essence MAF follows a similar scheme to KBMAG for building the automatic structure:
- Construct a candidate word acceptor from the current word difference machine
- Construct a candidate multiplier
- Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x. If not then ux is reducible and has an unknown word difference. Update the difference machine for all such u, until either the validation of the multiplier closes, or some abort condition is fulfilled. Then return to step 1.
In this outline the algorithm appears simpler than KBMAG's which once it has a word acceptor at all only builds a new word acceptor if step 2 fails. In MAF step 2 usually cannot fail because the word difference machine is maintained in a way that guarantees that if ux=v is recognised as a reduction, then so also is vX=u. Building a word-acceptor from a difference machine without this property is the the only reason that building the multiplier can fail, provided the word acceptor has been built from the strongest word difference machine currently available. MAF will only keep word acceptors from one iteration to the next if the word acceptor appears to be stable, and it takes longer to build than the multiplier. MAF does usually use the strongest word difference machine when building the word acceptor. However, if building the word acceptor takes a long time it may try to use the weaker word difference machine that only contains the transitions that have actually been proved. In this case building the multiplier can fail. MAF will try to check for this in advance by looking for words that were rejected by the last word acceptor built from a strongest difference machine that are accepted by the acceptor built from the weaker word difference machine. Because of all the extra complication this entails MAF will usually only try this once, and decide that using the strongest word difference machine is the best policy, but there are some groups for which building the acceptor from a weaker word difference machine does prove better. For example, for examples\coxeter\5335 trial word acceptors built from the weak word difference machine are built far more quickly, and have far fewer states than acceptors built from the strong word difference machine. In the case where MAF is building an automatic structure for a word-ordering other than shortlex a certain amount of pre-processing is performed against strong word difference machines to eliminate transitions which ought not to be present in the final correct word difference machine. This usually improves the chances of building a word acceptor successfully even though it may sometimes cause the word acceptor to accept words that could in fact have been rejected.
There are several other differences from KBMAG, which are actually extra stages in the outline algorithm shown above, which is actually more like this:
- Using the current word difference machine check that none of the equations that have contributed to the word differences have a reducible prefix or right hand side. Also check that no words of the form x wd y where x,y, are generators or IdWord, and wd is a word difference have previously unknown reductions. Perform consistency checks on the word difference machine. Rebuild it if need be.
- Construct a candidate word acceptor from the current word difference machine. MAF uses a much simpler set of states for the word-acceptor than KBMAG, which the author believes recognises more reducible words in the case where some word differences are missing, and which he also believes is less inclined to exponential behaviour. Put simply, the word acceptor accepts reductions where there are a padding characters in the interior of the RHS word, which KBMAG goes to a lot of trouble to exclude, and for no good reason, since such reductions cetainly prove a word reducible if they occur, and if they do not at least will tend to increase the similarity between earlier and later states, which will lead to more chances of the acceptor closing early.
- Estimate the number of states that will be present in the unminimised multiplier. If it is very large, because the word acceptor is very large, then do one or more of the following:
- If this is the first time we have tried to build the multiplier, and the word acceptor is improbably large speculatively create all the equations present in the defining states of the word acceptor and the L1 acceptor. Assuming this finds some missing word differences return to step 1.
- If the word acceptor is so large that the multiplier will be difficult to calculate, but we did not perform the action just described in i.), then compute the L1 acceptor and the "primary recogniser". The primary recogniser is an FSA similar to a multiplier which accepts a word pair (u,v) if u is minimally reducible, v is accepted and u==v. Then calculate the "exists" FSA of this acceptor and use the "andnot" of the L1 acceptor and this FSA to find some words that are known to be reducible, but for which we do not have a complete set of word differences. If the "primary recogniser" will itself be too big to construct MAF will simply explore the L1 acceptor looking for equations with new word differences.
- If ii) does not succeed in finding any new differences (which means that the word acceptor is probably correct and that we knew all the primary differences) we perform the same checks that would be made when checking the multiplier, but without actually constructing it. Assuming this finds missing word difference then return to step 1. This step is new to MAF. It is expensive, but less expensive than failing altogether to construct a multiplier because it is too big. Also this validation can detect errors earlier in the tree than can the multiplier checks, because it can detect pairs of words v,w, for which there exists a word u such that u,v and u,w give rise to the same difference. If such words are found then v=w and the word acceptor is wrong.
- Construct a candidate multiplier. Generally MAF uses the strongest word difference machine available, but if the estimate of the number of states in the multiplier is very high it uses the smaller difference machine which is the one that contains all transitions actually seen in confirmed equations. This will usually reduce the number of states in the multiplier a lot, but failure is more or less certain. However we do not mind that because we think the word-difference machine and acceptor are both wrong.
- Check that that the multiplier includes an accept state (u,v) for every accepted word u, and multiplier x.
- After a round of multiplier checks has failed MAF will consider whether to return to Knuth Bendix phase, and in any case may well take a look at its pool of equations to see if any have now become interesting.