In this section, we describe the utilities that are provided for manipulating finite state automata. You may like to note that here and elsewhere the author has used the abbreviation FSA to refer to a single finite state automaton, and the abbreviation FSAs for the plural, despite his classical education.
fsaand [loglevel] [format] [-first] filename1 filename2 [output_file]
Two finite state automata, which must have the same alphabet, are read in from the files filename1 and filename2. An automaton that accepts a word w in the alphabet if and only if both of the input automata accept w is computed, minimised, and output.
The [-first] option modifies the language of the output automaton in the following way: The output automaton accepts a word w if both input automata accept it, and no prefix of w was accepted by both automata (so that any accepting state is also final).
fsaandnot [loglevel] [format] [-first] filename1 filename2 [output_file]
Two finite state automata, which must have the same alphabet, are read in from the files filename1 and filename2. A new automtaon is computed and output. The output automaton accepts a word w if and only if the first input automaton accepts w but the second input automaton does not.
The [-first] option modifies the language of the output automaton in the following way: The output automaton accepts a word w if the first accepts w but the second does not, and no prefix of w was accepted by the first and rejected by the second.
"and not" automata are very often useful where one is trying to construct an automaton whose language should be closed under some operation. The general method of proceeding is to first construct a candidate automaton, then to construct a candidate two-variable automaton which should accept (w,op(w)), then form the "exists" automaton of this and then combine it with the candidate automaton using and_not. The aim is for the "and not" automaton to have an empty language. If not then one submits some of its accepted words to some error correcting code. Indeed, this is more or less how automata constructs the word acceptor and general multiplier for an automatic group. In fact the internal equivalent of the [-first] option is used to limit the size of the output language.
A finite state automaton is read in, and then its states are permuted into bfs-order (bfs = breadth-first-search), and it is printed out again. This means that the states are numbered 1,2, ..., n, and if one examines the transition-table, in order of increasing states, then the states first occur in sequence. fsamin and fsabfs used together can be used to check whether two deterministic automata with the same alphabet have the same language. First apply fsamin and then fsabfs. If they have the same language, then the resulting automata should be identical. In fact this is exactly what fsalequal does.
The default output filename is input_file.bfs.
Two finite state automata are read in, which should both be single-variable automata using the same alphabet. A new finite state automaton is computed, minimised, and output. The output automaton is a two variable FSA that accepts a word pair (u,v) if and only if the first automaton accepts u and the second automaton accepts v. The accepted language is therefore the cartesian product of the two input languages. If the input automata were word acceptors for two groups, the output automaton is in effect the word acceptor for the direct product of the two groups. Output is to file3 if three filenames are specified and to stdout otherwise
Two finite state automata are read in, which should both be two-variable automata using the same alphabet. A new finite state automaton is computed, minimised, and output. The output automaton is a two variable FSA that accepts a word pair (u,v) if and only if there is some w such that the first automaton accepts (u,w) and and the second automaton accepts (w,v). If the input automata were the multipliers for words w1 and w2 respectively in some group or coset system, then the output automaton is the multiplier for the word w1w2. Presumably because of this, and bcause this is almost the only practical use made of this operation, the KBMAG version of this utility is called gpcomp, but the method of construction is in principle applicable to any two two- variable FSA, not just multipliers, and could, for example, be used to help verify whether some FSA that purported to encode a total order of the words in the alphabet actually did so. Output is to file3 if three filenames are specified and to stdout otherwise
fsaconcat [loglevel] [format] filename1 filename2 output_file
Two finite state automata, which must have the same alphabet, are read in from the files filename1 and filename2. An new automaton with the same alphabet is computed and output. The new automaton accepts a word if and only if is identical to a concatenation of two words w1w2, where w1 is accepted by the first input automaton and w2 by the second.
fsacount [loglevel] [-is n] -i | input_file [start_word1] start_word2
A finite state automaton is read in, and the size of the accepted language is counted. Unless you specify the -silent the result of the calculation is then printed as part of a message. If you do specify -silent then the only output sent to stdout is a number representing the answer (which may of course be infinite). If the size of the accepted language is infinite the number output is -1. If the size of the input language is not infinite, but is greater than or equal to 2^32-2, the number output is -2. Otherwise the output is the normal decimal representation of the size of the language.
If the FSA is one-variable and start_word1 is specified then the size of the accepted language from that point is computed: i.e. the size of the intersection between the accepted language of the FSA and the set consisting of words having start_word1 as a prefix and start_word1 itself. If the FSA is two-variable then the size of the language from the pair [word1,word2] is similarly computed.
If you specify the -is n option, then the initial_state of the automaton is first changed to n, and the resulting automaton is minimised. This is included for compatibility with KBMAG. The purpose of this option was to facilitate the computation of the index of a subgroup. In MAF the same result will be achieved in a less obscure manner by using the start_word1,start_word2 facility.
fsacut [loglevel] [format] -i | [input_file] [-o | output_file]
A finite state automaton is read in, and a modified automaton computed and then output. The modified automaton has the same states and transitions as the original automaton, except that any transitions which create a loop are removed. Therefore the accepted language is only changed if the original FSA had an infinite accepted language, and the new language is the largest finite subset of this that can be recognised by an FSA with the same number of states as the original FSA.
fsadiagonal [loglevel] [format] -i | [input_file] [-o | output_file]
A finite state automaton, which must be two-variable, is read in, and a single-variable automaton is computed and then output. The computed automaton accepts a word u if and only if the input automaton accepted (u,u). This utility, together with gpmult could be used to extract the identity multiplier from a general multiplier, and then to convert it to the word acceptor.
fsaenumerate [-bfs | -dfs] min max -i | [input_file] [-o | output_file] [-l] [-s] [-equation] [-is n] [-sw w1] [-rw w2]
A finite state automaton is read in. min and max should be non-negative integers with min ≤ max. The words in the accepted language having lengths at least min and at most max are enumerated, and output as a GAP list of words.
If the option -dfs is called (depth-first search - the default), then the words in the list will be in lexicographical order, whereas with [-bfs] (breadth-first-search), they will be in order of increasing length, and in lexicographical order for each individual length (i.e. in shortlex order). Depth-first-search is marginally quicker.
If the -l option is specified the number of the label of the accepting state is printed with each accepted word or word pair. If the-s option is specified then the number of the accepting state is printed with each accepted word or word pair. If the -equation option is specified and the input FSA is a multiplier, then the accepted word pairs are printed out in the form of equations.
If you are enumerating words to feed into a C++ application it is better to link MAF directly into your program and use MAF's FSA::Word_Iterator class, because in this case you can use a flexible scheme to decide how far to explore each FSA. Or you can simply write your own code to load and run the FSA if you prefer.
The default output filename is input_file.enumerate.
fsaexists [loglevel] [format] [-sticky] [-swapped] -i | input_file output_file
A two-variable finite state automaton is read in, and a one-variable automaton, which accepts a word u if and only if the input automaton accepts (u,v) for some word v. If the -sticky option is specified then the output automaton accepts a word u provided (p(u),v) is accepted where p(u) is some prefix of u and v is any word. In the case where the input file is a word difference machine then the accepted language is the same with and without -sticky, and the -sticky automaton is much easier to calculate.
If -swapped is specified then the roles of u,v are interchanged, so that the output automaton accepts u if and only if (v,u) is accepted for some v.
The default output filename is input_file.exists, or if -swapped is specified input_file.swapped.exists.
fsakernel [loglevel] [format] [-accept_all | -a] [-no_minimise] [-infinite] [-midfa] -i | [input_file] [output_file]
A finite state automaton is read in, and a modified automaton computed and then output. The modified automaton has the same states and transitions as the original automaton and accepts the same language, except that states to which the automaton cannot return after entering them are made non accepting. In other words, when reading a word, the output automaton will reach its failure state, unless there were infinitely many accepted words having that word as a prefix in the original automaton, and it will not be in an accepting state if the original automaton was in a state than can only occur a finite number of times. So fsakernel is only of any interest when an automaton has an infinite accepted language.
The -accept_all or -a option makes all states that remain after this pruning accept states. The -infinite option extends the set of states that are allowed to remain accepting to include any state that can be reached infinitely often. So in this case the only states that are made not accepting are the states that only occur a finite number of times.
If the -midfa option is specified all the states which can recur are made initial states, and all the other states are removed entirely.
The -no_minimise option prevents the output FSA from being minimised or trimmed, so that it has the same states as the original FSA (unless you used the -midfa option).
fsakernel and fsaprune are similar utilities, and both identify the set of states that can recur. They differ in how they treat the states that can only occur finitely often. fsakernel makes all these states non-accepting, fsaprune only makes them non accepting if they are also states from which the accepted language is finite.
The suffix used with the "." filename is ".kernel".
fsalequal [loglevel] filename1 filename2
Two finite state automata are read in from the files filename1 and filename2. This program tests whether they have the same language. The exit code is 0 if they do, and a non-zero value of 4 or more if not. The various possible non-zero exit code values are:
Exit Code | Meaning |
---|---|
1 | Usage error |
3 | I/O error |
4 | The alphabets are not the same size |
5 | The alphabets are not the same |
6 | The canonical FSAs for the languages have a different number of states |
7 | The canonical FSAS for the languages have different transition tables |
A finite state automaton is read in. An FSA with as few states as possible but with the same accepted language is generated and printed out. If the input FSA has labelled states (including the case where each state has its own unique label), then normally minimisation is subject to the additional condition that the label of the state that is reached by reading the same input is the same in both the input and the output FSA, except that if a word is not accepted and is not a prefix of an accepted word then it will have reached the failure state in the second FSA, but may not have done in the first FSA. So in the case of an FSA with labelled states fsamin may not strictly minimise the FSA, because states which would be combined by a strict minimisation are kept apart because they have different labels. If the -nolabels option is specified, then any labels on the input FSA are removed prior to minimisation. Alternatively, the -mergelabels option, keeps the labels, but ignores them when deciding which states to combine, but afterwards the labels are changed, so that each state in the new FSA has all the labels from the original states. Thus fsamin without -nolabels is roughly equivalent to KBMAG's fsalabmin (except in the case where the input FSA has labels attached to states that are failing), and fsamin -nolabels or fsamin -mergelabels is equivalent to KBMAG's fsamin. The -mergeinitiallabels option is similar to -mergelabels, but does not combine so many states: it keeps apart accept states with different labels, but not other states. Minimising a MIDFA coset multiplier or a determinised multiplier with -mergelabels will invalidate it if it contains more than one multiplier, but it will still be valid if the -mergeinitiallabels option is used.
The default output filename is input_file.min.
A finite state automaton is read in, and an automaton with the same alphabet is computed and output. The output automaton accepts a word w if and only if the input automaton does not accept w.
If the [-first] option is used then the language of the output automaton is modified in the following way: the output automaton accepts a word w if and only if the input automaton does not accept wand the input automaton had not already reached its failure state before reading the last symbol of w. The -first language is usually the more interesting. For example if the input automaton is a word acceptor, then the output with -first is the automaton which accepts just those reducible words which have an irreducible prefix. (MAF refers to this language as the L2 language).
The default output filename is input_file.min, or input_file.firstnot if -first is used.
fsaor [-op d/s] [loglevel] filename1 filename2 output_file
Two finite state automata, which must have the same alphabet, are read in from the files filename1 and filename2. A new automaton with the same alphabet is computed and output. The output automaton accepts a word w in the alphabet if and only if at least one of the two input automata accept w.
fsaprint [loglevel] [format] -i | input_file [-o | output_file]
This merely reads in a finite state automaton and prints it out again. Its main use is for changing the format of an automaton from dense to sparse or vice-versa, or to annotate the FSA so that you can understand it more easily.
The KBMAG equivalent of this utility is called fsafilter. The name has been changed, because, if a user is interested in questions of compatibility between MAF and KBMAG he or she may well want sometimes use MAF's fsaprint to convert an FSA created by KBMAG into the format used by MAF, and sometimes use KBMAG's fsafilter to perform the reverse operation. MAF uses quite a different style of output from KBMAG, which reflects the author's rather different preferences. Perhaps at some future date, extra formatting options will be added, to make fsaprint more flexible.
The default output filename is input_file.print.
fsaprune [loglevel] [format] [-accept_all | -a] -i | [input_file] [-o | output_file]
A finite state automaton is read in, and a modified automaton computed and then output. The modified automaton has the same states and transitions as the original automaton and accepts the same language, except that states from which the accepted language was finite have been removed. In other words, when reading a word, the output automaton will reach its failure state, unless there were infinitely many accepted words having that word as a prefix in the original automaton. In particular, words from the original language that were accepted, but were inextensible, or only finitely extensible, are now rejected, so fsaprune will always produce a completely empty automaton if the original automaton had a finite language. Another, possibly better, name for this program might have been "fsaspine", though "prune" is quite suggestive too - the states that are removed, though not yet dead, are terminally ill. The -accept_all or -a option makes all states that remain after pruning accept states. This is not quite the same as KBMAG's -a option, which makes all states of the input FSA accepting before pruning it. In this case MAF will still correctly remove cycles of failing states from a non-trim input FSA, but KBMAG will not.
It is the author's belief that MAF's fsaprune creates the same automaton that KBMAG's fsaprune is intended to produce. However, the KBMAG version of fsaprune frequently only produces this automaton if its -a option is used, and then only if the input automaton was trim. In other cases KBMAG uses a much more complicated algorithm than seems necessary and produces an automaton which does not seem at all consistent with KBMAG's definition of what fsaprune should do, and for which the present author can see no discernible use.
There are some peculiarities in this program compared with most other MAF utilities: in particular this more or less the only program that outputs an automaton that may be neither "trim" nor "minimised". Frequently the output language is in fact empty unless you use the -accept_all option. When this happens the line accepting := [] will appear in the output. It should also be noted that KBMAG has a -i option for this program, which means something quite different and is not supported. The automaton produced by KBMAG's fsaprune -i also seems to be very obscure, since there is no obvious manner of relating its states to those of the original. However MAF's fsakernel utility does something similar.
The default output filename is input_file.prune.
fsareverse [loglevel] [format] [-s] [-midfa] -i | input_file [-o | output_file]
A finite state automaton is read in, and an automaton with the same alphabet, which accepts the same words, but read in the opposite direction, is computed and output.
If -s is used, then the states of the output automaton will be labelled as lists of integers, which specify the corresponding subsets of the states of the input automaton. This option is not recommended.
If -midfa is used, then the output automaton will be a MIDFA, and the initial states will correspond to the accepting states of the input automaton.
The default output filename is input_file.reverse, or, if with the -midfa option, input_file.mireverse.
The MAF version of fsareverse is much faster than the KBMAG one on large FSAs.
This program is useful if you want to investigate what accepted words a particular word occurs as a subword of. It is also interesting to compare the accepted languages of an automaton and its reverse as this shows how far from being "time symmetric" the language is. fsareverse also shows up the difference between the accepted language of a group and a coset word acceptor: although both languages are prefix closed, when reversed usually only the group word acceptor will still be prefix closed.
fsan [loglevel] [format] -all | [-exact] n -i | input_file [-o | output_file]
This program can create a number of simple automata that are sometimes useful as one of the input automata for utilities that combine more than one automaton. The created automaton always uses the same alphabet as the input automaton, but no other use is made of the input automaton.
If -all is specified then the automaton accepts all words in the alphabet. Otherwise you must specify a number between 0 and 65533, possibly prefixed by -exact. The output automaton accepts all words of length n or less, or of length exactly n, according to whether or not -exact is specified. In principle you could use the combination of fsan and fsaand, to create a truncated version of a language. However, this is unlikely to be a good idea, as the truncated automaton is quite likely to be larger than the original unless the original already had a finite language.
fsapad [loglevel] [format] -all | [-exact] n -i | input_file [-o | output_file]
This program creates the automaton that accepts the language of all correctly padded pairs of words (for the alphabet of the input FSA). In other words it accepts all pairs of words (u,v) which do not contain an interior padding character. It is sometimes useful to "and" this automaton with a word difference machine since the latter normally do not reject input contain interior padding, whereas multipliers do.
fsaproduct [loglevel] [format] file1 file2 file3 [file4]
This program reads three input automata and computes a fourth, which is minimised and output to file4, or to stdout. The first input automaton must be two-variable, the other two must both be single-variable. The computed automaton accepts a word pair (u,v) if and only if the first automaton accepts (u,v) the second accepts u and the third accepts v. In essence this is how the general multiplier is constructed, and it really is how closely related automata such as the primary equation recogniser and reduction recogniser are created.
fsashortlex [loglevel] [format] [-strict] [-lt] -i | input_file [-o | output_file]
This program creates an FSA which decides the shortlex order of two words in the alphabet of the input automaton. The output automaton accepts a pair of words (u,v) if and only if u > v in shortlex order. If the -lt option is specified the automaton instead accepts a pair of words (u,v) if and only if u < v in shortlex order. Usually the output automaton allows interior padding on the side which must contain the lesser word. However, if you specify the -strict option, interior padding is disallowed on both sides.
The default output filename is input_file.shortlex.gt, or, if with the -lt option, input_file.shortlex.lt.
fsastar [loglevel] [format] -i | input_file [-o | output_file]
A finite state automaton is read in and a new automaton with the same alphabet is computed and output. The output automaton accepts a word w if and only if it is the concatenation of 0 or more words accepted by the input automaton.
The default output filename is input_file.star.
fsaswapcoords [format] [loglevel] input_file [-o | output_file]
A two-variable finite state automaton is read in from input_filename. A new automaton that accepts a pair of words (u,v) if and only if the original automaton accepts (v,u) is output to output_file. In KBMAG this functionality is needed for quantifying over the second variable of a two variable automaton with fsaexists, but fsaexists supports this directly in MAF through its -swapped option. The FSA produced by running fsaswapcoords against the the multiplier for a word w is necessarily the multiplier for w^-1, except that the labels also need to be inverted. MAF uses this fact to reduce the number of composite multipliers it needs to build when checking axioms or performing other tasks that require the computation of composite multipliers.
The default output filename is input_file.swapcoords.
A finite state automaton is read in. States which cannot be reached from an initial state are removed, as are states from which no accepting state can be reached. The first category of removed state is called inaccessible, so that all remaining states are accessible. States from which an accepting state can be reached are sometimes called co-accessible. An fsa with no states in either of the offending categories is called trim. The trim operation does not change the language of the fsa. Generally speaking, the fsamin utility should be preferred to fsatrim; fsamin will often reduce the size of the automaton significantly more than can fsatrim. However, sometimes fsatrim runs much faster, and on certain automata, such as "and not first" automata fsamin may take a very long time to run and may not be able to improve the automaton significantly.
The default output filename is input_file.trim.
midfadeterminize [loglevel] [format] [-identical] [-equal] [-nolabels] -i | [input_file] [-o | output_file]
A MIDFA (multiple initial state determinstic finite state automaton) is read in, and a minimized deterministic automaton that accepts the same language is output. Unless the -nolabels option is specified, or the labels are not words, the output FSA is given labels by combining all the labels from merged states into a list of labels.
The -identical and -equal options are useful with composite MIDFA multipliers which you do not want to fully determinise. In such multipliers it is commonly the case that more than one initial state has the same label (for example because composition will involve multiplication on the left and right by the identity). There is no good reason to keep such states separate. The -identical option combines initial states with the same label, but keeps other states apart. The -equal option performs a stronger kind of merging: initial states are merged if any one of the labels attached to the state are equal. This is necessarily done transitively, so that two states each with a label in common with a third state will both be merged with that state. Internally MAF uses the -equal option when building composite MIDFA multipliers. As a result MIDFA multipliers usually have many fewer states when built with MAF as compared with KBMAG. When determinised the multipliers accept the same language, so they are equivalent.
The default output filename is input_file.midfadeterminize.