MAF : Background : Alphabets, words and identifiers

Alphabets, Words, Identifiers

Alphabet

An alphabet is any finite set, and a letter is any element of an alphabet. In MAF, an alphabet usually consists of a small finite set of identifiers; the MAF library also supports the use of alphabets consisting of a subset of the native character set of the computer.

When, as is usually the case in MAF, an alphabet consists of a set of elements of a group or monoid that generate that group or monoid, the alphabet is usually denoted by the letter X. When H is a subgroup of a group G an alphabet that is a generating set for H is usually denoted by Y.

Words

If X is an alphabet, a finite sequence of letters from X is called a string (on X). The term word is synonymous with string, and is preferred in the MAF documentation, since it is the usual term in the context of group theory, although string is more common in the context of automata theory.

A word is usually written by concatenating the symbols from the alphabet (so that the word 123 represents the sequence (1,2,3). However, in order to avoid possible ambiguity that arises when the underlying alphabet consists of identifiers, the individual symbols in words may be separated with the * symbol, or some other character which cannot be confused with any symbol, or part of a symbol, from the alphabet. This is the notation used for words in MAF when the alphabet consists of identifiers.

If a symbol a is repeated 2 or more times consecutively in a word, the repeated symbols are often abbreviated to an or a^n, where n is the number of consecutive appearances of a. Thus the word aaaa is often represented as a4 or a^4.

The length of a word w is defined in the obvious way, by counting its symbols, i.e. it is the value n for which w is an n-tuple. It is usually denoted by l(w). For example, l(a5b3) = 8

The empty word, the unique 0-tuple, is also considered to be a word, and requires exceptional notation. The symbol ε is often used. In MAF, the empty word is always represented by the string IdWord, which may not be used as an identifier.

The set of all finite words on an alphabet X is denoted by X* and is called the star of X.

If a word w is equal to pst for some possibly empty words p, s, t then: p is called a prefix, s a subword, and t a suffix or trailing subword of w.

If a word u is a prefix of a word w we shall say that w is an extension of u. More generally, if u is any subword of w we shall say that w contains u.

Identifier

An identifier is a non-empty word on some subset of a character set used by a computer. It must satisfy syntactic requirements that enable a particular computer program to recognise the identifier as a significant entity within a stream of characters.

In MAF, an identifier may only contain the characters 'a'..'z', 'A'..'Z', '0'..'9', '_' and '.' (period). An identifier must not begin with a digit or a period, and must not consist of a single underscore. (Here the exceptional treatment accorded to the sequence ^-1 is being neglected. See Input files for the gory details).