How does \expandafter work: From basic principles to exploring TeX's source code

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

Introduction

We have now covered the background topics necessary for a full exploration of \expandafter:

the basics of TeX tokens and how they are calculated;
principles behind TeX’s expansion process;
TeX’s use/creation of temporary token lists during document processing;
how TeX uses and “juggles” multiple input sources (including temporary token lists).

In this article we’ll bring these topics/concepts together to explain the mechanisms underlying TeX’s \expandafter command: in short, how it works.

And so, to `\expandafter`

The idea behind \expandafter is to force expansion of a command (token) before TeX would normally do so. Given two tokens, $\mathrm{T_1}$ and $\mathrm{T_2}$ the action of \expandafter $\mathrm{T_1T_2}$ results in TeX processing $\mathrm{T_1}\text{<}$expansion of $\mathrm{T_2}\text{>}$, where $\text{<}\dots\text{>}$ indicates a list of tokens. TeX expands $\mathrm{T_2}$ ahead of time so that token $\mathrm{T_1}$ (e.g., a primitive or a macro) gets to see, or can act upon, the tokens arising from expansion of $\mathrm{T_2}$. If the token $\mathrm{T_2}$ represents a non-expandable item such as a non-active character or (most) primitives, the action of \expandafter doesn’t change anything: TeX would continue to process tokens $\mathrm{T_1T_2}$ in the normal way.

Introduction to using `\expandafter`

If you’ve not used \expandafter, here is an example using it with the primitive \uppercase{...}. Suppose that we wanted to typeset the name of our main .tex input file but in upper-case letters. We might know the following TeX primitive commands:

\uppercase: does as its name suggests, converts character tokens to their upper-case equivalent (where that exists);
\jobname: as we have seen, expands to give the name of the main .tex file.

Assuming our TeX file is called mycode.tex we might reasonably expect \uppercase{\jobname} to typeset MYCODE. But no, it typesets mycode as lower-case. What went “wrong”?

If we write the general use of \uppercase as

\uppercase{<token list>}

we can say that \uppercase looks through <token list> and will only operate (change case) on character tokens it detects within the <token list>: all non-character tokens are ignored because \uppercase will not “look into” (expand) non-character tokens to see what they contain or represent. Because a token is simply an integer value, all \uppercase has to do is look through the token list to check if the numeric value of each token in <token list> falls within the range of values indicating a character token. Incidentally, \uppercase will also change the case of active characters to create an upper-case active character which, because it is still active, will also need to have been defined, otherwise TeX will generate an error: Undefined control sequence, but we digress...

For example, even if we define a macro that is just text

\def\foo{some lower-case text}

then \uppercase{\foo} still typesets some lower-case text and not SOME LOWER-CASE TEXT as we’d hope, simply because the action of \uppercase does not try to determine what \foo represents: it sees \foo as a command token and ignores it, as it did with \jobname.

How can we fix this? `\expandafter` to the rescue

To typeset an upper-case version of the file name, we need to modify \uppercase{\jobname} by forcing TeX to replace \jobname with its expansion (a sequence of character tokens) before \uppercase gets to work. Once again, expansion is being used to remove the \jobname token (command) and replace it with the result of its expansion (a token list containing character tokens). So, if we write

\uppercase\expandafter{\jobname}

then it works: MYCODE would be typeset. What happens is that TeX starts to process \uppercase and immediately checks for the mandatory opening brace character ({); however, TeX detects an \expandafter command which causes it to temporarily “divert its attention” to processing \expandafter{\jobname}.

If we compare

\expandafter $\mathrm{T_1T_2}$

with our example

\expandafter{\jobname}

we can see

$\mathrm{T_1} =\space ${_token
$\mathrm{T_2} =\space $\jobname_token

Where {_token and \jobname_token refer to the token values calculated by TeX—the subscript notation _token is used to remind ourselves that TeX works in the world of integer tokens.

Writing \uppercase\expandafter{\jobname} works because, in outline (details to follow), \expandafter causes TeX to perform the following tasks:

read and save the opening {_token;
read the next token: \jobname_token. TeX recognizes that \jobname_token represents an expandable command and expands it. \jobname_token is replaced with its expansion—a series of character tokens;
after expanding the \jobname command, TeX puts the {$_\mathrm{token}$ “back into the input” and uses the token list arising from expansion of \jobname so that TeX will read \uppercase{_token<expansion of \jobname>_{token list (characters)}}, and this produces our desired result.

The following diagram shows how TeX processes \uppercase\expandafter{\jobname}—read the graphic from the bottom and work upwards to follow the process flow.

$How \expandafter works$

The following notes explain the various stages of processing.

TeX starts to process \uppercase and checks for the mandatory opening brace character ({) but detects an \expandafter command.
If we compare \expandafter $\mathrm{T_1T_2}$ to our input of \expandafter{\jobname} we can see $\mathrm{T_1} =\ ${_token and $\mathrm{T_2} =\ $\jobname_token. Note that here we will use the subscript _token to indicate TeX is processing integer token values.
\expandafter reads, then temporarily saves, the {_token by storing that integer token value in an internal variable. Later, TeX will re-insert that token back into the input, after processing the \jobname command.
\expandafter reads the next token, \jobname_token, and expands the \jobname command.
The expansion of \jobname creates a temporary token list which contains a sequence of character tokens representing the .tex file name. Note that all character tokens generated by \jobname are calculated using category code 12.
Once \jobname has been expanded, TeX re-inserts the token saved in step 3 ({_token) and puts it back into the input. TeX does that by creating another token list containing the single {_token
TeX has now finished processing \expandafter, resulting in two token lists ready to be used as sources of TeX input. TeX now reverts back to processing \uppercase but has configured its input such that the two token lists created by \expandafter become the source of tokens for \uppercase—which now sees \uppercase{_token<expansion of \jobname>_{token list (characters)}}. \uppercase now sees a sequence of character tokens and can produce our desired result.
After reading all the character tokens produced by \jobname, TeX reverts back to obtaining tokens from its previous input source (our .tex file) from where it will read the next token: the closing } required to terminate the list of tokens to be processed by \uppercase.

`\expandafter` and internal token lists

Temporary token lists are a vital element of \expandafter’s processing behaviour: understanding the use and existence of those token lists can help to clarify how \expandafter achieves its results, particularly when trying to write, or understand, macros which make use of multiple consecutive \expandafter commands to achieve more complex forms of token processing: \expandafter\expandafter\expandafter...

Another key element of \expandafter’s behaviour, especially with multiple consecutive \expandafter commands, is the use of recursion (inside the TeX software itself)—a topic we will consider later in this article.

To further assist our understanding of temporary token lists we’ll look at one more example of \expandafter, this time with the \the command.

`\expandafter` and internal token lists: example 2

In this example we’ll see how \expandafter can be used to influence tokens stored in a token register via the \toks command. Here are the TeX primitives we’ll be using:

\count register=number: a TeX primitive used to store the value number in the TeX location register;
\toks register={token list}: a TeX primitive used to store token list into the token register location register—saving a sequence of tokens for later use;
\the token: an expandable TeX primitive command which processes token, although the exact results depend on the nature of the token being processed. \the has a number of uses: among those is typesetting the value stored in a TeX parameter or variable (e.g., a register). Other uses of \the include inserting a copy of tokens stored in a token register. Here, we’ll use \the to typeset the value stored in a \count register.

We’ll start with the following TeX code to store the value 12345 in TeX’s \count register 99:

\count99=12345

If we want to typeset the value stored in \count99 we can use \the\count99 (or \number\count99).

Next we’ll use the \toks command to store some tokens in token register 99:

\toks99={\the\count99 }

The list of tokens stored in token register 99 would contain the following:

TeX token value	Item represented
5382	`\the`
7885	`\count`
3129	`9` (character code 57 with category code 12) resulting in a token value of $256 \times 12 + 57 = 3129$
3129	`9` (character code 57 with category code 12), resulting in a token value of $256 \times 12 + 57 = 3129$
2592	`<space>` (character code 32 with category code 10), resulting in a token value of $256 \times 10 + 32 = 2592$

Note that the token list created by \toks99 does not contain the actual data value stored in \count99 because the \toks command does not perform expansion: it simply creates tokens and stores them. In our example, \the is not expanded so it doesn’t process \count99; here \the is merely turned into a token (value 5382) and stored in the token list.

If we want the \toks99 token list to contain tokens representing data stored in \count99 we will need some way to create those tokens (make them available) so that the \toks command can access them. And of course \expandafter can do this for us. If we write:

        \toks99=\expandafter{\the\count99 }

the action/processing of the \toks command will be “put on hold” whilst \expandafter causes (forces) expansion of \the which, in turn, acts on \count to generate a temporary token list containing character tokens representing data stored in \count99. A small but important point is the <space> character after the digits 99: that <space> character acts to terminate TeX’s scanning process when it is searching for a numeric quantity.

Here, the action of \expandafter is very similar to the \jobname example.

Read and save the opening {_token.
Read the next token, \the_token, which represents an expandable command, so TeX expands it. \expandafter forces expansion of \the which then operates on \count99 to convert data stored in \count register 99 (the number 12345) into a temporary token list. That list will contain character tokens representing the digits 1, 2, 3, 4 and 5—character tokens with category code 12.
After expanding and processing \the, TeX puts the {_token “back into the input” and uses the token list arising from \the\count99 so that TeX will read \toks99={_token<expansion of \the\count99>_{token list (characters)}} and this produces our desired result.

This sequence of events is summarized in the following diagram—read the graphic from the bottom and work upwards to follow the process flow.

$How \expandafter works$

TeX starts to process \toks; it sees the optional = sign, then checks for the mandatory opening brace character ({, or any character with category code 1) used to indicate the start of a token list. However, TeX detects an \expandafter command and proceeds to execute that instead.
If we compare \expandafter $\mathrm{T_1T_2}$ to our input of \expandafter{\the\count99 } we can see $\mathrm{T_1} =$ {_token and $\mathrm{T_2} =$ \the_token.
\expandafter reads, then temporarily saves, the {_token (TeX temporarily stores that integer token value in an internal variable). Later, TeX will re-insert that token back into the input, after processing \the
\expandafter reads the next token, \the_token and expands it.
The expansion of \the creates a temporary token list from processing \count99—that token list contains a sequence of character tokens which represent the data value stored in the \count register 99.
Once \the has been expanded, TeX re-inserts the token saved in step 3 ({_token) and puts that token back into the input. TeX does that by creating another token list containing the single token {_token.
TeX has now finished processing \expandafter and produced two tokens lists ready to be used as the next sources of input. TeX reverts back to processing \toks99= but now TeX has configured its input so that the two token lists created by \expandafter become the source of tokens for \toks—which now sees {_token<expansion of \the\count99>_{token list (characters)}}. \toks can now access, and store, the sequence of 5 character tokens which represent the data value (12345) stored in \count99: our desired result.
After reading the all character tokens produced by \the\count99, TeX reverts back to obtaining tokens from its previous input source (our .tex file) from where it will read the next token: the closing } required to terminate the list of tokens to be saved by \toks99={...}.

How `\expandafter` really works

In this section we’ll take a “low level” look inside TeX itself: exploring the source code/functions within TeX which implement the behaviour of \expandafter. Details are expressed in a pseudo-C code but should be accessible to anyone familiar with other programming languages.

The following annotated diagram explains how TeX implements \expandafter as part of a larger function called expand()—the core function which drives TeX’s expansion processing. Within the section responsible for implementing \expandafter we can see recursive behaviour where another call to the expand() function is used to process the second token read-in, $\mathrm{T_2}$, for those cases where $\mathrm{T_2}$ is expandable.

Although this code appears in Knuth’s TeX engine, the basic principles outlined by this graphic are applicable to all TeX engines.

$How \expandafter works inside TeX$

The first task of expand() is to determine if the command to be expanded is a macro or a primitive because macros have a specialized expansion process which is handled by a function called macrocall().

If the command to be expanded is a primitive, the expand() function uses the current command code value (stored in global variable curcmd) to identify which particular primitive needs to be processed. We can see these details in a more complete listing of expand():

    void expand(void)
    {
    //curcmd is a global variable
    if(curcmd != macro) // curcmd < 111
    {  
      switch(curcmd)
      {
        case \expandafter: // Process the \expandafter T₁T₂ command
        {
            gettoken(); // Read token T₁
            t = curtok; // Save token T₁ in local variable t
            gettoken(); // Read token T₂
            if(curcmd > 100) // Is token T₂ expandable?
                expand();    // Yes! T₂ is expandable: 
                             // perform expansion of T₂ by
                             // making a recursive function call to expand()
            else
                backinput(); // T₂ is not expandable: put that token 
                             // back in the input to be read again (later)
    
            curtok = t ;  // Restore global variable curtok to saved value of T₁
            backinput() ; // Put token T₁ back in the input
                          // ahead of the tokens arising from expansion of T₂
        }
        break;
        
        // Code to process other expandable commands
        case “convert to text” command: // Any one of \number, \string, \romannumeral, 
                                        // \meaning, \fontname, \jobname
                                        // They share the same value of curcmd
        break;

        case \noexpand: // Suppress expansion of the next token
        ...
        break;

        case \csname:  //Manufacture a control sequence name.
        ...
        break;

        case \the: // Insert some tokens
        ....
        break;

        case “\if... test command” : // Process one of TeX’s conditionals:  
                                      // \if, \ifcat, \ifnum, \ifdim,\ifodd, \ifvmode, 
                                      // \ifhmode, \ifmmode, \ifinner, \ifvoid, 
                                      // \ifhbox, \ifvbox, \ifx, \ifeof, \iftrue, \iffalse, 
                                      // \ifcase, \ifdefined, \ifcsname, \iffontchar
        ...
        break;

        case “\fi or \else”: // Terminate the current conditional
        ...
        break;

        // etc for any other expandable primitive commands supported by
        // the TeX engine

        }
    
    }else // Not an expandable primitive: it is a macro
        {
             macrocall()
        }
        //... more code removed
    }

TeX’s love for global variables

Perhaps reflecting its age and the era in which it was designed, TeX’s source code makes extensive use of so-called global variables—in fact there are hundreds of them. By their very nature, global variables can be changed/modified from anywhere within the TeX source code—which, for Knuth’s TeX, is a single monolithic file containing over 25,000 lines of code and hundreds of functions. Understanding how TeX works is not always an easy task...

To process \expandafter, TeX reads tokens from its current input using a function called gettoken() whose action is to create a token and set the value of several key global variables used throughout TeX’s source code. Two such variables, updated by the action of gettoken(), are used in the implementation of \expandafter:

curtok: (current token) the integer value of the token just read in;
curcmd: (current command code) the command code of the command (or character) represented by the token curtok.

When processing \expandafter$\mathrm{T_1T_2}$ TeX reads token $\mathrm{T_1}$ and temporarily saves its value (an integer) in a local variable called t. TeX then reads $\mathrm{T_2}$ and checks to see if that token represents an expandable command—by checking if its command code (curcmd) is > 100. If so, TeX needs to expand the command represented by $\mathrm{T_2}$ and makes another call to the function expand(): this is an example of recursion because the expand() function is calling itself. An awareness of the recursive nature of expansion, especially when using \expandafter, can help with understanding how multiple consecutive \expandafter commands—i.e., \expandafter\expandafter\expandafter... achieve their effects.

If token $\mathrm{T_2}$ is expandable, the expansion takes place and when the recursive call to expand() returns, code within the implementation of \expandafter re-inserts token $\mathrm{T_1}$ back into the input. The global variable curtok is re-assigned to the value of the saved token—stored in local variable t, which is the value of token $\mathrm{T_1}$—and a call is made to the function backinput().

The function `backinput()`

As its name suggests, this function puts a token “back into the input”. To do that, TeX uses the current value of the global variable curtok to create a token list which contains a single token (whose integer value is provided by curtok). TeX also arranges its input handling to ensure that single-token list will, at the appropriate time, be re-read by TeX as part of its subsequent input processing. Note carefully that the token $\mathrm{T_1}$ is re-inserted after the expansion is finished, which ensures TeX will read that re-inserted token before it reads the tokens arising from expansion of $\mathrm{T_2}$.

Processing macros: the `macrocall() function`

As previously discussed, all macros, together with some primitive commands, are expandable and all expansion processing goes through the expand() function. However, expand() is careful to use the curcmd (current command) value to distinguish between expandable primitives and macros because the macro-expansion process is handled by a dedicated function called macrocall(). Macros need a specialized expansion process because macro arguments, and delimiter tokens, have to be scanned for in a very particular and rigorous way; consequently, that process is delegated to a function designed to do that: macrocall().

Macro expansion vs. macro execution

Macro expansion is not the same process as macro execution: expansion of a macro is the pre-execution process TeX performs to get the macro ready for execution. The “execution” of a macro happens when TeX is actively reading and processing tokens contained in that macro’s definition (replacement text) and its arguments (parameters).

Macro expansion

To expand a macro TeX first checks if the macro takes arguments; if so, macrocall() very carefully scans the input looking for tokens destined to become the macro’s arguments. That process includes checking the user’s input for any delimiter tokens used in the macro’s original definition—the pattern of tokens used in a macro call must exactly match the pattern of tokens contained in the stored definition. However, tokens used as delimiters are simply discarded by TeX: they are, in effect, just a form of “punctuation” TeX uses to determine the actual tokens destined to become macro’s arguments—i.e., tokens the user intends for processing by the macro. For more information on delimiter tokens, see How TeX macros actually work.

For each parameter (#1, #2...#9) present in the macro’s original definition, TeX scans the actual macro call to identify which tokens provided by the user are destined for each parameter (i.e., form the macro’s arguments). That process produces one or more mini token lists: one for each macro argument.

After any macro arguments have been detected, and their token lists have been prepared, TeX retrieves the macro’s definition (replacement text) stored in its memory and arranges its input processing such that whenever TeX is ready to read/process more tokens, it will read them from the macro’s replacement text, thus executing the macro. At the appropriate point, during macro execution, token lists representing the macro arguments will be fed into the correct location within the macro’s replacement text.

Once again, expansion of a macro command means removing that macro command (token) from the input and replacing it with the token list stored as the macro’s replacement text.

For an in-depth look at TeX’s macro-processing see the six-part article series How do TeX macros actually work?

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

How does \expandafter work: From basic principles to exploring TeX's source code

Introduction

And so, to \expandafter

Introduction to using \expandafter

How can we fix this? \expandafter to the rescue

\expandafter and internal token lists

\expandafter and internal token lists: example 2

How \expandafter really works