How does \expandafter work: From basic principles to exploring TeX's source code
Introduction
We have now covered the background topics necessary for a full exploration of \expandafter
:
- the basics of TeX tokens and how they are calculated;
- principles behind TeX’s expansion process;
- TeX’s use/creation of temporary token lists during document processing;
- how TeX uses and “juggles” multiple input sources (including temporary token lists).
In this article we’ll bring these topics/concepts together to explain the mechanisms underlying TeX’s \expandafter
command: in short, how it works.
And so, to \expandafter
The idea behind \expandafter
is to force expansion of a command (token) before TeX would normally do so. Given two tokens, \(\mathrm{T_1}\) and \(\mathrm{T_2}\) the action of \expandafter
\(\mathrm{T_1T_2}\) results in TeX processing \(\mathrm{T_1}\text{<}\)expansion of \(\mathrm{T_2}\text{>}\), where \(\text{<}\dots\text{>}\) indicates a list of tokens. TeX expands \(\mathrm{T_2}\) ahead of time so that token \(\mathrm{T_1}\) (e.g., a primitive or a macro) gets to see, or can act upon, the tokens arising from expansion of \(\mathrm{T_2}\). If the token \(\mathrm{T_2}\) represents a non-expandable item such as a non-active character or (most) primitives, the action of \expandafter
doesn’t change anything: TeX would continue to process tokens \(\mathrm{T_1T_2}\) in the normal way.
Introduction to using \expandafter
If you’ve not used \expandafter
, here is an example using it with the primitive \uppercase{...}
. Suppose that we wanted to typeset the name of our main .tex
input file but in upper-case letters. We might know the following TeX primitive commands:
\uppercase
: does as its name suggests, converts character tokens to their upper-case equivalent (where that exists);\jobname
: as we have seen, expands to give the name of the main.tex
file.
Assuming our TeX file is called mycode.tex
we might reasonably expect \uppercase{\jobname}
to typeset MYCODE
. But no, it typesets mycode
as lower-case. What went “wrong”?
If we write the general use of \uppercase
as
\uppercase{<token list>}
we can say that \uppercase
looks through <token list>
and will only operate (change case) on character tokens it detects within the <token list>
: all non-character tokens are ignored because \uppercase
will not “look into” (expand) non-character tokens to see what they contain or represent. Because a token is simply an integer value, all \uppercase
has to do is look through the token list to check if the numeric value of each token in <token list>
falls within the range of values indicating a character token. Incidentally, \uppercase
will also change the case of active characters to create an upper-case active character which, because it is still active, will also need to have been defined, otherwise TeX will generate an error: Undefined control sequence
, but we digress...
For example, even if we define a macro that is just text
\def\foo{some lower-case text}
then \uppercase{\foo}
still typesets some lower-case text
and not SOME LOWER-CASE TEXT
as we’d hope, simply because the action of \uppercase
does not try to determine what \foo
represents: it sees \foo
as a command token and ignores it, as it did with \jobname
.
How can we fix this? \expandafter
to the rescue
To typeset an upper-case version of the file name, we need to modify \uppercase{\jobname}
by forcing TeX to replace \jobname
with its expansion (a sequence of character tokens) before \uppercase
gets to work. Once again, expansion is being used to remove the \jobname
token (command) and replace it with the result of its expansion (a token list containing character tokens). So, if we write
\uppercase\expandafter{\jobname}
then it works: MYCODE
would be typeset. What happens is that TeX starts to process \uppercase
and immediately checks for the mandatory opening brace character ({
); however, TeX detects an \expandafter
command which causes it to temporarily “divert its attention” to processing \expandafter{\jobname}
.
If we compare
\expandafter
\(\mathrm{T_1T_2}\)
with our example
\expandafter{\jobname}
we can see
- \(\mathrm{T_1} =\space \)
{
token - \(\mathrm{T_2} =\space \)
\jobname
token
Where {
token and \jobname
token refer to the token values calculated by TeX—the subscript notation token is used to remind ourselves that TeX works in the world of integer tokens.
Writing \uppercase\expandafter{\jobname}
works because, in outline (details to follow), \expandafter
causes TeX to perform the following tasks:
- read and save the opening
{
token; - read the next token:
\jobname
token. TeX recognizes that\jobname
token represents an expandable command and expands it.\jobname
token is replaced with its expansion—a series of character tokens; - after expanding the
\jobname
command, TeX puts the{
\(_\mathrm{token}\) “back into the input” and uses the token list arising from expansion of\jobname
so that TeX will read\uppercase{
token<expansion of \jobname>
token list (characters)}
, and this produces our desired result.
The following diagram shows how TeX processes \uppercase\expandafter{\jobname}
—read the graphic from the bottom and work upwards to follow the process flow.
The following notes explain the various stages of processing.
- TeX starts to process
\uppercase
and checks for the mandatory opening brace character ({
) but detects an\expandafter
command. - If we compare
\expandafter
\(\mathrm{T_1T_2}\) to our input of\expandafter{\jobname}
we can see \(\mathrm{T_1} =\ \){
token and \(\mathrm{T_2} =\ \)\jobname
token. Note that here we will use the subscript token to indicate TeX is processing integer token values. \expandafter
reads, then temporarily saves, the{
token by storing that integer token value in an internal variable. Later, TeX will re-insert that token back into the input, after processing the\jobname
command.\expandafter
reads the next token,\jobname
token, and expands the\jobname
command.- The expansion of
\jobname
creates a temporary token list which contains a sequence of character tokens representing the.tex
file name. Note that all character tokens generated by\jobname
are calculated using category code 12. - Once
\jobname
has been expanded, TeX re-inserts the token saved in step 3 ({
token) and puts it back into the input. TeX does that by creating another token list containing the single{
token - TeX has now finished processing
\expandafter
, resulting in two token lists ready to be used as sources of TeX input. TeX now reverts back to processing\uppercase
but has configured its input such that the two token lists created by\expandafter
become the source of tokens for\uppercase
—which now sees\uppercase{
token<expansion of \jobname>
token list (characters)}
.\uppercase
now sees a sequence of character tokens and can produce our desired result. - After reading all the character tokens produced by
\jobname
, TeX reverts back to obtaining tokens from its previous input source (our.tex
file) from where it will read the next token: the closing}
required to terminate the list of tokens to be processed by\uppercase
.
\expandafter
and internal token lists
Temporary token lists are a vital element of \expandafter
’s processing behaviour: understanding the use and existence of those token lists can help to clarify how \expandafter
achieves its results, particularly when trying to write, or understand, macros which make use of multiple consecutive \expandafter
commands to achieve more complex forms of token processing: \expandafter\expandafter\expandafter...
Another key element of \expandafter
’s behaviour, especially with multiple consecutive \expandafter
commands, is the use of recursion (inside the TeX software itself)—a topic we will consider later in this article.
To further assist our understanding of temporary token lists we’ll look at one more example of \expandafter
, this time with the \the
command.
\expandafter
and internal token lists: example 2
In this example we’ll see how \expandafter
can be used to influence tokens stored in a token register via the \toks
command. Here are the TeX primitives we’ll be using:
\count register=number
: a TeX primitive used to store the valuenumber
in the TeX locationregister
;\toks register={token list}
: a TeX primitive used to storetoken list
into the token register locationregister
—saving a sequence of tokens for later use;\the token
: an expandable TeX primitive command which processestoken
, although the exact results depend on the nature of thetoken
being processed.\the
has a number of uses: among those is typesetting the value stored in a TeX parameter or variable (e.g., a register). Other uses of\the
include inserting a copy of tokens stored in a token register. Here, we’ll use\the
to typeset the value stored in a\count
register.
We’ll start with the following TeX code to store the value 12345
in TeX’s \count
register 99
:
\count99=12345
If we want to typeset the value stored in \count99
we can use \the\count99
(or \number\count99
).
Next we’ll use the \toks
command to store some tokens in token register 99
:
\toks99={\the\count99 }
The list of tokens stored in token register 99
would contain the following:
Note that the token list created by \toks99
does not contain the actual data value stored in \count99
because the \toks
command does not perform expansion: it simply creates tokens and stores them. In our example, \the
is not expanded so it doesn’t process \count99
; here \the
is merely turned into a token (value 5382) and stored in the token list.
If we want the \toks99
token list to contain tokens representing data stored in \count99
we will need some way to create those tokens (make them available) so that the \toks
command can access them. And of course \expandafter
can do this for us. If we write:
\toks99=\expandafter{\the\count99 }
the action/processing of the \toks
command will be “put on hold” whilst \expandafter
causes (forces) expansion of \the
which, in turn, acts on \count
to generate a temporary token list containing character tokens representing data stored in \count99
. A small but important point is the <space>
character after the digits 99
: that <space>
character acts to terminate TeX’s scanning process when it is searching for a numeric quantity.
Here, the action of \expandafter
is very similar to the \jobname
example.
- Read and save the opening
{
token. - Read the next token,
\the
token, which represents an expandable command, so TeX expands it.\expandafter
forces expansion of\the
which then operates on\count99
to convert data stored in\count
register99
(the number 12345) into a temporary token list. That list will contain character tokens representing the digits1
,2
,3
,4
and5
—character tokens with category code 12. - After expanding and processing
\the
, TeX puts the{
token “back into the input” and uses the token list arising from\the\count99
so that TeX will read\toks99={
token<expansion of \the\count99>
token list (characters)}
and this produces our desired result.
This sequence of events is summarized in the following diagram—read the graphic from the bottom and work upwards to follow the process flow.
- TeX starts to process
\toks
; it sees the optional=
sign, then checks for the mandatory opening brace character ({
, or any character with category code 1) used to indicate the start of a token list. However, TeX detects an\expandafter
command and proceeds to execute that instead. - If we compare
\expandafter
\(\mathrm{T_1T_2}\) to our input of\expandafter{\the\count99 }
we can see \(\mathrm{T_1} =\){
token and \(\mathrm{T_2} =\)\thetoken
. \expandafter
reads, then temporarily saves, the{
token (TeX temporarily stores that integer token value in an internal variable). Later, TeX will re-insert that token back into the input, after processing\the
\expandafter
reads the next token,\the
token and expands it.- The expansion of
\the
creates a temporary token list from processing\count99
—that token list contains a sequence of character tokens which represent the data value stored in the\count
register99
. - Once
\the
has been expanded, TeX re-inserts the token saved in step 3 ({
token) and puts that token back into the input. TeX does that by creating another token list containing the single token{
token. - TeX has now finished processing
\expandafter
and produced two tokens lists ready to be used as the next sources of input. TeX reverts back to processing\toks99=
but now TeX has configured its input so that the two token lists created by\expandafter
become the source of tokens for\toks
—which now sees{
token<expansion of \the\count99>
token list (characters)}
.\toks
can now access, and store, the sequence of 5 character tokens which represent the data value (12345
) stored in\count99
: our desired result. - After reading the all character tokens produced by
\the\count99
, TeX reverts back to obtaining tokens from its previous input source (our.tex
file) from where it will read the next token: the closing}
required to terminate the list of tokens to be saved by\toks99={...}
.
How \expandafter
really works
In this section we’ll take a “low level” look inside TeX itself: exploring the source code/functions within TeX which implement the behaviour of \expandafter
. Details are expressed in a pseudo-C code but should be accessible to anyone familiar with other programming languages.
The following annotated diagram explains how TeX implements \expandafter
as part of a larger function called expand()
—the core function which drives TeX’s expansion processing. Within the section responsible for implementing \expandafter
we can see recursive behaviour where another call to the expand()
function is used to process the second token read-in, \(\mathrm{T_2}\), for those cases where \(\mathrm{T_2}\) is expandable.
Although this code appears in Knuth’s TeX engine, the basic principles outlined by this graphic are applicable to all TeX engines.
The first task of expand()
is to determine if the command to be expanded is a macro or a primitive because macros have a specialized expansion process which is handled by a function called macrocall()
.
If the command to be expanded is a primitive, the expand()
function uses the current command code value (stored in global variable curcmd
) to identify which particular primitive needs to be processed. We can see these details in a more complete listing of expand()
:
void expand(void) { //curcmd is a global variable if(curcmd != macro) // curcmd < 111 { switch(curcmd) { case \expandafter: // Process the \expandafter T1T2 command { gettoken(); // Read token T1 t = curtok; // Save token T1 in local variable t gettoken(); // Read token T2 if(curcmd > 100) // Is token T2 expandable? expand(); // Yes! T2 is expandable: // perform expansion of T2 by // making a recursive function call to expand() else backinput(); // T2 is not expandable: put that token // back in the input to be read again (later) curtok = t ; // Restore global variable curtok to saved value of T1 backinput() ; // Put token T1 back in the input // ahead of the tokens arising from expansion of T2 } break; // Code to process other expandable commands case “convert to text” command: // Any one of \number, \string, \romannumeral, // \meaning, \fontname, \jobname // They share the same value of curcmd break; case \noexpand: // Suppress expansion of the next token ... break; case \csname: //Manufacture a control sequence name. ... break; case \the: // Insert some tokens .... break; case “\if... test command” : // Process one of TeX’s conditionals: // \if, \ifcat, \ifnum, \ifdim,\ifodd, \ifvmode, // \ifhmode, \ifmmode, \ifinner, \ifvoid, // \ifhbox, \ifvbox, \ifx, \ifeof, \iftrue, \iffalse, // \ifcase, \ifdefined, \ifcsname, \iffontchar ... break; case “\fi or \else”: // Terminate the current conditional ... break; // etc for any other expandable primitive commands supported by // the TeX engine } }else // Not an expandable primitive: it is a macro { macrocall() } //... more code removed }
TeX’s love for global variables
Perhaps reflecting its age and the era in which it was designed, TeX’s source code makes extensive use of so-called global variables—in fact there are hundreds of them. By their very nature, global variables can be changed/modified from anywhere within the TeX source code—which, for Knuth’s TeX, is a single monolithic file containing over 25,000 lines of code and hundreds of functions. Understanding how TeX works is not always an easy task...
To process \expandafter
, TeX reads tokens from its current input using a function called gettoken()
whose action is to create a token and set the value of several key global variables used throughout TeX’s source code. Two such variables, updated by the action of gettoken()
, are used in the implementation of \expandafter
:
curtok
: (current token) the integer value of the token just read in;curcmd
: (current command code) the command code of the command (or character) represented by the tokencurtok
.
When processing \expandafter
\(\mathrm{T_1T_2}\) TeX reads token \(\mathrm{T_1}\) and temporarily saves its value (an integer) in a local variable called t
. TeX then reads \(\mathrm{T_2}\) and checks to see if that token represents an expandable command—by checking if its command code (curcmd
) is > 100. If so, TeX needs to expand the command represented by \(\mathrm{T_2}\) and makes another call to the function expand()
: this is an example of recursion because the expand()
function is calling itself. An awareness of the recursive nature of expansion, especially when using \expandafter
, can help with understanding how multiple consecutive \expandafter
commands—i.e., \expandafter\expandafter\expandafter...
achieve their effects.
If token \(\mathrm{T_2}\) is expandable, the expansion takes place and when the recursive call to expand()
returns, code within the implementation of \expandafter
re-inserts token \(\mathrm{T_1}\) back into the input. The global variable curtok
is re-assigned to the value of the saved token—stored in local variable t
, which is the value of token \(\mathrm{T_1}\)—and a call is made to the function backinput().
The function backinput()
As its name suggests, this function puts a token “back into the input”. To do that, TeX uses the current value of the global variable curtok
to create a token list which contains a single token (whose integer value is provided by curtok
). TeX also arranges its input handling to ensure that single-token list will, at the appropriate time, be re-read by TeX as part of its subsequent input processing. Note carefully that the token \(\mathrm{T_1}\) is re-inserted after the expansion is finished, which ensures TeX will read that re-inserted token before it reads the tokens arising from expansion of \(\mathrm{T_2}\).
Processing macros: the macrocall() function
As previously discussed, all macros, together with some primitive commands, are expandable and all expansion processing goes through the expand()
function. However, expand()
is careful to use the curcmd
(current command) value to distinguish between expandable primitives and macros because the macro-expansion process is handled by a dedicated function called macrocall()
. Macros need a specialized expansion process because macro arguments, and delimiter tokens, have to be scanned for in a very particular and rigorous way; consequently, that process is delegated to a function designed to do that: macrocall()
.
Macro expansion vs. macro execution
Macro expansion is not the same process as macro execution: expansion of a macro is the pre-execution process TeX performs to get the macro ready for execution. The “execution” of a macro happens when TeX is actively reading and processing tokens contained in that macro’s definition (replacement text) and its arguments (parameters).
Macro expansion
To expand a macro TeX first checks if the macro takes arguments; if so, macrocall()
very carefully scans the input looking for tokens destined to become the macro’s arguments. That process includes checking the user’s input for any delimiter tokens used in the macro’s original definition—the pattern of tokens used in a macro call must exactly match the pattern of tokens contained in the stored definition. However, tokens used as delimiters are simply discarded by TeX: they are, in effect, just a form of “punctuation” TeX uses to determine the actual tokens destined to become macro’s arguments—i.e., tokens the user intends for processing by the macro. For more information on delimiter tokens, see How TeX macros actually work.
For each parameter (#1, #2...#9
) present in the macro’s original definition, TeX scans the actual macro call to identify which tokens provided by the user are destined for each parameter (i.e., form the macro’s arguments). That process produces one or more mini token lists: one for each macro argument.
After any macro arguments have been detected, and their token lists have been prepared, TeX retrieves the macro’s definition (replacement text) stored in its memory and arranges its input processing such that whenever TeX is ready to read/process more tokens, it will read them from the macro’s replacement text, thus executing the macro. At the appropriate point, during macro execution, token lists representing the macro arguments will be fed into the correct location within the macro’s replacement text.
Once again, expansion of a macro command means removing that macro command (token) from the input and replacing it with the token list stored as the macro’s replacement text.
For an in-depth look at TeX’s macro-processing see the six-part article series How do TeX macros actually work?
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Using the Overleaf project menu
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Using the History feature
- Debugging Compilation timeout errors
- How-to guides
- Guide to Overleaf’s premium features
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Matrices
- Fractions and Binomials
- Aligning equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
- Using the Symbol Palette in Overleaf
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management with bibtex
- Bibliography management with natbib
- Bibliography management with biblatex
- Bibtex bibliography styles
- Natbib bibliography styles
- Natbib citation styles
- Biblatex bibliography styles
- Biblatex citation styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- Multilingual typesetting on Overleaf using babel and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections, equations and floats
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typesetting exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class