Parsing Expressions

Parsing Expressions
Photo by Tim Marshall / Unsplash

In the last post we defined the shape of a syntax tree — Binary, Unary, Literal, Grouping — but built every tree by hand. In this post we focus on the parser: the thing that reads tokens and builds the tree automatically.

The problem

1 + 2 * 3 can be read two ways:

(1 + 2) * 3 = 9   Incorrect
1 + (2 * 3) = 7   Correct

Nothing in the token list says which is correct. You need a rule for precedence.

The trick: who calls who first

Two functions: DoPlus() handles +, DoTimes() handles *. The entire precedence system comes down to one decision:

DoPlus() calls DoTimes() first, before doing anything else.
private Expr DoPlus()
{
    Expr left = DoTimes(); // forced to ask DoTimes first

    while (Match(TokenType.PLUS))
    {
        Token op = Previous();
        Expr right = DoTimes(); // forced again
        left = new Expr.Binary(left, op, right);
    }

    return left;
}

private Expr DoTimes()
{
    Expr left = Primary();

    while (Match(TokenType.STAR))
    {
        Token op = Previous();
        Expr right = Primary();
        left = new Expr.Binary(left, op, right);
    }

    return left;
}

Tracing 1 + 2 * 3:

  1. DoPlus asks DoTimes for its left side
  2. DoTimes looks at 1, sees no * after it, returns 1 alone
  3. DoPlus consumes the +
  4. DoPlus asks DoTimes for the right side
  5. This time DoTimes sees 2 * 3 — it glues those together into one Binary node before handing anything back
  6. DoPlus ends up combining 1 with the already-sealed Binary(2, *, 3)
      +
     / \
    1   *
       / \
      2   3

Multiplication didn't "win" by checking priority. It won because it got first access to the tokens and locked them together before addition ever got a turn.

That's the whole mechanism. The real grammar just stacks more layers the same way — equality calls comparison, which calls term (this is DoPlus), which calls factor (this is DoTimes), which calls unary, which calls primary. Each layer hands control down before doing its own work, so by the time a low-priority operator combines anything, every higher-priority operator nearby has already sealed its own piece.

This post is based on Chapter 6 of Crafting Interpreters by Robert Nystrom, one of the best programming books I've read. Highly recommended if you've ever wanted to understand how languages actually work.