From Tokens to Trees: How Your Code Gets Structure

From Tokens to Trees: How Your Code Gets Structure
Photo by A Chosen Soul / Unsplash

In my last post I walked through how a scanner takes raw source code and chops it into tokens. If you haven't read that one, start there first — this post picks up exactly where it left off.

After scanning you have this:

NUMBER(1)  PLUS  NUMBER(2)  STAR  NUMBER(3)

That's 1 + 2 * 3. But it's just a flat list. Nothing in that list tells you that 2 * 3 happens before + 1. You need structure. You need a tree.

Why a Flat List Isn't Enough

Look at these two expressions:

1 + 2 * 3     →  should be 7  (multiply first)
(1 + 2) * 3   →  should be 9  (add first)

Both produce roughly the same tokens. But they mean completely different things. A flat list can't capture that difference. A tree can.

1 + 2 * 3               (1 + 2) * 3

Binary(+)                Binary(*)
├── Literal(1)           ├── Grouping
└── Binary(*)            │   └── Binary(+)
    ├── Literal(2)       │       ├── Literal(1)
    └── Literal(3)       │       └── Literal(2)
                         └── Literal(3)

The deeper a node sits in the tree, the earlier it gets evaluated. That's how operator precedence works — it's baked right into the shape of the tree.

Before we get into the code, here's an interactive visualiser I had Claude build to go alongside this post. Pick an example and step through it to watch the tree build node by node — the (1 + 2) * 3 example is particularly good for seeing why Grouping matters, and how the same tokens in a different order produce a completely different tree.

Syntax Tree Visualiser

Syntax Tree Visualiser

Watch the syntax tree build node by node. Use Step → to go one step at a time, or ▶ Play to run automatically.

Binary
Literal
Unary
Grouping
Press ▶ Play or Step → to begin building the tree.
Speed

Part 1 — Building the Tree

The tree is just C# objects nested inside other C# objects. You make one class for each type of expression. They all inherit from a base class:

abstract class Expr { }

Empty on purpose. It's just a label that says "this thing is an expression".

Literal — a raw value

The simplest node. Just holds a value like 42, "hello", or true.

class Literal : Expr
{
    public object Value;

    public Literal(object value)
    {
        Value = value;
    }
}
new Literal(42)        // represents 42
new Literal("hello")   // represents "hello"
new Literal(true)      // represents true

In memory:

Literal
└── Value: 42

Binary — two expressions with an operator between them

Covers 1 + 2, x * y, a == b — anything with a left side, an operator, and a right side.

class Binary : Expr
{
    public Expr Left;
    public Token Operator;
    public Expr Right;

    public Binary(Expr left, Token op, Expr right)
    {
        Left     = left;
        Operator = op;
        Right    = right;
    }
}
// represents 1 + 2
new Binary(
    new Literal(1),
    plusToken,
    new Literal(2)
)

In memory:

Binary
├── Left:     Literal(1)
├── Operator: +
└── Right:    Literal(2)

Unary — an operator before a single expression

Covers -5 or !true.

class Unary : Expr
{
    public Token Operator;
    public Expr Right;

    public Unary(Token op, Expr right)
    {
        Operator = op;
        Right    = right;
    }
}
new Unary(minusToken, new Literal(5))    // represents -5
new Unary(bangToken,  new Literal(true)) // represents !true

Grouping — brackets around an expression

Covers (1 + 2).

class Grouping : Expr
{
    public Expr Expression;

    public Grouping(Expr expression)
    {
        Expression = expression;
    }
}
// represents (1 + 2)
new Grouping(
    new Binary(new Literal(1), plusToken, new Literal(2))
)

Building the full tree

Here's the key insight — the tree builds itself naturally just by nesting objects inside objects. There's no special tree-building code.

For 1 + 2 * 3:

var tree = new Binary(
    new Literal(1),
    plusToken,
    new Binary(         // ← right side is another Binary
        new Literal(2),
        starToken,
        new Literal(3)
    )
);

Which gives you:

Binary(+)
├── Literal(1)
└── Binary(*)
    ├── Literal(2)
    └── Literal(3)

The * node is deeper than the + node so it gets evaluated first. The structure of the tree IS the order of operations.

Part 2 — The Visitor Pattern

You have a tree sitting in memory. Now you need to do things with it. The obvious approach is to add methods directly to each class:

class Literal : Expr
{
    public string Print()    { return Value.ToString(); }
    public object Evaluate() { return Value; }
}

class Binary : Expr
{
    public string Print()    { ... }
    public object Evaluate() { ... }
}

This works at first. But every time you want a new operation — printing, evaluating, type checking — you have to open every single class and add another method. With four classes that's annoying. When there are many more expression types it becomes a nightmare.

The Visitor pattern solves this by keeping all operations outside the expression classes. The classes themselves never change. You just write a new class for each new operation.

Step 1 — the interface

interface IVisitor<T>
{
    T VisitLiteral(Literal expr);
    T VisitBinary(Binary expr);
    T VisitUnary(Unary expr);
    T VisitGrouping(Grouping expr);
}

This is a contract. Anyone who wants to work with the tree must implement one method per expression type. The <T> means the return type can be anything — string for a printer, object for an evaluator.

Step 2 — Accept on each class

Each expression class gets one Accept method:

class Literal : Expr
{
    public object Value;
    public Literal(object value) { Value = value; }

    public T Accept<T>(IVisitor<T> visitor)
    {
        return visitor.VisitLiteral(this);
    }
}

class Binary : Expr
{
    public Expr Left;
    public Token Operator;
    public Expr Right;

    public Binary(Expr left, Token op, Expr right)
    {
        Left = left; Operator = op; Right = right;
    }

    public T Accept<T>(IVisitor<T> visitor)
    {
        return visitor.VisitBinary(this);
    }
}

Accept just says — visitor, here I am, you deal with me. It passes itself to the visitor and gets out of the way. The node tells the visitor what it is by calling the right method automatically — no if (expr is Literal) chains needed anywhere.

Step 3 — write a visitor

In the book Crafting Interpreters, Nystrom uses an AstPrinter as its example visitor — a class that turns the tree into a readable string, useful for debugging:

class AstPrinter : IVisitor<string>
{
    public string VisitLiteral(Literal expr)
    {
        return expr.Value.ToString();
    }

    public string VisitBinary(Binary expr)
    {
        string left  = expr.Left.Accept(this);
        string right = expr.Right.Accept(this);
        return $"({left} {expr.Operator.Lexeme} {right})";
    }

    public string VisitUnary(Unary expr)
    {
        return $"({expr.Operator.Lexeme} {expr.Right.Accept(this)})";
    }

    public string VisitGrouping(Grouping expr)
    {
        return $"(group {expr.Expression.Accept(this)})";
    }

    public string Print(Expr expr)
    {
        return expr.Accept(this);
    }
}

Usage:

var tree = new Binary(
    new Literal(1),
    plusToken,
    new Binary(
        new Literal(2),
        starToken,
        new Literal(3)
    )
);

var printer = new AstPrinter();
Console.WriteLine(printer.Print(tree));
// output: (1 + (2 * 3))

How Accept actually works — step by step

This is the bit that confused me at first. Let's trace it completely for 1 + 2:

printer.Print(tree)
→ calls tree.Accept(printer)

tree is Binary(+)
→ Binary.Accept calls printer.VisitBinary(this)

inside VisitBinary:
  left = expr.Left.Accept(this)
       → Literal(1).Accept(printer)
       → printer.VisitLiteral(literal)
       → returns "1"

  right = expr.Right.Accept(this)
        → Literal(2).Accept(printer)
        → printer.VisitLiteral(literal)
        → returns "2"

  return "(1 + 2)"

It bounces back and forth between the tree and the visitor, going deeper each time until it hits a Literal at the bottom. The recursion handles any depth of nesting automatically.

Why the Visitor pattern is worth it

Right now you only have four expression types. But the pattern pays off as soon as you want to do something new with the tree. Want to evaluate it? Write an Evaluator class. Want to check types? Write a TypeChecker class. Want to generate bytecode? Write a CodeGenerator class.

Every one of those is a brand new class. The four expression classes never change. That separation is what keeps the codebase manageable as the interpreter grows.

Why This Matters

Before the tree, you had tokens — a flat list with no structure or meaning. The syntax tree is what gives your code shape. It captures the fact that 2 * 3 happens before + 1. It captures nesting. It captures the difference between these two completely different expressions that look almost identical as a flat list.

Everything the interpreter needs to understand and run your code lives in this tree. The Visitor pattern then gives you a clean way to do many different things with that tree without the codebase turning into a mess!

This post is based on Chapter 5 of Crafting Interpreters by Robert Nystrom, one of the best programming books I've read. Highly recommended if you've ever wanted to understand how languages actually work.