Guide: Expressions with precedence

This walkthrough builds an arithmetic expression parser that handles expressions like 1+2*3 or (1+2)*3 and produces a structured tree that respects operator precedence.

Along the way, we'll cover the standard recursive-descent technique for precedence, expect for clear error messages, and the fold_left combinator as a shortcut for left-associative operator chains.

Source: examples/better_errors.ml

What we're building

A parser that handles expressions like:

1+2*3 parses as 1 + (2 * 3) (multiplication binds tighter)
(1+2)*3 parses as (1 + 2) * 3 (parentheses override precedence)
1+2+3 parses as (1 + 2) + 3 (left-associative)

The result is an AST (abstract syntax tree), not a computed value.

The AST type

type expr =
  | Num of int
  | Add of expr * expr
  | Mul of expr * expr

Num 3 is a literal number. Add (Num 1, Num 2) is 1 + 2. Mul (Add (Num 1, Num 2), Num 3) is (1 + 2) * 3.

The precedence trick

The standard technique for operator precedence in recursive descent is to split the grammar into levels, one per precedence tier. Lower-precedence operators are parsed at higher levels (closer to the entry point), and higher-precedence operators at lower levels (closer to the leaves).

For our two operators:

expr   = term   (('+' term)*)      ← lowest precedence: addition
term   = factor (('*' factor)*)    ← higher precedence: multiplication
factor = number | '(' expr ')'     ← atomic values and grouping

Each level calls the next one for its operands. This structure guarantees that * binds tighter than + without any explicit precedence tables.

The parser

(* Addition level: lowest precedence *)
let rec expr () =
  let left = term () in
  let rest =
    Parseff.many
      (fun () ->
        Parseff.skip_whitespace ();
        let _ = Parseff.expect "a '+' operator" (fun () -> Parseff.char '+') in
        Parseff.skip_whitespace ();
        term ())
      ()
  in
  List.fold_left (fun acc t -> Add (acc, t)) left rest

(* Multiplication level: higher precedence *)
and term () =
  let left = factor () in
  let rest =
    Parseff.many
      (fun () ->
        Parseff.skip_whitespace ();
        let _ = Parseff.expect "a '*' operator" (fun () -> Parseff.char '*') in
        Parseff.skip_whitespace ();
        factor ())
      ()
  in
  List.fold_left (fun acc f -> Mul (acc, f)) left rest

(* Atoms and parenthesized groups *)
and factor () =
  Parseff.skip_whitespace ();
  Parseff.or_
    (fun () ->
      let _ = Parseff.expect "an opening parenthesis" (fun () -> Parseff.char '(') in
      Parseff.skip_whitespace ();
      let e = expr () in
      Parseff.skip_whitespace ();
      let _ = Parseff.expect "a closing parenthesis" (fun () -> Parseff.char ')') in
      e)
    (fun () ->
      let d = Parseff.expect "a number" Parseff.digit in
      Num d)
    ()

Each precedence level calls the next one for its operands. expr parses term (('+' term)*), term parses factor (('*' factor)*), and factor handles numbers and parenthesized sub-expressions. Because term is called from within expr, multiplication binds tighter than addition.

many collects zero or more operator-operand pairs, and fold_left builds left-associative AST nodes: 1+2+3 becomes Add(Add(Num 1, Num 2), Num 3). or_ in factor tries the parenthesized path first; if the input doesn't start with (, it backtracks and tries a digit.

How `expect` improves errors

Without expect, a failure on the closing parenthesis would say something like expected ')'. With expect, it says expected a closing parenthesis. This matters when users see the error.

Compare the error messages on "1+":

Without expect: expected digit
With expect: expected a number

And on "(1+2":

Without expect: expected ')'
With expect: expected a closing parenthesis

Tracing a parse

Let's trace 1+2*3 through the parser:

expr() calls term()
term() calls factor(), which matches 1 → Num 1
term() tries many(* factor), but no * follows, so rest = []
term() returns Num 1 to expr()
expr() tries many(+ term) and sees +
Consumes +, calls term()
term() calls factor(), which matches 2 → Num 2
term() tries many(* factor) and sees *
Consumes *, calls factor(), which matches 3 → Num 3
term() builds Mul(Num 2, Num 3) via fold_left
expr() builds Add(Num 1, Mul(Num 2, Num 3)) via fold_left

The key insight: * is consumed inside term, so by the time expr sees the result, 2*3 is already a single Mul node. That's how precedence works. Each level "claims" its operators before the level above sees them.

An alternative: using `fold_left`

The many + fold_left pattern is so common that Parseff provides fold_left as a shortcut:

let rec expr () =
  Parseff.fold_left
    term
    (fun () ->
      Parseff.skip_whitespace ();
      let _ = Parseff.char '+' in
      Parseff.skip_whitespace ();
      fun a b -> Add (a, b))
    ()

and term () =
  Parseff.fold_left
    factor
    (fun () ->
      Parseff.skip_whitespace ();
      let _ = Parseff.char '*' in
      Parseff.skip_whitespace ();
      fun a b -> Mul (a, b))
    ()

and factor () =
  Parseff.skip_whitespace ();
  Parseff.or_
    (fun () ->
      let _ = Parseff.char '(' in
      let e = expr () in
      Parseff.skip_whitespace ();
      let _ = Parseff.char ')' in
      e)
    (fun () -> Num (Parseff.digit ()))
    ()

fold_left element op parses one or more elements separated by op, combining them left-to-right. The op parser returns the combining function. This is more concise and expresses the intent directly.

For right-associative operators (like exponentiation), use fold_right instead.

Printing the AST

For debugging, a simple to_string function:

let rec expr_to_string = function
  | Num n -> string_of_int n
  | Add (l, r) ->
      Printf.sprintf "(%s + %s)" (expr_to_string l) (expr_to_string r)
  | Mul (l, r) ->
      Printf.sprintf "(%s * %s)" (expr_to_string l) (expr_to_string r)

(* "1+2*3"   -> "(1 + (2 * 3))" *)
(* "(1+2)*3" -> "((1 + 2) * 3)" *)