Guide: Expressions with precedence
This walkthrough builds an arithmetic expression parser that handles expressions like 1+2*3 or (1+2)*3 and produces a structured tree that respects operator precedence.
Along the way, we'll cover the standard recursive-descent technique for precedence, expect for clear error messages, and the fold_left combinator as a shortcut for left-associative operator chains.
Source: examples/better_errors.ml
What we're building
A parser that handles expressions like:
1+2*3parses as1 + (2 * 3)(multiplication binds tighter)(1+2)*3parses as(1 + 2) * 3(parentheses override precedence)1+2+3parses as(1 + 2) + 3(left-associative)
The result is an AST (abstract syntax tree), not a computed value.
The AST type
type expr =
| Num of int
| Add of expr * expr
| Mul of expr * exprNum 3 is a literal number. Add (Num 1, Num 2) is 1 + 2. Mul (Add (Num 1, Num 2), Num 3) is (1 + 2) * 3.
The precedence trick
The standard technique for operator precedence in recursive descent is to split the grammar into levels, one per precedence tier. Lower-precedence operators are parsed at higher levels (closer to the entry point), and higher-precedence operators at lower levels (closer to the leaves).
For our two operators:
expr = term (('+' term)*) ← lowest precedence: addition
term = factor (('*' factor)*) ← higher precedence: multiplication
factor = number | '(' expr ')' ← atomic values and groupingEach level calls the next one for its operands. This structure guarantees that * binds tighter than + without any explicit precedence tables.
The parser
(* Addition level: lowest precedence *)
let rec expr () =
let left = term () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '+' operator" (fun () -> Parseff.char '+') in
Parseff.skip_whitespace ();
term ())
()
in
List.fold_left (fun acc t -> Add (acc, t)) left rest
(* Multiplication level: higher precedence *)
and term () =
let left = factor () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '*' operator" (fun () -> Parseff.char '*') in
Parseff.skip_whitespace ();
factor ())
()
in
List.fold_left (fun acc f -> Mul (acc, f)) left rest
(* Atoms and parenthesized groups *)
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.expect "an opening parenthesis" (fun () -> Parseff.char '(') in
Parseff.skip_whitespace ();
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.expect "a closing parenthesis" (fun () -> Parseff.char ')') in
e)
(fun () ->
let d = Parseff.expect "a number" Parseff.digit in
Num d)
()Each precedence level calls the next one for its operands. expr parses term (('+' term)*), term parses factor (('*' factor)*), and factor handles numbers and parenthesized sub-expressions. Because term is called from within expr, multiplication binds tighter than addition.
many collects zero or more operator-operand pairs, and fold_left builds left-associative AST nodes: 1+2+3 becomes Add(Add(Num 1, Num 2), Num 3). or_ in factor tries the parenthesized path first; if the input doesn't start with (, it backtracks and tries a digit.
How expect improves errors
Without expect, a failure on the closing parenthesis would say something like expected ')'. With expect, it says expected a closing parenthesis. This matters when users see the error.
Compare the error messages on "1+":
- Without
expect:expected digit - With
expect:expected a number
And on "(1+2":
- Without
expect:expected ')' - With
expect:expected a closing parenthesis
Tracing a parse
Let's trace 1+2*3 through the parser:
expr()callsterm()term()callsfactor(), which matches1→Num 1term()triesmany(* factor), but no*follows, sorest = []term()returnsNum 1toexpr()expr()triesmany(+ term)and sees+- Consumes
+, callsterm() term()callsfactor(), which matches2→Num 2term()triesmany(* factor)and sees*- Consumes
*, callsfactor(), which matches3→Num 3 term()buildsMul(Num 2, Num 3)viafold_leftexpr()buildsAdd(Num 1, Mul(Num 2, Num 3))viafold_left
The key insight: * is consumed inside term, so by the time expr sees the result, 2*3 is already a single Mul node. That's how precedence works. Each level "claims" its operators before the level above sees them.
An alternative: using fold_left
The many + fold_left pattern is so common that Parseff provides fold_left as a shortcut:
let rec expr () =
Parseff.fold_left
term
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '+' in
Parseff.skip_whitespace ();
fun a b -> Add (a, b))
()
and term () =
Parseff.fold_left
factor
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '*' in
Parseff.skip_whitespace ();
fun a b -> Mul (a, b))
()
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.char '(' in
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.char ')' in
e)
(fun () -> Num (Parseff.digit ()))
()fold_left element op parses one or more elements separated by op, combining them left-to-right. The op parser returns the combining function. This is more concise and expresses the intent directly.
For right-associative operators (like exponentiation), use fold_right instead.
Printing the AST
For debugging, a simple to_string function:
let rec expr_to_string = function
| Num n -> string_of_int n
| Add (l, r) ->
Printf.sprintf "(%s + %s)" (expr_to_string l) (expr_to_string r)
| Mul (l, r) ->
Printf.sprintf "(%s * %s)" (expr_to_string l) (expr_to_string r)(* "1+2*3" -> "(1 + (2 * 3))" *)
(* "(1+2)*3" -> "((1 + 2) * 3)" *)