Guide: A JSON parser
Build a complete recursive-descent JSON parser with depth limiting.
This walkthrough builds a full JSON parser that handles all JSON value types: null, booleans, numbers, strings, arrays, and objects. It demonstrates recursive descent parsing, mutual recursion between parsers, and depth limiting with rec_.
Source: test/test_json.ml
What we're building
A parser that takes any valid JSON string and returns a structured OCaml value. Along the way, we'll handle:
- Six different value types with alternation
- Recursive nesting (arrays inside objects inside arrays...)
- Whitespace between tokens
- Depth limiting to prevent stack overflows on malicious input
The data type
First, we define what a parsed JSON value looks like:
type json =
| Null
| Bool of bool
| Number of float
| String of string
| Array of json list
| Object of (string * json) listEvery JSON value maps to one of these constructors. This is the return type of our parser.
Primitive value parsers
Each JSON primitive type gets its own parser. These are the leaf nodes. They don't call other parsers recursively.
Null and booleans
let null_parser () =
let _ = Parseff.consume "null" in
Null
let bool_parser () =
Parseff.or_
(fun () ->
let _ = Parseff.consume "true" in
Bool true)
(fun () ->
let _ = Parseff.consume "false" in
Bool false)
()null_parser matches the literal "null" and returns the Null constructor. bool_parser uses or_ to try "true" first, backtracking to "false" if that fails.
Numbers
let number_re = Re.compile (Re.Posix.re "-?[0-9]+(\\.[0-9]+)?")
let number_parser () =
let s = Parseff.match_regex number_re in
Number (float_of_string s)We match the number as a string with a regex, then convert it. The regex handles integers, decimals, and negative numbers.
Strings
let string_content_re = Re.compile (Re.Posix.re "[^\"]*")
let string_parser () =
let _ = Parseff.char '"' in
let s = Parseff.match_regex string_content_re in
let _ = Parseff.char '"' in
String sMatch an opening quote, capture everything that isn't a quote, then match the closing quote. This simplified version doesn't handle escape sequences (backslash-quote, backslash-backslash, etc.). A production parser would need more work here.
The recursive entry point
This is where it gets interesting. A JSON value can be any of the six types, and arrays/objects contain more JSON values:
let rec json () =
Parseff.rec_ (fun () ->
Parseff.skip_whitespace ();
Parseff.one_of
[
array_parser;
object_parser;
null_parser;
bool_parser;
number_parser;
string_parser;
]
()
)Three things to notice:
rec_wraps the body. This registers a recursion entry point for depth tracking. Every timejsoncalls itself (through arrays or objects), the depth counter increments. When it exceedsmax_depth, parsing fails cleanly instead of overflowing the stack.one_oftries each parser in order. Array and object parsers come first because they start with distinctive characters (LBRACKETandLBRACE). If those fail, we fall through to the simpler alternatives.- Whitespace is consumed before the value. This means individual parsers don't need to worry about leading whitespace.
Arrays
and array_parser () =
let _ = Parseff.char '[' in
Parseff.skip_whitespace ();
let elements =
Parseff.or_
(fun () ->
let first = json () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char ',' in
json ())
()
in
first :: rest)
(fun () -> [])
()
in
Parseff.skip_whitespace ();
let _ = Parseff.char ']' in
Array elementsAfter the opening LBRACKET, we use or_ to handle two cases:
- Non-empty array: Parse the first element, then
manyparses zero or more, elementpairs. This pattern avoids a trailing comma problem, since the separator always comes before an element (except the first). - Empty array: The left branch fails (no element to parse), so we backtrack and return
[].
Notice json () is called recursively here. This is where the depth tracking from rec_ matters. Without it, deeply nested arrays like [[[[[...]]]]] would blow the stack.
Objects
and object_parser () =
let _ = Parseff.char '{' in
Parseff.skip_whitespace ();
let pairs =
Parseff.or_
(fun () ->
let first = key_value () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char ',' in
key_value ())
()
in
first :: rest)
(fun () -> [])
()
in
Parseff.skip_whitespace ();
let _ = Parseff.char '}' in
Object pairsThe structure mirrors the array parser exactly. The only difference is that elements are key-value pairs instead of bare values.
Key-value pairs
and key_value () =
let _ = Parseff.char '"' in
let key = Parseff.match_regex string_content_re in
let _ = Parseff.char '"' in
Parseff.skip_whitespace ();
let _ = Parseff.char ':' in
Parseff.skip_whitespace ();
let value = json () in
(key, value)Parse a string key (reusing the same regex as string_parser), a colon separator, and then recursively parse the value.
Running the parser
let () =
let input = {|{"name": "parseff", "version": 1, "tags": ["parser", "ocaml"]}|} in
match Parseff.parse input json with
| Ok result -> (* ... process the JSON value ... *)
| Error { pos; error = `Expected msg } ->
Printf.printf "Error at %d: %s\n" pos msg
| Error { pos; error = `Unexpected_end_of_input } ->
Printf.printf "Unexpected end of input at %d\n" pos
| Error { pos; error = `Depth_limit_exceeded msg } ->
Printf.printf "Depth limit exceeded at %d: %s\n" pos msg
| Error _ -> print_endline "Parse error"Depth limiting
Without rec_, a malicious input with 10,000 nested arrays would crash your program with a stack overflow. With it, you can set a limit:
Parseff.parse ~max_depth:64 input jsonWhen the limit is exceeded, parsing fails with "maximum nesting depth 64 exceeded", giving a clean error instead of a crash.
Here's what happens with deeply nested input:
(* 50 levels deep: within the default limit of 128 *)
let input = String.make 50 '[' ^ String.make 50 ']' in
Parseff.parse input json_array
(* Ok (Array (Array (Array ...))) *)
(* 256 levels deep: exceeds the limit *)
let input = String.make 256 '[' ^ String.make 256 ']' in
Parseff.parse ~max_depth:128 input json
(* Error { error = `Depth_limit_exceeded "maximum nesting depth 128 exceeded" } *)The mutual recursion pattern
This parser uses five mutually recursive functions:
json ──→ one_of ──→ array_parser ──→ json (recursive)
──→ object_parser ──→ key_value ──→ json (recursive)
──→ null_parser
──→ bool_parser
──→ number_parser
──→ string_parserIn OCaml, mutually recursive functions are defined with let rec ... and ...:
let rec json () = ...
and array_parser () = ...
and object_parser () = ...
and key_value () = ...Only the top-level entry point (json) needs rec_. The other functions participate in the recursion, but json is where a new nesting level begins.