gjq: Regular Path Queries for JSON

April 9, 2026

jq is one of those tools that once you learn it, you can't imagine working without it. But for simple tasks—finding a field buried somewhere in a JSON document, pulling out every occurrence of a key—the pipeline syntax gets verbose fast. .. | .field? // empty is the kind of incantation you look up every single time.

I built gjq to approach JSON querying from a different angle. Instead of chaining filters step-by-step, you write a single pattern that describes where you want to go. The pattern language borrows the familiar building blocks of regular expressions—alternation, wildcards, repetition—and applies them to tree traversal instead of string matching.

The Core Idea: Regex, but for Paths

Think of a JSON document as a labeled graph. Keys and array indices form the edges, and the values are the nodes. A gjq query describes which edges to follow to reach the values you want. If you've ever used ** in a glob pattern to mean "any number of directories," you already understand the basics.

The query engine parses your expression into an NFA, converts it to a DFA, then walks the document in a single pass. No backtracking, no repeated traversals. One pattern, one pass.

The mental model:

You describe the shape of the path you want. gjq walks the tree and finds every path that matches.

The Query Language

The operators compose freely, just like in regex. Here's the full set:

foo.bar.baz — Follow the exact path foo → bar → baz
foo | bar — Accept either foo or bar (alternation)
** — Zero or more field steps (recursive descent)
foo* — Repeat the preceding step zero or more times
* or [*] — Match any single key or array position
foo?.bar — The foo step is optional
[0] or [1:3] — Index or inclusive slice

Operators compose inside parentheses. foo.(bar|baz).qux expands to two valid paths: foo.bar.qux and foo.baz.qux. To descend through an arbitrary mix of objects and arrays, use (* | [*])* — so (* | [*])*.foo locates every foo field no matter how deeply it's nested.

gjq vs jq: The Practical Differences

jq is a Turing-complete language. You can define functions, do arithmetic, implement recursive transforms. gjq is not that. It is a focused tool for one job: finding things in JSON. But for that job, it's more direct.

Deep Field Lookup

# gjq — one flag, no ceremony
$ curl -s 'https://randomuser.me/api/?results=5000' \
  | gjq -F first | head -6
results.[0].name.first:
"Charles"
results.[1].name.first:
"Joel"
results.[2].name.first:
"Anthony"

# jq — recursive descent with manual null suppression
$ curl -s 'https://randomuser.me/api/?results=5000' \
  | jq '.. | .first? // empty' | head -3
"Charles"
"Joel"
"Anthony"

The -F flag treats the argument as a literal field name and searches the entire tree. No recursive descent operators, no null suppression. One flag does the work of three jq constructs.

Notice that gjq also prints the full path to each match (e.g. results.[0].name.first:), so you always know where a value came from. jq strips that context by default.

Matching Multiple Keys

# gjq — alternation inside parentheses
$ curl -s 'https://randomuser.me/api/?results=5000' \
  | gjq 'results[0].(nat|email)'
results.[0].nat:
"DE"
results.[0].email:
"charles.kuhne@example.com"

# jq — enumerate each key separately
$ curl -s 'https://randomuser.me/api/?results=5000' \
  | jq '.results[0] | .nat, .email'
"DE"
"charles.kuhne@example.com"

In gjq, you describe what you want in one expression. In jq, you enumerate each key and combine them with the comma operator. Both work, but the gjq version scales better—adding a third or fourth key is just adding another pipe-separated name inside the parentheses.

Terminal-Aware Output

One detail I paid attention to: gjq adapts its output depending on whether it's writing to a terminal or a pipe—much like ripgrep's --heading behavior. In a terminal you get annotated paths. Through a pipe you get raw values, making it straightforward to chain into sort, uniq, wc, and friends.

# Values only when piped — ready for downstream processing
$ curl -s 'https://randomuser.me/api/?results=5000' \
  | gjq -F nat | sort | uniq -c | sort -rn | head -5
 265 "ES"
 263 "RS"
 260 "MX"
 257 "FR"
 253 "US"

Toggle with --with-path / --no-path if you need to override the default behavior.

Building gjq: Go, DFAs, and Single-Pass Traversal

The implementation has three stages: parse the query into an AST, compile the AST into an NFA, then convert the NFA to a DFA. The DFA walks the JSON document in a single pass, testing each edge (key or index) against the current state transitions. Every accepting state produces a match.

This architecture has a nice property: query complexity is bounded by the size of the pattern, not the size of the document. The DFA has a fixed number of states regardless of how deep or wide the JSON tree is. The document is visited exactly once.

I chose Go for a few reasons. The standard library's encoding/json package provides a streaming decoder that pairs naturally with single-pass traversal. Go's compilation speed keeps the development loop tight. And a single static binary with no runtime dependencies is the right shape for a CLI tool that people install with go install.

Performance

I benchmarked gjq against jq on a range of queries against a 1 MB randomuser dataset. The results are competitive:

Simple path queries (e.g. results.nat): roughly parity with jq
Recursive descent (e.g. **.first): within 3% of jq
Wildcard queries (e.g. simple: wildcard *): 1.4x faster than jq
Nested traversals (e.g. users[*].name): 1.2x faster than jq

For a v0.1.0, I'm happy with where things stand. The single-pass DFA approach should scale well as the query language grows.

Installation and Quick Start

Requires Go 1.21+:

go install github.com/fantods/gjq@latest
gjq --version
# gjq version 0.1.0

Some patterns to get started:

# Pretty-print JSON (like jq '.')
echo '{"name":"Ada","age":36}' | gjq ''

# Find every "email" field at any depth
cat response.json | gjq -F email

# Count matches silently
cat response.json | gjq -F email --count -n

# Match multiple keys with alternation
cat response.json | gjq 'users[*].(name|email)'

# Case-insensitive deep search
cat response.json | gjq -i **.Country

When to Use gjq vs jq

gjq is not trying to replace jq. If you need to transform data, compute aggregates, or implement complex logic, jq is the right tool. But if your use case is "find this thing in this JSON document"—which covers a surprisingly large fraction of daily JSON wrangling—gjq gets you there with less ceremony.

The two tools also compose well. Use gjq to locate the data you need, pipe the output into jq for transformation if necessary.

What's Next

gjq is at v0.1.0. The query language and DFA engine are solid. Some things I'm thinking about for future versions:

Value predicates: Filter matches by value type or content, not just path shape
Output formatting: More control over how matches are rendered
Streaming input: Process line-delimited JSON streams without buffering the entire input
Completions: Shell completions for bash, zsh, fish (the scaffold is already there via the generate subcommand)

gjq is available at github.com/fantods/gjq. I'd love to hear what you think. Reach out at matt@emmons.club.