Wrestling JSON with jq by Arjan van der Gaag

When you deal with JSON-based APIs a lot, a Unix-y tool to filter and transform JSON streams can be a valuable tool. Jq is such a tool, and you would do well to learn its basic usage.

Jq is a command-line tool to tranform JSON streams into other JSON streams. You can use it in a Unix pipeline to collect information from one set of JSON and output another set for further processing. This post will explain what that means using an example.

Example: collect all Bitbucket repo URLs

Let’s try collecting all the URLs for Git repositories you have hosted on Bitbucket. The Bitbucket JSON API gives us information about our repositories, but no simple way to get just the URLs we need. Rather than whipping up some Ruby to parse the JSON we get back, let’s use jq.

1. Fetching our JSON document from the API

Bitbucket’s API can be used with cURL as follows:

% curl --silent https://api.bitbucket.org/2.0/repositories/avdgaag

The response will be a JSON document with information on all my public repositories.¹ In this example our response will have on repository. The output cURL spits out looks like this:

{"pagelen": 10, "values": [{"scm": "git", "has_wiki": false, "description":
"...", "links": {"watchers": {"href": "..."}, "commits": {"href": "..."},
"self": {"href": "..."}, "html": {"href": "..."}, "avatar": {"href": "..."},
"forks": {"href": "..."}, "clone": [{"href": "...", "name": "https"},
{"href": "...", "name": "ssh"}], "pullrequests": {"href": "..."}},
"fork_policy": "no_forks", "name": "...", "language": "", "created_on":
"...", "full_name": "...", "has_issues": false, "owner": {"username": "...",
"display_name": "...", "uuid": "...", "links": {"self": {"href": "..."},
"html": {"href": "..."}, "avatar": {"href": "..."}}}, "updated_on": "...",
"size": 2328936, "is_private": true, "uuid": "..."}], "page": 1, "size": 1}

A nice, unreadable blob of JSON content! So, what if we have fifty repositories and want to get the clone URLs (the SSH-variant) for each of them?

2. Filtering the JSON with jq

Assume we have piped the cURL output into a file called response. We have also installed jq on our system, for example using Homebrew on Mac OS X (see full jq installation instructions). We can simply format and print the JSON using jq '.':

% jq '.' response
{
  "pagelen": 10,
  "page": 1,
  "size": 1,
  "values": [
    {
      "scm": "git",
      "has_wiki": false,
      "description": "...",
      "links": {
        "watchers": { "href": "..." },
        "commits": { "href": "..." },
        "self": { "href": "..." },
        "html": { "href": "..." },
        "avatar": {"href": "..."},
        "forks": {"href": "..."},
        "clone": [
          {"href": "...", "name": "https"},
          {"href": "...", "name": "ssh"}
        ],
        "pullrequests": {"href": "..."}
      },
      "fork_policy": "no_forks",
      "name": "...",
      "language": "",
      "created_on": "...",
      "full_name": "...",
      "has_issues": false,
      "owner": {
        "username": "...",
        "display_name": "...",
        "uuid": "...",
        "links": {
          "self": {"href": "..."},
          "html": {"href": "..."},
          "avatar": {"href": "..."}
        }
      },
      "updated_on": "...",
      "size": 2328936,
      "is_private": true,
      "uuid": "..."}
   ]
 }

Of course, we could also have piped our cURL output straight into jq:

% curl ... | jq '.'

Either way, our JSON gets pretty-printed. Now we can start building our query.

3. Querying simple values

First, let’s zoom in on the top-level values attribute:

% jq '.values' response
[
  {
    "scm": "git",
    "has_wiki": false,
    ...
  }
]

Our JSON stream is now the value for the values attribute, which is an array of objects. To collect a particular attribute path for each entry in this array, we can expand our query as follows:

% jq '.values[].links.clone' response
[
  {
    "href": "..."
    "name": "https"
  },
  {
    "href": "...",
    "name": "ssh"
  }
]

As you can see, we can construct paths through our JSON document by joining nested keys with a ., like .values.links.clone. Our values attribute happens to hold an array, and we want to collect the rest of the path for each of the objects in the array: .values[].links.clone. We could also have fetched just the clone links for the first repository using an index with .values[0].links.clone.

4. Unwrapping the array into multiple objects

We now have an array of objects, but we don’t really want an “array” as a single value; we’re interested in all the array values. So we pull the same trick as before and add some brackets to “unwrap” the array:

% jq '.values[].links.clone[]' response
{
  "href": "..."
  "name": "https"
}
{
  "href": "...",
  "name": "ssh"
}

That gives us a “stream” of objects. Nice!

5. Filtering our stream with a condition

Here comes the tricky bit. Now we got an array of objects with href and name attributes, we want to collect the hrefs where name equals ssh. Jq has a nice select function to do that:

% jq '.values[].links.clone[] | select(.name == "ssh")' response
{
  "href": "...",
  "name": "ssh"
}

Using the | we can pass one stream of objects (namely all clone objects) to a new expression. In this case that’s the select function which makes sure our output only contains the objects that match its subexpression.

Now we can use another pipe to filter our new stream and get just the values we’re interested in:

% jq '.values[].links.clone[] | select(.name == "ssh") | .href' response
"ssh://bitbucket.org/..."

6. Outputting raw values

We’ve got our selection down to the right values. Too bad, though, that the values are formatted as JSON — including string quotes. When we want to further process this list of values using other programs, we probbly prefer raw values over formatted strings. Luckily, jq has a command-line flag to do just that:

% jq --raw-values \
  '.values[].links.clone[] | select(.name == "ssh") | .href' \
  response
ssh://bitbucket.org/...

Note you can abbreviate --raw-values to -r.

Hurrah! We’ve used jq to quickly pull some values out of a complex JSON document without having to resort to opening our editor, writing Ruby or Python, figuring out how to use their HTTP libraries exactly, and then parsing and printing the response.

Jq’s syntax takes a little getting used to, but it’s got decent documentation and a wealth of functions and constructs to use. Most of this stuff you are not going to keep in your head, but I have found the patterns in this post (collecting key paths and conditional filtering) to be simple enough to use without having to resort to the manpages.

Do refer to the jq documentation for more information and enjoy your new-found JSON-wrestling superpowers!

To also include private repositories, include a --user USERNAME:PASSWORD option. ↩