Introduction to jsonvalidate

Rich FitzJohn

2021-11-03

This package wraps is-my-json-valid using V8 to do JSON schema validation in R.

You need a JSON schema file; see json-schema.org for details on writing these. Often someone else has done the hard work of writing one for you, and you can just check that the JSON you are producing or consuming conforms to the schema.

The examples below come from the JSON schema website

They describe a JSON based product catalogue, where each product has an id, a name, a price, and an optional set of tags. A JSON representation of a product is:

{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"]
}

The schema that they derive looks like this:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "Product",
    "description": "A product from Acme's catalog",
    "type": "object",
    "properties": {
        "id": {
            "description": "The unique identifier for a product",
            "type": "integer"
        },
        "name": {
            "description": "Name of the product",
            "type": "string"
        },
        "price": {
            "type": "number",
            "minimum": 0,
            "exclusiveMinimum": true
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "minItems": 1,
            "uniqueItems": true
        }
    },
    "required": ["id", "name", "price"]
}

This ensures the types of all fields, enforces presence of id, name and price, checks that the price is not negative and checks that if present tags is a unique list of strings.

There are two ways of passing the schema in to R; as a string or as a filename. If you have a large schema loading as a file will generally be easiest! Here’s a string representing the schema (watch out for escaping quotes):

schema <- '{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "Product",
    "description": "A product from Acme\'s catalog",
    "type": "object",
    "properties": {
        "id": {
            "description": "The unique identifier for a product",
            "type": "integer"
        },
        "name": {
            "description": "Name of the product",
            "type": "string"
        },
        "price": {
            "type": "number",
            "minimum": 0,
            "exclusiveMinimum": true
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "minItems": 1,
            "uniqueItems": true
        }
    },
    "required": ["id", "name", "price"]
}'

Create a validator:

v <- jsonvalidate::json_validator(schema)

If we’d saved the json to a file, this would work too:

path <- tempfile()
writeLines(schema, path)
v <- jsonvalidate::json_validator(path)

The returned object is a function that takes as its first argument a json string, or a filename of a json file. The empty list will fail validation because it does not contain any of the required fields:

v("{}")
## [1] FALSE

To get more information on why the validation fails, add verbose = TRUE:

v("{}", verbose = TRUE)
## [1] FALSE
## attr(,"errors")
##        field     message
## 1    data.id is required
## 2  data.name is required
## 3 data.price is required

The attribute “errors” is a data.frame and is present only when the json fails validation. The error messages come straight from is-my-json-valid and they may not always be that informative.

Alternatively, to throw an error if the json does not validate, add error = TRUE to the call:

v("{}", error = TRUE)
## Error: 3 errors validating json:
##  - data.id: is required
##  - data.name: is required
##  - data.price: is required

And to continue validating after the first error, pass greedy = TRUE:

v("{}", verbose = TRUE, greedy = TRUE)
## [1] FALSE
## attr(,"errors")
##        field     message
## 1    data.id is required
## 2  data.name is required
## 3 data.price is required

which will sometimes show more errors.

The JSON from the opening example works:

v('{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"]
}')
## [1] TRUE

But if we tried to enter a negative price it would fail:

v('{
    "id": 1,
    "name": "A green door",
    "price": -1,
    "tags": ["home", "green"]
}', verbose = TRUE)
## [1] FALSE
## attr(,"errors")
##        field              message
## 1 data.price is less than minimum

…or duplicate tags:

v('{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "home"]
}', verbose = TRUE)
## [1] FALSE
## attr(,"errors")
##       field        message
## 1 data.tags must be unique

or just basically everything wrong:

v('{
    "id": "identifier",
    "name": 1,
    "price": -1,
    "tags": ["home", "home", 1]
}', verbose = TRUE)
## [1] FALSE
## attr(,"errors")
##         field              message
## 1     data.id    is the wrong type
## 2   data.name    is the wrong type
## 3  data.price is less than minimum
## 4   data.tags       must be unique
## 5 data.tags.2    is the wrong type

The data.tags.2 name comes from within the is-my-json-valid source, and may be annoying to work with programmatically.

There is also a simple interface where you take the schema and the json at the same time:

json <- '{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"]
}'
jsonvalidate::json_validate(json, schema)
## [1] TRUE