Intro #
JsonSchema allows us to create schemas written in JSON. We can use these schemas to validate data and documents. JsonSchema was specifically built for JSON, but we can also use it to validate other formats like YAML.
This blog post is about defining a basic JsonSchema and extending it with custom validation. You can find the full code referenced in this post here.
TLDR #
Extending the validation of JsonSchema is pretty straight forward.
With Python and the jsonschema
module, it’s a matter of creating a new validator function and then extending the desired Validator object using jsonschema.validators.extend
.
A validator function takes four arguments: validator
, properties
, instance
and schema
.
In our function we can access the data we need in order to validate our requirement and yield a ValidationError
if the data is invalid.
Scenario #
We have an automation that takes a config file as input. Our automation takes a list of hosts and then processes the configured machines based on the provided data.
Config File #
This configuration file is written in YAML format and defines two hosts:
config.yml
hosts:
- hostname: Host-01
cpu: 4
memory: 8
storage: 100
os: linux
- hostname: Host-02
cpu: 8
memory: 16
storage: 250
os: windows
Schema #
To make sure the config file actually contains all the data that we need, we can create a simple JsonSchema to validate it:
schemas/demo.schema.json
Note: This is a very basic schema definition.
You should keep in mind to add patterns for strings and to add annotations using title, description and $comment properties.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "demo.schema.json",
"title": "Demo Configuration Schema",
"description": "Configurtion of hosts for demonstration purposes",
"type": "object",
"properties": {
"hosts": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"properties": {
"hostname": {
"type": "string",
"format": "hostname"
},
"cpu": {
"type": "integer",
"minimum": 1
},
"memory": {
"type": "integer",
"minimum": 1
},
"storage": {
"type": "integer",
"minimum": 50
},
"os": {
"enum": ["linux", "windows", "macos"]
}
},
"required": ["hostname", "cpu", "memory", "storage", "os"]
}
}
},
"required": ["hosts"]
}
With this schema, we define that our config needs to contain at least one host.
Each host needs to have a hostname
, cpu
, memory
, storage
and os
property.
Validator #
Now we need a tool to acutally validate the config file based on the schema. There are many different tools that implement this, but we are going to create our own python script for it. Later on we are going to extend this python script with our own validators.
For now, we have a simple script that parses the yaml config and validates it using the JsonSchema we defined:
validate.py
import yaml
import json
from jsonschema import ValidationError
from jsonschema.validators import Draft202012Validator
with open("./schemas/demo.schemas.json", "r") as f:
schema = json.load(f)
with open("./config.yml", "r") as f:
data = yaml.safe_load(f)
validator = Draft202012Validator(
schema=schema, format_checker=Draft202012Validator.FORMAT_CHECKER
)
try:
validator.validate(data)
print("Configuration file is valid")
except ValidationError as e:
print("Configuration file is invalid")
print(e.message)
print(f"Path: {list(e.path)}")
exit(1)
When we run this script, you will see that the config is indeed valid:
$ python validate.py
Configuration file is valid
If you update the config and remove the cpu
from the first host, the validation will fail accordingly:
$ python validate.py
Configuration file is invalid
'cpu' is a required property
Path: ['hosts', 0]
Custom Validation #
So far we have basic validation. If we forget to define a property or assign incorrect values, our validator makes us aware of that.
But what happens if we create a third host with the same hostname as the second? Right now, there is no validation in place to warn us about this.
So let’s add the third host with the same hostname as Host-02:
hosts:
- hostname: Host-01
cpu: 4
memory: 8
storage: 100
os: linux
- hostname: Host-02
cpu: 8
memory: 16
storage: 250
os: windows
- hostname: Host-02
cpu: 6
memory: 32
storage: 100
os: linux
When we validate the config, the file is indeed valid:
$ python validate.py
Configuration file is valid
Now, let’s add a custom validator in our python script to make sure a given property is unique across all objects a list:
validate.py
import yaml
import json
from jsonschema import ValidationError
from jsonschema.validators import Draft202012Validator, extend
def items_unique_properties(validator, properties, instance, schema):
for property in properties:
seen = []
for item in instance:
item_value = item.get(property, None)
if item_value:
if item_value not in seen:
seen.append(item_value)
else:
yield ValidationError(
f"Duplicate property {property} with value {item_value}"
)
with open("./schemas/demo.schemas.json", "r") as f:
schema = json.load(f)
with open("./config.yml", "r") as f:
data = yaml.safe_load(f)
CustomValidator = extend(
Draft202012Validator, validators={"itemsUniqueProperties": items_unique_properties}
)
validator = CustomValidator(
schema=schema, format_checker=Draft202012Validator.FORMAT_CHECKER
)
try:
validator.validate(data)
print("Configuration file is valid")
except ValidationError as e:
print("Configuration file is invalid")
print(e.message)
print(f"Path: {list(e.path)}")
exit(1)
With this we defined a new validator called itemsUniqueProperties
. We can now add this new validator to our schema:
...
"properties": {
"hosts": {
"type": "array",
"minItems": 1,
"itemsUniqueProperties": ["hostname"],
"items": {
...
}
}
}
...
Now, when we run the validator again, the validation will inform us about the duplicate hostname:
$ python validate.py
Configuration file is invalid
Duplicate property hostname with value Host-02
Path: ['hosts']