Skip to content

Why Omnist

The thesis

JSON, YAML, TOML, and XML are markup languages that got a schema system bolted on afterward, if at all -- JSON Schema, XSD, or (for YAML and TOML) nothing. Because the schema wasn't designed together with the data model, it can only ever check a document against a shape; it has no way to relate two shapes to each other.

Omnist designs the Document and the Schema as one formalism from the start (see the model spec): a Document is a canonical edge list, and a Schema is a closed, exactly-typed grammar over those same edges. Because every field has exactly one type -- never an enum, a union, or an open/Any escape hatch -- two schemas can be compared structurally, not just used to check data. That's what makes operations like compatible_with (is every document valid under schema A also valid under schema B?), equivalent, and infer decidable: there's always exactly one answer, never "it depends which branch of the union matched."

This is a falsifiable claim. It would be false if a mainstream JSON Schema library already had a clean way to ask "is this schema change backward compatible," or if YAML/TOML had any schema system at all. Both are checked below, against the real libraries -- not assumed.

Capability matrix

Verified against the actual library/spec for each column (see the worked comparison below for the compatible_with row, and the non-goals section for the XML row).

Capability JSON + jsonschema YAML (PyYAML) TOML (tomllib) XML + XSD Omnist
Schema validation yes no (no schema concept) no (no schema concept) yes yes
Backward-compat checking no (no API; see below) no no no (no standard API; XSD has no compat operation) yes (compatible_with)
Schema-from-examples (inference) no (no standard API in jsonschema) no no no (no standard tool ships with XSD) yes (infer)
Type-exact schema-directed deserialization partial (validates types but doesn't convert/upgrade values) no (PyYAML's loader infers types itself, not from a schema) no (tomllib likewise infers types itself, not schema-directed) partial (XSD types validate; bindings to native types need a separate codegen tool, e.g. xmlschema/xsdata) yes (schema= upgrades leaves only when value-exact)
Native lossless round-trip yes (within JSON's own type system) yes (within YAML's own type system) yes (within TOML's own type system) no (attributes/namespaces/mixed content not modeled by Omnist; see non-goals) yes (OML only -- JSON/YAML/TOML/XML round-trip within their own type systems, same as the other columns)
Multi-format read/write (one model, many formats) no (jsonschema only ever speaks JSON) no (PyYAML only speaks YAML) no (tomllib only speaks TOML, and is read-only) no (XML tooling only speaks XML) yes (read_*/write_* for JSON/YAML/TOML/XML/OML all target one Document model)

Notes on cells double-checked while writing this table:

  • PyYAML and tomllib have no schema concept whatsoever -- confirmed by inspecting their public APIs (yaml.safe_load/dump and tomllib.load/loads); there is nothing resembling validation, comparison, or inference to even attempt.
  • jsonschema 4.26.0's public API (validate, the DraftNValidator classes, FormatChecker) is entirely about checking one document against one schema. There is no method, function, or companion package shipped with it for comparing two schemas to each other.
  • XSD has a validate-only contract in every mainstream tool we're aware of (e.g. lxml.etree.XMLSchema); there's no standard "is this XSD change backward compatible" operation, and no inference tool ships with the spec.

Worked comparison: compatible_with

Omnist:

from omnist import parse_schema

v1 = parse_schema('record R { "host": string }\nroot R')
v2 = parse_schema('record R { "host": string, "port" [0,1]: integer }\nroot R')

v1.compatible_with(v2)   # True  -- every v1 document is still valid under v2
v2.compatible_with(v1)   # False -- a v2 document with a port isn't valid under v1

One method call. The answer is decidable because every Omnist field has exactly one type (Section 5 of the model spec) -- there's no union or enum branch that could make "is A's document set a subset of B's" ambiguous.

jsonschema (the most common Python JSON Schema validator, version 4.26.0, checked directly in this repo's venv): there is no compatible_with, is_subset, or any comparison API at all. The library's entire public surface (validate, the DraftNValidator family, FormatChecker) is shaped around checking one document against one schema -- never two schemas against each other:

import jsonschema

v1 = {
    "type": "object",
    "properties": {"host": {"type": "string"}},
    "required": ["host"],
    "additionalProperties": False,
}
v2 = {
    "type": "object",
    "properties": {"host": {"type": "string"}, "port": {"type": "integer"}},
    "required": ["host"],
    "additionalProperties": False,
}

# jsonschema.validate(instance, schema) only checks a document against
# ONE schema. To approximate "is v1 backward compatible with v2" you'd
# have to hand-write your own diff over the two schema dicts yourself --
# walking properties, required, additionalProperties, and (in the
# general case) oneOf/anyOf/$ref branches, recursively, with no
# library support for any of it. jsonschema gives you no starting point.
jsonschema.validate({"host": "x"}, v1)  # only proves a fact about ONE document

This isn't a contrived gap: jsonschema's own docs and API surface confirm it's a validator, not a schema-algebra library. Searching the package index for the natural alternative (pip index versions jsonschema, and pip list for anything diff/compat-shaped already installed) turned up no companion package that adds this. The claim survives unsoftened: a project that wants schema-version compatibility checking with jsonschema has to build that logic itself, by hand, over the raw schema dicts -- recursing through oneOf/anyOf/$ref in the general case -- where Omnist ships it as one method that's correct by construction because the type system has no branches to get wrong.

Non-goals

Omnist is not trying to be a bigger hammer than it is. Specifically:

  • No value-domain constraints. Omnist has no regex/pattern matching, no numeric ranges, no custom predicates (JSON Schema's pattern, minimum, format: email, and friends). A field's type is exactly one of seven scalar kinds (optionally nullable) -- never a refinement of a kind. If you need "a string matching this regex," Omnist isn't the tool.
  • No enums or unions. "Either an integer or the string unlimited" or "one of a, b, c" can't be expressed. This is deliberate (see the model spec, Section 2): a value that matches more than one candidate has no principled Python type to materialize to. If your data needs literal-value validation, you'll need something else for that part.
  • Not a replacement for XML with attributes, mixed content, or namespaces. This was checked directly against this repo's read_xml/ check_xml, not assumed:
from omnist import read_xml, check_xml

read_xml('<a x="1"><b>hi</b></a>')
# [('a', [('b', 'hi')])]   -- the x="1" attribute is silently gone

check_xml('<a x="1"><b>hi</b></a>')
# no adjustments -- check_xml reports nothing about the dropped attribute

Namespace prefixes are stripped the same way: an element like foo:b under an xmlns:foo binding reads as plain b -- the prefix and its namespace binding both vanish, with no warning. Omnist's Document model has no edge shape for attributes, mixed text+element content, or namespace-qualified names, so none of the three survive a round trip. If your XML relies on attributes, namespaces, or interleaved text and elements as meaningful data (not just structure), Omnist will silently lose it -- use a real XML library (lxml, xml.etree) for that data instead. This is a genuine, named limitation, not a hedge: nothing in Omnist currently detects or warns about it. (A separate issue may exist to track making this loss visible at read/check time; this page only documents the current behavior.) - No structureless escape hatches, by design, not by oversight. There's no Any type and no open/wildcard record. This isn't a missing feature on the roadmap -- it's the property that makes compatible_with, equivalent, normalize, and infer well-defined in the first place (see the thesis above). If you need a schema that accepts arbitrary, unstructured data, Omnist's schema model isn't going to do that for you.

See also

  • Model spec -- the formal Document and Schema definitions this page's claims are checked against.
  • Schema model & DSL -- compatible_with, equivalent, normalize, and infer in full.
  • Formats -- the per-format mapping and caveats, including XML's in more detail.