Skip to content

Formats

Omnist reads JSON, YAML, TOML, XML, and its own native OML (Omnist Markup Language) into one canonical Document, and writes that Document back out to any of them. Because they share one model, converting is just read one, write another:

from omnist import Doc
Doc.from_json('{"name": "Ann", "tags": ["x", "y"]}').to_toml()

How a format becomes a Document

A Document is an ordered list of labeled edges (see the model spec). The mapping is the same idea for every format:

  • An object / mapping / table becomes a list of edges.
  • A key whose value is a list becomes a repeated label — that's how an array appears. {"tag": ["x", "y"]} is the label tag twice, not a field pointing to a list.
  • A scalar is a leaf value.

So the same data in different formats reads into the same Document — which is what makes a cross-format example validate against one schema.

Source Document
JSON object {"a":1,"b":2} [(a,1),(b,2)]
JSON keyed list {"m":[A,B]} [(m,A),(m,B)]
YAML mapping / sequence as JSON
TOML table / array-of-tables as JSON
XML elements (incl. interleaved) [(tag,…),…], order preserved
OML edges a: 1\nb: 2 (incl. interleaved) [(a,1),(b,2)], order preserved — OML is this model

Reading and writing

read_*(text) parse to a Document node; Doc.from_* wrap it; Doc.to_* / write_* project back (same-label edges are grouped into a list).

from omnist import read_yaml, Doc
d = Doc(read_yaml("name: Ann\ntags: [x, y]\n"))
d.to_json()

Per-format pages

Format Notes
OML Omnist's own format; the only one with zero adjustments — every Document shape round-trips exactly
JSON the baseline; no dependencies
YAML the JSON-compatible core; needs pyyaml
TOML native dates, no null, top-level must be a table
XML single document element, repeated-element arrays, untyped text

One model, many formats — every reader converges on the same Document, and every writer diverges back out from it:

flowchart LR
    JSON1["JSON"] --> Doc(("Document"))
    YAML1["YAML"] --> Doc
    TOML1["TOML"] --> Doc
    XML1["XML"] --> Doc
    OML1["OML"] --> Doc
    Doc --> JSON2["JSON"]
    Doc --> YAML2["YAML"]
    Doc --> TOML2["TOML"]
    Doc --> XML2["XML"]
    Doc --> OML2["OML"]

Special features, mapped to OML

Beyond the basic shape mapping above, each format has its own quirky or special-cased features. This table is about those — what each format does that's distinctive, and concretely how OML / the Document model handles it (or doesn't):

Feature Format What happens
No native date/time type JSON read_json never produces date/time/datetime on its own; a date-looking string stays a plain str unless schema= upgrades it. write_json always stringifies a temporal leaf to ISO-8601 text (temporal.stringified). OML has the same gap — there's no native temporal literal in OML either, so a Document round-trips temporal values as typed Python objects in memory, but every textual format (including OML) carries them as strings without a schema.
No NaN/Infinity JSON write_json does not raise — it emits the literal tokens NaN/Infinity, which is not valid per the JSON spec (though Python's own json.loads is lenient enough to read it back). check_json reports this as float.special, an error-severity adjustment, so a caller checking the report learns about it without write_json itself failing.
Anchors/aliases (&x / *x) YAML read_yaml resolves aliases at parse time (via PyYAML's safe_load) — there is no shared-object identity preserved; a: &x foo / b: *x reads as the fully expanded [('a', 'foo'), ('b', 'foo')], two independent edges with equal values. OML has no anchor/alias syntax at all — a Document is always the fully-expanded edge list, which is exactly what every YAML alias collapses to anyway.
Native date/datetime recognition (no schema) YAML PyYAML's safe_load resolves unquoted ISO-8601-looking scalars straight into datetime.date/datetime.datetime with zero schema involvement — e.g. d: 2024-01-01 reads as [('d', datetime.date(2024, 1, 1))]. OML has no such resolver; a Document built from OML text needs schema= to get the same typed value, since OML's grammar has no native date literal.
Bare time-of-day as sexagesimal int YAML YAML's core schema has no standalone "time" type, so a bare 12:00:00 resolves to the integer 43200 (12*3600 + 0*60 + 0), not a datetime.time — confirmed: read_yaml('a: 12:00:00') returns [('a', 43200)]. This is PyYAML's own resolver behavior, not an omnist choice, and there is no schema-time workaround on the read side for a value that's already been resolved to an int by the time omnist sees it. OML sidesteps this because it has no bare-colon time literal to misparse in the first place.
Native date/time/datetime literals TOML tomllib parses TOML's date/time/datetime grammar directly into the matching Python types with no schema needed — confirmed for all three kinds (d = 2024-01-01, t = 12:00:00, dt = 2024-01-01T12:00:00). Writing round-trips the same way: TOML is the one format with no temporal.stringified adjustment in either direction. OML still has no native temporal literal, so going OML round-trip-equivalent would stringify where TOML wouldn't — this is the one case where TOML is more capable than OML on a specific scalar kind.
Array-of-tables ([[x]]) TOML The idiomatic way to write a repeated record; [[items]] … [[items]] … maps directly onto a repeated items label — confirmed: read_toml('[[x]]\nname="a"\n[[x]]\nname="b"\n') returns [('x', [('name', 'a')]), ('x', [('name', 'b')])], the same repeated-edge shape OML uses natively for any repeated label, OML-block or not.
Repeated / interleaved elements XML The Document's defining feature and OML's native strength: an ordered edge list preserves interleaving (<m/><x/><m/> reads as [(m,…),(x,…),(m,…)]) that a dict-of-arrays can't represent. OML represents this the same way XML does — repeated labels in original order — because OML's edge list is the Document model, not a projection of it.
Attributes XML Silently dropped, on both sides. Confirmed directly: read_xml('<a x="1"><b>hi</b></a>') returns [('a', [('b', 'hi')])] — the x="1" attribute is gone, with no trace in the Document. check_xml on the same input reports "no adjustments" — the drop isn't even flagged as lossy. write_xml never produces an attribute on the way back out, either; there is no path from a Document edge to an XML attribute. This is a real, current limitation of the XML profile, not something OML or the Document model offers any equivalent for — OML has no attribute concept to lose, but XML's own attributes are simply outside what gets read at all.
Namespace prefixes XML Stripped on read: a prefixed tag like <ns:b> reads as the local name b, dropping the prefix and any namespace binding — confirmed via _local() in omnist/canonical/formats.py, and directly: read_xml('<a xmlns:ns="http://x"><ns:b>hi</ns:b></a>') returns [('a', [('b', 'hi')])] with no namespace information anywhere in the Document. As with attributes, check_xml reports no adjustment for this. A genuine, current limitation — OML has no namespace concept to map this onto.
Everything else OML The "always" row: every Document shape — any combination of nesting, repeated labels, and interleaving — round-trips through OML with zero adjustments, because OML's edge-list syntax is the Document model rather than a format that has to be mapped onto it.

One thing to know: single-rooted for XML

An XML document has exactly one top-level element, so its Document has one top-level edge. To share a Document with the other three formats, wrap your data under a single top-level key (e.g. {"order": {…}}<order>…</order>). JSON/YAML/TOML happily carry multiple top-level keys; XML does not. See XML.