Skip to content

API reference

Everything importable from import omnist. Types: a Document is held by a Doc; a Schema is a root reference plus named Record definitions, where a field's type is always exactly one Scalar or one Ref. See the user guide for narrative and the model spec for the formal definitions.

import omnist
omnist.__version__        # "0.1.3"

Documents

doc(value) -> Doc

Build a Doc from a plain Python value. A dict becomes an edge list; a key whose value is a list expands into one edge per item (a repeated label). A scalar becomes a leaf. A bare list, an array-of-arrays, a non-string key, a cycle, or nesting past the depth limit raises DocumentError.

class Doc

A guarded handle on a Document node — either a leaf (a scalar value) or an internal node (an ordered list of (label, child) edges).

Construction

Doc.of(value) same as doc(value)
Doc.from_oml(text) read OML, Omnist's own format (see the guide)
Doc.from_json(text) / from_yaml / from_toml / from_xml read a format string
Doc.from_format(name, text) read by format name ("json", "oml", …)

Shape & navigation

.is_leaf (property) True for a scalar leaf
.value (property) the scalar of a leaf (raises on an internal node)
.edges() -> list[(str, Doc)] the ordered (label, child) edges
.labels() -> list[str] distinct labels, in first-seen order
.get(label) -> list[Doc] all children under label (a list — labels may repeat)
.get_one(label) -> Doc the single child under label (raises unless exactly one)
.count(label) -> int how many edges carry label
.child(label) -> Doc a cursor to the single child (editable if it's a node)

Editing (mutates the edge list; returns self for chaining)

.add(label, value) append an edge — a repeated label is how an array grows
.set(label, value) replace the single child under label, or add it
.remove(label) drop every edge under label

Export

.to_data() the canonical Python form — a scalar, or a list of (label, …) tuples
.to_grouped() a JSON-shaped projection: same-label edges grouped into a list
.to_oml(**opts) serialize to OML — the only format with zero adjustments
.to_json(**opts) / .to_yaml() / .to_toml() / .to_xml() serialize to a format
.to_format(name, **opts) serialize by format name
.check_oml() -> WriteReport always empty — see OML
.check_json() / .check_yaml() / .check_toml() / .check_xml() -> WriteReport simulate the matching to_*, no output
.check_format(name) -> WriteReport simulate to_format(name), no output (needs the format's check)
.validate(schema) -> ValidationResult shorthand for schema.validate(self)

Doc also supports == (compares the underlying data, against a Doc or a plain value).


Schemas

parse_schema(text) -> Schema

Parse DSL text (record / root) into a Schema. Raises SchemaError on malformed text or an undefined reference. See the DSL section of the guide.

to_dsl(schema) -> str

Serialize a Schema back to DSL text. parse_schema(to_dsl(s)) is equivalent to s.

infer(samples, root_name="Root") -> Schema

Draft a schema from example Documents (Docs or plain values). Cardinality follows observed counts (present in every sample → required; sometimes absent → optional; seen more than once → array); object children become nested named records.

A scalar field's Scalar is determined from the kinds of its observed values: integer and number collapse to number (the one subset relation between scalars); any other mix of kinds for the same field (e.g. an integer and a string) raises SchemaError — a field infers to exactly one scalar, never a composition. The field is nullable iff any sample's value was null, independent of which kind(s) were observed; if a field occurred but every observed value was null, infer defaults to a nullable string. The full algorithm, with the exact collapse and default rules, is model.md §11.

The Python builder

Function Builds
record(*fields) -> Record a closed record from Fields
field(label, type, min=1, max=1) -> Field one field; type is a Scalar (e.g. t.string) or a Ref; max=None is unbounded
nullable(scalar) -> Scalar a copy of scalar that also accepts null (the ? form)
ref(name) -> Ref a reference to a named record
schema(root, **env) -> Schema assemble a Schema (root is a Ref or a name string)
t the scalar namespace: t.string, t.integer, t.number, t.boolean, t.date, t.time, t.datetime — ready-to-use Scalar instances, passed as-is as a field's type
from omnist import schema, record, field, ref, nullable, t
s = schema(ref("User"),
           User=record(field("name", t.string),
                       field("note", nullable(t.string), min=0, max=1),
                       field("tags", t.string, min=0, max=None)))

class Schema

Schema(root: Ref, env: dict[str, Record] = None) — a root reference plus named record definitions. Raises SchemaError if root isn't a Ref, if any env entry isn't a Record, or if a Ref (the root or one inside a field) names an entry not present in env.

Method
.validate(doc) -> ValidationResult check a Doc against this schema
.accepts(doc) -> bool validate(doc).ok
.compatible_with(other) -> bool every document this accepts, other also accepts (backward-compat)
.equivalent(other) -> bool both accept exactly the same documents
.normalize() -> Schema merge structurally-identical named definitions
.to_dsl() -> str serialize back to DSL
.root, .env the root Ref and the name→record map
.resolve(t) -> Record follow a Ref chain to a Record

Definition & type classes

These are produced by the DSL and builder; you can also construct them directly.

  • Record(fields: list[Field]) — a closed record. .fields; .field(label) -> Field | None.
  • Field(label, type, min=1, max=1) — one labeled edge rule. .label, .type (a Scalar or a Ref), .min, .max (None = unbounded).
  • Scalar(name, nullable=False) — one of the seven fixed value types, optionally nullable; never composed with another kind or a literal value. .name (one of "string", "integer", "number", "boolean", "date", "time", "datetime"), .nullable (bool).
  • Ref(name) — a reference to a named record in the schema's env.
  • Ready-to-use instances: STRING, INTEGER, NUMBER, BOOLEAN, DATE, TIME, DATETIME (also under t.*).

Validation results

class ValidationResult

Returned by Schema.validate.

.ok (property) True if the document conforms
bool(result) same as .ok
.errors -> list[Error] every failure
str(result) a readable multi-line summary

class Error

A named tuple Error(path, message) — unpacks as (path, message) and exposes .path (e.g. "$.order.items") and .message.

r = s.validate(doc({"id": "x"}))
if not r.ok:
    for e in r.errors:
        print(e.path, e.message)

Reading & writing formats

Low-level codecs over the canonical node form (a scalar, or a list of (label, node) edges). Most code uses Doc.from_* / Doc.to_* instead.

read_oml(text) / read_json / read_yaml / read_toml / read_xml parse → a node
write_oml(node, *, indent=2) a node → OML, losslessly — no strict/report needed (see below)
write_json(node, *, strict=False, report=None, indent=None) a node → JSON (groups same-label edges)
write_yaml(node, *, strict=False, report=None) a node → YAML
write_toml(node, *, strict=False, report=None) a node → TOML
write_xml(node, *, strict=False, report=None) a node → XML
check_oml(node) always an empty WriteReport — OML holds every node shape exactly
check_json(node) / check_yaml / check_toml / check_xml simulate a write; return a WriteReport, no output

read_yaml/write_yaml need pyyaml; write_toml needs tomli_w; read_xml recommends defusedxml (else an UnsafeXMLWarning). See Formats for per-format mapping and caveats.

Schema-directed deserialization

Pass schema= to any reader (or Doc.from_json / Doc.from_yaml / Doc.from_toml / Doc.from_xml) to upgrade each leaf to match what the schema declares, whenever the conversion is value-exact, raising ParseError when it isn't. See Schema-directed deserialization for the full explanation, the conversion rules, and materialize.

read_oml(text, schema=...) / read_json / read_yaml / read_toml / read_xml parse → a node, upgrading leaves to match schema
materialize(node, schema) -> node apply the same upgrade directly to an already-parsed node

Adjustment reports (lossy writes)

Writing to a format that can't hold every value (TOML has no null; JSON/XML have no date type) is lenient by default: the writer adjusts the value and records it. Doc.to_* and write_* accept the same two options:

strict=True raise WriteError (carrying the report) if anything was adjusted
report=a_WriteReport collect the adjustments into it, without raising
from omnist import doc, WriteReport, WriteError

d = doc({"a": 1, "b": None})
d.to_toml()                          # 'a = 1\n' -- 'b' dropped, silently

rep = WriteReport()
d.to_toml(report=rep)
[(a.code, a.severity) for a in rep]  # [('null.omitted', 'warning')]

d.to_toml(strict=True)               # raises WriteError

class WriteReport

Every adjustment a writer made. .warnings / .errors (lists of Adjustment); bool(report) is True when there are no "error"-severity entries (warnings are fine) — if check_toml(node): ... reads as "safe to write." Iterable; str(report) is a readable multi-line summary.

class Adjustment

A named tuple Adjustment(path, code, message, severity)severity is "warning" or "error". Stable codes: null.omitted (TOML/XML), temporal.stringified (JSON/YAML/XML), float.special (JSON NaN/Infinity), key.sanitized (XML), string.ambiguous (XML — a string value that looks like another type, e.g. a digit string or "true", and would read back as that type), shape.empty_ambiguous (XML — an empty internal node, i.e. zero edges, is written as <tag /> and reads back as the empty-string leaf "", not []), string.illegal_xml_char (XML, "error" — a string contains a character XML 1.0 cannot represent, e.g. a C0 control other than tab/LF/CR, or a surrogate; write_xml replaces it with U+FFFD so the output is always well-formed), string.cr_normalized (XML — a string contains \r, which is legal XML but normalizes to \n on parse per the XML spec, so it doesn't round-trip byte-for-byte), and string.line-break-char (YAML — a label or value containing U+0085 NEL, which YAML's line-break rules would otherwise normalize to a space; written double-quoted to round-trip correctly).


Format registry

Formats are plugins. The four built-ins register themselves on import.

register_format(Format(name, read, write, check=None)) add a format, usable via Doc.from_format / Doc.to_format / Doc.check_format
get_format(name) -> Format look one up by name (raises OmnistError if unknown)
formats() -> list[str] every registered name, sorted
from omnist import Format, register_format, Doc

register_format(Format(
    name="lines",
    read=lambda text: [("n", int(x)) for x in text.split()],
    write=lambda node, **opts: " ".join(str(v) for _, v in node),
))
Doc.from_format("lines", "1 2 3").to_format("lines")    # '1 2 3'

class Format

A named tuple Format(name, read, write, check=None)read(text) -> node, write(node, **opts) -> str, and an optional check(node) -> WriteReport for simulating a write without producing output. The four built-ins all provide check; a plugin that omits it can still be used with from_format/ to_format, but Doc.check_format raises DocumentError for it.


Exceptions & warnings

Raised when
OmnistError base class for all Omnist errors
SchemaError invalid schema text or structure (bad DSL, undefined Ref, bad cardinality)
ParseError a document couldn't be read from its format
DocumentError a value isn't a legal Document, or an invalid Doc operation
WriteError a Document can't be represented in the target format (e.g. multi-rooted XML)
DetachedNode (DocumentError subclass) a cursor used after its node was removed
UnsafeXMLWarning read_xml fell back to the stdlib parser because defusedxml is missing

See also