API reference¶

Everything importable from import omnist. Types: a Document is held by a Doc; a Schema is a root reference plus named Record definitions, where a field's type is always exactly one Scalar or one Ref. See the user guide for narrative and the model spec for the formal definitions.

import omnist
omnist.__version__        # "0.1.3"

Documents¶

`doc(value) -> Doc`¶

Build a Doc from a plain Python value. A dict becomes an edge list; a key whose value is a list expands into one edge per item (a repeated label). A scalar becomes a leaf. A bare list, an array-of-arrays, a non-string key, a cycle, or nesting past the depth limit raises DocumentError.

`class Doc`¶

A guarded handle on a Document node — either a leaf (a scalar value) or an internal node (an ordered list of (label, child) edges).

Construction


`Doc.of(value)`	same as `doc(value)`
`Doc.from_oml(text)`	read OML, Omnist's own format (see the guide)
`Doc.from_json(text)` / `from_yaml` / `from_toml` / `from_xml`	read a format string
`Doc.from_format(name, text)`	read by format name (`"json"`, `"oml"`, …)

Shape & navigation


`.is_leaf` (property)	`True` for a scalar leaf
`.value` (property)	the scalar of a leaf (raises on an internal node)
`.edges() -> list[(str, Doc)]`	the ordered `(label, child)` edges
`.labels() -> list[str]`	distinct labels, in first-seen order
`.get(label) -> list[Doc]`	all children under `label` (a list — labels may repeat)
`.get_one(label) -> Doc`	the single child under `label` (raises unless exactly one)
`.count(label) -> int`	how many edges carry `label`
`.child(label) -> Doc`	a cursor to the single child (editable if it's a node)

Editing (mutates the edge list; returns self for chaining)


`.add(label, value)`	append an edge — a repeated label is how an array grows
`.set(label, value)`	replace the single child under `label`, or add it
`.remove(label)`	drop every edge under `label`

Export


`.to_data()`	the canonical Python form — a scalar, or a list of `(label, …)` tuples
`.to_grouped()`	a JSON-shaped projection: same-label edges grouped into a list
`.to_oml(**opts)`	serialize to OML — the only format with zero adjustments
`.to_json(**opts)` / `.to_yaml()` / `.to_toml()` / `.to_xml()`	serialize to a format
`.to_format(name, **opts)`	serialize by format name
`.check_oml() -> WriteReport`	always empty — see OML
`.check_json()` / `.check_yaml()` / `.check_toml()` / `.check_xml() -> WriteReport`	simulate the matching `to_*`, no output
`.check_format(name) -> WriteReport`	simulate `to_format(name)`, no output (needs the format's `check`)
`.validate(schema) -> ValidationResult`	shorthand for `schema.validate(self)`

Doc also supports == (compares the underlying data, against a Doc or a plain value).

Schemas¶

`parse_schema(text) -> Schema`¶

Parse DSL text (record / root) into a Schema. Raises SchemaError on malformed text or an undefined reference. See the DSL section of the guide.

`to_dsl(schema) -> str`¶

Serialize a Schema back to DSL text. parse_schema(to_dsl(s)) is equivalent to s.

`infer(samples, root_name="Root") -> Schema`¶

Draft a schema from example Documents (Docs or plain values). Cardinality follows observed counts (present in every sample → required; sometimes absent → optional; seen more than once → array); object children become nested named records.

A scalar field's Scalar is determined from the kinds of its observed values: integer and number collapse to number (the one subset relation between scalars); any other mix of kinds for the same field (e.g. an integer and a string) raises SchemaError — a field infers to exactly one scalar, never a composition. The field is nullable iff any sample's value was null, independent of which kind(s) were observed; if a field occurred but every observed value was null, infer defaults to a nullable string. The full algorithm, with the exact collapse and default rules, is model.md §11.

The Python builder¶

Function	Builds
`record(*fields) -> Record`	a closed record from `Field`s
`field(label, type, min=1, max=1) -> Field`	one field; `type` is a `Scalar` (e.g. `t.string`) or a `Ref`; `max=None` is unbounded
`nullable(scalar) -> Scalar`	a copy of `scalar` that also accepts `null` (the `?` form)
`ref(name) -> Ref`	a reference to a named record
`schema(root, **env) -> Schema`	assemble a `Schema` (`root` is a `Ref` or a name string)
`t`	the scalar namespace: `t.string`, `t.integer`, `t.number`, `t.boolean`, `t.date`, `t.time`, `t.datetime` — ready-to-use `Scalar` instances, passed as-is as a field's type

from omnist import schema, record, field, ref, nullable, t
s = schema(ref("User"),
           User=record(field("name", t.string),
                       field("note", nullable(t.string), min=0, max=1),
                       field("tags", t.string, min=0, max=None)))

`class Schema`¶

Schema(root: Ref, env: dict[str, Record] = None) — a root reference plus named record definitions. Raises SchemaError if root isn't a Ref, if any env entry isn't a Record, or if a Ref (the root or one inside a field) names an entry not present in env.

Method
`.validate(doc) -> ValidationResult`	check a `Doc` against this schema
`.accepts(doc) -> bool`	`validate(doc).ok`
`.compatible_with(other) -> bool`	every document this accepts, `other` also accepts (backward-compat)
`.equivalent(other) -> bool`	both accept exactly the same documents
`.normalize() -> Schema`	merge structurally-identical named definitions
`.to_dsl() -> str`	serialize back to DSL
`.root`, `.env`	the root `Ref` and the name→record map
`.resolve(t) -> Record`	follow a `Ref` chain to a `Record`

Definition & type classes¶

These are produced by the DSL and builder; you can also construct them directly.

Record(fields: list[Field]) — a closed record. .fields; .field(label) -> Field | None.
Field(label, type, min=1, max=1) — one labeled edge rule. .label, .type (a Scalar or a Ref), .min, .max (None = unbounded).
Scalar(name, nullable=False) — one of the seven fixed value types, optionally nullable; never composed with another kind or a literal value. .name (one of "string", "integer", "number", "boolean", "date", "time", "datetime"), .nullable (bool).
Ref(name) — a reference to a named record in the schema's env.
Ready-to-use instances: STRING, INTEGER, NUMBER, BOOLEAN, DATE, TIME, DATETIME (also under t.*).

Validation results¶

`class ValidationResult`¶

Returned by Schema.validate.


`.ok` (property)	`True` if the document conforms
`bool(result)`	same as `.ok`
`.errors -> list[Error]`	every failure
`str(result)`	a readable multi-line summary

`class Error`¶

A named tuple Error(path, message) — unpacks as (path, message) and exposes .path (e.g. "$.order.items") and .message.

r = s.validate(doc({"id": "x"}))
if not r.ok:
    for e in r.errors:
        print(e.path, e.message)

Reading & writing formats¶

Low-level codecs over the canonical node form (a scalar, or a list of (label, node) edges). Most code uses Doc.from_* / Doc.to_* instead.


`read_oml(text)` / `read_json` / `read_yaml` / `read_toml` / `read_xml`	parse → a node
`write_oml(node, *, indent=2)`	a node → OML, losslessly — no `strict`/`report` needed (see below)
`write_json(node, *, strict=False, report=None, indent=None)`	a node → JSON (groups same-label edges)
`write_yaml(node, *, strict=False, report=None)`	a node → YAML
`write_toml(node, *, strict=False, report=None)`	a node → TOML
`write_xml(node, *, strict=False, report=None)`	a node → XML
`check_oml(node)`	always an empty `WriteReport` — OML holds every node shape exactly
`check_json(node)` / `check_yaml` / `check_toml` / `check_xml`	simulate a write; return a `WriteReport`, no output

read_yaml/write_yaml need pyyaml; write_toml needs tomli_w; read_xml recommends defusedxml (else an UnsafeXMLWarning). See Formats for per-format mapping and caveats.

Schema-directed deserialization¶

Pass schema= to any reader (or Doc.from_json / Doc.from_yaml / Doc.from_toml / Doc.from_xml) to upgrade each leaf to match what the schema declares, whenever the conversion is value-exact, raising ParseError when it isn't. See Schema-directed deserialization for the full explanation, the conversion rules, and materialize.


`read_oml(text, schema=...)` / `read_json` / `read_yaml` / `read_toml` / `read_xml`	parse → a node, upgrading leaves to match `schema`
`materialize(node, schema) -> node`	apply the same upgrade directly to an already-parsed node

Adjustment reports (lossy writes)¶

Writing to a format that can't hold every value (TOML has no null; JSON/XML have no date type) is lenient by default: the writer adjusts the value and records it. Doc.to_* and write_* accept the same two options:


`strict=True`	raise `WriteError` (carrying the report) if anything was adjusted
`report=a_WriteReport`	collect the adjustments into it, without raising

from omnist import doc, WriteReport, WriteError

d = doc({"a": 1, "b": None})
d.to_toml()                          # 'a = 1\n' -- 'b' dropped, silently

rep = WriteReport()
d.to_toml(report=rep)
[(a.code, a.severity) for a in rep]  # [('null.omitted', 'warning')]

d.to_toml(strict=True)               # raises WriteError

`class WriteReport`¶

Every adjustment a writer made. .warnings / .errors (lists of Adjustment); bool(report) is True when there are no "error"-severity entries (warnings are fine) — if check_toml(node): ... reads as "safe to write." Iterable; str(report) is a readable multi-line summary.

`class Adjustment`¶

A named tuple Adjustment(path, code, message, severity) — severity is "warning" or "error". Stable codes: null.omitted (TOML/XML), temporal.stringified (JSON/YAML/XML), float.special (JSON NaN/Infinity), key.sanitized (XML), string.ambiguous (XML — a string value that looks like another type, e.g. a digit string or "true", and would read back as that type), shape.empty_ambiguous (XML — an empty internal node, i.e. zero edges, is written as <tag /> and reads back as the empty-string leaf "", not []), string.illegal_xml_char (XML, "error" — a string contains a character XML 1.0 cannot represent, e.g. a C0 control other than tab/LF/CR, or a surrogate; write_xml replaces it with U+FFFD so the output is always well-formed), string.cr_normalized (XML — a string contains \r, which is legal XML but normalizes to \n on parse per the XML spec, so it doesn't round-trip byte-for-byte), and string.line-break-char (YAML — a label or value containing U+0085 NEL, which YAML's line-break rules would otherwise normalize to a space; written double-quoted to round-trip correctly).

Format registry¶

Formats are plugins. The four built-ins register themselves on import.


`register_format(Format(name, read, write, check=None))`	add a format, usable via `Doc.from_format` / `Doc.to_format` / `Doc.check_format`
`get_format(name) -> Format`	look one up by name (raises `OmnistError` if unknown)
`formats() -> list[str]`	every registered name, sorted

from omnist import Format, register_format, Doc

register_format(Format(
    name="lines",
    read=lambda text: [("n", int(x)) for x in text.split()],
    write=lambda node, **opts: " ".join(str(v) for _, v in node),
))
Doc.from_format("lines", "1 2 3").to_format("lines")    # '1 2 3'

`class Format`¶

A named tuple Format(name, read, write, check=None) — read(text) -> node, write(node, **opts) -> str, and an optional check(node) -> WriteReport for simulating a write without producing output. The four built-ins all provide check; a plugin that omits it can still be used with from_format/ to_format, but Doc.check_format raises DocumentError for it.

Exceptions & warnings¶

	Raised when
`OmnistError`	base class for all Omnist errors
`SchemaError`	invalid schema text or structure (bad DSL, undefined `Ref`, bad cardinality)
`ParseError`	a document couldn't be read from its format
`DocumentError`	a value isn't a legal Document, or an invalid `Doc` operation
`WriteError`	a Document can't be represented in the target format (e.g. multi-rooted XML)
`DetachedNode`	(`DocumentError` subclass) a cursor used after its node was removed
`UnsafeXMLWarning`	`read_xml` fell back to the stdlib parser because `defusedxml` is missing

API reference¶

Documents¶

doc(value) -> Doc¶

class Doc¶

Schemas¶

parse_schema(text) -> Schema¶

to_dsl(schema) -> str¶

infer(samples, root_name="Root") -> Schema¶