Glossary¶
One definition per term, grouped by concept area. Each entry links to where the term is defined or used in depth. Where two terms look similar but mean different things (the most common source of confusion in this codebase), the entry says so explicitly.
Document model terms¶
These describe the data — the canonical tree every supported format reads into and writes out of. Defined formally in the model spec, §4.
- node — the canonical Python value of the Document model: either a
scalar value (a leaf), or an ordered list of labeled edges (an internal
node).
nodeis the model term — the shape itself, independent of any wrapper class. See model spec §4. - Document — the data model as a whole: "a Document is a tree of ordered, labeled edges." Used for the concept, not a specific Python class — a node is a Document, in the sense that any node is an instance of the Document model. See model spec §4 and the guide.
Doc— the guarded Python wrapper class around a node, with navigation and editing helpers (.edges(),.get(),.add(), ...). Where "Document" names the model/concept,Docis the specific class that holds one. See the API reference.- tree — informal, non-technical synonym for the Document's overall shape (an ordered, nested structure of edges). Not a distinct technical term from "node" — used in prose to evoke the shape, e.g. "a Document is a tree: a node is either a scalar value or an ordered list of labeled edges" (guide).
- edge — a
(label, node)pair: the Document-model unit of structure. "Many" is a repeated label, i.e. several edges sharing one label — not a single edge pointing at an array. Contrast with field, the Schema-model term for the corresponding named, cardinality-bound slot a record declares. See model spec §4. - label — the string key half of an edge (
(label, node)). The same word is used on the Schema side for a field's name (Field.label) — by design, since a field's label is exactly the edge label it constrains. See model spec §4 and §5. - leaf — a node holding a scalar value rather than a list of edges.
Doc.is_leafisTrueexactly for this case. See the API reference. - scalar value — a Python value that can sit at a Document leaf:
str,int,float,bool,datetime.date/time/datetime, orNone. Contrast with Scalar (capitalized), the Schema-side type object described below — a scalar value is data; aScalaris a type constraint on what scalar values are allowed. See model spec §4.
Schema model terms¶
These describe the constraint — the shape a Document must have to be valid. Defined formally in the model spec, §5.
Scalar— the Schema-side type object: one of the seven fixed kinds (string,integer,number,boolean,date,time,datetime), optionally nullable, used as a field's type (e.g.Scalar("string"), or the ready-madeSTRING/t.string). AScalaris a type; it constrains which scalar values are valid, it is not itself a value. Seeomnist/canonical/schema.pyand the schema doc.- kind /
value_kind()—kindis the plain string name ("string","integer", ...) classifying a Python value — whatvalue_kind(v)returns, used for inference and error messages. It is the same vocabulary as aScalar's.name, butkind/value_kind()answers "what kind of value is this, at runtime?" whileScalaranswers "what type does this field declare?"matches_kind(value, name)is the boolean predicate form, with a wider match set (e.g. anintegervalue also matches a"number"check). Seeomnist/canonical/schema.pyand model spec §10. - data type — not used as a distinct term in this codebase; the
intentional vocabulary is kind (a value's runtime classification),
Scalar(a field's declared type), and Python type (the actualstr/int/date/... class a value materializes as) — see model spec §10 for how the three relate. Avoid "data type" in new docs/comments; use the specific one of the three meant. - field — the Schema-model term for a named, cardinality-bound slot of
a
Record:Field(label, type, min, max). A field is what aRecorddeclares; an edge (Document-model term, above) is what aDocactually contains. A field'stypeis aScalaror aRef, never both or a composition. Seeomnist/canonical/schema.pyand the schema doc. Field— the Python class implementing a field (capitalized, contrast with lowercase field, the general concept, and with thefield()builder function below).- record (lowercase) — the DSL keyword that introduces a record
definition (
record Member { ... }), and the general concept: a closed set of named fields. See the schema doc. Record— the Python class: a closed set ofFields, constructed by the lowercaserecord(*fields)builder function. Lowercaserecordis the keyword/concept;Recordis the class — distinguished by case and code-formatting throughout the docs. Seeomnist/canonical/schema.py.Ref— a pointer into the schema's named environment (env), resolving to aRecord. Used for reuse and recursion; a field's type is aScalaror aRef, never an inline/anonymous record. See model spec §5.- cardinality — the
[min, max]range on aField, the single mechanism for required/optional/array (max=Noneis unbounded). There is no separate array type — "array" is a field withmax > 1. See model spec §5 and the schema doc. Schema— the root object: arootRefplus anenvdict mapping names toRecorddefinitions. Seeomnist/canonical/schema.pyand the API reference.- schema (lowercase, general use) — the constraint as a concept, or an
instance of
Schema; also used loosely for "the DSL text describing one." Distinguished from the DSL (below) by context: "a schema" is the parsed/constructed object or the idea; "the DSL" is the text syntax used to write one.
OML / format terms¶
- DSL — the small text language (
record/rootsyntax) for writing aSchema, parsed byparse_schema()and produced byto_dsl(). See the schema doc. - OML (Omnist Markup Language) — Omnist's own native format, designed
so every Document shape round-trips through it with zero adjustments.
Distinct from "the DSL": OML is a data format (like JSON/YAML/TOML/XML),
while the DSL is a schema text syntax — they look superficially similar
(both use
label: value-ish syntax) but describe different things (data vs. constraints). See the OML format page. - codec — a
Format'sread/write(and optionalcheck) functions, bundled together and registered by name (register_format), lettingDoc.from_format/to_format/check_formatuse it like a built-in format. See the API reference. - round-trip — reading a value in one format and writing it back out (same format or another) without losing information. OML round-trips every Document shape losslessly; other formats may need an adjustment (below) to round-trip. See the OML format page and Formats.
- adjustment — a recorded change a writer made because the target
format cannot hold a value losslessly (e.g. TOML dropping
null, JSON stringifying a date). Collected in aWriteReport;strict=Trueraises aWriteErrorinstead of adjusting silently. See the API reference. - deserialization / materialize — converting a freshly-read node's
leaf values to match a
Schema's declaredScalarkinds (e.g. an ISO-8601 string to a realdatetime.date), as distinct from validation, which only checks a match without converting anything. See model spec §10. - inference — drafting a
Schemafrom example Documents (infer()), determining each field'sScalar, nullability, and cardinality from observed samples. See model spec §11.