Skip to content

Repo layout

How the repo is organized: the canonical model's modules, the docs page map, and the test file map. For anything deeper than a one-line summary, follow the link.

omnist/canonical/*.py

The implementation of the Document/Schema model described in docs/design/model.md. import omnist re-exports its public surface; this is where the logic actually lives.

  • document.py -- the Document model: Doc and the edge-list node shape ([(label, node), ...] / scalar leaf), with navigation/editing helpers.
  • schema.py -- the Schema model: Record, Scalar, Ref, Field, Schema (with validate), and the seven scalar kinds.
  • dsl.py -- the schema DSL: parse_schema / to_dsl, parsing and serializing the record ... root ... text syntax.
  • oml.py -- OML (Omnist's own format): tokenizer, parser, and writer; the only format with zero adjustments on write.
  • formats.py -- the JSON/YAML/TOML/XML codecs (read_*/write_*/ check_*), each going through the shared grouped/edge-list conversion.
  • infer.py -- infer(): draft a schema from example Documents.
  • operations.py -- schema comparison: compatible_with, equivalent, normalize.
  • deserialize.py -- materialize(): schema-directed upgrading of a freshly-read node's leaf values (e.g. ISO string -> date) when the schema is known and the conversion is value-exact.
  • report.py -- Adjustment/WriteReport: records what a lossy writer had to change, with severities, and drives lenient/inspect/strict write modes.
  • registry.py -- the format plugin registry: Format, register_format, and the built-in JSON/YAML/TOML/XML registrations.
  • __init__.py -- re-exports the package's public names from the modules above.

Outside canonical/: omnist/errors.py defines the exception hierarchy (OmnistError, DocumentError, SchemaError, ParseError, WriteError, UnsafeXMLWarning); omnist/__init__.py is the public package surface.

docs/ page map

  • README.md -- the docs index (this page is linked from it).
  • quickstart.md -- the shortest possible tour: one OML snippet, one schema, validate(), infer().
  • guide.md -- the practical, narrative tour of the whole library. Read this first if you're not in a hurry.
  • schema.md -- the Schema model and DSL on their own: record definitions, cardinality, the Python builder, the comparison/ inference operations.
  • example.md -- one order/address/line-item schema validated against a document in OML (and the other formats), plus a backward-compatibility check. The full worked example; quickstart.md is the short one.
  • api.md -- every public name importable from omnist, with signatures.
  • glossary.md -- one definition per term used across the docs and code, grouped by concept area.
  • testing.md -- the test suite layout, coverage tooling and target, the fuzzing approach, and what CI runs. The tests section below points here for depth.
  • layout.md -- this page.
  • formats/ -- one page per format, plus an overview: overview.md (how each format maps to the model), oml.md, json.md, yaml.md, toml.md, xml.md.
  • design/model.md (design/model.md) -- the formal Document and Schema model definitions; self-contained, no paper required.
  • paper/ -- the Lee & Cheung CIKM 2010 paper that inspired the model (background reading only, not required to use Omnist).

tests/ file map

Full test strategy (coverage target, fuzzing approach, CI) is in testing.md -- this is just a map of what lives where.

  • test_canonical.py -- the core suite for the Document/Schema model: Doc, the record/Ref schema, the DSL, validation, the schema operations (compatible_with/equivalent/infer), and the format codecs.
  • test_oml.py -- OML round-tripping: every scalar kind, escaping, raw/multiline strings, separators, reserved words, numeric and nesting edge cases, and schema-directed reads.
  • test_docs.py -- executes the key snippets shown in the docs (README.md, docs/guide.md, docs/schema.md, docs/quickstart.md, etc.) as assertions, so a docs change that breaks the described behavior fails CI instead of rotting silently.
  • test_examples.py -- runs every examples/*.py file as a subprocess and asserts a clean exit, since examples are documentation too.
  • test_fuzz.py -- property-based fuzzing (Hypothesis) of the Document model, codecs, and the DSL parser.

See also testing.md for coverage measurement, the fuzzing methodology, and what CI runs on every push and PR.

Other top-level files

  • mkdocs.yml + .github/workflows/docs.yml -- build this docs/ tree into a browsable site (mkdocs-material) and deploy it to GitHub Pages on every push to master that touches docs/, mkdocs.yml, or README.md. Isolated from the package: its dependencies aren't part of any pyproject.toml extra, so installing omnist never pulls in documentation-site tooling.