Repo layout¶
How the repo is organized: the canonical model's modules, the docs page map, and the test file map. For anything deeper than a one-line summary, follow the link.
omnist/canonical/*.py¶
The implementation of the Document/Schema model described in
docs/design/model.md. import omnist re-exports its public surface; this
is where the logic actually lives.
document.py-- the Document model:Docand the edge-list node shape ([(label, node), ...]/ scalar leaf), with navigation/editing helpers.schema.py-- the Schema model:Record,Scalar,Ref,Field,Schema(withvalidate), and the seven scalar kinds.dsl.py-- the schema DSL:parse_schema/to_dsl, parsing and serializing therecord ... root ...text syntax.oml.py-- OML (Omnist's own format): tokenizer, parser, and writer; the only format with zero adjustments on write.formats.py-- the JSON/YAML/TOML/XML codecs (read_*/write_*/check_*), each going through the shared grouped/edge-list conversion.infer.py--infer(): draft a schema from example Documents.operations.py-- schema comparison:compatible_with,equivalent,normalize.deserialize.py--materialize(): schema-directed upgrading of a freshly-read node's leaf values (e.g. ISO string ->date) when the schema is known and the conversion is value-exact.report.py--Adjustment/WriteReport: records what a lossy writer had to change, with severities, and drives lenient/inspect/strict write modes.registry.py-- the format plugin registry:Format,register_format, and the built-in JSON/YAML/TOML/XML registrations.__init__.py-- re-exports the package's public names from the modules above.
Outside canonical/: omnist/errors.py defines the exception hierarchy
(OmnistError, DocumentError, SchemaError, ParseError, WriteError,
UnsafeXMLWarning); omnist/__init__.py is the public package surface.
docs/ page map¶
- README.md -- the docs index (this page is linked from it).
- quickstart.md -- the shortest possible tour: one OML
snippet, one schema,
validate(),infer(). - guide.md -- the practical, narrative tour of the whole library. Read this first if you're not in a hurry.
- schema.md -- the Schema model and DSL on their own:
recorddefinitions, cardinality, the Python builder, the comparison/ inference operations. - example.md -- one order/address/line-item schema
validated against a document in OML (and the other formats), plus a
backward-compatibility check. The full worked example;
quickstart.mdis the short one. - api.md -- every public name importable from
omnist, with signatures. - glossary.md -- one definition per term used across the docs and code, grouped by concept area.
- testing.md -- the test suite layout, coverage tooling and target, the fuzzing approach, and what CI runs. The tests section below points here for depth.
- layout.md -- this page.
formats/-- one page per format, plus an overview: overview.md (how each format maps to the model), oml.md, json.md, yaml.md, toml.md, xml.md.design/model.md(design/model.md) -- the formal Document and Schema model definitions; self-contained, no paper required.paper/-- the Lee & Cheung CIKM 2010 paper that inspired the model (background reading only, not required to use Omnist).
tests/ file map¶
Full test strategy (coverage target, fuzzing approach, CI) is in testing.md -- this is just a map of what lives where.
test_canonical.py-- the core suite for the Document/Schema model:Doc, therecord/Refschema, the DSL, validation, the schema operations (compatible_with/equivalent/infer), and the format codecs.test_oml.py-- OML round-tripping: every scalar kind, escaping, raw/multiline strings, separators, reserved words, numeric and nesting edge cases, and schema-directed reads.test_docs.py-- executes the key snippets shown in the docs (README.md,docs/guide.md,docs/schema.md,docs/quickstart.md, etc.) as assertions, so a docs change that breaks the described behavior fails CI instead of rotting silently.test_examples.py-- runs everyexamples/*.pyfile as a subprocess and asserts a clean exit, since examples are documentation too.test_fuzz.py-- property-based fuzzing (Hypothesis) of the Document model, codecs, and the DSL parser.
See also testing.md for coverage measurement, the fuzzing methodology, and what CI runs on every push and PR.
Other top-level files¶
mkdocs.yml+.github/workflows/docs.yml-- build thisdocs/tree into a browsable site (mkdocs-material) and deploy it to GitHub Pages on every push tomasterthat touchesdocs/,mkdocs.yml, orREADME.md. Isolated from the package: its dependencies aren't part of anypyproject.tomlextra, so installingomnistnever pulls in documentation-site tooling.