Here’s a concrete, low‑lift way to boost Stella Ops’s visibility and prove your “deterministic, replayable” moat: publish a **sanitized subset of reachability graphs** as a public benchmark that others can run and score identically.

### What this is (plain English)

* You release a small, carefully scrubbed set of **packages + SBOMs + VEX + call‑graphs** (source & binaries) with **ground‑truth reachability labels** for a curated list of CVEs.
* You also ship a **deterministic scoring harness** (container + manifest) so anyone can reproduce the exact scores, byte‑for‑byte.

### Why it helps

* **Proof of determinism:** identical inputs → identical graphs → identical scores.
* **Research magnet:** gives labs and tool vendors a neutral yardstick; you become “the” benchmark steward.
* **Biz impact:** easy demo for buyers; lets you publish leaderboards and whitepapers.

### Scope (MVP dataset)

* **Languages:** PHP, JS, Python, plus **binary** (ELF/PE/Mach‑O) mini-cases.
* **Units:** 20–30 packages total; 3–6 CVEs per language; 4–6 binary cases (static & dynamically‑linked).
* **Artifacts per unit:**

  * Package tarball(s) or container image digest
  * SBOM (CycloneDX 1.6 + SPDX 3.0.1)
  * VEX (known‑exploited, not‑affected, under‑investigation)
  * **Call graph** (normalized JSON)
  * **Ground truth**: list of vulnerable entrypoints/edges considered *reachable*
  * **Determinism manifest**: feed URLs + rule hashes + container digests + tool versions

### Data model (keep it simple)

* `dataset.json`: index of cases with content‑addressed URIs (sha256)
* `sbom/`, `vex/`, `graphs/`, `truth/` folders mirroring the index
* `manifest.lock.json`: DSSE‑signed record of:

  * feeder rules, lattice policies, normalizers (name + version + hash)
  * container image digests for each step (scanner/cartographer/normalizer)
  * timestamp + signer (Stella Ops Authority)

### Scoring harness (deterministic)

* One Docker image: `stellaops/benchmark-harness:<tag>`
* Inputs: dataset root + `manifest.lock.json`
* Outputs:

  * `scores.json` (precision/recall/F1, per‑case and macro)
  * `replay-proof.txt` (hashes of every artifact used)
* **No network** mode (offline‑first). Fails closed if any hash mismatches.

### Metrics (clear + auditable)

* Per case: TP/FP/FN for **reachable** functions (or edges), plus optional **sink‑reach** verification.
* Aggregates: micro/macro F1; “Determinism Index” (stddev of repeated runs must be 0).
* **Repro test:** the harness re‑runs N=3 and asserts identical outputs (hash compare).

### Sanitization & legal

* Strip any proprietary code/data; prefer OSS with permissive licenses.
* Replace real package registries with **local mirrors** and pin digests.
* Publish under **CC‑BY‑4.0** (data) + **Apache‑2.0** (harness). Add a simple **contributor license agreement** for external case submissions.

### Baselines to include (neutral + useful)

* “Naïve reachable” (all functions in package)
* “Imports‑only” (entrypoints that match import graph)
* “Call‑depth‑2” (bounded traversal)
* **Your** graph engine run with **frozen rules** from the manifest (as a reference, not a claim of SOTA)

### Repository layout (public)

```
stellaops-reachability-benchmark/
  dataset/
    dataset.json
    sbom/...
    vex/...
    graphs/...
    truth/...
    manifest.lock.json  (DSSE-signed)
  harness/
    Dockerfile
    runner.py (CLI)
    schema/ (JSON Schemas for graphs, truth, scores)
  docs/
    HOWTO.md (5-min run)
    CONTRIBUTING.md
    SANITIZATION.md
    LICENSES/
```

### Docs your team can ship in a day

* **HOWTO.md:** `docker run -v $PWD/dataset:/d -v $PWD/out:/o stellaops/benchmark-harness score /d /o`
* **SCHEMA.md:** JSON Schemas for graph and truth (keep fields minimal: `nodes`, `edges`, `purls`, `sinks`, `evidence`).
* **REPRODUCIBILITY.md:** explains DSSE signatures, lockfile, and offline run.
* **LIMITATIONS.md:** clarifies scope (no dynamic runtime traces in v1, etc.).

### Governance (lightweight)

* **Versioned releases:** `v0.1`, `v0.2` with changelogs.
* **Submission gate:** PR template + CI that:

  * validates schemas
  * checks hashes match lockfile
  * re‑scores and compares to contributor’s score
* **Leaderboard cadence:** monthly markdown table regenerated by CI.

### Launch plan (2‑week sprint)

* **Day 1–2:** pick cases; finalize schemas; write SANITIZATION.md.
* **Day 3–5:** build harness image; implement deterministic runner; freeze `manifest.lock.json`.
* **Day 6–8:** produce ground truth; run baselines; generate initial scores.
* **Day 9–10:** docs + website README; record a 2‑minute demo GIF.
* **Day 11–12:** legal review + licenses; create issue labels (“good first case”).
* **Day 13–14:** publish, post on GitHub + LinkedIn; invite Semgrep/Snyk/OSS‑Fuzz folks to submit cases.

### Nice‑to‑have (but easy)

* **JSON Schema** for ground‑truth edges so academics can auto‑ingest.
* **Small “unknowns” registry** example to show how you annotate unresolved symbols without breaking determinism.
* **Binary mini‑lab**: stripped vs non‑stripped ELF pair to show your patch‑oracle technique in action (truth labels reflect oracle result).

If you want, I can draft the repo skeleton (folders, placeholder JSON Schemas, a sample `manifest.lock.json`, and a minimal `runner.py` CLI) so you can drop it straight into GitHub.
Got you — let’s turn that high‑level idea into something your devs can actually pick up and ship.

Below is a **concrete implementation plan** for the *StellaOps Reachability Benchmark* repo: directory structure, components, tasks, and acceptance criteria. You can drop this straight into a ticketing system as epics → stories.

---

## 0. Tech assumptions (adjust if needed)

To be specific, I’ll assume:

* **Repo**: `stellaops-reachability-benchmark`
* **Harness language**: Python 3.11+
* **Packaging**: Docker image for the harness
* **Schemas**: JSON Schema (Draft 2020–12)
* **CI**: GitHub Actions

If your stack differs, you can still reuse the structure and acceptance criteria.

---

## 1. Repo skeleton & project bootstrap

**Goal:** Create a minimal but fully wired repo.

### Tasks

1. **Create skeleton**

   * Structure:

     ```text
     stellaops-reachability-benchmark/
       dataset/
         dataset.json
         sbom/
         vex/
         graphs/
         truth/
         packages/
         manifest.lock.json    # initially stub
       harness/
         reachbench/
           __init__.py
           cli.py
           dataset_loader.py
           schemas/
             graph.schema.json
             truth.schema.json
             dataset.schema.json
             scores.schema.json
         tests/
       docs/
         HOWTO.md
         SCHEMA.md
         REPRODUCIBILITY.md
         LIMITATIONS.md
         SANITIZATION.md
       .github/
         workflows/
           ci.yml
       pyproject.toml
       README.md
       LICENSE
       Dockerfile
     ```

2. **Bootstrap Python project**

   * `pyproject.toml` with:

     * `reachbench` package
     * deps: `jsonschema`, `click` or `typer`, `pyyaml`, `pytest`
   * `harness/tests/` with a dummy test to ensure CI is green.

3. **Dockerfile**

   * Minimal, pinned versions:

     ```Dockerfile
     FROM python:3.11-slim
     WORKDIR /app
     COPY . .
     RUN pip install --no-cache-dir .
     ENTRYPOINT ["reachbench"]
     ```

4. **CI basic pipeline (`.github/workflows/ci.yml`)**

   * Jobs:

     * `lint` (e.g., `ruff` or `flake8` if you want)
     * `test` (pytest)
     * `build-docker` (just to ensure Dockerfile stays valid)

### Acceptance criteria

* `pip install .` works locally.
* `reachbench --help` prints CLI help (even if commands are stubs).
* CI passes on main branch.

---

## 2. Dataset & schema definitions

**Goal:** Define all JSON formats and enforce them.

### 2.1 Define dataset index format (`dataset/dataset.json`)

**File:** `dataset/dataset.json`

**Example:**

```json
{
  "version": "0.1.0",
  "cases": [
    {
      "id": "php-wordpress-5.8-cve-2023-12345",
      "language": "php",
      "kind": "source",          // "source" | "binary" | "container"
      "cves": ["CVE-2023-12345"],
      "artifacts": {
        "package": {
          "path": "packages/php/wordpress-5.8.tar.gz",
          "sha256": "…"
        },
        "sbom": {
          "path": "sbom/php/wordpress-5.8.cdx.json",
          "format": "cyclonedx-1.6",
          "sha256": "…"
        },
        "vex": {
          "path": "vex/php/wordpress-5.8.vex.json",
          "format": "csaf-2.0",
          "sha256": "…"
        },
        "graph": {
          "path": "graphs/php/wordpress-5.8.graph.json",
          "schema": "graph.schema.json",
          "sha256": "…"
        },
        "truth": {
          "path": "truth/php/wordpress-5.8.truth.json",
          "schema": "truth.schema.json",
          "sha256": "…"
        }
      }
    }
  ]
}
```

### 2.2 Define **truth schema** (`harness/reachbench/schemas/truth.schema.json`)

**Model (conceptual):**

```jsonc
{
  "case_id": "php-wordpress-5.8-cve-2023-12345",
  "vulnerable_components": [
    {
      "cve": "CVE-2023-12345",
      "symbol": "wp_ajax_nopriv_some_vuln",
      "symbol_kind": "function",      // "function" | "method" | "binary_symbol"
      "status": "reachable",          // "reachable" | "not_reachable"
      "reachable_from": [
        {
          "entrypoint_id": "web:GET:/foo",
          "notes": "HTTP route /foo"
        }
      ],
      "evidence": "manual-analysis"   // or "unit-test", "patch-oracle"
    }
  ],
  "non_vulnerable_components": [
    {
      "symbol": "wp_safe_function",
      "symbol_kind": "function",
      "status": "not_reachable",
      "evidence": "manual-analysis"
    }
  ]
}
```

**Tasks**

* Implement JSON Schema capturing:

  * required fields: `case_id`, `vulnerable_components`
  * allowed enums for `symbol_kind`, `status`, `evidence`
* Add unit tests that:

  * validate a valid truth file
  * fail on various broken ones (missing `case_id`, unknown `status`, etc.)

### 2.3 Define **graph schema** (`harness/reachbench/schemas/graph.schema.json`)

**Model (conceptual):**

```jsonc
{
  "case_id": "php-wordpress-5.8-cve-2023-12345",
  "language": "php",
  "nodes": [
    {
      "id": "func:wp_ajax_nopriv_some_vuln",
      "symbol": "wp_ajax_nopriv_some_vuln",
      "kind": "function",
      "purl": "pkg:composer/wordpress/wordpress@5.8"
    }
  ],
  "edges": [
    {
      "from": "func:wp_ajax_nopriv_some_vuln",
      "to": "func:wpdb_query",
      "kind": "call"
    }
  ],
  "entrypoints": [
    {
      "id": "web:GET:/foo",
      "symbol": "some_controller",
      "kind": "http_route"
    }
  ]
}
```

**Tasks**

* JSON Schema with:

  * `nodes[]` (id, symbol, kind, optional purl)
  * `edges[]` (`from`, `to`, `kind`)
  * `entrypoints[]` (id, symbol, kind)
* Tests: verify a valid graph; invalid ones (missing `id`, unknown `kind`) are rejected.

### 2.4 Dataset index schema (`dataset.schema.json`)

* JSON Schema describing `dataset.json` (version string, cases array).
* Tests: validate the example dataset file.

### Acceptance criteria

* Running a simple script (will be `reachbench validate-dataset`) validates all JSON files in `dataset/` against schemas without errors.
* CI fails if any dataset JSON is invalid.

---

## 3. Lockfile & determinism manifest

**Goal:** Implement `manifest.lock.json` generation and verification.

### 3.1 Lockfile structure

**File:** `dataset/manifest.lock.json`

**Example:**

```jsonc
{
  "version": "0.1.0",
  "created_at": "2025-01-15T12:00:00Z",
  "dataset": {
    "root": "dataset/",
    "sha256": "…",
    "cases": {
      "php-wordpress-5.8-cve-2023-12345": {
        "sha256": "…"
      }
    }
  },
  "tools": {
    "graph_normalizer": {
      "name": "stellaops-graph-normalizer",
      "version": "1.2.3",
      "sha256": "…"
    }
  },
  "containers": {
    "scanner_image": "ghcr.io/stellaops/scanner@sha256:…",
    "normalizer_image": "ghcr.io/stellaops/normalizer@sha256:…"
  },
  "signatures": [
    {
      "type": "dsse",
      "key_id": "stellaops-benchmark-key-1",
      "signature": "base64-encoded-blob"
    }
  ]
}
```

*(Signatures can be optional in v1 – but structure should be there.)*

### 3.2 `lockfile.py` module

**File:** `harness/reachbench/lockfile.py`

**Responsibilities**

* Compute deterministic SHA-256 digest of:

  * each case’s artifacts (path → hash from `dataset.json`)
  * entire `dataset/` tree (sorted traversal)
* Generate new `manifest.lock.json`:

  * `version` (hard-coded constant)
  * `created_at` (UTC ISO8601)
  * `dataset` section with case hashes
* Verification:

  * `verify_lockfile(dataset_root, lockfile_path)`:

    * recompute hashes
    * compare to `lockfile.dataset`
    * return boolean + list of mismatches

**Tasks**

1. Implement canonical hashing:

   * For text JSON files: normalize with:

     * sort keys
     * no whitespace
     * UTF‑8 encoding
   * For binaries (packages): raw bytes.
2. Implement `compute_dataset_hashes(dataset_root)`:

   * Returns `{"cases": {...}, "root_sha256": "…"}`.
3. Implement `write_lockfile(...)` and `verify_lockfile(...)`.
4. Tests:

   * Two calls with same dataset produce identical lockfile (order of `cases` keys normalized).
   * Changing any artifact file changes the root hash and causes verify to fail.

### 3.3 CLI commands

Add to `cli.py`:

* `reachbench compute-lockfile --dataset-root ./dataset --out ./dataset/manifest.lock.json`
* `reachbench verify-lockfile --dataset-root ./dataset --lockfile ./dataset/manifest.lock.json`

### Acceptance criteria

* `reachbench compute-lockfile` generates a stable file (byte-for-byte identical across runs).
* `reachbench verify-lockfile` exits with:

  * code 0 if matches
  * non-zero if mismatch (plus human-readable diff).

---

## 4. Scoring harness CLI

**Goal:** Deterministically score participant results against ground truth.

### 4.1 Result format (participant output)

**Expectation:**

Participants provide `results/` with one JSON per case:

```text
results/
  php-wordpress-5.8-cve-2023-12345.json
  js-express-4.17-cve-2022-9999.json
```

**Result file example:**

```jsonc
{
  "case_id": "php-wordpress-5.8-cve-2023-12345",
  "tool_name": "my-reachability-analyzer",
  "tool_version": "1.0.0",
  "predictions": [
    {
      "cve": "CVE-2023-12345",
      "symbol": "wp_ajax_nopriv_some_vuln",
      "symbol_kind": "function",
      "status": "reachable"
    },
    {
      "cve": "CVE-2023-12345",
      "symbol": "wp_safe_function",
      "symbol_kind": "function",
      "status": "not_reachable"
    }
  ]
}
```

### 4.2 Scoring model

* Treat scoring as classification over `(cve, symbol)` pairs.
* For each case:

  * Truth positives: all `vulnerable_components` with `status == "reachable"`.
  * Truth negatives: everything marked `not_reachable` (optional in v1).
  * Predictions: all entries with `status == "reachable"`.
* Compute:

  * `TP`: predicted reachable & truth reachable.
  * `FP`: predicted reachable but truth says not reachable / unknown.
  * `FN`: truth reachable but not predicted reachable.
* Metrics:

  * Precision, Recall, F1 per case.
  * Macro-averaged metrics across all cases.

### 4.3 Implementation (`scoring.py`)

**File:** `harness/reachbench/scoring.py`

**Functions:**

* `load_truth(case_truth_path) -> TruthModel`

* `load_predictions(predictions_path) -> PredictionModel`

* `compute_case_metrics(truth, preds) -> dict`

  * returns:

    ```python
    {
      "case_id": str,
      "tp": int,
      "fp": int,
      "fn": int,
      "precision": float,
      "recall": float,
      "f1": float
    }
    ```

* `aggregate_metrics(case_metrics_list) -> dict`

  * `macro_precision`, `macro_recall`, `macro_f1`, `num_cases`.

### 4.4 CLI: `score`

**Signature:**

```bash
reachbench score \
  --dataset-root ./dataset \
  --results-root ./results \
  --lockfile ./dataset/manifest.lock.json \
  --out ./out/scores.json \
  [--cases php-*] \
  [--repeat 3]
```

**Behavior:**

1. **Verify lockfile** (fail closed if mismatch).

2. Load `dataset.json`, filter cases if `--cases` is set (glob).

3. For each case:

   * Load truth file (and validate schema).
   * Locate results file (`<case_id>.json`) under `results-root`:

     * If missing, treat as all FN (or mark case as “no submission”).
   * Load and validate predictions (include a JSON Schema: `results.schema.json`).
   * Compute per-case metrics.

4. Aggregate metrics.

5. Write `scores.json`:

   ```jsonc
   {
     "version": "0.1.0",
     "dataset_version": "0.1.0",
     "generated_at": "2025-01-15T12:34:56Z",
     "macro_precision": 0.92,
     "macro_recall": 0.88,
     "macro_f1": 0.90,
     "cases": [
       {
         "case_id": "php-wordpress-5.8-cve-2023-12345",
         "tp": 10,
         "fp": 1,
         "fn": 2,
         "precision": 0.91,
         "recall": 0.83,
         "f1": 0.87
       }
     ]
   }
   ```

6. **Determinism check**:

   * If `--repeat N` given:

     * Re-run scoring in-memory N times.
     * Compare resulting JSON strings (canonicalized via sorted keys).
     * If any differ, exit non-zero with message (“non-deterministic scoring detected”).

### 4.5 Offline-only mode

* In `cli.py`, early check:

  ```python
  if os.getenv("REACHBENCH_OFFLINE_ONLY", "1") == "1":
      # Verify no outbound network: by policy, just ensure we never call any net libs.
      # (In v1, simply avoid adding any such calls.)
  ```

* Document that harness must not reach out to the internet.

### Acceptance criteria

* Given a small artificial dataset with 2–3 cases and handcrafted results, `reachbench score` produces expected metrics (assert via tests).
* Running `reachbench score --repeat 3` produces identical `scores.json` across runs.
* Missing results files are handled gracefully (but clearly documented).

---

## 5. Baseline implementations

**Goal:** Provide in-repo baselines that use only the provided graphs (no extra tooling).

### 5.1 Baseline types

1. **Naïve reachable**: all symbols in the vulnerable package are considered reachable.
2. **Imports-only**: reachable = any symbol that:

   * appears in the graph AND
   * is reachable from any entrypoint by a single edge OR name match.
3. **Call-depth-2**:

   * From each entrypoint, traverse up to depth 2 along `call` edges.
   * Anything at depth ≤ 2 is considered reachable.

### 5.2 Implementation

**File:** `harness/reachbench/baselines.py`

* `baseline_naive(graph, truth) -> PredictionModel`
* `baseline_imports_only(graph, truth) -> PredictionModel`
* `baseline_call_depth_2(graph, truth) -> PredictionModel`

**CLI:**

```bash
reachbench run-baseline \
  --dataset-root ./dataset \
  --baseline naive|imports|depth2 \
  --out ./results-baseline-<baseline>/
```

Behavior:

* For each case:

  * Load graph.
  * Generate predictions per baseline.
  * Write result file `results-baseline-<baseline>/<case_id>.json`.

### 5.3 Tests

* Tiny synthetic dataset in `harness/tests/data/`:

  * 1–2 cases with simple graphs.
  * Known expectations for each baseline (TP/FP/FN counts).

### Acceptance criteria

* `reachbench run-baseline --baseline naive` runs end-to-end and outputs results files.
* `reachbench score` on baseline results produces stable scores.
* Tests validate baseline behavior on synthetic cases.

---

## 6. Dataset validation & tooling

**Goal:** One command to validate everything (schemas, hashes, internal consistency).

### CLI: `validate-dataset`

```bash
reachbench validate-dataset \
  --dataset-root ./dataset \
  [--lockfile ./dataset/manifest.lock.json]
```

**Checks:**

1. `dataset.json` conforms to `dataset.schema.json`.
2. For each case:

   * all artifact paths exist
   * `graph` file passes `graph.schema.json`
   * `truth` file passes `truth.schema.json`
3. Optional: verify lockfile if provided.

**Implementation:**

* `dataset_loader.py`:

  * `load_dataset_index(path) -> DatasetIndex`
  * `iter_cases(dataset_index)` yields case objects.
  * `validate_case(case, dataset_root) -> list[str]` (list of error messages).

**Acceptance criteria**

* Broken paths / invalid JSON produce a clear error message and non-zero exit code.
* CI job calls `reachbench validate-dataset` on every push.

---

## 7. Documentation

**Goal:** Make it trivial for outsiders to use the benchmark.

### 7.1 `README.md`

* Overview:

  * What the benchmark is.
  * What it measures (reachability precision/recall).
* Quickstart:

  ```bash
  git clone ...
  cd stellaops-reachability-benchmark

  # Validate dataset
  reachbench validate-dataset --dataset-root ./dataset

  # Run baselines
  reachbench run-baseline --baseline naive --dataset-root ./dataset --out ./results-naive

  # Score baselines
  reachbench score --dataset-root ./dataset --results-root ./results-naive --out ./out/naive-scores.json
  ```

### 7.2 `docs/HOWTO.md`

* Step-by-step:

  * Installing harness.
  * Running your own tool on the dataset.
  * Formatting your `results/`.
  * Running `reachbench score`.
  * Interpreting `scores.json`.

### 7.3 `docs/SCHEMA.md`

* Human-readable description of:

  * `graph` JSON
  * `truth` JSON
  * `results` JSON
  * `scores` JSON
* Link to actual JSON Schemas.

### 7.4 `docs/REPRODUCIBILITY.md`

* Explain:

  * lockfile design
  * hashing rules
  * deterministic scoring and `--repeat` flag
  * how to verify you’re using the exact same dataset.

### 7.5 `docs/SANITIZATION.md`

* Rules for adding new cases:

  * Only use OSS or properly licensed code.
  * Strip secrets / proprietary paths / user data.
  * How to confirm nothing sensitive is in package tarballs.

### Acceptance criteria

* A new engineer (or external user) can go from zero to “I ran the baseline and got scores” by following docs only.
* All example commands work as written.

---

## 8. CI/CD details

**Goal:** Keep repo healthy and ensure determinism.

### CI jobs (GitHub Actions)

1. **`lint`**

   * Run `ruff` / `flake8` (your choice).
2. **`test`**

   * Run `pytest`.
3. **`validate-dataset`**

   * Run `reachbench validate-dataset --dataset-root ./dataset`.
4. **`determinism`**

   * Small workflow step:

     * Run `reachbench score` on a tiny test dataset with `--repeat 3`.
     * Assert success.
5. **`docker-build`**

   * `docker build` the harness image.

### Acceptance criteria

* All jobs green on main.
* PRs show failing status if schemas or determinism break.

---

## 9. Rough “epics → stories” breakdown

You can paste roughly like this into Jira/Linear:

1. **Epic: Repo bootstrap & CI**

   * Story: Create repo skeleton & Python project
   * Story: Add Dockerfile & basic CI (lint + tests)

2. **Epic: Schemas & dataset plumbing**

   * Story: Implement `truth.schema.json` + tests
   * Story: Implement `graph.schema.json` + tests
   * Story: Implement `dataset.schema.json` + tests
   * Story: Implement `validate-dataset` CLI

3. **Epic: Lockfile & determinism**

   * Story: Implement lockfile computation + verification
   * Story: Add `compute-lockfile` & `verify-lockfile` CLI
   * Story: Add determinism checks in CI

4. **Epic: Scoring harness**

   * Story: Define results format + `results.schema.json`
   * Story: Implement scoring logic (`scoring.py`)
   * Story: Implement `score` CLI with `--repeat`
   * Story: Add unit tests for metrics

5. **Epic: Baselines**

   * Story: Implement naive baseline
   * Story: Implement imports-only baseline
   * Story: Implement depth-2 baseline
   * Story: Add `run-baseline` CLI + tests

6. **Epic: Documentation & polish**

   * Story: Write README + HOWTO
   * Story: Write SCHEMA / REPRODUCIBILITY / SANITIZATION docs
   * Story: Final repo cleanup & examples

---

If you tell me your preferred language and CI, I can also rewrite this into exact tickets and even starter code for `cli.py` and a couple of schemas.