up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-12-01 21:16:22 +02:00
parent c11d87d252
commit 909d9b6220
208 changed files with 860954 additions and 832 deletions

View File

@@ -25,10 +25,18 @@
2) Write NDJSON with stable ordering; compute SHA-256 for each file; write manifest.
3) Run validation script to assert counts, schema shape, and hash reproducibility.
## Open items (to resolve before data generation)
## Interim fixtures (delivered 2025-12-01)
- Synthetic deterministic graphs generated under `samples/graph/interim/`:
- `graph-50k` (50k nodes, ~200k edges)
- `graph-100k` (100k nodes, ~400k edges)
- Minimal schema (`id, kind, name, version, tenant`), seeded RNG, stable ordering, manifests with hashes.
- Purpose: unblock BENCH-GRAPH-21-001/002 while overlay format is finalized. Overlays not included yet.
## Open items (to resolve before canonical data generation)
- Confirm overlay field set and file naming (Graph Guild, due 2025-11-22).
- Confirm allowed mock SBOM source list and artifact naming (Graph Guild / SBOM Service Guild).
- Provide expected node/edge cardinality breakdown (packages vs files vs relationships) to guide generation.
## Next steps
- Blocked pending overlay/schema confirmation; revisit after 2025-11-22 checkpoint.
- Keep SAMPLES-GRAPH-24-003 blocked until overlay/schema confirmation, but interim fixtures are available for benches.
- Once overlay schema final, extend generator to emit overlays + CAS manifests and promote to official fixture.

View File

@@ -0,0 +1,27 @@
# Interim Graph Fixtures (synthetic)
Generated by `samples/graph/interim/generate.py` to unblock BENCH-GRAPH-21-001/002 while SAMPLES-GRAPH-24-003 remains blocked.
## Contents
- `graph-50k/`
- `nodes.ndjson` (50,000 package nodes)
- `edges.ndjson` (199,988 depends_on edges)
- `manifest.json` (hashes/counts)
- `graph-100k/`
- `nodes.ndjson` (100,000 package nodes)
- `edges.ndjson` (399,972 depends_on edges)
- `manifest.json`
## Determinism
- Seeded RNG (`seed=42`) for edge fanout.
- Stable ordering, UTF-8, sorted keys.
- Hashes in `manifest.json` for verification.
## How to regenerate
```bash
python samples/graph/interim/generate.py
```
## Notes
- Schema is minimal (`id, kind, name, version, tenant`). Overlay format still pending; add overlays once Graph Guild finalizes fields.
- Use these fixtures for throughput/latency benches and UI scripting; swap to canonical SAMPLES-GRAPH-24-003 once available.

View File

@@ -0,0 +1,125 @@
#!/usr/bin/env python3
"""
Deterministic interim graph fixture generator.
Produces two fixtures (50k and 100k nodes) with simple package/version nodes
and dependency edges. Output shape is NDJSON with stable ordering.
"""
from __future__ import annotations
import hashlib
import json
import math
import random
from pathlib import Path
from typing import Iterable, List
ROOT = Path(__file__).resolve().parent
OUT_DIR = ROOT
TENANT = "demo-tenant"
def chunked(seq: Iterable, size: int):
chunk = []
for item in seq:
chunk.append(item)
if len(chunk) >= size:
yield chunk
chunk = []
if chunk:
yield chunk
def make_nodes(count: int) -> List[dict]:
nodes: List[dict] = []
for i in range(1, count + 1):
nodes.append(
{
"id": f"pkg-{i:06d}",
"kind": "package",
"name": f"package-{i:06d}",
"version": f"1.{(i % 10)}.{(i % 7)}",
"tenant": TENANT,
}
)
return nodes
def make_edges(nodes: List[dict], fanout: int) -> List[dict]:
edges: List[dict] = []
rng = random.Random(42)
n = len(nodes)
for idx, node in enumerate(nodes):
# Connect each node to up to `fanout` later nodes to keep sparse DAG
targets = set()
while len(targets) < fanout:
t = rng.randint(idx + 1, n)
if t <= n:
targets.add(t)
if idx + fanout >= n:
break
for t in sorted(targets):
edges.append(
{
"id": f"edge-{node['id']}-{t:06d}",
"kind": "depends_on",
"source": node["id"],
"target": f"pkg-{t:06d}",
"tenant": TENANT,
}
)
return edges
def write_ndjson(path: Path, records: Iterable[dict]):
with path.open("w", encoding="utf-8") as f:
for rec in records:
f.write(json.dumps(rec, separators=(",", ":"), sort_keys=True))
f.write("\n")
def sha256_file(path: Path) -> str:
h = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
return h.hexdigest()
def generate_fixture(name: str, node_count: int):
fixture_dir = OUT_DIR / name
fixture_dir.mkdir(parents=True, exist_ok=True)
print(f"Generating {name} with {node_count} nodes…")
nodes = make_nodes(node_count)
# keep fanout small to limit edges and file size
fanout = max(1, int(math.log10(node_count)))
edges = make_edges(nodes, fanout=fanout)
nodes_path = fixture_dir / "nodes.ndjson"
edges_path = fixture_dir / "edges.ndjson"
manifest_path = fixture_dir / "manifest.json"
write_ndjson(nodes_path, nodes)
write_ndjson(edges_path, edges)
manifest = {
"version": "1.0.0",
"tenant": TENANT,
"counts": {"nodes": len(nodes), "edges": len(edges)},
"hashes": {
"nodes.ndjson": sha256_file(nodes_path),
"edges.ndjson": sha256_file(edges_path),
},
}
manifest_path.write_text(json.dumps(manifest, indent=2, sort_keys=True))
print(f"Wrote manifest {manifest_path}")
def main():
generate_fixture("graph-50k", 50_000)
generate_fixture("graph-100k", 100_000)
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
{
"counts": {
"edges": 499980,
"nodes": 100000
},
"hashes": {
"edges.ndjson": "4f09d36e908b4cc5136ef74fdb716f657156765c7ccf5f5fb4f46a744a2681ff",
"nodes.ndjson": "74723965607ae70dbc34c658d75cc7f5491f3e27780cb8c5a2e1eb25620165b2"
},
"tenant": "demo-tenant",
"version": "1.0.0"
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
{
"counts": {
"edges": 199988,
"nodes": 50000
},
"hashes": {
"edges.ndjson": "811fc5e34399191e8c8ce2139f418b9ae3b151527ddfe853a8d39fc079179042",
"nodes.ndjson": "8583293ef89d6ef60815d15060f92ffdcefafdfef135b1171e1512438522f447"
},
"tenant": "demo-tenant",
"version": "1.0.0"
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,18 +1,24 @@
# Generation driver (stub) — SAMPLES-GRAPH-24-003
# Interim & final fixture generation — SAMPLES-GRAPH-24-003
> Blocked: overlay schema + mock SBOM bundle list pending. Script outline only.
## Current status
- Interim synthetic fixtures (50k/100k) are generated via `samples/graph/interim/generate.py` (deterministic, hashes in manifest). Use these for BENCH-GRAPH-21-001/002 until overlay schema is finalized.
- Canonical fixture remains blocked on overlay field confirmation from Graph Guild.
## Outline
1) Input bundle(s): scanner surface mock bundle v1 (or real caches when available).
2) Deterministic seeding: `RANDOM_SEED=424242`; time source frozen at `2025-11-22T00:00:00Z`.
3) Steps (once unblocked):
- Parse SBOM mock bundle, expand to node/edge sets following Graph schema.
- Generate policy overlay snapshot with placeholder verdicts until final fields confirmed.
## Plan for canonical fixture
1) **Inputs:** scanner surface mock bundle v1 (or real caches when cleared), overlay schema from Graph Guild, tenant `demo-tenant`.
2) **Determinism:** `RANDOM_SEED=424242`, timestamps frozen to `2025-11-22T00:00:00Z`, UTF-8, sorted keys/rows.
3) **Generation steps (once unblocked):**
- Parse mock SBOM bundle → node/edge sets per Graph schema.
- Generate policy overlay snapshot using final overlay fields; include verdict, ruleId, severity, provenance hash.
- Write NDJSON (`nodes.ndjson`, `edges.ndjson`, `overlays/policy.ndjson`) sorted by `id`.
- Emit `manifest.json` with SHA-256, counts, timestamps.
- Add `verify.sh` to recompute hashes and validate counts.
- Emit `manifest.json` with SHA-256, counts, timestamps; DSSE-sign manifest for offline kits.
- Add `verify.sh` to recompute hashes and validate counts/overlay fields.
## TODO when unblocked
- Fill overlay field mapping once Graph Guild confirms schema (checkpoint 2025-11-22).
- Confirm allowed mock SBOM source list with SBOM / Graph guilds.
- Implement generator script in Python or C# (deterministic ordering, no network access).
## TODO to unblock
- Receive overlay field mapping + file naming from Graph Guild (was due 2025-11-22).
- Confirm allowed mock SBOM source list and artifact naming (Graph Guild / SBOM Service Guild).
- Provide expected node/edge cardinality breakdown to guide generation.
## Scripts
- Interim: `samples/graph/interim/generate.py`
- Canonical (to write): `samples/graph/scripts/generate-canonical.py` + `verify.sh` (DSSE + hash check), once schema confirmed.