# Graph Index Canonical Schema > Ownership: Graph Indexer Guild • Version 2025-11-03 (Sprint 140)\ > Scope: Canonical node and edge schemas, attribute dictionary, identity rules, and fixture references for the Graph Indexer foundations (GRAPH-INDEX-28-001). ## 1. Purpose - Provide a deterministic schema contract for graph indexing pipelines. - Document the attribute dictionary consumed by SBOM, Advisory, VEX, Policy, and Runtime signal feeds. - Define the identity rules that guarantee stable node and edge identifiers across rebuilds. - Point implementers and QA to the seed fixtures used in unit/integration tests. ## 2. Node taxonomy | Node kind | Identity tuple (ordered) | Required attributes | Primary sources | |-----------|--------------------------|---------------------|-----------------| | `artifact` | `tenant`, `artifact_digest`, `sbom_digest` | `display_name`, `artifact_digest`, `sbom_digest`, `environment`, `labels[]`, `origin_registry`, `supply_chain_stage` | Scanner WebService, SBOM Service | | `component` | `tenant`, `purl`, `source_type` | `purl`, `version`, `ecosystem`, `scope`, `license_spdx`, `usage` | SBOM Service analyzers | | `file` | `tenant`, `artifact_digest`, `normalized_path`, `content_sha256` | `normalized_path`, `content_sha256`, `language_hint`, `size_bytes`, `scope` | SBOM layer analyzers | | `license` | `tenant`, `license_spdx`, `source_digest` | `license_spdx`, `name`, `classification`, `notice_uri` | SBOM Service, Concelier | | `advisory` | `tenant`, `advisory_source`, `advisory_id`, `content_hash` | `advisory_source`, `advisory_id`, `severity`, `published_at`, `content_hash`, `linkset_digest` | Concelier | | `vex_statement` | `tenant`, `vex_source`, `statement_id`, `content_hash` | `status`, `statement_id`, `justification`, `issued_at`, `expires_at`, `content_hash` | Excititor | | `policy_version` | `tenant`, `policy_pack_digest`, `effective_from` | `policy_pack_digest`, `policy_name`, `effective_from`, `expires_at`, `explain_hash` | Policy Engine | | `runtime_context` | `tenant`, `runtime_fingerprint`, `collector`, `observed_at` | `runtime_fingerprint`, `collector`, `observed_at`, `cluster`, `namespace`, `workload_kind`, `runtime_state` | Signals, Zastava | ## 3. Edge taxonomy | Edge kind | Source → Target | Identity tuple (ordered) | Required attributes | Default validity | |-----------|-----------------|--------------------------|---------------------|------------------| | `CONTAINS` | `artifact` → `component` | `tenant`, `artifact_node_id`, `component_node_id`, `sbom_digest` | `detected_by`, `layer_digest`, `scope`, `evidence_digest` | `valid_from = sbom_collected_at`, `valid_to = null` | | `DEPENDS_ON` | `component` → `component` | `tenant`, `component_node_id`, `dependency_purl`, `sbom_digest` | `dependency_purl`, `dependency_version`, `relationship`, `evidence_digest` | Derived from SBOM dependency graph | | `DECLARED_IN` | `component` → `file` | `tenant`, `component_node_id`, `file_node_id`, `sbom_digest` | `detected_by`, `scope`, `evidence_digest` | Mirrors SBOM declaration | | `BUILT_FROM` | `artifact` → `artifact` | `tenant`, `parent_artifact_node_id`, `child_artifact_digest` | `build_type`, `builder_id`, `attestation_digest` | Derived from provenance attestations | | `AFFECTED_BY` | `component` → `advisory` | `tenant`, `component_node_id`, `advisory_node_id`, `linkset_digest` | `evidence_digest`, `matched_versions`, `cvss`, `confidence` | Concelier overlays | | `VEX_EXEMPTS` | `component` → `vex_statement` | `tenant`, `component_node_id`, `vex_node_id`, `statement_hash` | `status`, `justification`, `impact_statement`, `evidence_digest` | Excititor overlays | | `GOVERNS_WITH` | `policy_version` → `component` | `tenant`, `policy_node_id`, `component_node_id`, `finding_explain_hash` | `verdict`, `explain_hash`, `policy_rule_id`, `evaluation_timestamp` | Policy Engine overlays | | `OBSERVED_RUNTIME` | `runtime_context` → `component` | `tenant`, `runtime_node_id`, `component_node_id`, `runtime_fingerprint` | `process_name`, `entrypoint_kind`, `runtime_evidence_digest`, `confidence` | Signals/Zastava ingestion | ## 4. Attribute dictionary | Attribute | Type | Applies to | Description | |-----------|------|------------|-------------| | `tenant` | `string` | nodes, edges | Tenant identifier (enforced on storage and query). | | `kind` | `string` | nodes, edges | One of the values listed in the taxonomy tables. | | `canonical_key` | `object` | nodes | Ordered tuple persisted as a JSON object matching the identity tuple components. | | `id` | `string` | nodes, edges | Deterministic identifier (`gn:` or `ge:` prefix + Base32-encoded SHA-256). | | `hash` | `string` | nodes, edges | SHA-256 of the canonical JSON representation (normalized by sorted keys). | | `attributes` | `object` | nodes, edges | Domain-specific attributes (all dictionary keys kebab-case). | | `provenance` | `object` | nodes, edges | Includes `source`, `collected_at`, `sbom_digest`, `attestation_digest`, `event_offset`. | | `valid_from` | `string (ISO-8601)` | nodes, edges | Inclusive timestamp describing when the record became effective. | | `valid_to` | `string (ISO-8601 or null)` | nodes, edges | Exclusive timestamp; `null` means open-ended. | | `scope` | `string` | nodes, edges | Scope label (e.g., `runtime`, `build`, `dev-dependency`). | | `labels` | `array[string]` | nodes | Free-form but deterministic ordering (ASCII sort). | | `confidence` | `number` | edges | 0-1 numeric confidence score for overlay-derived edges. | | `evidence_digest` | `string` | edges | SHA-256 digest referencing the immutable evidence payload. | | `linkset_digest` | `string` | nodes, edges | SHA-256 digest to Concelier linkset documents. | | `explain_hash` | `string` | nodes, edges | Hash of Policy Engine explain trace payload. | | `runtime_state` | `string` | `runtime_context` nodes | Aggregated runtime state (e.g., `Running`, `Terminated`). | ## 5. Identity rules 1. **Node IDs (`gn:` prefix).** `id = "gn:" + tenant + ":" + kind + ":" + base32(sha256(identity_tuple))`\ `identity_tuple` concatenates tuple components with `|` (no escaping) and lower-cases both keys and values unless the component is a hash or digest. 2. **Edge IDs (`ge:` prefix).** `id = "ge:" + tenant + ":" + kind + ":" + base32(sha256(identity_tuple))`\ Edge tuples must include the resolved node IDs rather than only the canonical keys to ensure immutability under re-key events. 3. **Hashes.** `hash` is computed by serializing the canonical document with: - UTF-8 JSON - Object keys sorted lexicographically - Arrays sorted where semantics allow (e.g., `labels`, `matched_versions`) - Timestamps normalized to UTC ISO-8601 (`YYYY-MM-DDTHH:MM:SSZ`) 4. **Deterministic provenance.** `provenance.source` is a dotted string (`scanner.sbom.v1`, `concelier.linkset.v1`) and `provenance.event_offset` is a monotonic integer for replay. ## 6. Validity window semantics - `valid_from` equals the upstream event timestamp at ingestion time (SBOM collected timestamp, advisory published timestamp, policy evaluation timestamp, runtime observation timestamp). - `valid_to` stays `null` until a newer version supersedes the record. Superseding records carry a `supersedes` reference in `attributes`. - Snapshots freeze the set of nodes/edges with `valid_from <= snapshot_at < coalesce(valid_to, +∞)`. ## 7. Fixtures & verification - Seed fixtures live under `tests/Graph/StellaOps.Graph.Indexer.Tests/Fixtures/v1/`. - Fixture files: - `nodes.json` — canonical node samples (per node kind). - `edges.json` — canonical edge samples including overlay references. - `schema-matrix.json` — lists attribute coverage per node/edge kind for regression tests. - Unit tests assert: - Identifier determinism (`GraphIdentityTests.NodeIds_are_stable`). - Hash determinism under property ordering variations. - Attribute coverage against `schema-matrix.json`. - Fixtures follow the attribute dictionary above; new attributes require dictionary updates and fixture refresh. ## 8. Change control - Increment schema version in fixture folder (`v1`, `v2`, …) when making breaking changes. - Update this document and the JSON fixtures together; do not ship mismatched versions. - Notify SBOM Service, Concelier, Excititor, Policy, Signals, and Zastava owners before promoting changes to DOING/DONE state. ## 9. References - `docs/modules/graph/architecture.md` — high-level architecture. - `docs/modules/platform/architecture-overview.md` — platform context. - `src/Graph/StellaOps.Graph.Indexer/TASKS.md` — task tracking. - `seed-data/` — additional sample payloads for offline kit packaging (future work).