Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly. - Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps. - Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges. - Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges. - Set up project file for the test project with necessary dependencies and configurations. - Include JSON fixture files for testing purposes.
8.5 KiB
8.5 KiB
Graph Index Canonical Schema
Ownership: Graph Indexer Guild • Version 2025-11-03 (Sprint 140)
Scope: Canonical node and edge schemas, attribute dictionary, identity rules, and fixture references for the Graph Indexer foundations (GRAPH-INDEX-28-001).
1. Purpose
- Provide a deterministic schema contract for graph indexing pipelines.
- Document the attribute dictionary consumed by SBOM, Advisory, VEX, Policy, and Runtime signal feeds.
- Define the identity rules that guarantee stable node and edge identifiers across rebuilds.
- Point implementers and QA to the seed fixtures used in unit/integration tests.
2. Node taxonomy
| Node kind | Identity tuple (ordered) | Required attributes | Primary sources |
|---|---|---|---|
artifact |
tenant, artifact_digest, sbom_digest |
display_name, artifact_digest, sbom_digest, environment, labels[], origin_registry, supply_chain_stage |
Scanner WebService, SBOM Service |
component |
tenant, purl, source_type |
purl, version, ecosystem, scope, license_spdx, usage |
SBOM Service analyzers |
file |
tenant, artifact_digest, normalized_path, content_sha256 |
normalized_path, content_sha256, language_hint, size_bytes, scope |
SBOM layer analyzers |
license |
tenant, license_spdx, source_digest |
license_spdx, name, classification, notice_uri |
SBOM Service, Concelier |
advisory |
tenant, advisory_source, advisory_id, content_hash |
advisory_source, advisory_id, severity, published_at, content_hash, linkset_digest |
Concelier |
vex_statement |
tenant, vex_source, statement_id, content_hash |
status, statement_id, justification, issued_at, expires_at, content_hash |
Excititor |
policy_version |
tenant, policy_pack_digest, effective_from |
policy_pack_digest, policy_name, effective_from, expires_at, explain_hash |
Policy Engine |
runtime_context |
tenant, runtime_fingerprint, collector, observed_at |
runtime_fingerprint, collector, observed_at, cluster, namespace, workload_kind, runtime_state |
Signals, Zastava |
3. Edge taxonomy
| Edge kind | Source → Target | Identity tuple (ordered) | Required attributes | Default validity |
|---|---|---|---|---|
CONTAINS |
artifact → component |
tenant, artifact_node_id, component_node_id, sbom_digest |
detected_by, layer_digest, scope, evidence_digest |
valid_from = sbom_collected_at, valid_to = null |
DEPENDS_ON |
component → component |
tenant, component_node_id, dependency_purl, sbom_digest |
dependency_purl, dependency_version, relationship, evidence_digest |
Derived from SBOM dependency graph |
DECLARED_IN |
component → file |
tenant, component_node_id, file_node_id, sbom_digest |
detected_by, scope, evidence_digest |
Mirrors SBOM declaration |
BUILT_FROM |
artifact → artifact |
tenant, parent_artifact_node_id, child_artifact_digest |
build_type, builder_id, attestation_digest |
Derived from provenance attestations |
AFFECTED_BY |
component → advisory |
tenant, component_node_id, advisory_node_id, linkset_digest |
evidence_digest, matched_versions, cvss, confidence |
Concelier overlays |
VEX_EXEMPTS |
component → vex_statement |
tenant, component_node_id, vex_node_id, statement_hash |
status, justification, impact_statement, evidence_digest |
Excititor overlays |
GOVERNS_WITH |
policy_version → component |
tenant, policy_node_id, component_node_id, finding_explain_hash |
verdict, explain_hash, policy_rule_id, evaluation_timestamp |
Policy Engine overlays |
OBSERVED_RUNTIME |
runtime_context → component |
tenant, runtime_node_id, component_node_id, runtime_fingerprint |
process_name, entrypoint_kind, runtime_evidence_digest, confidence |
Signals/Zastava ingestion |
4. Attribute dictionary
| Attribute | Type | Applies to | Description |
|---|---|---|---|
tenant |
string |
nodes, edges | Tenant identifier (enforced on storage and query). |
kind |
string |
nodes, edges | One of the values listed in the taxonomy tables. |
canonical_key |
object |
nodes | Ordered tuple persisted as a JSON object matching the identity tuple components. |
id |
string |
nodes, edges | Deterministic identifier (gn: or ge: prefix + Base32-encoded SHA-256). |
hash |
string |
nodes, edges | SHA-256 of the canonical JSON representation (normalized by sorted keys). |
attributes |
object |
nodes, edges | Domain-specific attributes (all dictionary keys kebab-case). |
provenance |
object |
nodes, edges | Includes source, collected_at, sbom_digest, attestation_digest, event_offset. |
valid_from |
string (ISO-8601) |
nodes, edges | Inclusive timestamp describing when the record became effective. |
valid_to |
string (ISO-8601 or null) |
nodes, edges | Exclusive timestamp; null means open-ended. |
scope |
string |
nodes, edges | Scope label (e.g., runtime, build, dev-dependency). |
labels |
array[string] |
nodes | Free-form but deterministic ordering (ASCII sort). |
confidence |
number |
edges | 0-1 numeric confidence score for overlay-derived edges. |
evidence_digest |
string |
edges | SHA-256 digest referencing the immutable evidence payload. |
linkset_digest |
string |
nodes, edges | SHA-256 digest to Concelier linkset documents. |
explain_hash |
string |
nodes, edges | Hash of Policy Engine explain trace payload. |
runtime_state |
string |
runtime_context nodes |
Aggregated runtime state (e.g., Running, Terminated). |
5. Identity rules
- Node IDs (
gn:prefix).
id = "gn:" + tenant + ":" + kind + ":" + base32(sha256(identity_tuple))
identity_tupleconcatenates tuple components with|(no escaping) and lower-cases both keys and values unless the component is a hash or digest. - Edge IDs (
ge:prefix).
id = "ge:" + tenant + ":" + kind + ":" + base32(sha256(identity_tuple))
Edge tuples must include the resolved node IDs rather than only the canonical keys to ensure immutability under re-key events. - Hashes.
hashis computed by serializing the canonical document with:- UTF-8 JSON
- Object keys sorted lexicographically
- Arrays sorted where semantics allow (e.g.,
labels,matched_versions) - Timestamps normalized to UTC ISO-8601 (
YYYY-MM-DDTHH:MM:SSZ)
- Deterministic provenance.
provenance.sourceis a dotted string (scanner.sbom.v1,concelier.linkset.v1) andprovenance.event_offsetis a monotonic integer for replay.
6. Validity window semantics
valid_fromequals the upstream event timestamp at ingestion time (SBOM collected timestamp, advisory published timestamp, policy evaluation timestamp, runtime observation timestamp).valid_tostaysnulluntil a newer version supersedes the record. Superseding records carry asupersedesreference inattributes.- Snapshots freeze the set of nodes/edges with
valid_from <= snapshot_at < coalesce(valid_to, +∞).
7. Fixtures & verification
- Seed fixtures live under
tests/Graph/StellaOps.Graph.Indexer.Tests/Fixtures/v1/. - Fixture files:
nodes.json— canonical node samples (per node kind).edges.json— canonical edge samples including overlay references.schema-matrix.json— lists attribute coverage per node/edge kind for regression tests.
- Unit tests assert:
- Identifier determinism (
GraphIdentityTests.NodeIds_are_stable). - Hash determinism under property ordering variations.
- Attribute coverage against
schema-matrix.json.
- Identifier determinism (
- Fixtures follow the attribute dictionary above; new attributes require dictionary updates and fixture refresh.
8. Change control
- Increment schema version in fixture folder (
v1,v2, …) when making breaking changes. - Update this document and the JSON fixtures together; do not ship mismatched versions.
- Notify SBOM Service, Concelier, Excititor, Policy, Signals, and Zastava owners before promoting changes to DOING/DONE state.
9. References
docs/modules/graph/architecture.md— high-level architecture.docs/modules/platform/architecture-overview.md— platform context.src/Graph/StellaOps.Graph.Indexer/TASKS.md— task tracking.seed-data/— additional sample payloads for offline kit packaging (future work).