Here’s a quick win for making your vuln paths auditor‑friendly without retraining any models: **add a plain‑language `reason` to every graph edge** (why this edge exists). Think “introduced via dynamic import” or “symbol relocation via `ld`”, not jargon soup. ![A simple vulnerability path showing edges labeled with reasons like "imported at runtime" and "linked via ld".](https://images.unsplash.com/photo-1515879218367-8466d910aaa4?ixlib=rb-4.0.3\&q=80\&fm=jpg\&fit=crop\&w=1600\&h=900) # Why this helps * **Explains reachability** at a glance (auditors & devs can follow the story). * **Reduces false‑positive fights** (every hop justifies itself). * **Stable across languages** (no model changes, just metadata). # Minimal schema change Add three fields to every edge in your call/dep graph (SBOM→Reachability→Fix plan): ```json { "from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request", "to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen", "via": { "reason": "imported via top-level module dependency", "evidence": [ "import urllib3 in requests/adapters.py:12", "pip freeze: urllib3==2.2.3" ], "provenance": { "detector": "StellaOps.Scanner.WebService@1.4.2", "rule_id": "PY-IMPORT-001", "confidence": "high" } } } ``` ### Standard reason glossary (use as enum) * `declared_dependency` (manifest lock/SBOM edge) * `static_call` (direct call site with symbol ref) * `dynamic_import` (e.g., `__import__`, `importlib`, `require(...)`) * `reflection_call` (C# `MethodInfo.Invoke`, Java reflection) * `plugin_discovery` (entry points, ServiceLoader, MEF) * `symbol_relocation` (ELF/PE/Mach‑O relocation binds) * `plt_got_resolution` (ELF PLT/GOT jump to symbol) * `ld_preload_injection` (runtime injected .so/.dll) * `env_config_path` (path read from env/config enables load) * `taint_propagation` (user input reaches sink) * `vendor_patch_alias` (function moved/aliased across versions) # Emission rules (keep it deterministic) * **One reason per edge**, short, lowercase snake_case from glossary. * **Up to 3 evidence strings** (file:line or binary section + symbol). * **Confidence**: `high|medium|low` with a single, stable rubric: * high = exact symbol/call site or relocation * medium = heuristic import/loader path * low = inferred from naming or optional plugin # UI/Report snippet Render paths like: ``` app → requests → urllib3 → OpenSSL EVP_PKEY_new_raw_private_key • declared_dependency (poetry.lock) • static_call (requests.adapters:345) • symbol_relocation (ELF .rela.plt: _EVP_PKEY_new_raw_private_key) ``` # C# drop‑in (for your .NET 10 code) Edge builder with reason/evidence: ```csharp public sealed record EdgeId(string From, string To); public sealed record EdgeEvidence( string Reason, // enum string from glossary IReadOnlyList Evidence, // file:line, symbol, section string Confidence, // high|medium|low string Detector, // component@version string RuleId // stable rule key ); public sealed record GraphEdge(EdgeId Id, EdgeEvidence Via); public static class EdgeFactory { public static GraphEdge DeclaredDependency(string from, string to, string manifestPath) => new(new EdgeId(from, to), new EdgeEvidence( Reason: "declared_dependency", Evidence: new[] { $"manifest:{manifestPath}" }, Confidence: "high", Detector: "StellaOps.Scanner.WebService@1.0.0", RuleId: "DEP-LOCK-001")); public static GraphEdge SymbolRelocation(string from, string to, string objPath, string section, string symbol) => new(new EdgeId(from, to), new EdgeEvidence( Reason: "symbol_relocation", Evidence: new[] { $"{objPath}::{section}:{symbol}" }, Confidence: "high", Detector: "StellaOps.Scanner.WebService@1.0.0", RuleId: "BIN-RELOC-101")); } ``` # Integration checklist (fast path) * Emit `via.reason/evidence/provenance` for **all** edges (SBOM, source, binary). * Validate `reason` against glossary; reject free‑text. * Add a “**Why this edge exists**” column in your path tables. * In JSON/CSV exports, keep columns: `from,to,reason,confidence,evidence0..2,rule_id`. * In the console, collapse evidence by default; expand on click. If you want, I’ll plug this into your Stella Ops graph contracts (Concelier/Cartographer) and produce the enum + validators and a tiny renderer for your docs. Cool, let’s turn this into a concrete, dev‑friendly implementation plan you can actually hand to teams. I’ll structure it by phases and by component (schema, producers, APIs, UI, testing, rollout) so you can slice into tickets easily. --- ## 0. Recap of what we’re building **Goal:** Every edge in your vuln path graph (SBOM → Reachability → Fix plan) carries **machine‑readable, auditor‑friendly metadata**: ```jsonc { "from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request", "to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen", "via": { "reason": "declared_dependency", // from a controlled enum "evidence": [ "manifest:requirements.txt:3", // up to 3 short evidence strings "pip freeze: urllib3==2.2.3" ], "provenance": { "detector": "StellaOps.Scanner.WebService@1.4.2", "rule_id": "PY-IMPORT-001", "confidence": "high" } } } ``` Standard **reason glossary** (enum): * `declared_dependency` * `static_call` * `dynamic_import` * `reflection_call` * `plugin_discovery` * `symbol_relocation` * `plt_got_resolution` * `ld_preload_injection` * `env_config_path` * `taint_propagation` * `vendor_patch_alias` * `unknown` (fallback only when you truly can’t do better) --- ## 1. Design & contracts (shared work for backend & frontend) ### 1.1 Define the canonical edge metadata types **Owner:** Platform / shared lib team **Tasks:** 1. In your shared C# library (used by scanners + API), define: ```csharp public enum EdgeReason { Unknown = 0, DeclaredDependency, StaticCall, DynamicImport, ReflectionCall, PluginDiscovery, SymbolRelocation, PltGotResolution, LdPreloadInjection, EnvConfigPath, TaintPropagation, VendorPatchAlias } public enum EdgeConfidence { Low = 0, Medium, High } public sealed record EdgeProvenance( string Detector, // e.g., "StellaOps.Scanner.WebService@1.4.2" string RuleId, // e.g., "PY-IMPORT-001" EdgeConfidence Confidence ); public sealed record EdgeVia( EdgeReason Reason, IReadOnlyList Evidence, EdgeProvenance Provenance ); public sealed record EdgeId(string From, string To); public sealed record GraphEdge( EdgeId Id, EdgeVia Via ); ``` 2. Enforce **max 3 evidence strings** via a small helper to avoid accidental spam: ```csharp public static class EdgeViaFactory { private const int MaxEvidence = 3; public static EdgeVia Create( EdgeReason reason, IEnumerable evidence, string detector, string ruleId, EdgeConfidence confidence ) { var ev = evidence .Where(s => !string.IsNullOrWhiteSpace(s)) .Take(MaxEvidence) .ToArray(); return new EdgeVia( Reason: reason, Evidence: ev, Provenance: new EdgeProvenance(detector, ruleId, confidence) ); } } ``` **Acceptance criteria:** * [ ] EdgeReason enum defined and shared in a reusable package. * [ ] EdgeVia and EdgeProvenance types exist and are serializable to JSON. * [ ] Evidence is capped to 3 entries and cannot be null (empty list allowed). --- ### 1.2 API / JSON contract **Owner:** API team **Tasks:** 1. Extend your existing graph edge DTO to include `via`: ```csharp public sealed record GraphEdgeDto { public string From { get; init; } = default!; public string To { get; init; } = default!; public EdgeViaDto Via { get; init; } = default!; } public sealed record EdgeViaDto { public string Reason { get; init; } = default!; // enum as string public string[] Evidence { get; init; } = Array.Empty(); public EdgeProvenanceDto Provenance { get; init; } = default!; } public sealed record EdgeProvenanceDto { public string Detector { get; init; } = default!; public string RuleId { get; init; } = default!; public string Confidence { get; init; } = default!; // "high|medium|low" } ``` 2. Ensure JSON is **additive** (backward compatible): * `via` is **non‑nullable** in responses from the new API version. * If you must keep a legacy endpoint, add **v2** endpoints that guarantee `via`. 3. Update OpenAPI spec: * Document `via.reason` as enum string, including allowed values. * Document `via.provenance.detector`, `rule_id`, `confidence`. **Acceptance criteria:** * [ ] OpenAPI / Swagger shows `via.reason` as a string enum + description. * [ ] New clients can deserialize edges with `via` without custom hacks. * [ ] Old clients remain unaffected (either keep old endpoint or allow them to ignore `via`). --- ## 2. Producers: add reasons & evidence where edges are created You likely have 3 main edge producers: * SBOM / manifest / lockfile analyzers * Source analyzers (call graph, taint analysis) * Binary analyzers (ELF/PE/Mach‑O, containers) Treat each as a mini‑project with identical patterns. --- ### 2.1 SBOM / manifest edges **Owner:** SBOM / dep graph team **Tasks:** 1. Identify all code paths that create “declared dependency” edges: * Manifest → Package * Root module → Imported package (if you store these explicitly) 2. Replace plain edge construction with factory calls: ```csharp public static class EdgeFactory { private const string DetectorName = "StellaOps.Scanner.Sbom@1.0.0"; public static GraphEdge DeclaredDependency( string from, string to, string manifestPath, string? dependencySpecLine ) { var evidence = new List { $"manifest:{manifestPath}" }; if (!string.IsNullOrWhiteSpace(dependencySpecLine)) evidence.Add($"spec:{dependencySpecLine}"); var via = EdgeViaFactory.Create( EdgeReason.DeclaredDependency, evidence, DetectorName, "DEP-LOCK-001", EdgeConfidence.High ); return new GraphEdge(new EdgeId(from, to), via); } } ``` 3. Make sure each SBOM/manifest edge sets: * `reason = declared_dependency` * `confidence = high` * Evidence includes at least `manifest:` and, if possible, line or spec snippet. **Acceptance criteria:** * [ ] Any SBOM‑generated edge returns with `via.reason == declared_dependency`. * [ ] Evidence contains manifest path for ≥ 99% of SBOM edges. * [ ] Unit tests cover at least: normal manifest, multiple manifests, malformed manifest. --- ### 2.2 Source code call graph edges **Owner:** Static analysis / call graph team **Tasks:** 1. Map current edge types → reasons: * Direct function/method calls → `static_call` * Reflection (Java/C#) → `reflection_call` * Dynamic imports (`__import__`, `importlib`, `require(...)`) → `dynamic_import` * Plugin systems (entry points, ServiceLoader, MEF) → `plugin_discovery` * Taint / dataflow edges (user input → sink) → `taint_propagation` 2. Implement helper factories: ```csharp public static class SourceEdgeFactory { private const string DetectorName = "StellaOps.Scanner.Source@1.0.0"; public static GraphEdge StaticCall( string fromSymbol, string toSymbol, string filePath, int lineNumber ) { var evidence = new[] { $"callsite:{filePath}:{lineNumber}" }; var via = EdgeViaFactory.Create( EdgeReason.StaticCall, evidence, DetectorName, "SRC-CALL-001", EdgeConfidence.High ); return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via); } public static GraphEdge DynamicImport( string fromSymbol, string toSymbol, string filePath, int lineNumber ) { var via = EdgeViaFactory.Create( EdgeReason.DynamicImport, new[] { $"importsite:{filePath}:{lineNumber}" }, DetectorName, "SRC-DYNIMPORT-001", EdgeConfidence.Medium ); return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via); } // Similar for ReflectionCall, PluginDiscovery, TaintPropagation... } ``` 3. Replace all direct `new GraphEdge(...)` calls in source analyzers with these factories. **Acceptance criteria:** * [ ] Direct call edges produce `reason = static_call` with file:line evidence. * [ ] Reflection/dynamic import edges use correct reasons and mark `confidence = medium` (or high where you’re certain). * [ ] Unit tests check that for a known source file, the resulting edges contain expected `reason`, `evidence`, and `rule_id`. --- ### 2.3 Binary / container analyzers **Owner:** Binary analysis / SCA team **Tasks:** 1. Map binary features to reasons: * Symbol relocations + PLT/GOT edges → `symbol_relocation` or `plt_got_resolution` * LD_PRELOAD or injection edges → `ld_preload_injection` 2. Implement factory: ```csharp public static class BinaryEdgeFactory { private const string DetectorName = "StellaOps.Scanner.Binary@1.0.0"; public static GraphEdge SymbolRelocation( string fromSymbol, string toSymbol, string binaryPath, string section, string relocationName ) { var evidence = new[] { $"{binaryPath}::{section}:{relocationName}" }; var via = EdgeViaFactory.Create( EdgeReason.SymbolRelocation, evidence, DetectorName, "BIN-RELOC-101", EdgeConfidence.High ); return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via); } } ``` 3. Wire up all binary edge creation to use this. **Acceptance criteria:** * [ ] For a test binary with a known relocation, edges include `reason = symbol_relocation` and section/symbol in evidence. * [ ] No binary edge is created without `via`. --- ## 3. Storage & migrations This depends on your backing store, but the pattern is similar. ### 3.1 Relational (SQL) example **Owner:** Data / infra team **Tasks:** 1. Add columns: ```sql ALTER TABLE graph_edges ADD COLUMN via_reason VARCHAR(64) NOT NULL DEFAULT 'unknown', ADD COLUMN via_evidence JSONB NOT NULL DEFAULT '[]'::jsonb, ADD COLUMN via_detector VARCHAR(255) NOT NULL DEFAULT 'unknown', ADD COLUMN via_rule_id VARCHAR(128) NOT NULL DEFAULT 'unknown', ADD COLUMN via_confidence VARCHAR(16) NOT NULL DEFAULT 'low'; ``` 2. Update ORM model: ```csharp public class EdgeEntity { public string From { get; set; } = default!; public string To { get; set; } = default!; public string ViaReason { get; set; } = "unknown"; public string[] ViaEvidence { get; set; } = Array.Empty(); public string ViaDetector { get; set; } = "unknown"; public string ViaRuleId { get; set; } = "unknown"; public string ViaConfidence { get; set; } = "low"; } ``` 3. Add mapping to domain `GraphEdge`: ```csharp public static GraphEdge ToDomain(this EdgeEntity e) { var via = new EdgeVia( Reason: Enum.TryParse(e.ViaReason, true, out var r) ? r : EdgeReason.Unknown, Evidence: e.ViaEvidence, Provenance: new EdgeProvenance( Detector: e.ViaDetector, RuleId: e.ViaRuleId, Confidence: Enum.TryParse(e.ViaConfidence, true, out var c) ? c : EdgeConfidence.Low ) ); return new GraphEdge(new EdgeId(e.From, e.To), via); } ``` 4. **Backfill existing data** (optional but recommended): * For edges with a known “type” column, map to best‑fit `reason`. * If you can’t infer: set `reason = unknown`, `confidence = low`, `detector = "backfill@"`. **Acceptance criteria:** * [ ] DB migration runs cleanly in staging and prod. * [ ] No existing reader breaks: default values keep queries functioning. * [ ] Edge round‑trip (domain → DB → API JSON) retains `via` fields correctly. --- ## 4. API & service layer **Owner:** API / service team **Tasks:** 1. Wire domain model → DTOs: ```csharp public static GraphEdgeDto ToDto(this GraphEdge edge) { return new GraphEdgeDto { From = edge.Id.From, To = edge.Id.To, Via = new EdgeViaDto { Reason = edge.Via.Reason.ToString().ToSnakeCaseLower(), // e.g. "static_call" Evidence = edge.Via.Evidence.ToArray(), Provenance = new EdgeProvenanceDto { Detector = edge.Via.Provenance.Detector, RuleId = edge.Via.Provenance.RuleId, Confidence = edge.Via.Provenance.Confidence.ToString().ToLowerInvariant() } } }; } ``` 2. If you accept edges via API (internal services), validate: * `reason` must be one of the known values; otherwise reject or coerce to `unknown`. * `evidence` length ≤ 3. * Trim whitespace and limit each evidence string length (e.g. 256 chars). 3. Versioning: * Introduce `/v2/graph/paths` (or similar) that guarantees `via`. * Keep `/v1/...` unchanged or mark deprecated. **Acceptance criteria:** * [ ] Path API returns `via.reason` and `via.evidence` for all edges in new endpoints. * [ ] Invalid reason strings are rejected or converted to `unknown` with a log. * [ ] Integration tests cover full flow: repo → scanner → DB → API → JSON. --- ## 5. UI: make paths auditor‑friendly **Owner:** Frontend team **Tasks:** 1. **Path details UI**: For each edge in the vulnerability path table: * Show a **“Reason” column** with a small pill: * `static_call` → “Static call” * `declared_dependency` → “Declared dependency” * etc. * Below or on hover, show **primary evidence** (first evidence string). 2. **Edge details panel** (drawer/modal): When user clicks an edge: * Show: * From → To (symbols/packages) * Reason (with friendly description per enum) * Evidence list (each on its own line) * Detector, rule id, confidence 3. **Filtering & sorting (optional but powerful)**: * Filter edges by `reason` (multi‑select). * Filter by `confidence` (e.g. show only high/medium). * This helps auditors quickly isolate more speculative edges. 4. **UX text / glossary**: * Add a small “?” tooltip that links to a glossary explaining each reason type in human language. **Acceptance criteria:** * [ ] For a given vulnerability, the path view shows a “Reason” column per edge. * [ ] Clicking an edge reveals all evidence and provenance information. * [ ] UX has a glossary/tooltip explaining what each reason means in plain English. --- ## 6. Testing strategy **Owner:** QA + each feature team ### 6.1 Unit tests * **Factories**: verify correct mapping from input to `EdgeVia`: * Reason set correctly. * Evidence trimmed, max 3. * Confidence matches rubric (high for relocations, medium for heuristic imports, etc.). * **Serialization**: `EdgeVia` → JSON and back. ### 6.2 Integration tests Set up **small fixtures**: 1. **Simple dependency project**: * Example: Python project with `requirements.txt` → `requests` → `urllib3`. * Expected edges: * App → requests: `declared_dependency`, evidence includes `requirements.txt`. * requests → urllib3: `declared_dependency`, plus static call edges. 2. **Dynamic import case**: * A module using `importlib.import_module("mod")`. * Ensure edge is `dynamic_import` with `confidence = medium`. 3. **Binary edge case**: * Test ELF with known symbol relocation. * Ensure an edge with `reason = symbol_relocation` exists. ### 6.3 End‑to‑end tests * Run full scan on a sample repo and: * Hit path API. * Assert every edge has non‑null `via` fields. * Spot check a few known edges for exact `reason` and evidence. **Acceptance criteria:** * [ ] Automated tests fail if any edge is emitted without `via`. * [ ] Coverage includes at least one example for each `EdgeReason` you support. --- ## 7. Observability, guardrails & rollout ### 7.1 Metrics & logging **Owner:** Observability / platform **Tasks:** * Emit metrics: * `% edges with reason != unknown` * Count by `reason` and `confidence` * Log warnings when: * Edge is emitted with `reason = unknown`. * Evidence is empty for a non‑unknown reason. **Acceptance criteria:** * [ ] Dashboards showing distribution of edge reasons over time. * [ ] Alerts if `unknown` reason edges exceed a threshold (e.g. >5%). --- ### 7.2 Rollout plan **Owner:** PM + tech leads **Steps:** 1. **Phase 1 – Dark‑launch metadata:** * Start generating & storing `via` for new scans. * Keep UI unchanged. * Monitor metrics, unknown ratio, and storage overhead. 2. **Phase 2 – Enable for internal users:** * Toggle UI on (feature flag for internal / beta users). * Collect feedback from security engineers and auditors. 3. **Phase 3 – General availability:** * Enable UI for all. * Update customer‑facing documentation & audit guides. --- ### 7.3 Documentation **Owner:** Docs / PM * Short **“Why this edge exists”** section in: * Product docs (for customers). * Internal runbooks (for support & SEs). * Include: * Table of reasons → human descriptions. * Examples of path explanations (e.g., “This edge exists because `app` declares `urllib3` in `requirements.txt` and calls it in `client.py:42`”). --- ## 8. Ready‑to‑use ticket breakdown You can almost copy‑paste these into your tracker: 1. **Shared**: Define EdgeReason, EdgeVia & EdgeProvenance in shared library, plus EdgeViaFactory. 2. **SBOM**: Use EdgeFactory.DeclaredDependency for all manifest‑generated edges. 3. **Source**: Wire all callgraph edges to SourceEdgeFactory (static_call, dynamic_import, reflection_call, plugin_discovery, taint_propagation). 4. **Binary**: Wire relocations/PLT/GOT edges to BinaryEdgeFactory (symbol_relocation, plt_got_resolution, ld_preload_injection). 5. **Data**: Add via_* columns/properties to graph_edges storage and map to/from domain. 6. **API**: Extend graph path DTOs to include `via`, update OpenAPI, and implement /v2 endpoints if needed. 7. **UI**: Show edge reason, evidence, and provenance in vulnerability path screens and add filters. 8. **Testing**: Add unit, integration, and end‑to‑end tests ensuring every edge has non‑null `via`. 9. **Observability**: Add metrics and logs for edge reasons and unknown rates. 10. **Docs & rollout**: Write glossary + auditor docs and plan staged rollout. --- If you tell me a bit about your current storage (e.g., Neo4j vs SQL) and the services’ names, I can tailor this into an even more literal set of code snippets and migrations to match your stack exactly.