23 KiB
Here’s a quick win for making your vuln paths auditor‑friendly without retraining any models: add a plain‑language reason to every graph edge (why this edge exists). Think “introduced via dynamic import” or “symbol relocation via ld”, not jargon soup.
Why this helps
- Explains reachability at a glance (auditors & devs can follow the story).
- Reduces false‑positive fights (every hop justifies itself).
- Stable across languages (no model changes, just metadata).
Minimal schema change
Add three fields to every edge in your call/dep graph (SBOM→Reachability→Fix plan):
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "imported via top-level module dependency",
"evidence": [
"import urllib3 in requests/adapters.py:12",
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
Standard reason glossary (use as enum)
declared_dependency(manifest lock/SBOM edge)static_call(direct call site with symbol ref)dynamic_import(e.g.,__import__,importlib,require(...))reflection_call(C#MethodInfo.Invoke, Java reflection)plugin_discovery(entry points, ServiceLoader, MEF)symbol_relocation(ELF/PE/Mach‑O relocation binds)plt_got_resolution(ELF PLT/GOT jump to symbol)ld_preload_injection(runtime injected .so/.dll)env_config_path(path read from env/config enables load)taint_propagation(user input reaches sink)vendor_patch_alias(function moved/aliased across versions)
Emission rules (keep it deterministic)
-
One reason per edge, short, lowercase snake_case from glossary.
-
Up to 3 evidence strings (file:line or binary section + symbol).
-
Confidence:
high|medium|lowwith a single, stable rubric:- high = exact symbol/call site or relocation
- medium = heuristic import/loader path
- low = inferred from naming or optional plugin
UI/Report snippet
Render paths like:
app → requests → urllib3 → OpenSSL EVP_PKEY_new_raw_private_key
• declared_dependency (poetry.lock)
• static_call (requests.adapters:345)
• symbol_relocation (ELF .rela.plt: _EVP_PKEY_new_raw_private_key)
C# drop‑in (for your .NET 10 code)
Edge builder with reason/evidence:
public sealed record EdgeId(string From, string To);
public sealed record EdgeEvidence(
string Reason, // enum string from glossary
IReadOnlyList<string> Evidence, // file:line, symbol, section
string Confidence, // high|medium|low
string Detector, // component@version
string RuleId // stable rule key
);
public sealed record GraphEdge(EdgeId Id, EdgeEvidence Via);
public static class EdgeFactory
{
public static GraphEdge DeclaredDependency(string from, string to, string manifestPath)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "declared_dependency",
Evidence: new[] { $"manifest:{manifestPath}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "DEP-LOCK-001"));
public static GraphEdge SymbolRelocation(string from, string to, string objPath, string section, string symbol)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "symbol_relocation",
Evidence: new[] { $"{objPath}::{section}:{symbol}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "BIN-RELOC-101"));
}
Integration checklist (fast path)
- Emit
via.reason/evidence/provenancefor all edges (SBOM, source, binary). - Validate
reasonagainst glossary; reject free‑text. - Add a “Why this edge exists” column in your path tables.
- In JSON/CSV exports, keep columns:
from,to,reason,confidence,evidence0..2,rule_id. - In the console, collapse evidence by default; expand on click.
If you want, I’ll plug this into your Stella Ops graph contracts (Concelier/Cartographer) and produce the enum + validators and a tiny renderer for your docs. Cool, let’s turn this into a concrete, dev‑friendly implementation plan you can actually hand to teams.
I’ll structure it by phases and by component (schema, producers, APIs, UI, testing, rollout) so you can slice into tickets easily.
0. Recap of what we’re building
Goal: Every edge in your vuln path graph (SBOM → Reachability → Fix plan) carries machine‑readable, auditor‑friendly metadata:
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "declared_dependency", // from a controlled enum
"evidence": [
"manifest:requirements.txt:3", // up to 3 short evidence strings
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
Standard reason glossary (enum):
declared_dependencystatic_calldynamic_importreflection_callplugin_discoverysymbol_relocationplt_got_resolutionld_preload_injectionenv_config_pathtaint_propagationvendor_patch_aliasunknown(fallback only when you truly can’t do better)
1. Design & contracts (shared work for backend & frontend)
1.1 Define the canonical edge metadata types
Owner: Platform / shared lib team
Tasks:
- In your shared C# library (used by scanners + API), define:
public enum EdgeReason
{
Unknown = 0,
DeclaredDependency,
StaticCall,
DynamicImport,
ReflectionCall,
PluginDiscovery,
SymbolRelocation,
PltGotResolution,
LdPreloadInjection,
EnvConfigPath,
TaintPropagation,
VendorPatchAlias
}
public enum EdgeConfidence
{
Low = 0,
Medium,
High
}
public sealed record EdgeProvenance(
string Detector, // e.g., "StellaOps.Scanner.WebService@1.4.2"
string RuleId, // e.g., "PY-IMPORT-001"
EdgeConfidence Confidence
);
public sealed record EdgeVia(
EdgeReason Reason,
IReadOnlyList<string> Evidence,
EdgeProvenance Provenance
);
public sealed record EdgeId(string From, string To);
public sealed record GraphEdge(
EdgeId Id,
EdgeVia Via
);
- Enforce max 3 evidence strings via a small helper to avoid accidental spam:
public static class EdgeViaFactory
{
private const int MaxEvidence = 3;
public static EdgeVia Create(
EdgeReason reason,
IEnumerable<string> evidence,
string detector,
string ruleId,
EdgeConfidence confidence
)
{
var ev = evidence
.Where(s => !string.IsNullOrWhiteSpace(s))
.Take(MaxEvidence)
.ToArray();
return new EdgeVia(
Reason: reason,
Evidence: ev,
Provenance: new EdgeProvenance(detector, ruleId, confidence)
);
}
}
Acceptance criteria:
- EdgeReason enum defined and shared in a reusable package.
- EdgeVia and EdgeProvenance types exist and are serializable to JSON.
- Evidence is capped to 3 entries and cannot be null (empty list allowed).
1.2 API / JSON contract
Owner: API team
Tasks:
- Extend your existing graph edge DTO to include
via:
public sealed record GraphEdgeDto
{
public string From { get; init; } = default!;
public string To { get; init; } = default!;
public EdgeViaDto Via { get; init; } = default!;
}
public sealed record EdgeViaDto
{
public string Reason { get; init; } = default!; // enum as string
public string[] Evidence { get; init; } = Array.Empty<string>();
public EdgeProvenanceDto Provenance { get; init; } = default!;
}
public sealed record EdgeProvenanceDto
{
public string Detector { get; init; } = default!;
public string RuleId { get; init; } = default!;
public string Confidence { get; init; } = default!; // "high|medium|low"
}
- Ensure JSON is additive (backward compatible):
viais non‑nullable in responses from the new API version.- If you must keep a legacy endpoint, add v2 endpoints that guarantee
via.
- Update OpenAPI spec:
- Document
via.reasonas enum string, including allowed values. - Document
via.provenance.detector,rule_id,confidence.
Acceptance criteria:
- OpenAPI / Swagger shows
via.reasonas a string enum + description. - New clients can deserialize edges with
viawithout custom hacks. - Old clients remain unaffected (either keep old endpoint or allow them to ignore
via).
2. Producers: add reasons & evidence where edges are created
You likely have 3 main edge producers:
- SBOM / manifest / lockfile analyzers
- Source analyzers (call graph, taint analysis)
- Binary analyzers (ELF/PE/Mach‑O, containers)
Treat each as a mini‑project with identical patterns.
2.1 SBOM / manifest edges
Owner: SBOM / dep graph team
Tasks:
-
Identify all code paths that create “declared dependency” edges:
- Manifest → Package
- Root module → Imported package (if you store these explicitly)
-
Replace plain edge construction with factory calls:
public static class EdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Sbom@1.0.0";
public static GraphEdge DeclaredDependency(
string from,
string to,
string manifestPath,
string? dependencySpecLine
)
{
var evidence = new List<string>
{
$"manifest:{manifestPath}"
};
if (!string.IsNullOrWhiteSpace(dependencySpecLine))
evidence.Add($"spec:{dependencySpecLine}");
var via = EdgeViaFactory.Create(
EdgeReason.DeclaredDependency,
evidence,
DetectorName,
"DEP-LOCK-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(from, to), via);
}
}
- Make sure each SBOM/manifest edge sets:
reason = declared_dependencyconfidence = high- Evidence includes at least
manifest:<path>and, if possible, line or spec snippet.
Acceptance criteria:
- Any SBOM‑generated edge returns with
via.reason == declared_dependency. - Evidence contains manifest path for ≥ 99% of SBOM edges.
- Unit tests cover at least: normal manifest, multiple manifests, malformed manifest.
2.2 Source code call graph edges
Owner: Static analysis / call graph team
Tasks:
- Map current edge types → reasons:
- Direct function/method calls →
static_call - Reflection (Java/C#) →
reflection_call - Dynamic imports (
__import__,importlib,require(...)) →dynamic_import - Plugin systems (entry points, ServiceLoader, MEF) →
plugin_discovery - Taint / dataflow edges (user input → sink) →
taint_propagation
- Implement helper factories:
public static class SourceEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Source@1.0.0";
public static GraphEdge StaticCall(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var evidence = new[]
{
$"callsite:{filePath}:{lineNumber}"
};
var via = EdgeViaFactory.Create(
EdgeReason.StaticCall,
evidence,
DetectorName,
"SRC-CALL-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
public static GraphEdge DynamicImport(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var via = EdgeViaFactory.Create(
EdgeReason.DynamicImport,
new[] { $"importsite:{filePath}:{lineNumber}" },
DetectorName,
"SRC-DYNIMPORT-001",
EdgeConfidence.Medium
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
// Similar for ReflectionCall, PluginDiscovery, TaintPropagation...
}
- Replace all direct
new GraphEdge(...)calls in source analyzers with these factories.
Acceptance criteria:
- Direct call edges produce
reason = static_callwith file:line evidence. - Reflection/dynamic import edges use correct reasons and mark
confidence = medium(or high where you’re certain). - Unit tests check that for a known source file, the resulting edges contain expected
reason,evidence, andrule_id.
2.3 Binary / container analyzers
Owner: Binary analysis / SCA team
Tasks:
- Map binary features to reasons:
- Symbol relocations + PLT/GOT edges →
symbol_relocationorplt_got_resolution - LD_PRELOAD or injection edges →
ld_preload_injection
- Implement factory:
public static class BinaryEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Binary@1.0.0";
public static GraphEdge SymbolRelocation(
string fromSymbol,
string toSymbol,
string binaryPath,
string section,
string relocationName
)
{
var evidence = new[]
{
$"{binaryPath}::{section}:{relocationName}"
};
var via = EdgeViaFactory.Create(
EdgeReason.SymbolRelocation,
evidence,
DetectorName,
"BIN-RELOC-101",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
}
- Wire up all binary edge creation to use this.
Acceptance criteria:
- For a test binary with a known relocation, edges include
reason = symbol_relocationand section/symbol in evidence. - No binary edge is created without
via.
3. Storage & migrations
This depends on your backing store, but the pattern is similar.
3.1 Relational (SQL) example
Owner: Data / infra team
Tasks:
- Add columns:
ALTER TABLE graph_edges
ADD COLUMN via_reason VARCHAR(64) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_evidence JSONB NOT NULL DEFAULT '[]'::jsonb,
ADD COLUMN via_detector VARCHAR(255) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_rule_id VARCHAR(128) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_confidence VARCHAR(16) NOT NULL DEFAULT 'low';
- Update ORM model:
public class EdgeEntity
{
public string From { get; set; } = default!;
public string To { get; set; } = default!;
public string ViaReason { get; set; } = "unknown";
public string[] ViaEvidence { get; set; } = Array.Empty<string>();
public string ViaDetector { get; set; } = "unknown";
public string ViaRuleId { get; set; } = "unknown";
public string ViaConfidence { get; set; } = "low";
}
- Add mapping to domain
GraphEdge:
public static GraphEdge ToDomain(this EdgeEntity e)
{
var via = new EdgeVia(
Reason: Enum.TryParse<EdgeReason>(e.ViaReason, true, out var r) ? r : EdgeReason.Unknown,
Evidence: e.ViaEvidence,
Provenance: new EdgeProvenance(
Detector: e.ViaDetector,
RuleId: e.ViaRuleId,
Confidence: Enum.TryParse<EdgeConfidence>(e.ViaConfidence, true, out var c) ? c : EdgeConfidence.Low
)
);
return new GraphEdge(new EdgeId(e.From, e.To), via);
}
- Backfill existing data (optional but recommended):
- For edges with a known “type” column, map to best‑fit
reason. - If you can’t infer: set
reason = unknown,confidence = low,detector = "backfill@<version>".
Acceptance criteria:
- DB migration runs cleanly in staging and prod.
- No existing reader breaks: default values keep queries functioning.
- Edge round‑trip (domain → DB → API JSON) retains
viafields correctly.
4. API & service layer
Owner: API / service team
Tasks:
- Wire domain model → DTOs:
public static GraphEdgeDto ToDto(this GraphEdge edge)
{
return new GraphEdgeDto
{
From = edge.Id.From,
To = edge.Id.To,
Via = new EdgeViaDto
{
Reason = edge.Via.Reason.ToString().ToSnakeCaseLower(), // e.g. "static_call"
Evidence = edge.Via.Evidence.ToArray(),
Provenance = new EdgeProvenanceDto
{
Detector = edge.Via.Provenance.Detector,
RuleId = edge.Via.Provenance.RuleId,
Confidence = edge.Via.Provenance.Confidence.ToString().ToLowerInvariant()
}
}
};
}
- If you accept edges via API (internal services), validate:
reasonmust be one of the known values; otherwise reject or coerce tounknown.evidencelength ≤ 3.- Trim whitespace and limit each evidence string length (e.g. 256 chars).
- Versioning:
- Introduce
/v2/graph/paths(or similar) that guaranteesvia. - Keep
/v1/...unchanged or mark deprecated.
Acceptance criteria:
- Path API returns
via.reasonandvia.evidencefor all edges in new endpoints. - Invalid reason strings are rejected or converted to
unknownwith a log. - Integration tests cover full flow: repo → scanner → DB → API → JSON.
5. UI: make paths auditor‑friendly
Owner: Frontend team
Tasks:
-
Path details UI:
For each edge in the vulnerability path table:
-
Show a “Reason” column with a small pill:
static_call→ “Static call”declared_dependency→ “Declared dependency”- etc.
-
Below or on hover, show primary evidence (first evidence string).
-
-
Edge details panel (drawer/modal):
When user clicks an edge:
-
Show:
- From → To (symbols/packages)
- Reason (with friendly description per enum)
- Evidence list (each on its own line)
- Detector, rule id, confidence
-
-
Filtering & sorting (optional but powerful):
- Filter edges by
reason(multi‑select). - Filter by
confidence(e.g. show only high/medium). - This helps auditors quickly isolate more speculative edges.
- Filter edges by
-
UX text / glossary:
- Add a small “?” tooltip that links to a glossary explaining each reason type in human language.
Acceptance criteria:
- For a given vulnerability, the path view shows a “Reason” column per edge.
- Clicking an edge reveals all evidence and provenance information.
- UX has a glossary/tooltip explaining what each reason means in plain English.
6. Testing strategy
Owner: QA + each feature team
6.1 Unit tests
-
Factories: verify correct mapping from input to
EdgeVia:- Reason set correctly.
- Evidence trimmed, max 3.
- Confidence matches rubric (high for relocations, medium for heuristic imports, etc.).
-
Serialization:
EdgeVia→ JSON and back.
6.2 Integration tests
Set up small fixtures:
-
Simple dependency project:
-
Example: Python project with
requirements.txt→requests→urllib3. -
Expected edges:
- App → requests:
declared_dependency, evidence includesrequirements.txt. - requests → urllib3:
declared_dependency, plus static call edges.
- App → requests:
-
-
Dynamic import case:
- A module using
importlib.import_module("mod"). - Ensure edge is
dynamic_importwithconfidence = medium.
- A module using
-
Binary edge case:
- Test ELF with known symbol relocation.
- Ensure an edge with
reason = symbol_relocationexists.
6.3 End‑to‑end tests
-
Run full scan on a sample repo and:
- Hit path API.
- Assert every edge has non‑null
viafields. - Spot check a few known edges for exact
reasonand evidence.
Acceptance criteria:
- Automated tests fail if any edge is emitted without
via. - Coverage includes at least one example for each
EdgeReasonyou support.
7. Observability, guardrails & rollout
7.1 Metrics & logging
Owner: Observability / platform
Tasks:
-
Emit metrics:
% edges with reason != unknown- Count by
reasonandconfidence
-
Log warnings when:
- Edge is emitted with
reason = unknown. - Evidence is empty for a non‑unknown reason.
- Edge is emitted with
Acceptance criteria:
- Dashboards showing distribution of edge reasons over time.
- Alerts if
unknownreason edges exceed a threshold (e.g. >5%).
7.2 Rollout plan
Owner: PM + tech leads
Steps:
-
Phase 1 – Dark‑launch metadata:
- Start generating & storing
viafor new scans. - Keep UI unchanged.
- Monitor metrics, unknown ratio, and storage overhead.
- Start generating & storing
-
Phase 2 – Enable for internal users:
- Toggle UI on (feature flag for internal / beta users).
- Collect feedback from security engineers and auditors.
-
Phase 3 – General availability:
- Enable UI for all.
- Update customer‑facing documentation & audit guides.
7.3 Documentation
Owner: Docs / PM
-
Short “Why this edge exists” section in:
- Product docs (for customers).
- Internal runbooks (for support & SEs).
-
Include:
- Table of reasons → human descriptions.
- Examples of path explanations (e.g., “This edge exists because
appdeclaresurllib3inrequirements.txtand calls it inclient.py:42”).
8. Ready‑to‑use ticket breakdown
You can almost copy‑paste these into your tracker:
- Shared: Define EdgeReason, EdgeVia & EdgeProvenance in shared library, plus EdgeViaFactory.
- SBOM: Use EdgeFactory.DeclaredDependency for all manifest‑generated edges.
- Source: Wire all callgraph edges to SourceEdgeFactory (static_call, dynamic_import, reflection_call, plugin_discovery, taint_propagation).
- Binary: Wire relocations/PLT/GOT edges to BinaryEdgeFactory (symbol_relocation, plt_got_resolution, ld_preload_injection).
- Data: Add via_* columns/properties to graph_edges storage and map to/from domain.
- API: Extend graph path DTOs to include
via, update OpenAPI, and implement /v2 endpoints if needed. - UI: Show edge reason, evidence, and provenance in vulnerability path screens and add filters.
- Testing: Add unit, integration, and end‑to‑end tests ensuring every edge has non‑null
via. - Observability: Add metrics and logs for edge reasons and unknown rates.
- Docs & rollout: Write glossary + auditor docs and plan staged rollout.
If you tell me a bit about your current storage (e.g., Neo4j vs SQL) and the services’ names, I can tailor this into an even more literal set of code snippets and migrations to match your stack exactly.