up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled

This commit is contained in:
master
2025-11-27 15:05:48 +02:00
parent 4831c7fcb0
commit e950474a77
278 changed files with 81498 additions and 672 deletions

View File

@@ -0,0 +1,799 @@
Heres a quick win for making your vuln paths auditorfriendly without retraining any models: **add a plainlanguage `reason` to every graph edge** (why this edge exists). Think “introduced via dynamic import” or “symbol relocation via `ld`”, not jargon soup.
![A simple vulnerability path showing edges labeled with reasons like "imported at runtime" and "linked via ld".](https://images.unsplash.com/photo-1515879218367-8466d910aaa4?ixlib=rb-4.0.3\&q=80\&fm=jpg\&fit=crop\&w=1600\&h=900)
# Why this helps
* **Explains reachability** at a glance (auditors & devs can follow the story).
* **Reduces falsepositive fights** (every hop justifies itself).
* **Stable across languages** (no model changes, just metadata).
# Minimal schema change
Add three fields to every edge in your call/dep graph (SBOM→Reachability→Fix plan):
```json
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "imported via top-level module dependency",
"evidence": [
"import urllib3 in requests/adapters.py:12",
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
```
### Standard reason glossary (use as enum)
* `declared_dependency` (manifest lock/SBOM edge)
* `static_call` (direct call site with symbol ref)
* `dynamic_import` (e.g., `__import__`, `importlib`, `require(...)`)
* `reflection_call` (C# `MethodInfo.Invoke`, Java reflection)
* `plugin_discovery` (entry points, ServiceLoader, MEF)
* `symbol_relocation` (ELF/PE/MachO relocation binds)
* `plt_got_resolution` (ELF PLT/GOT jump to symbol)
* `ld_preload_injection` (runtime injected .so/.dll)
* `env_config_path` (path read from env/config enables load)
* `taint_propagation` (user input reaches sink)
* `vendor_patch_alias` (function moved/aliased across versions)
# Emission rules (keep it deterministic)
* **One reason per edge**, short, lowercase snake_case from glossary.
* **Up to 3 evidence strings** (file:line or binary section + symbol).
* **Confidence**: `high|medium|low` with a single, stable rubric:
* high = exact symbol/call site or relocation
* medium = heuristic import/loader path
* low = inferred from naming or optional plugin
# UI/Report snippet
Render paths like:
```
app → requests → urllib3 → OpenSSL EVP_PKEY_new_raw_private_key
• declared_dependency (poetry.lock)
• static_call (requests.adapters:345)
• symbol_relocation (ELF .rela.plt: _EVP_PKEY_new_raw_private_key)
```
# C# dropin (for your .NET 10 code)
Edge builder with reason/evidence:
```csharp
public sealed record EdgeId(string From, string To);
public sealed record EdgeEvidence(
string Reason, // enum string from glossary
IReadOnlyList<string> Evidence, // file:line, symbol, section
string Confidence, // high|medium|low
string Detector, // component@version
string RuleId // stable rule key
);
public sealed record GraphEdge(EdgeId Id, EdgeEvidence Via);
public static class EdgeFactory
{
public static GraphEdge DeclaredDependency(string from, string to, string manifestPath)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "declared_dependency",
Evidence: new[] { $"manifest:{manifestPath}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "DEP-LOCK-001"));
public static GraphEdge SymbolRelocation(string from, string to, string objPath, string section, string symbol)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "symbol_relocation",
Evidence: new[] { $"{objPath}::{section}:{symbol}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "BIN-RELOC-101"));
}
```
# Integration checklist (fast path)
* Emit `via.reason/evidence/provenance` for **all** edges (SBOM, source, binary).
* Validate `reason` against glossary; reject freetext.
* Add a “**Why this edge exists**” column in your path tables.
* In JSON/CSV exports, keep columns: `from,to,reason,confidence,evidence0..2,rule_id`.
* In the console, collapse evidence by default; expand on click.
If you want, Ill plug this into your StellaOps graph contracts (Concelier/Cartographer) and produce the enum + validators and a tiny renderer for your docs.
Cool, lets turn this into a concrete, devfriendly implementation plan you can actually hand to teams.
Ill structure it by phases and by component (schema, producers, APIs, UI, testing, rollout) so you can slice into tickets easily.
---
## 0. Recap of what were building
**Goal:**
Every edge in your vuln path graph (SBOM → Reachability → Fix plan) carries **machinereadable, auditorfriendly metadata**:
```jsonc
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "declared_dependency", // from a controlled enum
"evidence": [
"manifest:requirements.txt:3", // up to 3 short evidence strings
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
```
Standard **reason glossary** (enum):
* `declared_dependency`
* `static_call`
* `dynamic_import`
* `reflection_call`
* `plugin_discovery`
* `symbol_relocation`
* `plt_got_resolution`
* `ld_preload_injection`
* `env_config_path`
* `taint_propagation`
* `vendor_patch_alias`
* `unknown` (fallback only when you truly cant do better)
---
## 1. Design & contracts (shared work for backend & frontend)
### 1.1 Define the canonical edge metadata types
**Owner:** Platform / shared lib team
**Tasks:**
1. In your shared C# library (used by scanners + API), define:
```csharp
public enum EdgeReason
{
Unknown = 0,
DeclaredDependency,
StaticCall,
DynamicImport,
ReflectionCall,
PluginDiscovery,
SymbolRelocation,
PltGotResolution,
LdPreloadInjection,
EnvConfigPath,
TaintPropagation,
VendorPatchAlias
}
public enum EdgeConfidence
{
Low = 0,
Medium,
High
}
public sealed record EdgeProvenance(
string Detector, // e.g., "StellaOps.Scanner.WebService@1.4.2"
string RuleId, // e.g., "PY-IMPORT-001"
EdgeConfidence Confidence
);
public sealed record EdgeVia(
EdgeReason Reason,
IReadOnlyList<string> Evidence,
EdgeProvenance Provenance
);
public sealed record EdgeId(string From, string To);
public sealed record GraphEdge(
EdgeId Id,
EdgeVia Via
);
```
2. Enforce **max 3 evidence strings** via a small helper to avoid accidental spam:
```csharp
public static class EdgeViaFactory
{
private const int MaxEvidence = 3;
public static EdgeVia Create(
EdgeReason reason,
IEnumerable<string> evidence,
string detector,
string ruleId,
EdgeConfidence confidence
)
{
var ev = evidence
.Where(s => !string.IsNullOrWhiteSpace(s))
.Take(MaxEvidence)
.ToArray();
return new EdgeVia(
Reason: reason,
Evidence: ev,
Provenance: new EdgeProvenance(detector, ruleId, confidence)
);
}
}
```
**Acceptance criteria:**
* [ ] EdgeReason enum defined and shared in a reusable package.
* [ ] EdgeVia and EdgeProvenance types exist and are serializable to JSON.
* [ ] Evidence is capped to 3 entries and cannot be null (empty list allowed).
---
### 1.2 API / JSON contract
**Owner:** API team
**Tasks:**
1. Extend your existing graph edge DTO to include `via`:
```csharp
public sealed record GraphEdgeDto
{
public string From { get; init; } = default!;
public string To { get; init; } = default!;
public EdgeViaDto Via { get; init; } = default!;
}
public sealed record EdgeViaDto
{
public string Reason { get; init; } = default!; // enum as string
public string[] Evidence { get; init; } = Array.Empty<string>();
public EdgeProvenanceDto Provenance { get; init; } = default!;
}
public sealed record EdgeProvenanceDto
{
public string Detector { get; init; } = default!;
public string RuleId { get; init; } = default!;
public string Confidence { get; init; } = default!; // "high|medium|low"
}
```
2. Ensure JSON is **additive** (backward compatible):
* `via` is **nonnullable** in responses from the new API version.
* If you must keep a legacy endpoint, add **v2** endpoints that guarantee `via`.
3. Update OpenAPI spec:
* Document `via.reason` as enum string, including allowed values.
* Document `via.provenance.detector`, `rule_id`, `confidence`.
**Acceptance criteria:**
* [ ] OpenAPI / Swagger shows `via.reason` as a string enum + description.
* [ ] New clients can deserialize edges with `via` without custom hacks.
* [ ] Old clients remain unaffected (either keep old endpoint or allow them to ignore `via`).
---
## 2. Producers: add reasons & evidence where edges are created
You likely have 3 main edge producers:
* SBOM / manifest / lockfile analyzers
* Source analyzers (call graph, taint analysis)
* Binary analyzers (ELF/PE/MachO, containers)
Treat each as a miniproject with identical patterns.
---
### 2.1 SBOM / manifest edges
**Owner:** SBOM / dep graph team
**Tasks:**
1. Identify all code paths that create “declared dependency” edges:
* Manifest → Package
* Root module → Imported package (if you store these explicitly)
2. Replace plain edge construction with factory calls:
```csharp
public static class EdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Sbom@1.0.0";
public static GraphEdge DeclaredDependency(
string from,
string to,
string manifestPath,
string? dependencySpecLine
)
{
var evidence = new List<string>
{
$"manifest:{manifestPath}"
};
if (!string.IsNullOrWhiteSpace(dependencySpecLine))
evidence.Add($"spec:{dependencySpecLine}");
var via = EdgeViaFactory.Create(
EdgeReason.DeclaredDependency,
evidence,
DetectorName,
"DEP-LOCK-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(from, to), via);
}
}
```
3. Make sure each SBOM/manifest edge sets:
* `reason = declared_dependency`
* `confidence = high`
* Evidence includes at least `manifest:<path>` and, if possible, line or spec snippet.
**Acceptance criteria:**
* [ ] Any SBOMgenerated edge returns with `via.reason == declared_dependency`.
* [ ] Evidence contains manifest path for ≥ 99% of SBOM edges.
* [ ] Unit tests cover at least: normal manifest, multiple manifests, malformed manifest.
---
### 2.2 Source code call graph edges
**Owner:** Static analysis / call graph team
**Tasks:**
1. Map current edge types → reasons:
* Direct function/method calls → `static_call`
* Reflection (Java/C#) → `reflection_call`
* Dynamic imports (`__import__`, `importlib`, `require(...)`) → `dynamic_import`
* Plugin systems (entry points, ServiceLoader, MEF) → `plugin_discovery`
* Taint / dataflow edges (user input → sink) → `taint_propagation`
2. Implement helper factories:
```csharp
public static class SourceEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Source@1.0.0";
public static GraphEdge StaticCall(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var evidence = new[]
{
$"callsite:{filePath}:{lineNumber}"
};
var via = EdgeViaFactory.Create(
EdgeReason.StaticCall,
evidence,
DetectorName,
"SRC-CALL-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
public static GraphEdge DynamicImport(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var via = EdgeViaFactory.Create(
EdgeReason.DynamicImport,
new[] { $"importsite:{filePath}:{lineNumber}" },
DetectorName,
"SRC-DYNIMPORT-001",
EdgeConfidence.Medium
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
// Similar for ReflectionCall, PluginDiscovery, TaintPropagation...
}
```
3. Replace all direct `new GraphEdge(...)` calls in source analyzers with these factories.
**Acceptance criteria:**
* [ ] Direct call edges produce `reason = static_call` with file:line evidence.
* [ ] Reflection/dynamic import edges use correct reasons and mark `confidence = medium` (or high where youre certain).
* [ ] Unit tests check that for a known source file, the resulting edges contain expected `reason`, `evidence`, and `rule_id`.
---
### 2.3 Binary / container analyzers
**Owner:** Binary analysis / SCA team
**Tasks:**
1. Map binary features to reasons:
* Symbol relocations + PLT/GOT edges → `symbol_relocation` or `plt_got_resolution`
* LD_PRELOAD or injection edges → `ld_preload_injection`
2. Implement factory:
```csharp
public static class BinaryEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Binary@1.0.0";
public static GraphEdge SymbolRelocation(
string fromSymbol,
string toSymbol,
string binaryPath,
string section,
string relocationName
)
{
var evidence = new[]
{
$"{binaryPath}::{section}:{relocationName}"
};
var via = EdgeViaFactory.Create(
EdgeReason.SymbolRelocation,
evidence,
DetectorName,
"BIN-RELOC-101",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
}
```
3. Wire up all binary edge creation to use this.
**Acceptance criteria:**
* [ ] For a test binary with a known relocation, edges include `reason = symbol_relocation` and section/symbol in evidence.
* [ ] No binary edge is created without `via`.
---
## 3. Storage & migrations
This depends on your backing store, but the pattern is similar.
### 3.1 Relational (SQL) example
**Owner:** Data / infra team
**Tasks:**
1. Add columns:
```sql
ALTER TABLE graph_edges
ADD COLUMN via_reason VARCHAR(64) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_evidence JSONB NOT NULL DEFAULT '[]'::jsonb,
ADD COLUMN via_detector VARCHAR(255) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_rule_id VARCHAR(128) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_confidence VARCHAR(16) NOT NULL DEFAULT 'low';
```
2. Update ORM model:
```csharp
public class EdgeEntity
{
public string From { get; set; } = default!;
public string To { get; set; } = default!;
public string ViaReason { get; set; } = "unknown";
public string[] ViaEvidence { get; set; } = Array.Empty<string>();
public string ViaDetector { get; set; } = "unknown";
public string ViaRuleId { get; set; } = "unknown";
public string ViaConfidence { get; set; } = "low";
}
```
3. Add mapping to domain `GraphEdge`:
```csharp
public static GraphEdge ToDomain(this EdgeEntity e)
{
var via = new EdgeVia(
Reason: Enum.TryParse<EdgeReason>(e.ViaReason, true, out var r) ? r : EdgeReason.Unknown,
Evidence: e.ViaEvidence,
Provenance: new EdgeProvenance(
Detector: e.ViaDetector,
RuleId: e.ViaRuleId,
Confidence: Enum.TryParse<EdgeConfidence>(e.ViaConfidence, true, out var c) ? c : EdgeConfidence.Low
)
);
return new GraphEdge(new EdgeId(e.From, e.To), via);
}
```
4. **Backfill existing data** (optional but recommended):
* For edges with a known “type” column, map to bestfit `reason`.
* If you cant infer: set `reason = unknown`, `confidence = low`, `detector = "backfill@<version>"`.
**Acceptance criteria:**
* [ ] DB migration runs cleanly in staging and prod.
* [ ] No existing reader breaks: default values keep queries functioning.
* [ ] Edge roundtrip (domain → DB → API JSON) retains `via` fields correctly.
---
## 4. API & service layer
**Owner:** API / service team
**Tasks:**
1. Wire domain model → DTOs:
```csharp
public static GraphEdgeDto ToDto(this GraphEdge edge)
{
return new GraphEdgeDto
{
From = edge.Id.From,
To = edge.Id.To,
Via = new EdgeViaDto
{
Reason = edge.Via.Reason.ToString().ToSnakeCaseLower(), // e.g. "static_call"
Evidence = edge.Via.Evidence.ToArray(),
Provenance = new EdgeProvenanceDto
{
Detector = edge.Via.Provenance.Detector,
RuleId = edge.Via.Provenance.RuleId,
Confidence = edge.Via.Provenance.Confidence.ToString().ToLowerInvariant()
}
}
};
}
```
2. If you accept edges via API (internal services), validate:
* `reason` must be one of the known values; otherwise reject or coerce to `unknown`.
* `evidence` length ≤ 3.
* Trim whitespace and limit each evidence string length (e.g. 256 chars).
3. Versioning:
* Introduce `/v2/graph/paths` (or similar) that guarantees `via`.
* Keep `/v1/...` unchanged or mark deprecated.
**Acceptance criteria:**
* [ ] Path API returns `via.reason` and `via.evidence` for all edges in new endpoints.
* [ ] Invalid reason strings are rejected or converted to `unknown` with a log.
* [ ] Integration tests cover full flow: repo → scanner → DB → API → JSON.
---
## 5. UI: make paths auditorfriendly
**Owner:** Frontend team
**Tasks:**
1. **Path details UI**:
For each edge in the vulnerability path table:
* Show a **“Reason” column** with a small pill:
* `static_call` → “Static call”
* `declared_dependency` → “Declared dependency”
* etc.
* Below or on hover, show **primary evidence** (first evidence string).
2. **Edge details panel** (drawer/modal):
When user clicks an edge:
* Show:
* From → To (symbols/packages)
* Reason (with friendly description per enum)
* Evidence list (each on its own line)
* Detector, rule id, confidence
3. **Filtering & sorting (optional but powerful)**:
* Filter edges by `reason` (multiselect).
* Filter by `confidence` (e.g. show only high/medium).
* This helps auditors quickly isolate more speculative edges.
4. **UX text / glossary**:
* Add a small “?” tooltip that links to a glossary explaining each reason type in human language.
**Acceptance criteria:**
* [ ] For a given vulnerability, the path view shows a “Reason” column per edge.
* [ ] Clicking an edge reveals all evidence and provenance information.
* [ ] UX has a glossary/tooltip explaining what each reason means in plain English.
---
## 6. Testing strategy
**Owner:** QA + each feature team
### 6.1 Unit tests
* **Factories**: verify correct mapping from input to `EdgeVia`:
* Reason set correctly.
* Evidence trimmed, max 3.
* Confidence matches rubric (high for relocations, medium for heuristic imports, etc.).
* **Serialization**: `EdgeVia` → JSON and back.
### 6.2 Integration tests
Set up **small fixtures**:
1. **Simple dependency project**:
* Example: Python project with `requirements.txt``requests``urllib3`.
* Expected edges:
* App → requests: `declared_dependency`, evidence includes `requirements.txt`.
* requests → urllib3: `declared_dependency`, plus static call edges.
2. **Dynamic import case**:
* A module using `importlib.import_module("mod")`.
* Ensure edge is `dynamic_import` with `confidence = medium`.
3. **Binary edge case**:
* Test ELF with known symbol relocation.
* Ensure an edge with `reason = symbol_relocation` exists.
### 6.3 Endtoend tests
* Run full scan on a sample repo and:
* Hit path API.
* Assert every edge has nonnull `via` fields.
* Spot check a few known edges for exact `reason` and evidence.
**Acceptance criteria:**
* [ ] Automated tests fail if any edge is emitted without `via`.
* [ ] Coverage includes at least one example for each `EdgeReason` you support.
---
## 7. Observability, guardrails & rollout
### 7.1 Metrics & logging
**Owner:** Observability / platform
**Tasks:**
* Emit metrics:
* `% edges with reason != unknown`
* Count by `reason` and `confidence`
* Log warnings when:
* Edge is emitted with `reason = unknown`.
* Evidence is empty for a nonunknown reason.
**Acceptance criteria:**
* [ ] Dashboards showing distribution of edge reasons over time.
* [ ] Alerts if `unknown` reason edges exceed a threshold (e.g. >5%).
---
### 7.2 Rollout plan
**Owner:** PM + tech leads
**Steps:**
1. **Phase 1 Darklaunch metadata:**
* Start generating & storing `via` for new scans.
* Keep UI unchanged.
* Monitor metrics, unknown ratio, and storage overhead.
2. **Phase 2 Enable for internal users:**
* Toggle UI on (feature flag for internal / beta users).
* Collect feedback from security engineers and auditors.
3. **Phase 3 General availability:**
* Enable UI for all.
* Update customerfacing documentation & audit guides.
---
### 7.3 Documentation
**Owner:** Docs / PM
* Short **“Why this edge exists”** section in:
* Product docs (for customers).
* Internal runbooks (for support & SEs).
* Include:
* Table of reasons → human descriptions.
* Examples of path explanations (e.g., “This edge exists because `app` declares `urllib3` in `requirements.txt` and calls it in `client.py:42`”).
---
## 8. Readytouse ticket breakdown
You can almost copypaste these into your tracker:
1. **Shared**: Define EdgeReason, EdgeVia & EdgeProvenance in shared library, plus EdgeViaFactory.
2. **SBOM**: Use EdgeFactory.DeclaredDependency for all manifestgenerated edges.
3. **Source**: Wire all callgraph edges to SourceEdgeFactory (static_call, dynamic_import, reflection_call, plugin_discovery, taint_propagation).
4. **Binary**: Wire relocations/PLT/GOT edges to BinaryEdgeFactory (symbol_relocation, plt_got_resolution, ld_preload_injection).
5. **Data**: Add via_* columns/properties to graph_edges storage and map to/from domain.
6. **API**: Extend graph path DTOs to include `via`, update OpenAPI, and implement /v2 endpoints if needed.
7. **UI**: Show edge reason, evidence, and provenance in vulnerability path screens and add filters.
8. **Testing**: Add unit, integration, and endtoend tests ensuring every edge has nonnull `via`.
9. **Observability**: Add metrics and logs for edge reasons and unknown rates.
10. **Docs & rollout**: Write glossary + auditor docs and plan staged rollout.
---
If you tell me a bit about your current storage (e.g., Neo4j vs SQL) and the services names, I can tailor this into an even more literal set of code snippets and migrations to match your stack exactly.