stella-ops.org/git.stella-ops.org

Fork 0

Files

master e950474a77

Docs CI / lint-and-preview (push) Has been cancelled

Details

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

api-governance / spectral-lint (push) Has been cancelled

Details

oas-ci / oas-validate (push) Has been cancelled

Details

Policy Lint & Smoke / policy-lint (push) Has been cancelled

Details

Policy Simulation / policy-simulate (push) Has been cancelled

Details

SDK Publish & Sign / sdk-publish (push) Has been cancelled

Details

2025-11-27 15:16:31 +02:00

23 KiB

Raw Blame History

Here’s a quick win for making your vuln paths auditor‑friendly without retraining any models: add a plain‑language reason to every graph edge (why this edge exists). Think “introduced via dynamic import” or “symbol relocation via ld”, not jargon soup.

Why this helps

Explains reachability at a glance (auditors & devs can follow the story).
Reduces false‑positive fights (every hop justifies itself).
Stable across languages (no model changes, just metadata).

Minimal schema change

Add three fields to every edge in your call/dep graph (SBOM→Reachability→Fix plan):

{
  "from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
  "to":   "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
  "via": {
    "reason": "imported via top-level module dependency",
    "evidence": [
      "import urllib3 in requests/adapters.py:12",
      "pip freeze: urllib3==2.2.3"
    ],
    "provenance": {
      "detector": "StellaOps.Scanner.WebService@1.4.2",
      "rule_id": "PY-IMPORT-001",
      "confidence": "high"
    }
  }
}

Standard reason glossary (use as enum)

declared_dependency (manifest lock/SBOM edge)
static_call (direct call site with symbol ref)
dynamic_import (e.g., __import__, importlib, require(...))
reflection_call (C# MethodInfo.Invoke, Java reflection)
plugin_discovery (entry points, ServiceLoader, MEF)
symbol_relocation (ELF/PE/Mach‑O relocation binds)
plt_got_resolution (ELF PLT/GOT jump to symbol)
ld_preload_injection (runtime injected .so/.dll)
env_config_path (path read from env/config enables load)
taint_propagation (user input reaches sink)
vendor_patch_alias (function moved/aliased across versions)

Emission rules (keep it deterministic)

One reason per edge, short, lowercase snake_case from glossary.
Up to 3 evidence strings (file:line or binary section + symbol).
Confidence: high|medium|low with a single, stable rubric:
- high = exact symbol/call site or relocation
- medium = heuristic import/loader path
- low = inferred from naming or optional plugin

UI/Report snippet

Render paths like:

app → requests → urllib3 → OpenSSL EVP_PKEY_new_raw_private_key
  • declared_dependency (poetry.lock)
  • static_call (requests.adapters:345)
  • symbol_relocation (ELF .rela.plt: _EVP_PKEY_new_raw_private_key)

C# drop‑in (for your .NET 10 code)

Edge builder with reason/evidence:

public sealed record EdgeId(string From, string To);

public sealed record EdgeEvidence(
    string Reason,                  // enum string from glossary
    IReadOnlyList<string> Evidence, // file:line, symbol, section
    string Confidence,              // high|medium|low
    string Detector,                // component@version
    string RuleId                   // stable rule key
);

public sealed record GraphEdge(EdgeId Id, EdgeEvidence Via);

public static class EdgeFactory
{
    public static GraphEdge DeclaredDependency(string from, string to, string manifestPath)
        => new(new EdgeId(from, to),
               new EdgeEvidence(
                   Reason: "declared_dependency",
                   Evidence: new[] { $"manifest:{manifestPath}" },
                   Confidence: "high",
                   Detector: "StellaOps.Scanner.WebService@1.0.0",
                   RuleId: "DEP-LOCK-001"));

    public static GraphEdge SymbolRelocation(string from, string to, string objPath, string section, string symbol)
        => new(new EdgeId(from, to),
               new EdgeEvidence(
                   Reason: "symbol_relocation",
                   Evidence: new[] { $"{objPath}::{section}:{symbol}" },
                   Confidence: "high",
                   Detector: "StellaOps.Scanner.WebService@1.0.0",
                   RuleId: "BIN-RELOC-101"));
}

Integration checklist (fast path)

Emit via.reason/evidence/provenance for all edges (SBOM, source, binary).
Validate reason against glossary; reject free‑text.
Add a “Why this edge exists” column in your path tables.
In JSON/CSV exports, keep columns: from,to,reason,confidence,evidence0..2,rule_id.
In the console, collapse evidence by default; expand on click.

If you want, I’ll plug this into your Stella Ops graph contracts (Concelier/Cartographer) and produce the enum + validators and a tiny renderer for your docs. Cool, let’s turn this into a concrete, dev‑friendly implementation plan you can actually hand to teams.

I’ll structure it by phases and by component (schema, producers, APIs, UI, testing, rollout) so you can slice into tickets easily.

0. Recap of what we’re building

Goal: Every edge in your vuln path graph (SBOM → Reachability → Fix plan) carries machine‑readable, auditor‑friendly metadata:

{
  "from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
  "to":   "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
  "via": {
    "reason": "declared_dependency",       // from a controlled enum
    "evidence": [
      "manifest:requirements.txt:3",       // up to 3 short evidence strings
      "pip freeze: urllib3==2.2.3"
    ],
    "provenance": {
      "detector": "StellaOps.Scanner.WebService@1.4.2",
      "rule_id": "PY-IMPORT-001",
      "confidence": "high"
    }
  }
}

Standard reason glossary (enum):

declared_dependency
static_call
dynamic_import
reflection_call
plugin_discovery
symbol_relocation
plt_got_resolution
ld_preload_injection
env_config_path
taint_propagation
vendor_patch_alias
unknown (fallback only when you truly can’t do better)

1. Design & contracts (shared work for backend & frontend)

1.1 Define the canonical edge metadata types

Owner: Platform / shared lib team

Tasks:

In your shared C# library (used by scanners + API), define:

public enum EdgeReason
{
    Unknown = 0,
    DeclaredDependency,
    StaticCall,
    DynamicImport,
    ReflectionCall,
    PluginDiscovery,
    SymbolRelocation,
    PltGotResolution,
    LdPreloadInjection,
    EnvConfigPath,
    TaintPropagation,
    VendorPatchAlias
}

public enum EdgeConfidence
{
    Low = 0,
    Medium,
    High
}

public sealed record EdgeProvenance(
    string Detector,   // e.g., "StellaOps.Scanner.WebService@1.4.2"
    string RuleId,     // e.g., "PY-IMPORT-001"
    EdgeConfidence Confidence
);

public sealed record EdgeVia(
    EdgeReason Reason,
    IReadOnlyList<string> Evidence,
    EdgeProvenance Provenance
);

public sealed record EdgeId(string From, string To);

public sealed record GraphEdge(
    EdgeId Id,
    EdgeVia Via
);

Enforce max 3 evidence strings via a small helper to avoid accidental spam:

public static class EdgeViaFactory
{
    private const int MaxEvidence = 3;

    public static EdgeVia Create(
        EdgeReason reason,
        IEnumerable<string> evidence,
        string detector,
        string ruleId,
        EdgeConfidence confidence
    )
    {
        var ev = evidence
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .Take(MaxEvidence)
            .ToArray();

        return new EdgeVia(
            Reason: reason,
            Evidence: ev,
            Provenance: new EdgeProvenance(detector, ruleId, confidence)
        );
    }
}

Acceptance criteria:

EdgeReason enum defined and shared in a reusable package.
EdgeVia and EdgeProvenance types exist and are serializable to JSON.
Evidence is capped to 3 entries and cannot be null (empty list allowed).

1.2 API / JSON contract

Owner: API team

Tasks:

Extend your existing graph edge DTO to include via:

public sealed record GraphEdgeDto
{
    public string From { get; init; } = default!;
    public string To { get; init; } = default!;
    public EdgeViaDto Via { get; init; } = default!;
}

public sealed record EdgeViaDto
{
    public string Reason { get; init; } = default!;             // enum as string
    public string[] Evidence { get; init; } = Array.Empty<string>();
    public EdgeProvenanceDto Provenance { get; init; } = default!;
}

public sealed record EdgeProvenanceDto
{
    public string Detector { get; init; } = default!;
    public string RuleId { get; init; } = default!;
    public string Confidence { get; init; } = default!;         // "high|medium|low"
}

Ensure JSON is additive (backward compatible):

via is non‑nullable in responses from the new API version.
If you must keep a legacy endpoint, add v2 endpoints that guarantee via.

Update OpenAPI spec:

Document via.reason as enum string, including allowed values.
Document via.provenance.detector, rule_id, confidence.

Acceptance criteria:

OpenAPI / Swagger shows via.reason as a string enum + description.
New clients can deserialize edges with via without custom hacks.
Old clients remain unaffected (either keep old endpoint or allow them to ignore via).

2. Producers: add reasons & evidence where edges are created

You likely have 3 main edge producers:

SBOM / manifest / lockfile analyzers
Source analyzers (call graph, taint analysis)
Binary analyzers (ELF/PE/Mach‑O, containers)

Treat each as a mini‑project with identical patterns.

2.1 SBOM / manifest edges

Owner: SBOM / dep graph team

Tasks:

Identify all code paths that create “declared dependency” edges:
- Manifest → Package
- Root module → Imported package (if you store these explicitly)
Replace plain edge construction with factory calls:

public static class EdgeFactory
{
    private const string DetectorName = "StellaOps.Scanner.Sbom@1.0.0";

    public static GraphEdge DeclaredDependency(
        string from,
        string to,
        string manifestPath,
        string? dependencySpecLine
    )
    {
        var evidence = new List<string>
        {
            $"manifest:{manifestPath}"
        };

        if (!string.IsNullOrWhiteSpace(dependencySpecLine))
            evidence.Add($"spec:{dependencySpecLine}");

        var via = EdgeViaFactory.Create(
            EdgeReason.DeclaredDependency,
            evidence,
            DetectorName,
            "DEP-LOCK-001",
            EdgeConfidence.High
        );

        return new GraphEdge(new EdgeId(from, to), via);
    }
}

Make sure each SBOM/manifest edge sets:

reason = declared_dependency
confidence = high
Evidence includes at least manifest:<path> and, if possible, line or spec snippet.

Acceptance criteria:

Any SBOM‑generated edge returns with via.reason == declared_dependency.
Evidence contains manifest path for ≥ 99% of SBOM edges.
Unit tests cover at least: normal manifest, multiple manifests, malformed manifest.

2.2 Source code call graph edges

Owner: Static analysis / call graph team

Tasks:

Map current edge types → reasons:

Direct function/method calls → static_call
Reflection (Java/C#) → reflection_call
Dynamic imports (__import__, importlib, require(...)) → dynamic_import
Plugin systems (entry points, ServiceLoader, MEF) → plugin_discovery
Taint / dataflow edges (user input → sink) → taint_propagation

Implement helper factories:

public static class SourceEdgeFactory
{
    private const string DetectorName = "StellaOps.Scanner.Source@1.0.0";

    public static GraphEdge StaticCall(
        string fromSymbol,
        string toSymbol,
        string filePath,
        int lineNumber
    )
    {
        var evidence = new[]
        {
            $"callsite:{filePath}:{lineNumber}"
        };

        var via = EdgeViaFactory.Create(
            EdgeReason.StaticCall,
            evidence,
            DetectorName,
            "SRC-CALL-001",
            EdgeConfidence.High
        );

        return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
    }

    public static GraphEdge DynamicImport(
        string fromSymbol,
        string toSymbol,
        string filePath,
        int lineNumber
    )
    {
        var via = EdgeViaFactory.Create(
            EdgeReason.DynamicImport,
            new[] { $"importsite:{filePath}:{lineNumber}" },
            DetectorName,
            "SRC-DYNIMPORT-001",
            EdgeConfidence.Medium
        );

        return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
    }

    // Similar for ReflectionCall, PluginDiscovery, TaintPropagation...
}

Replace all direct new GraphEdge(...) calls in source analyzers with these factories.

Acceptance criteria:

Direct call edges produce reason = static_call with file:line evidence.
Reflection/dynamic import edges use correct reasons and mark confidence = medium (or high where you’re certain).
Unit tests check that for a known source file, the resulting edges contain expected reason, evidence, and rule_id.

2.3 Binary / container analyzers

Owner: Binary analysis / SCA team

Tasks:

Map binary features to reasons:

Symbol relocations + PLT/GOT edges → symbol_relocation or plt_got_resolution
LD_PRELOAD or injection edges → ld_preload_injection

Implement factory:

public static class BinaryEdgeFactory
{
    private const string DetectorName = "StellaOps.Scanner.Binary@1.0.0";

    public static GraphEdge SymbolRelocation(
        string fromSymbol,
        string toSymbol,
        string binaryPath,
        string section,
        string relocationName
    )
    {
        var evidence = new[]
        {
            $"{binaryPath}::{section}:{relocationName}"
        };

        var via = EdgeViaFactory.Create(
            EdgeReason.SymbolRelocation,
            evidence,
            DetectorName,
            "BIN-RELOC-101",
            EdgeConfidence.High
        );

        return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
    }
}

Wire up all binary edge creation to use this.

Acceptance criteria:

For a test binary with a known relocation, edges include reason = symbol_relocation and section/symbol in evidence.
No binary edge is created without via.

3. Storage & migrations

This depends on your backing store, but the pattern is similar.

3.1 Relational (SQL) example

Owner: Data / infra team

Tasks:

Add columns:

ALTER TABLE graph_edges
    ADD COLUMN via_reason VARCHAR(64) NOT NULL DEFAULT 'unknown',
    ADD COLUMN via_evidence JSONB NOT NULL DEFAULT '[]'::jsonb,
    ADD COLUMN via_detector VARCHAR(255) NOT NULL DEFAULT 'unknown',
    ADD COLUMN via_rule_id VARCHAR(128) NOT NULL DEFAULT 'unknown',
    ADD COLUMN via_confidence VARCHAR(16) NOT NULL DEFAULT 'low';

Update ORM model:

public class EdgeEntity
{
    public string From { get; set; } = default!;
    public string To { get; set; } = default!;

    public string ViaReason { get; set; } = "unknown";
    public string[] ViaEvidence { get; set; } = Array.Empty<string>();
    public string ViaDetector { get; set; } = "unknown";
    public string ViaRuleId { get; set; } = "unknown";
    public string ViaConfidence { get; set; } = "low";
}

Add mapping to domain GraphEdge:

public static GraphEdge ToDomain(this EdgeEntity e)
{
    var via = new EdgeVia(
        Reason: Enum.TryParse<EdgeReason>(e.ViaReason, true, out var r) ? r : EdgeReason.Unknown,
        Evidence: e.ViaEvidence,
        Provenance: new EdgeProvenance(
            Detector: e.ViaDetector,
            RuleId: e.ViaRuleId,
            Confidence: Enum.TryParse<EdgeConfidence>(e.ViaConfidence, true, out var c) ? c : EdgeConfidence.Low
        )
    );

    return new GraphEdge(new EdgeId(e.From, e.To), via);
}

Backfill existing data (optional but recommended):

For edges with a known “type” column, map to best‑fit reason.
If you can’t infer: set reason = unknown, confidence = low, detector = "backfill@<version>".

Acceptance criteria:

DB migration runs cleanly in staging and prod.
No existing reader breaks: default values keep queries functioning.
Edge round‑trip (domain → DB → API JSON) retains via fields correctly.

4. API & service layer

Owner: API / service team

Tasks:

Wire domain model → DTOs:

public static GraphEdgeDto ToDto(this GraphEdge edge)
{
    return new GraphEdgeDto
    {
        From = edge.Id.From,
        To = edge.Id.To,
        Via = new EdgeViaDto
        {
            Reason = edge.Via.Reason.ToString().ToSnakeCaseLower(), // e.g. "static_call"
            Evidence = edge.Via.Evidence.ToArray(),
            Provenance = new EdgeProvenanceDto
            {
                Detector = edge.Via.Provenance.Detector,
                RuleId = edge.Via.Provenance.RuleId,
                Confidence = edge.Via.Provenance.Confidence.ToString().ToLowerInvariant()
            }
        }
    };
}

If you accept edges via API (internal services), validate:

reason must be one of the known values; otherwise reject or coerce to unknown.
evidence length ≤ 3.
Trim whitespace and limit each evidence string length (e.g. 256 chars).

Versioning:

Introduce /v2/graph/paths (or similar) that guarantees via.
Keep /v1/... unchanged or mark deprecated.

Acceptance criteria:

Path API returns via.reason and via.evidence for all edges in new endpoints.
Invalid reason strings are rejected or converted to unknown with a log.
Integration tests cover full flow: repo → scanner → DB → API → JSON.

5. UI: make paths auditor‑friendly

Owner: Frontend team

Tasks:

Path details UI:

For each edge in the vulnerability path table:
- Show a “Reason” column with a small pill:
  - static_call → “Static call”
  - declared_dependency → “Declared dependency”
  - etc.
- Below or on hover, show primary evidence (first evidence string).
Edge details panel (drawer/modal):

When user clicks an edge:
- Show:
  - From → To (symbols/packages)
  - Reason (with friendly description per enum)
  - Evidence list (each on its own line)
  - Detector, rule id, confidence
Filtering & sorting (optional but powerful):
- Filter edges by reason (multi‑select).
- Filter by confidence (e.g. show only high/medium).
- This helps auditors quickly isolate more speculative edges.
UX text / glossary:
- Add a small “?” tooltip that links to a glossary explaining each reason type in human language.

Acceptance criteria:

For a given vulnerability, the path view shows a “Reason” column per edge.
Clicking an edge reveals all evidence and provenance information.
UX has a glossary/tooltip explaining what each reason means in plain English.

6. Testing strategy

Owner: QA + each feature team

6.1 Unit tests

Factories: verify correct mapping from input to EdgeVia:
- Reason set correctly.
- Evidence trimmed, max 3.
- Confidence matches rubric (high for relocations, medium for heuristic imports, etc.).
Serialization: EdgeVia → JSON and back.

6.2 Integration tests

Set up small fixtures:

Simple dependency project:
- Example: Python project with requirements.txt → requests → urllib3.
- Expected edges:
  - App → requests: declared_dependency, evidence includes requirements.txt.
  - requests → urllib3: declared_dependency, plus static call edges.
Dynamic import case:
- A module using importlib.import_module("mod").
- Ensure edge is dynamic_import with confidence = medium.
Binary edge case:
- Test ELF with known symbol relocation.
- Ensure an edge with reason = symbol_relocation exists.

6.3 End‑to‑end tests

Run full scan on a sample repo and:
- Hit path API.
- Assert every edge has non‑null via fields.
- Spot check a few known edges for exact reason and evidence.

Acceptance criteria:

Automated tests fail if any edge is emitted without via.
Coverage includes at least one example for each EdgeReason you support.

7. Observability, guardrails & rollout

7.1 Metrics & logging

Owner: Observability / platform

Tasks:

Emit metrics:
- % edges with reason != unknown
- Count by reason and confidence
Log warnings when:
- Edge is emitted with reason = unknown.
- Evidence is empty for a non‑unknown reason.

Acceptance criteria:

Dashboards showing distribution of edge reasons over time.
Alerts if unknown reason edges exceed a threshold (e.g. >5%).

7.2 Rollout plan

Owner: PM + tech leads

Steps:

Phase 1 – Dark‑launch metadata:
- Start generating & storing via for new scans.
- Keep UI unchanged.
- Monitor metrics, unknown ratio, and storage overhead.
Phase 2 – Enable for internal users:
- Toggle UI on (feature flag for internal / beta users).
- Collect feedback from security engineers and auditors.
Phase 3 – General availability:
- Enable UI for all.
- Update customer‑facing documentation & audit guides.

7.3 Documentation

Owner: Docs / PM

Short “Why this edge exists” section in:
- Product docs (for customers).
- Internal runbooks (for support & SEs).
Include:
- Table of reasons → human descriptions.
- Examples of path explanations (e.g., “This edge exists because app declares urllib3 in requirements.txt and calls it in client.py:42”).

8. Ready‑to‑use ticket breakdown

You can almost copy‑paste these into your tracker:

Shared: Define EdgeReason, EdgeVia & EdgeProvenance in shared library, plus EdgeViaFactory.
SBOM: Use EdgeFactory.DeclaredDependency for all manifest‑generated edges.
Source: Wire all callgraph edges to SourceEdgeFactory (static_call, dynamic_import, reflection_call, plugin_discovery, taint_propagation).
Binary: Wire relocations/PLT/GOT edges to BinaryEdgeFactory (symbol_relocation, plt_got_resolution, ld_preload_injection).
Data: Add via_* columns/properties to graph_edges storage and map to/from domain.
API: Extend graph path DTOs to include via, update OpenAPI, and implement /v2 endpoints if needed.
UI: Show edge reason, evidence, and provenance in vulnerability path screens and add filters.
Testing: Add unit, integration, and end‑to‑end tests ensuring every edge has non‑null via.
Observability: Add metrics and logs for edge reasons and unknown rates.
Docs & rollout: Write glossary + auditor docs and plan staged rollout.

If you tell me a bit about your current storage (e.g., Neo4j vs SQL) and the services’ names, I can tailor this into an even more literal set of code snippets and migrations to match your stack exactly.

23 KiB Raw Blame History Unescape Escape

Why this helps

Minimal schema change

Standard reason glossary (use as enum)

Emission rules (keep it deterministic)

UI/Report snippet

C# drop‑in (for your .NET 10 code)

Integration checklist (fast path)

0. Recap of what we’re building

1. Design & contracts (shared work for backend & frontend)

1.1 Define the canonical edge metadata types

1.2 API / JSON contract

2. Producers: add reasons & evidence where edges are created

2.1 SBOM / manifest edges

2.2 Source code call graph edges

2.3 Binary / container analyzers

3. Storage & migrations

3.1 Relational (SQL) example

4. API & service layer

5. UI: make paths auditor‑friendly

6. Testing strategy

6.1 Unit tests

6.2 Integration tests

6.3 End‑to‑end tests

7. Observability, guardrails & rollout

7.1 Metrics & logging

7.2 Rollout plan

7.3 Documentation

8. Ready‑to‑use ticket breakdown

23 KiB

Raw Blame History