Files

master 01f4943ab9 up

2025-12-14 16:23:44 +02:00

42 KiB

Raw Blame History

Here’s a practical blueprint for building a reachability‑first code+binary scanner that fuses static call‑graphs with runtime evidence, and scales to large monorepos/microservices.

1) Static analyzers (per language)

.NET (Roslyn / IL)

Parse solutions with Microsoft.CodeAnalysis.MSBuild, collect symbols, build call graph from ISymbol → IInvocationOperation.
Handle reflection edges by heuristics (string literals, Type.GetType, DI registrations).
IL pass: read assemblies with System.Reflection.Metadata to connect external/library calls.

Minimal sample:

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.MSBuild;

var ws = MSBuildWorkspace.Create();
var sln = await ws.OpenSolutionAsync(@"path\to.sln");
foreach (var proj in sln.Projects)
foreach (var doc in proj.Documents)
{
    var model = await doc.GetSemanticModelAsync();
    var root = await doc.GetSyntaxRootAsync();
    foreach (var node in root.DescendantNodes().OfType<Microsoft.CodeAnalysis.CSharp.Syntax.InvocationExpressionSyntax>())
    {
        var sym = model.GetSymbolInfo(node).Symbol as IMethodSymbol;
        if (sym != null)
        {
            // record edge: caller -> sym.ContainingType.Name + "." + sym.Name
        }
    }
}

Java (Soot or WALA)
- Build bytecode call graph (CHA/RTA/points‑to) and export edges.
- Seed entrypoints from public static void main, Spring Boot controllers, servlet mappings.
Node/Python
- Build AST + import graph; resolve exports (module.exports, export default, Python __all__).
- Track dynamic requires (best‑effort string eval); record web/router handlers as entrypoints.
Go/Rust
- Use build graph (Go modules, Cargo metadata) + AST to map main and handler functions.
- Include linker‑time features/conditions to avoid dead edges.
Binary‑only (containers, closed libs)
- Recover function boundaries (Ghidra/rizin), mine strings/imports, detect candidates for entrypoints from container ENTRYPOINT/CMD, service files, and exposed ports.
- Heuristics: exported symbols, syscall usage, and common framework stubs.

2) Runtime confirmation (evidence)

Windows/.NET: ETW sampling to “mint” runtime edges (method IDs, stack samples) without heavy overhead.
Linux/containers: eBPF/usdt or perf sampling to confirm hot paths; record PID→image→build info to link evidence back to SBOM components.
Rule: static edge exists → mark probable; static+runtime match → mark proven (confidence ↑, prioritize).

3) Entrypoint discovery

Web services: framework routers (ASP.NET Core endpoints, Spring mappings, Express routes, FastAPI decorators).
Jobs/CLIs: scheduler configs (Cron, systemd timers, k8s CronJobs).
Events: message consumers (RabbitMQ/Kafka topics), gRPC service maps.

Entrypoints seed reachability: start from entry, traverse call graph, intersect with SBOM → “reachable components + reachable vulns”.

4) Scale & storage

Shard by repo/service; compute graphs independently.
Compress with SCCs (strongly connected components) to shrink graph size.
Cap cardinality using hot‑path sampling (keep top‑N edges by observed frequency).
Cache: content‑addressed graphs keyed by (SBOM hash, compiler flags, env); invalidate on source/SBOM/CFG changes or new VEX/policy.
Store edges as (caller, callee, kind: static|runtime, weight, build-id) in Postgres; keep Valkey for ephemeral reachability queries.

5) SBOM/VEX linkage

Normalize package coordinates (purl), map symbols/binaries → SBOM components.
For each CVE:
- Reachable? (entrypoint‑anchored traversal hits affected symbol/library)
- Proven at runtime? (evidence present)
- Gated by config? (feature flags, platform checks)
Emit VEX with machine‑explainable reasons (e.g., not reachable, reachable but not loaded, reachable+proven).

6) APIs and outputs (developer‑friendly)

CLI
- scan graph --lang dotnet --sln path.sln --out graph.scc.json
- scan runtime --target pod/myservice --duration 30s --out stacks.json
- reachability join --graph graph.scc.json --runtime stacks.json --sbom bom.cdx.json --out reach.cdxr.json
HTTP
- POST /graph (upload call graph)
- POST /runtime (upload evidence)
- POST /reachability → returns ranked, evidence‑linked findings
Artifacts
- graph.scc.json (SCC‑compressed call graph)
- reach.cdxr.json (CycloneDX extension with evidence)
- vex.json (OpenVEX/CSAF w/ “justifications”)

7) Quality gates & tests

Golden images: tiny test services where reachable/unreachable CVEs are known.
Mutation tests: toggle entrypoints, flags, and ensure reachability shifts correctly.
Drift checks: if runtime sees edges not in static graph → open “coverage debt” issue.

8) Security & perf knobs

Sampling rate caps (CPU bound), PID/image allowlists, PII‑safe symbol hashing option.
Offline mode: bundle symbols + evidence into a replayable archive (deterministic re‑evaluation).

If you want, I can generate a starter repo layout (Roslyn worker, Java WALA worker, eBPF sampler, joiner, and a Postgres schema) tailored to your .NET 10 + microservices stack. Below is a developer-ready product + BA implementation specification for the Reachability-First Scanner described earlier, tailored to StellaOps (.NET 10) and your standing architecture rules (lattice algorithms run in scanner.webservice; Concelier/Excititor preserve prune source; Postgres is SoR; Valkey is ephemeral only).

StellaOps Reachability-First Scanner

Developer Implementation Specification (v1)

0) Objective and boundaries

Objective

Reduce vulnerability noise by classifying findings as Unreachable / Possibly Reachable / Reachable (Static) / Proven Reachable (Runtime) using:

Static call graph (best-effort; language-aware)
Runtime evidence (sampling, low overhead)
Entrypoint seeding (framework-aware)
Join against SBOM component mapping + vulnerability data (from Concelier) + VEX (from Excititor)

Non-goals (v1)

Perfect points-to analysis for all languages.
Full decompilation for every binary (support is “best-effort” with confidence).
Executing or fuzzing workloads.

1) Product behavior: what the user sees

1.1 Reachability statuses (canonical)

These labels must be stable across UI/CLI/API:

UNREACHABLE: no path from any discovered entrypoint to affected component/symbol.
POSSIBLY_REACHABLE: graph incomplete / dynamic behavior; heuristics indicate risk.
REACHABLE_STATIC: a static path exists from at least one entrypoint.
REACHABLE_PROVEN: runtime evidence confirms code path or library load (stronger than static).

Required explanation fields (always returned)

Every reachability classification must include:

why[]: list of structured reasons (machine-readable codes + human text)
evidence[]: references to graph paths and/or runtime samples
confidence: 0.0–1.0
scope: component-only or symbol-level (if symbol mapping exists)

1.2 Key UX outputs (pipeline-first)

CLI output for CI gates: stella scan reachability --format sarif|json
UI detail panel must show:
- Entry point(s) → path summary (k shortest paths, default k=3)
- Whether runtime proved it (samples, timestamps, container/build IDs)
- Which assumptions/heuristics were used (reflection, DI, dynamic import, etc.)

2) System architecture (StellaOps modules)

2.1 Services and responsibilities

`StellaOps.Scanner.WebService` (authoritative)

Owns the reachability pipeline and the lattice computation for reachability decisions. Responsibilities:

Ingest static graphs from language workers
Ingest runtime evidence (from collectors)
Normalize symbols → components (SBOM join)
Compute reachability results, confidence, and explanation artifacts
Expose query APIs and CI export formats
Persist everything to Postgres (SoR)
Use Valkey only as ephemeral accelerator

Language workers (stateless compute)

Examples:

StellaOps.Scanner.Worker.DotNet
StellaOps.Scanner.Worker.Java
StellaOps.Scanner.Worker.Node
StellaOps.Scanner.Worker.Python
StellaOps.Scanner.Worker.Go
StellaOps.Scanner.Worker.Rust
StellaOps.Scanner.Worker.Binary

Responsibilities:

Produce CallGraph.v1.json (+ optional Entrypoints.v1.json)
Provide symbol IDs stable within a scan (see hashing rules)

Runtime collectors (agent/sidecar; optional)

Windows: ETW/EventPipe sampling for .NET
Linux: eBPF/perf sampling for native; plus runtime-specific exporters where feasible

Collectors only emit evidence events; they never compute reachability.

Concelier / Excititor integration

Concelier provides vulnerability facts (CVE ↔ component versions).
Excititor provides VEX statements. Neither computes reachability or lattice merges; they provide pruned sources only.

3) Data contracts (hard requirements)

3.1 Stable identifiers

All graph nodes must have:

nodeId: stable across replays when code is unchanged.
symbolKey: canonical string (language-specific)
artifactKey: assembly/jar/module/binary identity (prefer build ID + path + hash)
Optional: purlCandidates[] (library mapping hints)

DotNet nodeId rule (v1): nodeId = SHA256(assemblyMvid + ":" + metadataToken + ":" + genericArity + ":" + signatureShape)

If token unavailable (source-only), fallback: SHA256(projectPath + ":" + file + ":" + span + ":" + symbolDisplayString)

3.2 CallGraph.v1.json

Minimum required schema:

{
  "schema": "stella.callgraph.v1",
  "scanKey": "uuid",
  "language": "dotnet|java|node|python|go|rust|binary",
  "artifacts": [{ "artifactKey": "…", "kind": "assembly|jar|module|binary", "sha256": "…" }],
  "nodes": [{
    "nodeId": "…",
    "artifactKey": "…",
    "symbolKey": "Namespace.Type::Method(…)",
    "visibility": "public|internal|private|unknown",
    "isEntrypointCandidate": false
  }],
  "edges": [{
    "from": "nodeId",
    "to": "nodeId",
    "kind": "static|heuristic",
    "reason": "direct_call|virtual_call|reflection_string|di_binding|dynamic_import|unknown",
    "weight": 1.0
  }],
  "entrypoints": [{
    "nodeId": "…",
    "kind": "http|grpc|cli|job|event|unknown",
    "route": "/api/orders/{id}",
    "framework": "aspnetcore|minimalapi|spring|express|unknown"
  }]
}

3.3 RuntimeEvidence.v1.json

{
  "schema": "stella.runtimeevidence.v1",
  "scanKey": "uuid",
  "collectedAt": "2025-12-14T10:00:00Z",
  "environment": {
    "os": "linux|windows",
    "k8s": { "namespace": "…", "pod": "…", "container": "…" },
    "imageDigest": "sha256:…",
    "buildId": "…"
  },
  "samples": [{
    "timestamp": "…",
    "pid": 1234,
    "threadId": 77,
    "frames": ["nodeId","nodeId","nodeId"],
    "sampleWeight": 1.0
  }],
  "loadedArtifacts": [{
    "artifactKey": "…",
    "evidence": "loaded_module|mapped_file|jar_loaded"
  }]
}

4) Postgres schema (system of record)

4.1 Core tables

You can implement with migrations in StellaOps.Scanner.Persistence (EF Core 9).

`scan`

scan_id uuid pk
created_at timestamptz
repo_uri text null
commit_sha text null
sbom_digest text (hash of SBOM input)
policy_digest text (hash of reachability policy inputs)
status text (NEW/RUNNING/DONE/FAILED)

Indexes:

(commit_sha, sbom_digest) for caching

`artifact`

artifact_id uuid pk
scan_id uuid fk
artifact_key text unique per scan
kind text
sha256 text
build_id text null
purl text null

Index:

(scan_id, artifact_key) unique

`cg_node`

scan_id uuid fk
node_id text (hash string)
artifact_key text
symbol_key text
visibility text
flags int (bitset: entrypointCandidate, external, generated, etc.) PK: (scan_id, node_id)

GIN index:

symbol_key trigram for search (optional)

`cg_edge`

scan_id uuid fk
from_node_id text
to_node_id text
kind smallint (0 static, 1 heuristic, 2 runtime_minted)
reason smallint
weight real PK: (scan_id, from_node_id, to_node_id, kind, reason)

Indexes:

(scan_id, from_node_id)
(scan_id, to_node_id)

`entrypoint`

scan_id uuid
node_id text
kind text
framework text
route text null PK: (scan_id, node_id, kind, framework, route)

`runtime_sample`

scan_id uuid
collected_at timestamptz
env_hash text (hash of environment identity)
sample_id bigserial pk
timestamp timestamptz
pid int
thread_id int
frames text[] (nodeIds)
weight real

Partition suggestion:

Partition by scan_id or by month depending on retention.

`symbol_component_map`

scan_id uuid
node_id text
purl text
mapping_kind text (exact|heuristic|external)
confidence real PK: (scan_id, node_id, purl)

`reachability_component`

scan_id uuid
purl text
status smallint (0 unreachable, 1 possible, 2 reachable_static, 3 reachable_proven)
confidence real
why jsonb
evidence jsonb PK: (scan_id, purl)

`reachability_finding`

scan_id uuid
cve_id text
purl text
status smallint
confidence real
why jsonb
evidence jsonb PK: (scan_id, cve_id, purl)

4.2 Valkey usage (ephemeral only)

Allowed:

Dedup keys for evidence ingest (short TTL)
Hot query cache: (scan_id, purl) → reachability result
Rate limits / nonces

Not allowed:

Authoritative queueing for scan state
Any “only copy” of results

5) Reachability computation (the actual algorithm)

5.1 Inputs

Call graph nodes/edges + entrypoints
Runtime evidence (optional)
SBOM (CycloneDX/SPDX) with purls
Concelier vulnerability facts (CVE ↔ purl/version ranges)
Excititor VEX statements (not affected / affected / under investigation)

5.2 Normalize to a graph suitable for traversal

In scanner.webservice:

Build adjacency list for cg_edge.kind in (static, heuristic)
Optionally compress SCCs:
- Compute SCCs (Tarjan/Kosaraju)
- Store SCC mapping for explanation paths (must remain explainable)

5.3 Entrypoint seeding rules

Entrypoints come from:

Worker-reported entrypoints (preferred)
Framework discovery in worker (ASP.NET maps, Spring mappings, etc.)
Fallback: Main, exported symbols, container CMD/ENTRYPOINT

If entrypoints are empty, mark all results as POSSIBLY_REACHABLE with reason NO_ENTRYPOINTS_DISCOVERED, unless runtime evidence exists.

5.4 Traversal

For each scan:

Start from all entrypoints; traverse reachable nodes.
Track:
- firstSeenFromEntrypoint[node] (for k-shortest path reconstruction)
- pathWitness[node] (parent pointers or compressed witness)

Produce:

reachableNodesStatic set

5.5 Join to components (SBOM)

Map reachable nodes to purls using symbol_component_map.

Mapping sources (priority order):

Exact binary symbol → package metadata (where available)
Assembly/jar/module to SBOM component (by hash/purl)
Heuristics: namespace prefixes, import paths, jar manifest, npm package.json, go module path

If a vulnerable purl is in SBOM but has no symbol mapping, component reachability defaults:

If artifact is loaded at runtime → at least REACHABLE_PROVEN (component level)
Else if referenced by static dependency graph → POSSIBLY_REACHABLE
Else → UNREACHABLE (with NO_SYMBOL_MAPPING reason)

5.6 Runtime evidence upgrade (“minting”)

If runtime evidence is present:

For each sample stack:
- Mark each frame node as “executed”
- Mint runtime edges: consecutive frames become cg_edge.kind=runtime_minted (optional table or derived view)
If any executed node maps to purl affected by CVE:
- Upgrade status to REACHABLE_PROVEN
If only loaded artifact exists:
- Upgrade component status to REACHABLE_PROVEN (component-only), but keep symbol-level as unknown.

5.7 Confidence scoring (deterministic)

A simple deterministic scoring function (v1) used everywhere:

Base:
- UNREACHABLE → 0.05
- POSSIBLY_REACHABLE → 0.35
- REACHABLE_STATIC → 0.70
- REACHABLE_PROVEN → 0.95
Modifiers:
- +0.10 if path uses only static edges (no heuristic)
- −0.15 if path includes reflection_string|dynamic_import
- +0.10 if runtime evidence hits a node in affected component
- −0.10 if entrypoints incomplete (NO_ENTRYPOINTS_DISCOVERED) Clamp to [0, 1].

All modifiers must be recorded in why[].

6) Language worker specs (what each worker must do)

6.1 .NET worker (Roslyn + optional IL)

Goal (v1): produce good-enough call graph + entrypoints for ASP.NET Core and workers.

Required features

Direct invocation edges: InvocationExpressionSyntax
Object creation edges: constructors
Delegate invocation: best-effort; record heuristic edge when target unresolved
Virtual/interface dispatch:
- record virtual_call edge to declared method
- optionally add edges to known overrides within solution (static, conservative)
Async/await: treat state machine calls as implementation detail; connect logical caller → awaited method

Entrypoint discovery (.NET)

Implement these detectors:

Program.Main (classic)
ASP.NET Core:
- Controllers: [ApiController], route attributes, action methods
- Minimal APIs: MapGet/MapPost/MapMethods patterns (syntactic + semantic)
- gRPC: MapGrpcService<T>() and service methods
- Hosted services: IHostedService, BackgroundService.ExecuteAsync as job entrypoints
Message consumers (if present): known libs patterns (e.g., MassTransit consumers)

Reflection and DI heuristics

Produce heuristic edges when you see:

Type.GetType("…"), Assembly.GetType, GetMethod("…"), Invoke
services.AddTransient<IFoo,Foo>() / AddScoped / AddSingleton
- Add edge IFoo → Foo constructor as di_binding heuristic
Activator.CreateInstance, ServiceProvider.GetService patterns

Output guarantees

Must not crash on partial compilation (missing refs); produce partial graph with why=COMPILATION_PARTIAL
Provide artifact_key per assembly/project output

6.2 Java / Node / Python / Go / Rust workers

v1 expectations:

Provide import graph + framework entrypoints + best-effort call edges.
Always label uncertain resolution as heuristic with a reason code.

6.3 Binary worker

v1 expectations:

Identify artifacts, exported symbols, imported libs, and candidate entrypoints from container metadata.
Provide component-level mapping primarily; symbol-level mapping only when confident.

7) APIs (scanner.webservice)

7.1 Ingestion endpoints

POST /api/scans → creates scan record (returns scanId)
POST /api/scans/{scanId}/callgraphs → accepts CallGraph.v1.json
POST /api/scans/{scanId}/runtimeevidence → accepts RuntimeEvidence.v1.json
POST /api/scans/{scanId}/sbom → accepts CycloneDX/SPDX
POST /api/scans/{scanId}/compute-reachability → triggers computation (idempotent)

Rules:

All ingests must be idempotent via contentDigest header (store seen digests in Postgres; Valkey may accelerate dedupe).
Reject mismatched scanKey/scanId.

7.2 Query endpoints

GET /api/scans/{scanId}/reachability/components?purl=...
GET /api/scans/{scanId}/reachability/findings?cve=...
GET /api/scans/{scanId}/reachability/explain?cve=...&purl=...
- returns why[] + path witness + sample refs

7.3 Export endpoints

GET /api/scans/{scanId}/exports/sarif
GET /api/scans/{scanId}/exports/cdxr (CycloneDX reachability extension)
GET /api/scans/{scanId}/exports/openvex (reachability justifications as VEX annotations)

8) Deterministic replay requirements (must-have)

Every reachability result must be reproducible from:

SBOM digest
CallGraph digests (per worker)
RuntimeEvidence digests (optional)
Concelier feed snapshot digest
Excititor VEX snapshot digest
Policy digest (confidence scoring + gating rules)

Implement ReplayManifest.json:

{
  "schema": "stella.replaymanifest.v1",
  "scanId": "uuid",
  "inputs": {
    "sbomDigest": "sha256:…",
    "callGraphs": [{"language":"dotnet","digest":"sha256:…"}],
    "runtimeEvidence": [{"digest":"sha256:…"}],
    "concelierSnapshot": "sha256:…",
    "excititorSnapshot": "sha256:…",
    "policyDigest": "sha256:…"
  }
}

9) Quality gates and acceptance criteria

9.1 Golden corpus (mandatory)

Create /tests/Reachability.Golden/ with:

Minimal ASP.NET controller app with known reachable endpoint → vulnerable lib call
Minimal app with vulnerable lib present but never called → unreachable
Reflection-based activation case → “possible” unless runtime proves
BackgroundService job case

Acceptance:

Each golden test asserts:
- Reachability status
- At least one why[] reason
- Deterministic confidence within ±0.01

9.2 Drift detection (mandatory)

If runtime minted edges not present in static graph above a threshold:

Emit COVERAGE_DRIFT warning with top missing edges
Store drift report in Postgres (reachability_drift table or JSONB field)

9.3 Performance SLOs (v1 targets)

1 medium service (100k LOC .NET) static graph: < 2 minutes on CI runner class machine
Reachability compute: < 30 seconds
Query GET finding: < 200ms p95 (use Postgres indexes + optional Valkey cache)

10) Implementation plan (developer checklist)

Milestone A — Data plumbing (1)

Implement Postgres tables + migrations
Implement ingestion endpoints
Persist call graphs and runtime evidence
Implement digesting + idempotency

Done when: you can ingest a call graph + SBOM and query stored nodes/edges.

Milestone B — DotNet worker MVP (2)

Roslyn graph extraction (direct calls + constructors)
ASP.NET Core entrypoints (controllers + minimal APIs)
Output CallGraph.v1.json

Done when: golden corpus “reachable/unreachable” works on static only.

Milestone C — Reachability compute (3)

Entry point seeding
SCC compression optional
Traversal + path witness generation
Join to SBOM purls
Generate reachability_component + reachability_finding

Done when: component-level reachability returns stable answers.

Milestone D — Runtime evidence upgrade (4)

Ingest runtime evidence
Upgrade statuses to REACHABLE_PROVEN
Drift report

Done when: runtime samples can flip a “possible” to “proven”.

Milestone E — Exports (5)

SARIF export
CycloneDX reachability extension export
OpenVEX annotations export

Done when: CI pipeline can gate on proven/static reachability.

If you want this converted into repo-facing artifacts (e.g., docs/modules/scanner/reachability-spec.md, EF Core entity classes skeletons, and OpenAPI snippets for the endpoints), tell me the exact module paths you use under /src/ and I’ll emit drop-in files matching your StellaOps naming conventions. Below is a concrete, implementable architecture + technology stack for StellaOps.Scanner.Worker.DotNet (the .NET static graph worker), written as if you’re handing it directly to an engineer to build.

1) Worker’s purpose in the StellaOps scanner system

StellaOps.Scanner.Worker.DotNet is a stateless “graph producer.” It does not compute reachability, confidence, or any lattice logic (that lives in scanner.webservice). The worker:

Discovers entrypoints (ASP.NET Core controllers, minimal APIs, gRPC, hosted services, etc.)
Extracts a static call graph (method → method edges)
Adds heuristic edges for DI/reflection/dynamic patterns
Emits CallGraph.v1.json and optionally uploads it to scanner.webservice

Key constraint: node IDs must be compatible with runtime evidence (EventPipe/ETW) mapping. That’s why we build node IDs from (Module MVID + metadata token) whenever possible.

2) Deployment model

2.1 Container image choice

You have two legitimate modes; implement both:

Mode A — “Artifacts-first” (preferred for security)

Input: already-built assemblies from CI (bin/Release/.../*.dll + associated files)
Worker does no dotnet build
Worker performs IL/metadata scanning + optional Roslyn source parsing for entrypoints/heuristics

Mode B — “Build-and-scan” (convenience; higher risk)

Input: repo checkout with .sln
Worker runs dotnet restore/dotnet build inside a sandboxed container, then scans outputs

Because .NET build can execute MSBuild tasks, analyzers, and source generators (code execution risk), the product-default should be Mode A in any untrusted scenario.

2.2 Runtime requirements

Base runtime: .NET 10 (LTS). Microsoft’s support policy lists .NET 10 as LTS with original release Nov 11, 2025 and latest patch 10.0.1 (Dec 9, 2025). (Microsoft)
If you use Mode B, the image must include .NET 10 SDK (not just runtime). (Microsoft)

2.3 Sandbox controls (Mode B)

If you allow building:

Run with no outbound network (or allowlist only internal NuGet proxy).
Read-only root FS; writable temp only.
Drop Linux capabilities; use seccomp/apparmor defaults.
Mount repo read-only; write outputs to a dedicated volume.
Disable telemetry: DOTNET_CLI_TELEMETRY_OPTOUT=1.

3) Core architecture (pipeline)

Implement the worker as a single executable (CLI) with internal pipeline stages:

┌───────────────────────────────────────────────────────────────┐
│ Worker.DotNet CLI                                              │
│  Inputs: --sln / --assemblies / --repo, --scanKey, --out       │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 0: Discovery                                              │
│  - Find solutions/projects or assemblies                         │
│  - Determine configuration/TFM                                   │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 1: Build (optional)                                       │
│  - dotnet restore/build OR skip                                 │
│  - Collect output assembly paths                                │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 2: Reference Indexer                                      │
│  - Build mapping: (AssemblyName, Version) -> artifactKey        │
│  - Compute sha256 per referenced dll                            │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 3: IL Call Graph Extractor                                │
│  - Parse each project assembly                                  │
│  - Create method nodes (nodeId = hash(MVID:token))              │
│  - Parse IL & add static edges (call/callvirt/newobj/ldftn...)  │
│  - Emit external nodes for member refs                           │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 4: Roslyn Entrypoints + Heuristics                        │
│  - Controllers/minimal APIs/gRPC/HostedService entrypoints      │
│  - DI binding edges (AddTransient/AddScoped/AddSingleton etc.)  │
│  - Reflection edges (Type.GetType/GetMethod/Invoke etc.)        │
│  - Resolve Roslyn symbols -> nodeIds via symbolKey dictionary    │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 5: Merge + Emit                                           │
│  - Merge nodes/edges/entrypoints                                │
│  - Output CallGraph.v1.json                                     │
│  - Optional POST to scanner.webservice                           │
└───────────────────────────────────────────────────────────────┘

Why IL-first? Because you want metadata token + MVID node IDs that correlate naturally with runtime stacks. Deterministic builds make MVID stable for identical compilation inputs. (Microsoft Learn)

4) Technology stack (NuGet + platform APIs)

4.1 Roslyn / MSBuild loading

Use Roslyn MSBuild workspace packages:

Microsoft.CodeAnalysis.Workspaces.MSBuild (MSBuildWorkspace support) (NuGet)
Microsoft.CodeAnalysis.CSharp.Workspaces (C# semantic model / operations API)
Optional: Microsoft.CodeAnalysis meta-package (superset) (NuGet)
Microsoft.Build.Locator (register MSBuild instances for workspace loading)

Roslyn packages are actively published by RoslynTeam (latest shown as 5.0.0 as of Nov 2025). (NuGet)

4.2 IL + metadata scanning

Prefer BCL APIs (no extra dependencies):

System.Reflection.Metadata
System.Reflection.PortableExecutable
System.Reflection.Emit.OpCodes for IL decoding (operand sizes) (This lets you implement a compact IL parser without Cecil.)

Optional alternative (faster development, more deps):

Mono.Cecil (makes IL traversal trivial) (NuGet)

4.3 CLI + logging + JSON

System.CommandLine (recommended)
Microsoft.Extensions.Logging (+ Console logger)
System.Text.Json (source-generated serializers strongly recommended)

4.4 Runtime alignment note

Runtime collectors commonly rely on EventPipe/ETW; the .NET diagnostics client library (Microsoft.Diagnostics.NETCore.Client) is the standard managed API for EventPipe sessions. (Microsoft Learn) The worker itself doesn’t collect runtime evidence, but the nodeId algorithm must match what runtime collectors can compute (hence MVID+token).

5) Internal module decomposition

Implement these internal components as classes/services. Keep them testable (pure functions where possible).

5.1 `WorkerOptions`

Holds CLI options:

ScanKey (uuid)
RepoRoot, SolutionPath OR AssembliesPath[]
Configuration (default Release)
TargetFramework (optional)
BuildMode = Artifacts | Build
OutFile
UploadUrl + ApiKey (optional)
MaxEdgesPerNode (optional throttle)
IncludeExternalNodes (bool)
Concurrency (int)

5.2 `BuildOrchestrator` (Mode B only)

Responsibilities:

Run dotnet restore and dotnet build
Capture output logs and surface them as structured diagnostics
Return discovered output assemblies (dll paths)

Hard requirements:

Support --no-restore and --no-build toggles (or equivalent)
Support ContinuousIntegrationBuild=true to improve determinism when available
If build fails, still attempt to scan any assemblies that exist, but mark output with why=BUILD_FAILED_PARTIAL.

5.3 `MsbuildWorkspaceLoader` (Roslyn)

Responsibilities:

Register MSBuild with MSBuildLocator
Load .sln via MSBuildWorkspace
Provide:
- Solution object
- Project list (C# only for v1)
- Compilation(s) when needed (for semantic analysis)

MSBuildWorkspace is the canonical Roslyn path for analyzing MSBuild solutions. (NuGet)

5.4 `ReferenceIndexer`

Responsibilities:

Build a map from referenced assemblies to artifactKey
For each PortableExecutableReference with a file path:
- compute sha256
- read assembly identity (name, version)
- create artifactKey
- add to:
  - AssemblyIdentity -> artifactKey
  - artifactKey -> sha256/path/version

This index is used by IL extractor to attribute external nodes to correct artifacts.

5.5 `IlCallGraphExtractor`

Responsibilities:

For each “root” assembly (project output):
- open PE
- get module MVID
- enumerate MethodDefinition rows
- create nodes for all methods
- parse IL bodies and emit edges

IL parsing scope (v1)

You only need to recognize these opcodes as “calls”:

call
callvirt
newobj
jmp
ldftn
ldvirtftn

Node identity

Internal method nodeId:
- nodeId = SHA256( MVID + ":" + metadataToken + ":" + arity + ":" + signatureShape )
- Minimal acceptable: SHA256(MVID + ":" + metadataToken)

This is intentionally compatible with how runtime stacks identify methods (module + token).

External method nodes

If a call operand is a MemberRef/MethodSpec that targets another assembly:

Create an “external node” with:
- symbolKey computed from metadata signature
- artifactKey resolved via ReferenceIndexer (assembly identity match)
- nodeId = SHA256("ext:" + artifactKey + ":" + symbolKey) (runtime-proof not required)

Set flags |= External.

5.6 `RoslynEntrypointExtractor`

Responsibilities:

Produce entrypoints[] records pointing to nodeIds.

Must support (v1)

ASP.NET Core MVC controllers

Type has [ApiController] or derives from ControllerBase
Action methods: public instance methods with routing attributes [HttpGet], [HttpPost], [Route], etc.
Route template:
- combine controller + action route attributes (best effort)
entrypoint.kind = http, framework=aspnetcore

Minimal APIs

Detect invocation of MapGet, MapPost, MapPut, MapDelete, MapMethods
Extract route string literal when available
Handler target:
- lambda => map to generated method? (best effort)
- method group => resolve to method symbolKey => nodeId

gRPC

Detect MapGrpcService<T>() (endpoint registration)
Entry points: service methods on generated base types (best effort)

Background jobs

Types implementing IHostedService
BackgroundService.ExecuteAsync override
entrypoint.kind = job

Mapping Roslyn → nodeId

Do not attempt to compute metadata tokens from Roslyn symbols directly.

Instead:

Generate the same canonical symbolKey for Roslyn symbols
Resolve symbolKey -> nodeId using a dictionary built from IL nodes

If not resolvable, emit an entrypoint with a synthetic “unresolved” node:

nodeId = SHA256("unresolved:" + symbolKey)
flags |= Unresolved
why += ENTRYPOINT_SYMBOL_UNRESOLVED

5.7 `RoslynHeuristicEdgeExtractor`

Responsibilities:

Add heuristic edges that IL won’t reliably capture.

DI bindings (must-have)

Detect common DI registration patterns:

services.AddTransient<IFoo, Foo>()
AddScoped, AddSingleton Emit heuristic edge:
from: interface method set? (v1 simplify to type-level constructor edge)
to: Foo..ctor(...) node
reason = di_binding

Practical v1 implementation:

Create edge from a synthetic “DI container” node per assembly to implementation constructors.
Or create edges from the registration site method to the constructor. (Choose one and keep consistent.)

Reflection (must-have)

Emit heuristic edges with lower confidence:

Type.GetType("Namespace.Type, Assembly")
Assembly.Load(...), GetMethod("X"), Invoke
Activator.CreateInstance(...)

If string literal resolves to a type/method in the solution, create edge:

from: caller method
to: target method/ctor
reason = reflection_string

If not resolvable, record a why=REFLECTION_UNRESOLVED_STRING diagnostic; do not crash.

5.8 `GraphMerger`

Responsibilities:

Merge nodes/edges/entrypoints from IL and Roslyn stages
De-duplicate edges by (from,to,kind,reason)
Apply optional throttles:
- cap edges per node
- drop low-weight heuristics if too many

5.9 `CallGraphWriter`

Responsibilities:

Serialize CallGraph.v1.json exactly to spec
Include:
- artifacts[] (project outputs + references)
- nodes[], edges[]
- entrypoints[]
- language = "dotnet"
- scanKey

6) Canonical symbolKey format (critical for merges)

Pick one canonical form and use it everywhere.

Recommended v1 symbolKey shape:

{Namespace}.{TypeName}[`Arity][+Nested]::{MethodName}[`Arity]({ParamType1},{ParamType2},...)

Rules:

Use System.* full names for BCL types
Use + for nested types (metadata style)
Use backtick arity for generic type/method definitions
For arrays: System.String[]
For byref: System.String&

Implementation detail:

IL extractor can build this from metadata signatures.
Roslyn extractor can build this using a controlled SymbolDisplayFormat.

If you get this right, Roslyn → IL mapping becomes reliable.

7) CLI surface (what developers will actually run)

Minimum viable commands:

Artifacts-first scan

stella-worker-dotnet scan \
  --scanKey 00000000-0000-0000-0000-000000000000 \
  --assemblies ./artifacts/bin/Release \
  --out ./callgraph.json

Build-and-scan (internal trusted only)

stella-worker-dotnet scan \
  --scanKey ... \
  --sln ./src/MySolution.sln \
  --configuration Release \
  --tfm net10.0 \
  --buildMode build \
  --out ./callgraph.json

Upload to scanner.webservice

stella-worker-dotnet scan \
  --scanKey ... \
  --assemblies ./artifacts/bin/Release \
  --upload https://scanner/api/scans/{scanId}/callgraphs \
  --apiKey $STELLA_API_KEY

8) Observability and failure behavior

8.1 Structured diagnostics

Always emit:

counts: nodes/edges/entrypoints
build outcome: success/failed/partial
list of projects scanned/skipped
unresolved symbol counts (entrypoints + heuristic edges)

8.2 Hard failure vs partial output

If at least one assembly scanned, output a graph even if others fail.
Mark diagnostics in output:
- add why/notes (if you extend schema) OR log to stderr and let webservice record the warning on ingest.

9) Why this architecture works for reachability

IL extraction provides the most faithful call edges and stable node IDs.
Roslyn adds what IL can’t: framework entrypoints, DI and reflection heuristics.
Node IDs based on MVID+token align with deterministic compilation behavior and runtime tooling patterns. Deterministic compilation replaces timestamp/MVID with values derived from compilation inputs, enabling stable identity under stable inputs. (Microsoft Learn)
Roslyn MSBuildWorkspace is the canonical way to load solutions/projects with correct references and compilation options. (NuGet)

10) Implementation “must-do” checklist for the developer

Define canonical symbolKey and implement it in:
- IL extractor
- Roslyn symbol formatter
Implement ReferenceIndexer to map assembly identity → artifactKey.
Implement IL extractor:
- nodes for method defs
- edges for call opcodes
Implement entrypoint detectors (controllers + minimal APIs + hosted service).
Implement DI + reflection heuristic edges.
Merge and output CallGraph.v1.json matching schema.
Add golden tests (small ASP.NET apps) verifying:
- entrypoint detection
- at least one static path exists
- DI/reflection edges appear with correct reason codes

If you want, I can also provide:

a concrete folder layout (/src/StellaOps.Scanner.Worker.DotNet/...) with project files,
the exact SymbolDisplayFormat you should use for symbolKey stability,
and an IL opcode decoder snippet that only records call-like opcodes (fast, minimal).

42 KiB Raw Blame History Unescape Escape

1) Static analyzers (per language)

2) Runtime confirmation (evidence)

3) Entrypoint discovery

4) Scale & storage

5) SBOM/VEX linkage

6) APIs and outputs (developer‑friendly)

7) Quality gates & tests

8) Security & perf knobs

StellaOps Reachability-First Scanner

Developer Implementation Specification (v1)

0) Objective and boundaries

Objective

Non-goals (v1)

1) Product behavior: what the user sees

1.1 Reachability statuses (canonical)

Required explanation fields (always returned)

1.2 Key UX outputs (pipeline-first)

2) System architecture (StellaOps modules)

2.1 Services and responsibilities

StellaOps.Scanner.WebService (authoritative)

Language workers (stateless compute)

Runtime collectors (agent/sidecar; optional)

Concelier / Excititor integration

3) Data contracts (hard requirements)

3.1 Stable identifiers

3.2 CallGraph.v1.json

3.3 RuntimeEvidence.v1.json

4) Postgres schema (system of record)

4.1 Core tables

scan

artifact

cg_node

cg_edge

entrypoint

runtime_sample

symbol_component_map

reachability_component

reachability_finding

4.2 Valkey usage (ephemeral only)

5) Reachability computation (the actual algorithm)

5.1 Inputs

5.2 Normalize to a graph suitable for traversal

5.3 Entrypoint seeding rules

5.4 Traversal

5.5 Join to components (SBOM)

5.6 Runtime evidence upgrade (“minting”)

5.7 Confidence scoring (deterministic)

6) Language worker specs (what each worker must do)

6.1 .NET worker (Roslyn + optional IL)

Required features

Entrypoint discovery (.NET)

Reflection and DI heuristics

Output guarantees

6.2 Java / Node / Python / Go / Rust workers

6.3 Binary worker

7) APIs (scanner.webservice)

7.1 Ingestion endpoints

7.2 Query endpoints

7.3 Export endpoints

8) Deterministic replay requirements (must-have)

9) Quality gates and acceptance criteria

9.1 Golden corpus (mandatory)

9.2 Drift detection (mandatory)

9.3 Performance SLOs (v1 targets)

10) Implementation plan (developer checklist)

Milestone A — Data plumbing (1)

Milestone B — DotNet worker MVP (2)

Milestone C — Reachability compute (3)

Milestone D — Runtime evidence upgrade (4)

Milestone E — Exports (5)

1) Worker’s purpose in the StellaOps scanner system

2) Deployment model

2.1 Container image choice

Mode A — “Artifacts-first” (preferred for security)

Mode B — “Build-and-scan” (convenience; higher risk)

2.2 Runtime requirements

2.3 Sandbox controls (Mode B)

3) Core architecture (pipeline)

4) Technology stack (NuGet + platform APIs)

42 KiB

Raw Blame History

`StellaOps.Scanner.WebService` (authoritative)

`scan`

`artifact`

`cg_node`

`cg_edge`

`entrypoint`

`runtime_sample`

`symbol_component_map`

`reachability_component`

`reachability_finding`

5.1 `WorkerOptions`

5.2 `BuildOrchestrator` (Mode B only)

5.3 `MsbuildWorkspaceLoader` (Roslyn)

5.4 `ReferenceIndexer`

5.5 `IlCallGraphExtractor`

5.6 `RoslynEntrypointExtractor`

5.7 `RoslynHeuristicEdgeExtractor`

5.8 `GraphMerger`

5.9 `CallGraphWriter`