Files
git.stella-ops.org/docs/product-advisories/13-Dec-2025 - Designing the Call‑Stack Reachability Engine.md
2025-12-14 16:23:44 +02:00

42 KiB
Raw Blame History

Heres a practical blueprint for building a reachabilityfirst code+binary scanner that fuses static callgraphs with runtime evidence, and scales to large monorepos/microservices.


1) Static analyzers (per language)

  • .NET (Roslyn / IL)

    • Parse solutions with Microsoft.CodeAnalysis.MSBuild, collect symbols, build call graph from ISymbolIInvocationOperation.

    • Handle reflection edges by heuristics (string literals, Type.GetType, DI registrations).

    • IL pass: read assemblies with System.Reflection.Metadata to connect external/library calls.

    • Minimal sample:

      using Microsoft.CodeAnalysis;
      using Microsoft.CodeAnalysis.CSharp;
      using Microsoft.CodeAnalysis.MSBuild;
      
      var ws = MSBuildWorkspace.Create();
      var sln = await ws.OpenSolutionAsync(@"path\to.sln");
      foreach (var proj in sln.Projects)
      foreach (var doc in proj.Documents)
      {
          var model = await doc.GetSemanticModelAsync();
          var root = await doc.GetSyntaxRootAsync();
          foreach (var node in root.DescendantNodes().OfType<Microsoft.CodeAnalysis.CSharp.Syntax.InvocationExpressionSyntax>())
          {
              var sym = model.GetSymbolInfo(node).Symbol as IMethodSymbol;
              if (sym != null)
              {
                  // record edge: caller -> sym.ContainingType.Name + "." + sym.Name
              }
          }
      }
      
  • Java (Soot or WALA)

    • Build bytecode call graph (CHA/RTA/pointsto) and export edges.
    • Seed entrypoints from public static void main, Spring Boot controllers, servlet mappings.
  • Node/Python

    • Build AST + import graph; resolve exports (module.exports, export default, Python __all__).
    • Track dynamic requires (besteffort string eval); record web/router handlers as entrypoints.
  • Go/Rust

    • Use build graph (Go modules, Cargo metadata) + AST to map main and handler functions.
    • Include linkertime features/conditions to avoid dead edges.
  • Binaryonly (containers, closed libs)

    • Recover function boundaries (Ghidra/rizin), mine strings/imports, detect candidates for entrypoints from container ENTRYPOINT/CMD, service files, and exposed ports.
    • Heuristics: exported symbols, syscall usage, and common framework stubs.

2) Runtime confirmation (evidence)

  • Windows/.NET: ETW sampling to “mint” runtime edges (method IDs, stack samples) without heavy overhead.
  • Linux/containers: eBPF/usdt or perf sampling to confirm hot paths; record PID→image→build info to link evidence back to SBOM components.
  • Rule: static edge exists → mark probable; static+runtime match → mark proven (confidence ↑, prioritize).

3) Entrypoint discovery

  • Web services: framework routers (ASP.NET Core endpoints, Spring mappings, Express routes, FastAPI decorators).
  • Jobs/CLIs: scheduler configs (Cron, systemd timers, k8s CronJobs).
  • Events: message consumers (RabbitMQ/Kafka topics), gRPC service maps.

Entrypoints seed reachability: start from entry, traverse call graph, intersect with SBOM → “reachable components + reachable vulns”.


4) Scale & storage

  • Shard by repo/service; compute graphs independently.
  • Compress with SCCs (strongly connected components) to shrink graph size.
  • Cap cardinality using hotpath sampling (keep topN edges by observed frequency).
  • Cache: contentaddressed graphs keyed by (SBOM hash, compiler flags, env); invalidate on source/SBOM/CFG changes or new VEX/policy.
  • Store edges as (caller, callee, kind: static|runtime, weight, build-id) in Postgres; keep Valkey for ephemeral reachability queries.

5) SBOM/VEX linkage

  • Normalize package coordinates (purl), map symbols/binaries → SBOM components.

  • For each CVE:

    • Reachable? (entrypointanchored traversal hits affected symbol/library)
    • Proven at runtime? (evidence present)
    • Gated by config? (feature flags, platform checks)
  • Emit VEX with machineexplainable reasons (e.g., not reachable, reachable but not loaded, reachable+proven).


6) APIs and outputs (developerfriendly)

  • CLI

    • scan graph --lang dotnet --sln path.sln --out graph.scc.json
    • scan runtime --target pod/myservice --duration 30s --out stacks.json
    • reachability join --graph graph.scc.json --runtime stacks.json --sbom bom.cdx.json --out reach.cdxr.json
  • HTTP

    • POST /graph (upload call graph)
    • POST /runtime (upload evidence)
    • POST /reachability → returns ranked, evidencelinked findings
  • Artifacts

    • graph.scc.json (SCCcompressed call graph)
    • reach.cdxr.json (CycloneDX extension with evidence)
    • vex.json (OpenVEX/CSAF w/ “justifications”)

7) Quality gates & tests

  • Golden images: tiny test services where reachable/unreachable CVEs are known.
  • Mutation tests: toggle entrypoints, flags, and ensure reachability shifts correctly.
  • Drift checks: if runtime sees edges not in static graph → open “coverage debt” issue.

8) Security & perf knobs

  • Sampling rate caps (CPU bound), PID/image allowlists, PIIsafe symbol hashing option.
  • Offline mode: bundle symbols + evidence into a replayable archive (deterministic reevaluation).

If you want, I can generate a starter repo layout (Roslyn worker, Java WALA worker, eBPF sampler, joiner, and a Postgres schema) tailored to your .NET 10 + microservices stack. Below is a developer-ready product + BA implementation specification for the Reachability-First Scanner described earlier, tailored to StellaOps (.NET 10) and your standing architecture rules (lattice algorithms run in scanner.webservice; Concelier/Excititor preserve prune source; Postgres is SoR; Valkey is ephemeral only).


StellaOps Reachability-First Scanner

Developer Implementation Specification (v1)

0) Objective and boundaries

Objective

Reduce vulnerability noise by classifying findings as Unreachable / Possibly Reachable / Reachable (Static) / Proven Reachable (Runtime) using:

  1. Static call graph (best-effort; language-aware)
  2. Runtime evidence (sampling, low overhead)
  3. Entrypoint seeding (framework-aware)
  4. Join against SBOM component mapping + vulnerability data (from Concelier) + VEX (from Excititor)

Non-goals (v1)

  • Perfect points-to analysis for all languages.
  • Full decompilation for every binary (support is “best-effort” with confidence).
  • Executing or fuzzing workloads.

1) Product behavior: what the user sees

1.1 Reachability statuses (canonical)

These labels must be stable across UI/CLI/API:

  • UNREACHABLE: no path from any discovered entrypoint to affected component/symbol.
  • POSSIBLY_REACHABLE: graph incomplete / dynamic behavior; heuristics indicate risk.
  • REACHABLE_STATIC: a static path exists from at least one entrypoint.
  • REACHABLE_PROVEN: runtime evidence confirms code path or library load (stronger than static).

Required explanation fields (always returned)

Every reachability classification must include:

  • why[]: list of structured reasons (machine-readable codes + human text)
  • evidence[]: references to graph paths and/or runtime samples
  • confidence: 0.01.0
  • scope: component-only or symbol-level (if symbol mapping exists)

1.2 Key UX outputs (pipeline-first)

  • CLI output for CI gates: stella scan reachability --format sarif|json

  • UI detail panel must show:

    • Entry point(s) → path summary (k shortest paths, default k=3)
    • Whether runtime proved it (samples, timestamps, container/build IDs)
    • Which assumptions/heuristics were used (reflection, DI, dynamic import, etc.)

2) System architecture (StellaOps modules)

2.1 Services and responsibilities

StellaOps.Scanner.WebService (authoritative)

Owns the reachability pipeline and the lattice computation for reachability decisions. Responsibilities:

  • Ingest static graphs from language workers
  • Ingest runtime evidence (from collectors)
  • Normalize symbols → components (SBOM join)
  • Compute reachability results, confidence, and explanation artifacts
  • Expose query APIs and CI export formats
  • Persist everything to Postgres (SoR)
  • Use Valkey only as ephemeral accelerator

Language workers (stateless compute)

Examples:

  • StellaOps.Scanner.Worker.DotNet
  • StellaOps.Scanner.Worker.Java
  • StellaOps.Scanner.Worker.Node
  • StellaOps.Scanner.Worker.Python
  • StellaOps.Scanner.Worker.Go
  • StellaOps.Scanner.Worker.Rust
  • StellaOps.Scanner.Worker.Binary

Responsibilities:

  • Produce CallGraph.v1.json (+ optional Entrypoints.v1.json)
  • Provide symbol IDs stable within a scan (see hashing rules)

Runtime collectors (agent/sidecar; optional)

  • Windows: ETW/EventPipe sampling for .NET
  • Linux: eBPF/perf sampling for native; plus runtime-specific exporters where feasible

Collectors only emit evidence events; they never compute reachability.

Concelier / Excititor integration

  • Concelier provides vulnerability facts (CVE ↔ component versions).
  • Excititor provides VEX statements. Neither computes reachability or lattice merges; they provide pruned sources only.

3) Data contracts (hard requirements)

3.1 Stable identifiers

All graph nodes must have:

  • nodeId: stable across replays when code is unchanged.
  • symbolKey: canonical string (language-specific)
  • artifactKey: assembly/jar/module/binary identity (prefer build ID + path + hash)
  • Optional: purlCandidates[] (library mapping hints)

DotNet nodeId rule (v1): nodeId = SHA256(assemblyMvid + ":" + metadataToken + ":" + genericArity + ":" + signatureShape)

  • If token unavailable (source-only), fallback: SHA256(projectPath + ":" + file + ":" + span + ":" + symbolDisplayString)

3.2 CallGraph.v1.json

Minimum required schema:

{
  "schema": "stella.callgraph.v1",
  "scanKey": "uuid",
  "language": "dotnet|java|node|python|go|rust|binary",
  "artifacts": [{ "artifactKey": "…", "kind": "assembly|jar|module|binary", "sha256": "…" }],
  "nodes": [{
    "nodeId": "…",
    "artifactKey": "…",
    "symbolKey": "Namespace.Type::Method(…)",
    "visibility": "public|internal|private|unknown",
    "isEntrypointCandidate": false
  }],
  "edges": [{
    "from": "nodeId",
    "to": "nodeId",
    "kind": "static|heuristic",
    "reason": "direct_call|virtual_call|reflection_string|di_binding|dynamic_import|unknown",
    "weight": 1.0
  }],
  "entrypoints": [{
    "nodeId": "…",
    "kind": "http|grpc|cli|job|event|unknown",
    "route": "/api/orders/{id}",
    "framework": "aspnetcore|minimalapi|spring|express|unknown"
  }]
}

3.3 RuntimeEvidence.v1.json

{
  "schema": "stella.runtimeevidence.v1",
  "scanKey": "uuid",
  "collectedAt": "2025-12-14T10:00:00Z",
  "environment": {
    "os": "linux|windows",
    "k8s": { "namespace": "…", "pod": "…", "container": "…" },
    "imageDigest": "sha256:…",
    "buildId": "…"
  },
  "samples": [{
    "timestamp": "…",
    "pid": 1234,
    "threadId": 77,
    "frames": ["nodeId","nodeId","nodeId"],
    "sampleWeight": 1.0
  }],
  "loadedArtifacts": [{
    "artifactKey": "…",
    "evidence": "loaded_module|mapped_file|jar_loaded"
  }]
}

4) Postgres schema (system of record)

4.1 Core tables

You can implement with migrations in StellaOps.Scanner.Persistence (EF Core 9).

scan

  • scan_id uuid pk
  • created_at timestamptz
  • repo_uri text null
  • commit_sha text null
  • sbom_digest text (hash of SBOM input)
  • policy_digest text (hash of reachability policy inputs)
  • status text (NEW/RUNNING/DONE/FAILED)

Indexes:

  • (commit_sha, sbom_digest) for caching

artifact

  • artifact_id uuid pk
  • scan_id uuid fk
  • artifact_key text unique per scan
  • kind text
  • sha256 text
  • build_id text null
  • purl text null

Index:

  • (scan_id, artifact_key) unique

cg_node

  • scan_id uuid fk
  • node_id text (hash string)
  • artifact_key text
  • symbol_key text
  • visibility text
  • flags int (bitset: entrypointCandidate, external, generated, etc.) PK: (scan_id, node_id)

GIN index:

  • symbol_key trigram for search (optional)

cg_edge

  • scan_id uuid fk
  • from_node_id text
  • to_node_id text
  • kind smallint (0 static, 1 heuristic, 2 runtime_minted)
  • reason smallint
  • weight real PK: (scan_id, from_node_id, to_node_id, kind, reason)

Indexes:

  • (scan_id, from_node_id)
  • (scan_id, to_node_id)

entrypoint

  • scan_id uuid
  • node_id text
  • kind text
  • framework text
  • route text null PK: (scan_id, node_id, kind, framework, route)

runtime_sample

  • scan_id uuid
  • collected_at timestamptz
  • env_hash text (hash of environment identity)
  • sample_id bigserial pk
  • timestamp timestamptz
  • pid int
  • thread_id int
  • frames text[] (nodeIds)
  • weight real

Partition suggestion:

  • Partition by scan_id or by month depending on retention.

symbol_component_map

  • scan_id uuid
  • node_id text
  • purl text
  • mapping_kind text (exact|heuristic|external)
  • confidence real PK: (scan_id, node_id, purl)

reachability_component

  • scan_id uuid
  • purl text
  • status smallint (0 unreachable, 1 possible, 2 reachable_static, 3 reachable_proven)
  • confidence real
  • why jsonb
  • evidence jsonb PK: (scan_id, purl)

reachability_finding

  • scan_id uuid
  • cve_id text
  • purl text
  • status smallint
  • confidence real
  • why jsonb
  • evidence jsonb PK: (scan_id, cve_id, purl)

4.2 Valkey usage (ephemeral only)

Allowed:

  • Dedup keys for evidence ingest (short TTL)
  • Hot query cache: (scan_id, purl) → reachability result
  • Rate limits / nonces

Not allowed:

  • Authoritative queueing for scan state
  • Any “only copy” of results

5) Reachability computation (the actual algorithm)

5.1 Inputs

  • Call graph nodes/edges + entrypoints
  • Runtime evidence (optional)
  • SBOM (CycloneDX/SPDX) with purls
  • Concelier vulnerability facts (CVE ↔ purl/version ranges)
  • Excititor VEX statements (not affected / affected / under investigation)

5.2 Normalize to a graph suitable for traversal

In scanner.webservice:

  1. Build adjacency list for cg_edge.kind in (static, heuristic)

  2. Optionally compress SCCs:

    • Compute SCCs (Tarjan/Kosaraju)
    • Store SCC mapping for explanation paths (must remain explainable)

5.3 Entrypoint seeding rules

Entrypoints come from:

  • Worker-reported entrypoints (preferred)
  • Framework discovery in worker (ASP.NET maps, Spring mappings, etc.)
  • Fallback: Main, exported symbols, container CMD/ENTRYPOINT

If entrypoints are empty, mark all results as POSSIBLY_REACHABLE with reason NO_ENTRYPOINTS_DISCOVERED, unless runtime evidence exists.

5.4 Traversal

For each scan:

  • Start from all entrypoints; traverse reachable nodes.

  • Track:

    • firstSeenFromEntrypoint[node] (for k-shortest path reconstruction)
    • pathWitness[node] (parent pointers or compressed witness)

Produce:

  • reachableNodesStatic set

5.5 Join to components (SBOM)

Map reachable nodes to purls using symbol_component_map.

Mapping sources (priority order):

  1. Exact binary symbol → package metadata (where available)
  2. Assembly/jar/module to SBOM component (by hash/purl)
  3. Heuristics: namespace prefixes, import paths, jar manifest, npm package.json, go module path

If a vulnerable purl is in SBOM but has no symbol mapping, component reachability defaults:

  • If artifact is loaded at runtime → at least REACHABLE_PROVEN (component level)
  • Else if referenced by static dependency graph → POSSIBLY_REACHABLE
  • Else → UNREACHABLE (with NO_SYMBOL_MAPPING reason)

5.6 Runtime evidence upgrade (“minting”)

If runtime evidence is present:

  • For each sample stack:

    • Mark each frame node as “executed”
    • Mint runtime edges: consecutive frames become cg_edge.kind=runtime_minted (optional table or derived view)
  • If any executed node maps to purl affected by CVE:

    • Upgrade status to REACHABLE_PROVEN
  • If only loaded artifact exists:

    • Upgrade component status to REACHABLE_PROVEN (component-only), but keep symbol-level as unknown.

5.7 Confidence scoring (deterministic)

A simple deterministic scoring function (v1) used everywhere:

  • Base:

    • UNREACHABLE → 0.05
    • POSSIBLY_REACHABLE → 0.35
    • REACHABLE_STATIC → 0.70
    • REACHABLE_PROVEN → 0.95
  • Modifiers:

    • +0.10 if path uses only static edges (no heuristic)
    • 0.15 if path includes reflection_string|dynamic_import
    • +0.10 if runtime evidence hits a node in affected component
    • 0.10 if entrypoints incomplete (NO_ENTRYPOINTS_DISCOVERED) Clamp to [0, 1].

All modifiers must be recorded in why[].


6) Language worker specs (what each worker must do)

6.1 .NET worker (Roslyn + optional IL)

Goal (v1): produce good-enough call graph + entrypoints for ASP.NET Core and workers.

Required features

  • Direct invocation edges: InvocationExpressionSyntax

  • Object creation edges: constructors

  • Delegate invocation: best-effort; record heuristic edge when target unresolved

  • Virtual/interface dispatch:

    • record virtual_call edge to declared method
    • optionally add edges to known overrides within solution (static, conservative)
  • Async/await: treat state machine calls as implementation detail; connect logical caller → awaited method

Entrypoint discovery (.NET)

Implement these detectors:

  • Program.Main (classic)

  • ASP.NET Core:

    • Controllers: [ApiController], route attributes, action methods
    • Minimal APIs: MapGet/MapPost/MapMethods patterns (syntactic + semantic)
    • gRPC: MapGrpcService<T>() and service methods
    • Hosted services: IHostedService, BackgroundService.ExecuteAsync as job entrypoints
  • Message consumers (if present): known libs patterns (e.g., MassTransit consumers)

Reflection and DI heuristics

Produce heuristic edges when you see:

  • Type.GetType("…"), Assembly.GetType, GetMethod("…"), Invoke

  • services.AddTransient<IFoo,Foo>() / AddScoped / AddSingleton

    • Add edge IFooFoo constructor as di_binding heuristic
  • Activator.CreateInstance, ServiceProvider.GetService patterns

Output guarantees

  • Must not crash on partial compilation (missing refs); produce partial graph with why=COMPILATION_PARTIAL
  • Provide artifact_key per assembly/project output

6.2 Java / Node / Python / Go / Rust workers

v1 expectations:

  • Provide import graph + framework entrypoints + best-effort call edges.
  • Always label uncertain resolution as heuristic with a reason code.

6.3 Binary worker

v1 expectations:

  • Identify artifacts, exported symbols, imported libs, and candidate entrypoints from container metadata.
  • Provide component-level mapping primarily; symbol-level mapping only when confident.

7) APIs (scanner.webservice)

7.1 Ingestion endpoints

  • POST /api/scans → creates scan record (returns scanId)
  • POST /api/scans/{scanId}/callgraphs → accepts CallGraph.v1.json
  • POST /api/scans/{scanId}/runtimeevidence → accepts RuntimeEvidence.v1.json
  • POST /api/scans/{scanId}/sbom → accepts CycloneDX/SPDX
  • POST /api/scans/{scanId}/compute-reachability → triggers computation (idempotent)

Rules:

  • All ingests must be idempotent via contentDigest header (store seen digests in Postgres; Valkey may accelerate dedupe).
  • Reject mismatched scanKey/scanId.

7.2 Query endpoints

  • GET /api/scans/{scanId}/reachability/components?purl=...

  • GET /api/scans/{scanId}/reachability/findings?cve=...

  • GET /api/scans/{scanId}/reachability/explain?cve=...&purl=...

    • returns why[] + path witness + sample refs

7.3 Export endpoints

  • GET /api/scans/{scanId}/exports/sarif
  • GET /api/scans/{scanId}/exports/cdxr (CycloneDX reachability extension)
  • GET /api/scans/{scanId}/exports/openvex (reachability justifications as VEX annotations)

8) Deterministic replay requirements (must-have)

Every reachability result must be reproducible from:

  • SBOM digest
  • CallGraph digests (per worker)
  • RuntimeEvidence digests (optional)
  • Concelier feed snapshot digest
  • Excititor VEX snapshot digest
  • Policy digest (confidence scoring + gating rules)

Implement ReplayManifest.json:

{
  "schema": "stella.replaymanifest.v1",
  "scanId": "uuid",
  "inputs": {
    "sbomDigest": "sha256:…",
    "callGraphs": [{"language":"dotnet","digest":"sha256:…"}],
    "runtimeEvidence": [{"digest":"sha256:…"}],
    "concelierSnapshot": "sha256:…",
    "excititorSnapshot": "sha256:…",
    "policyDigest": "sha256:…"
  }
}

9) Quality gates and acceptance criteria

9.1 Golden corpus (mandatory)

Create /tests/Reachability.Golden/ with:

  • Minimal ASP.NET controller app with known reachable endpoint → vulnerable lib call
  • Minimal app with vulnerable lib present but never called → unreachable
  • Reflection-based activation case → “possible” unless runtime proves
  • BackgroundService job case

Acceptance:

  • Each golden test asserts:

    • Reachability status
    • At least one why[] reason
    • Deterministic confidence within ±0.01

9.2 Drift detection (mandatory)

If runtime minted edges not present in static graph above a threshold:

  • Emit COVERAGE_DRIFT warning with top missing edges
  • Store drift report in Postgres (reachability_drift table or JSONB field)

9.3 Performance SLOs (v1 targets)

  • 1 medium service (100k LOC .NET) static graph: < 2 minutes on CI runner class machine
  • Reachability compute: < 30 seconds
  • Query GET finding: < 200ms p95 (use Postgres indexes + optional Valkey cache)

10) Implementation plan (developer checklist)

Milestone A — Data plumbing (1)

  • Implement Postgres tables + migrations
  • Implement ingestion endpoints
  • Persist call graphs and runtime evidence
  • Implement digesting + idempotency

Done when: you can ingest a call graph + SBOM and query stored nodes/edges.

Milestone B — DotNet worker MVP (2)

  • Roslyn graph extraction (direct calls + constructors)
  • ASP.NET Core entrypoints (controllers + minimal APIs)
  • Output CallGraph.v1.json

Done when: golden corpus “reachable/unreachable” works on static only.

Milestone C — Reachability compute (3)

  • Entry point seeding
  • SCC compression optional
  • Traversal + path witness generation
  • Join to SBOM purls
  • Generate reachability_component + reachability_finding

Done when: component-level reachability returns stable answers.

Milestone D — Runtime evidence upgrade (4)

  • Ingest runtime evidence
  • Upgrade statuses to REACHABLE_PROVEN
  • Drift report

Done when: runtime samples can flip a “possible” to “proven”.

Milestone E — Exports (5)

  • SARIF export
  • CycloneDX reachability extension export
  • OpenVEX annotations export

Done when: CI pipeline can gate on proven/static reachability.


If you want this converted into repo-facing artifacts (e.g., docs/modules/scanner/reachability-spec.md, EF Core entity classes skeletons, and OpenAPI snippets for the endpoints), tell me the exact module paths you use under /src/ and Ill emit drop-in files matching your StellaOps naming conventions. Below is a concrete, implementable architecture + technology stack for StellaOps.Scanner.Worker.DotNet (the .NET static graph worker), written as if youre handing it directly to an engineer to build.


1) Workers purpose in the StellaOps scanner system

StellaOps.Scanner.Worker.DotNet is a stateless “graph producer.” It does not compute reachability, confidence, or any lattice logic (that lives in scanner.webservice). The worker:

  1. Discovers entrypoints (ASP.NET Core controllers, minimal APIs, gRPC, hosted services, etc.)
  2. Extracts a static call graph (method → method edges)
  3. Adds heuristic edges for DI/reflection/dynamic patterns
  4. Emits CallGraph.v1.json and optionally uploads it to scanner.webservice

Key constraint: node IDs must be compatible with runtime evidence (EventPipe/ETW) mapping. Thats why we build node IDs from (Module MVID + metadata token) whenever possible.


2) Deployment model

2.1 Container image choice

You have two legitimate modes; implement both:

Mode A — “Artifacts-first” (preferred for security)

  • Input: already-built assemblies from CI (bin/Release/.../*.dll + associated files)
  • Worker does no dotnet build
  • Worker performs IL/metadata scanning + optional Roslyn source parsing for entrypoints/heuristics

Mode B — “Build-and-scan” (convenience; higher risk)

  • Input: repo checkout with .sln
  • Worker runs dotnet restore/dotnet build inside a sandboxed container, then scans outputs

Because .NET build can execute MSBuild tasks, analyzers, and source generators (code execution risk), the product-default should be Mode A in any untrusted scenario.

2.2 Runtime requirements

  • Base runtime: .NET 10 (LTS). Microsofts support policy lists .NET 10 as LTS with original release Nov 11, 2025 and latest patch 10.0.1 (Dec 9, 2025). (Microsoft)
  • If you use Mode B, the image must include .NET 10 SDK (not just runtime). (Microsoft)

2.3 Sandbox controls (Mode B)

If you allow building:

  • Run with no outbound network (or allowlist only internal NuGet proxy).
  • Read-only root FS; writable temp only.
  • Drop Linux capabilities; use seccomp/apparmor defaults.
  • Mount repo read-only; write outputs to a dedicated volume.
  • Disable telemetry: DOTNET_CLI_TELEMETRY_OPTOUT=1.

3) Core architecture (pipeline)

Implement the worker as a single executable (CLI) with internal pipeline stages:

┌───────────────────────────────────────────────────────────────┐
│ Worker.DotNet CLI                                              │
│  Inputs: --sln / --assemblies / --repo, --scanKey, --out       │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 0: Discovery                                              │
│  - Find solutions/projects or assemblies                         │
│  - Determine configuration/TFM                                   │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 1: Build (optional)                                       │
│  - dotnet restore/build OR skip                                 │
│  - Collect output assembly paths                                │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 2: Reference Indexer                                      │
│  - Build mapping: (AssemblyName, Version) -> artifactKey        │
│  - Compute sha256 per referenced dll                            │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 3: IL Call Graph Extractor                                │
│  - Parse each project assembly                                  │
│  - Create method nodes (nodeId = hash(MVID:token))              │
│  - Parse IL & add static edges (call/callvirt/newobj/ldftn...)  │
│  - Emit external nodes for member refs                           │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 4: Roslyn Entrypoints + Heuristics                        │
│  - Controllers/minimal APIs/gRPC/HostedService entrypoints      │
│  - DI binding edges (AddTransient/AddScoped/AddSingleton etc.)  │
│  - Reflection edges (Type.GetType/GetMethod/Invoke etc.)        │
│  - Resolve Roslyn symbols -> nodeIds via symbolKey dictionary    │
└───────────────┬───────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 5: Merge + Emit                                           │
│  - Merge nodes/edges/entrypoints                                │
│  - Output CallGraph.v1.json                                     │
│  - Optional POST to scanner.webservice                           │
└───────────────────────────────────────────────────────────────┘

Why IL-first? Because you want metadata token + MVID node IDs that correlate naturally with runtime stacks. Deterministic builds make MVID stable for identical compilation inputs. (Microsoft Learn)


4) Technology stack (NuGet + platform APIs)

4.1 Roslyn / MSBuild loading

Use Roslyn MSBuild workspace packages:

  • Microsoft.CodeAnalysis.Workspaces.MSBuild (MSBuildWorkspace support) (NuGet)
  • Microsoft.CodeAnalysis.CSharp.Workspaces (C# semantic model / operations API)
  • Optional: Microsoft.CodeAnalysis meta-package (superset) (NuGet)
  • Microsoft.Build.Locator (register MSBuild instances for workspace loading)

Roslyn packages are actively published by RoslynTeam (latest shown as 5.0.0 as of Nov 2025). (NuGet)

4.2 IL + metadata scanning

Prefer BCL APIs (no extra dependencies):

  • System.Reflection.Metadata
  • System.Reflection.PortableExecutable
  • System.Reflection.Emit.OpCodes for IL decoding (operand sizes) (This lets you implement a compact IL parser without Cecil.)

Optional alternative (faster development, more deps):

  • Mono.Cecil (makes IL traversal trivial) (NuGet)

4.3 CLI + logging + JSON

  • System.CommandLine (recommended)
  • Microsoft.Extensions.Logging (+ Console logger)
  • System.Text.Json (source-generated serializers strongly recommended)

4.4 Runtime alignment note

Runtime collectors commonly rely on EventPipe/ETW; the .NET diagnostics client library (Microsoft.Diagnostics.NETCore.Client) is the standard managed API for EventPipe sessions. (Microsoft Learn) The worker itself doesnt collect runtime evidence, but the nodeId algorithm must match what runtime collectors can compute (hence MVID+token).


5) Internal module decomposition

Implement these internal components as classes/services. Keep them testable (pure functions where possible).

5.1 WorkerOptions

Holds CLI options:

  • ScanKey (uuid)
  • RepoRoot, SolutionPath OR AssembliesPath[]
  • Configuration (default Release)
  • TargetFramework (optional)
  • BuildMode = Artifacts | Build
  • OutFile
  • UploadUrl + ApiKey (optional)
  • MaxEdgesPerNode (optional throttle)
  • IncludeExternalNodes (bool)
  • Concurrency (int)

5.2 BuildOrchestrator (Mode B only)

Responsibilities:

  • Run dotnet restore and dotnet build
  • Capture output logs and surface them as structured diagnostics
  • Return discovered output assemblies (dll paths)

Hard requirements:

  • Support --no-restore and --no-build toggles (or equivalent)
  • Support ContinuousIntegrationBuild=true to improve determinism when available
  • If build fails, still attempt to scan any assemblies that exist, but mark output with why=BUILD_FAILED_PARTIAL.

5.3 MsbuildWorkspaceLoader (Roslyn)

Responsibilities:

  • Register MSBuild with MSBuildLocator

  • Load .sln via MSBuildWorkspace

  • Provide:

    • Solution object
    • Project list (C# only for v1)
    • Compilation(s) when needed (for semantic analysis)

MSBuildWorkspace is the canonical Roslyn path for analyzing MSBuild solutions. (NuGet)

5.4 ReferenceIndexer

Responsibilities:

  • Build a map from referenced assemblies to artifactKey

  • For each PortableExecutableReference with a file path:

    • compute sha256

    • read assembly identity (name, version)

    • create artifactKey

    • add to:

      • AssemblyIdentity -> artifactKey
      • artifactKey -> sha256/path/version

This index is used by IL extractor to attribute external nodes to correct artifacts.

5.5 IlCallGraphExtractor

Responsibilities:

  • For each “root” assembly (project output):

    • open PE
    • get module MVID
    • enumerate MethodDefinition rows
    • create nodes for all methods
    • parse IL bodies and emit edges

IL parsing scope (v1)

You only need to recognize these opcodes as “calls”:

  • call
  • callvirt
  • newobj
  • jmp
  • ldftn
  • ldvirtftn

Node identity

  • Internal method nodeId:

    • nodeId = SHA256( MVID + ":" + metadataToken + ":" + arity + ":" + signatureShape )
    • Minimal acceptable: SHA256(MVID + ":" + metadataToken)

This is intentionally compatible with how runtime stacks identify methods (module + token).

External method nodes

If a call operand is a MemberRef/MethodSpec that targets another assembly:

  • Create an “external node” with:

    • symbolKey computed from metadata signature
    • artifactKey resolved via ReferenceIndexer (assembly identity match)
    • nodeId = SHA256("ext:" + artifactKey + ":" + symbolKey) (runtime-proof not required)

Set flags |= External.

5.6 RoslynEntrypointExtractor

Responsibilities:

  • Produce entrypoints[] records pointing to nodeIds.

Must support (v1)

ASP.NET Core MVC controllers

  • Type has [ApiController] or derives from ControllerBase

  • Action methods: public instance methods with routing attributes [HttpGet], [HttpPost], [Route], etc.

  • Route template:

    • combine controller + action route attributes (best effort)
  • entrypoint.kind = http, framework=aspnetcore

Minimal APIs

  • Detect invocation of MapGet, MapPost, MapPut, MapDelete, MapMethods

  • Extract route string literal when available

  • Handler target:

    • lambda => map to generated method? (best effort)
    • method group => resolve to method symbolKey => nodeId

gRPC

  • Detect MapGrpcService<T>() (endpoint registration)
  • Entry points: service methods on generated base types (best effort)

Background jobs

  • Types implementing IHostedService
  • BackgroundService.ExecuteAsync override
  • entrypoint.kind = job

Mapping Roslyn → nodeId

Do not attempt to compute metadata tokens from Roslyn symbols directly.

Instead:

  • Generate the same canonical symbolKey for Roslyn symbols
  • Resolve symbolKey -> nodeId using a dictionary built from IL nodes

If not resolvable, emit an entrypoint with a synthetic “unresolved” node:

  • nodeId = SHA256("unresolved:" + symbolKey)
  • flags |= Unresolved
  • why += ENTRYPOINT_SYMBOL_UNRESOLVED

5.7 RoslynHeuristicEdgeExtractor

Responsibilities:

  • Add heuristic edges that IL wont reliably capture.

DI bindings (must-have)

Detect common DI registration patterns:

  • services.AddTransient<IFoo, Foo>()
  • AddScoped, AddSingleton Emit heuristic edge:
  • from: interface method set? (v1 simplify to type-level constructor edge)
  • to: Foo..ctor(...) node
  • reason = di_binding

Practical v1 implementation:

  • Create edge from a synthetic “DI container” node per assembly to implementation constructors.
  • Or create edges from the registration site method to the constructor. (Choose one and keep consistent.)

Reflection (must-have)

Emit heuristic edges with lower confidence:

  • Type.GetType("Namespace.Type, Assembly")
  • Assembly.Load(...), GetMethod("X"), Invoke
  • Activator.CreateInstance(...)

If string literal resolves to a type/method in the solution, create edge:

  • from: caller method
  • to: target method/ctor
  • reason = reflection_string

If not resolvable, record a why=REFLECTION_UNRESOLVED_STRING diagnostic; do not crash.

5.8 GraphMerger

Responsibilities:

  • Merge nodes/edges/entrypoints from IL and Roslyn stages

  • De-duplicate edges by (from,to,kind,reason)

  • Apply optional throttles:

    • cap edges per node
    • drop low-weight heuristics if too many

5.9 CallGraphWriter

Responsibilities:

  • Serialize CallGraph.v1.json exactly to spec

  • Include:

    • artifacts[] (project outputs + references)
    • nodes[], edges[]
    • entrypoints[]
    • language = "dotnet"
    • scanKey

6) Canonical symbolKey format (critical for merges)

Pick one canonical form and use it everywhere.

Recommended v1 symbolKey shape:

{Namespace}.{TypeName}[`Arity][+Nested]::{MethodName}[`Arity]({ParamType1},{ParamType2},...)

Rules:

  • Use System.* full names for BCL types
  • Use + for nested types (metadata style)
  • Use backtick arity for generic type/method definitions
  • For arrays: System.String[]
  • For byref: System.String&

Implementation detail:

  • IL extractor can build this from metadata signatures.
  • Roslyn extractor can build this using a controlled SymbolDisplayFormat.

If you get this right, Roslyn → IL mapping becomes reliable.


7) CLI surface (what developers will actually run)

Minimum viable commands:

Artifacts-first scan

stella-worker-dotnet scan \
  --scanKey 00000000-0000-0000-0000-000000000000 \
  --assemblies ./artifacts/bin/Release \
  --out ./callgraph.json

Build-and-scan (internal trusted only)

stella-worker-dotnet scan \
  --scanKey ... \
  --sln ./src/MySolution.sln \
  --configuration Release \
  --tfm net10.0 \
  --buildMode build \
  --out ./callgraph.json

Upload to scanner.webservice

stella-worker-dotnet scan \
  --scanKey ... \
  --assemblies ./artifacts/bin/Release \
  --upload https://scanner/api/scans/{scanId}/callgraphs \
  --apiKey $STELLA_API_KEY

8) Observability and failure behavior

8.1 Structured diagnostics

Always emit:

  • counts: nodes/edges/entrypoints
  • build outcome: success/failed/partial
  • list of projects scanned/skipped
  • unresolved symbol counts (entrypoints + heuristic edges)

8.2 Hard failure vs partial output

  • If at least one assembly scanned, output a graph even if others fail.

  • Mark diagnostics in output:

    • add why/notes (if you extend schema) OR log to stderr and let webservice record the warning on ingest.

9) Why this architecture works for reachability

  • IL extraction provides the most faithful call edges and stable node IDs.
  • Roslyn adds what IL cant: framework entrypoints, DI and reflection heuristics.
  • Node IDs based on MVID+token align with deterministic compilation behavior and runtime tooling patterns. Deterministic compilation replaces timestamp/MVID with values derived from compilation inputs, enabling stable identity under stable inputs. (Microsoft Learn)
  • Roslyn MSBuildWorkspace is the canonical way to load solutions/projects with correct references and compilation options. (NuGet)

10) Implementation “must-do” checklist for the developer

  1. Define canonical symbolKey and implement it in:

    • IL extractor
    • Roslyn symbol formatter
  2. Implement ReferenceIndexer to map assembly identity → artifactKey.

  3. Implement IL extractor:

    • nodes for method defs
    • edges for call opcodes
  4. Implement entrypoint detectors (controllers + minimal APIs + hosted service).

  5. Implement DI + reflection heuristic edges.

  6. Merge and output CallGraph.v1.json matching schema.

  7. Add golden tests (small ASP.NET apps) verifying:

    • entrypoint detection
    • at least one static path exists
    • DI/reflection edges appear with correct reason codes

If you want, I can also provide:

  • a concrete folder layout (/src/StellaOps.Scanner.Worker.DotNet/...) with project files,
  • the exact SymbolDisplayFormat you should use for symbolKey stability,
  • and an IL opcode decoder snippet that only records call-like opcodes (fast, minimal).