# Scanner Module — Score Proofs & Reachability Implementation Guide **Module**: Scanner (Scanner.WebService + Scanner.Worker) **Sprint**: SPRINT_3500_0002_0001 through SPRINT_3500_0004_0004 **Target**: Agents implementing deterministic score proofs and binary reachability --- ## Purpose This guide provides step-by-step implementation instructions for agents working on: 1. **Epic A**: Deterministic Score Proofs + Unknowns Registry 2. **Epic B**: Binary Reachability v1 (.NET + Java) **Role**: You are an implementer agent. Your job is to write code, tests, and migrations following the specifications in the sprint files. Do NOT make architectural decisions or ask clarifying questions—if ambiguity exists, mark the task as BLOCKED in the delivery tracker. --- ## Module Structure ``` src/Scanner/ ├── __Libraries/ │ ├── StellaOps.Scanner.Core/ # Shared models, proof bundle writer │ ├── StellaOps.Scanner.Storage/ # EF Core, repositories, migrations │ └── StellaOps.Scanner.Reachability/ # Reachability algorithms (BFS, path search) ├── StellaOps.Scanner.WebService/ # API endpoints, orchestration ├── StellaOps.Scanner.Worker/ # Background workers (call-graph, scoring) └── __Tests/ ├── StellaOps.Scanner.Core.Tests/ ├── StellaOps.Scanner.Storage.Tests/ └── StellaOps.Scanner.Integration.Tests/ ``` **Existing Code to Reference**: - `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/Gates/CompositeGateDetector.cs` — Gate detection patterns - `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/` — Migration examples - `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` — DSSE signing, Merkle trees --- ## Epic A: Score Proofs Implementation ### Phase 1: Foundations (Sprint 3500.0002.0001) **Working Directory**: `src/__Libraries/` #### Task 1.1: Canonical JSON Library **File**: `src/__Libraries/StellaOps.Canonical.Json/CanonJson.cs` **Implementation**: 1. Create new project: `dotnet new classlib -n StellaOps.Canonical.Json -f net10.0` 2. Add dependencies: `System.Text.Json`, `System.Security.Cryptography` 3. Implement `CanonJson.Canonicalize(obj)`: - Serialize to JSON using `JsonSerializer.SerializeToUtf8Bytes` - Parse with `JsonDocument` - Write with recursive key sorting (Ordinal comparison) - Return `byte[]` 4. Implement `CanonJson.Sha256Hex(bytes)`: - Use `SHA256.HashData(bytes)` - Convert to lowercase hex: `Convert.ToHexString(...).ToLowerInvariant()` **Tests** (`src/__Libraries/StellaOps.Canonical.Json.Tests/CanonJsonTests.cs`): - `Canonicalize_SameInput_ProducesSameHash` — Bit-identical replay - `Canonicalize_SortsKeysAlphabetically` — Verify {z,a,m} → {a,m,z} - `Canonicalize_HandlesNestedObjects` — Recursive sorting - `Sha256Hex_ProducesLowercaseHex` — Verify regex `^[0-9a-f]{64}$` **Acceptance Criteria**: - [ ] All tests pass - [ ] Coverage ≥90% - [ ] Benchmark: Canonicalize 1MB JSON <50ms (p95) --- #### Task 1.2: Scan Manifest Model **File**: `src/__Libraries/StellaOps.Scanner.Core/Models/ScanManifest.cs` **Implementation**: 1. Add to existing `StellaOps.Scanner.Core` project (or create if missing) 2. Define `record ScanManifest` with properties per sprint spec (lines 545-559 of advisory) 3. Use `[JsonPropertyName]` attributes for camelCase serialization 4. Add method `ComputeHash()`: ```csharp public string ComputeHash() { var canonical = CanonJson.Canonicalize(this); return "sha256:" + CanonJson.Sha256Hex(canonical); } ``` **Tests** (`src/__Libraries/StellaOps.Scanner.Core.Tests/Models/ScanManifestTests.cs`): - `ComputeHash_SameManifest_ProducesSameHash` - `ComputeHash_DifferentSeed_ProducesDifferentHash` - `Serialization_RoundTrip_PreservesAllFields` **Acceptance Criteria**: - [ ] All tests pass - [ ] JSON serialization uses camelCase - [ ] Hash format: `sha256:[0-9a-f]{64}` --- #### Task 1.3: DSSE Envelope Implementation **File**: `src/__Libraries/StellaOps.Attestor.Dsse/` (new library) **Implementation**: 1. Create project: `dotnet new classlib -n StellaOps.Attestor.Dsse -f net10.0` 2. Add models: `DsseEnvelope`, `DsseSignature` (records with JsonPropertyName) 3. Add interface: `IContentSigner` (KeyId, Sign, Verify) 4. Implement `Dsse.PAE(payloadType, payload)`: - Format: `"DSSEv1 " + len(payloadType) + " " + payloadType + " " + len(payload) + " " + payload` - Use `MemoryStream` for efficient concatenation 5. Implement `Dsse.SignJson(payloadType, obj, signer)`: - Canonicalize payload with `CanonJson.Canonicalize` - Compute PAE - Sign with `signer.Sign(pae)` - Return `DsseEnvelope` 6. Implement `EcdsaP256Signer` (IContentSigner): - Wrap `ECDsa` from `System.Security.Cryptography` - Use `SHA256` for hashing - Implement `IDisposable` **Tests** (`src/__Libraries/StellaOps.Attestor.Dsse.Tests/DsseTests.cs`): - `SignJson_AndVerify_Succeeds` - `VerifyEnvelope_WrongKey_Fails` - `PAE_Encoding_MatchesSpec` — Verify format string **Acceptance Criteria**: - [ ] All tests pass - [ ] DSSE signature verifies with same key - [ ] Cross-key verification fails --- #### Task 1.4: ProofLedger Implementation **File**: `src/__Libraries/StellaOps.Policy.Scoring/ProofLedger.cs` **Implementation**: 1. Add to existing `StellaOps.Policy.Scoring` project 2. Define `enum ProofNodeKind { Input, Transform, Delta, Score }` 3. Define `record ProofNode` with properties per sprint spec 4. Implement `ProofHashing.WithHash(node)`: - Canonicalize node (exclude `NodeHash` field to avoid circularity) - Compute SHA-256: `"sha256:" + CanonJson.Sha256Hex(...)` 5. Implement `ProofHashing.ComputeRootHash(nodes)`: - Extract all node hashes into array - Canonicalize array - Compute SHA-256 of canonical array 6. Implement `ProofLedger.Append(node)`: - Call `ProofHashing.WithHash(node)` to compute hash - Add to internal list 7. Implement `ProofLedger.RootHash()`: - Return `ProofHashing.ComputeRootHash(_nodes)` **Tests** (`src/__Libraries/StellaOps.Policy.Scoring.Tests/ProofLedgerTests.cs`): - `Append_ComputesNodeHash` - `RootHash_SameNodes_ProducesSameHash` - `RootHash_DifferentOrder_ProducesDifferentHash` **Acceptance Criteria**: - [ ] All tests pass - [ ] Node hash excludes `NodeHash` field - [ ] Root hash changes if node order changes --- #### Task 1.5: Database Schema Migration **File**: `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/010_scanner_schema.sql` **Implementation**: 1. Copy migration template from sprint spec (SPRINT_3500_0002_0001, Task T5) 2. Advisory lock pattern: ```sql SELECT pg_advisory_lock(hashtext('scanner')); -- DDL statements SELECT pg_advisory_unlock(hashtext('scanner')); ``` 3. Create `scanner` schema if not exists 4. Create tables: `scan_manifest`, `proof_bundle` 5. Create indexes per spec 6. Add verification `DO $$ ... END $$` block **EF Core Entities** (`src/Scanner/__Libraries/StellaOps.Scanner.Storage/Entities/`): - `ScanManifestRow.cs` — Maps to `scanner.scan_manifest` - `ProofBundleRow.cs` — Maps to `scanner.proof_bundle` **DbContext** (`src/Scanner/__Libraries/StellaOps.Scanner.Storage/ScannerDbContext.cs`): - Add `DbSet`, `DbSet` - Override `OnModelCreating`: - Set default schema: `b.HasDefaultSchema("scanner")` - Map entities to tables - Configure column names (snake_case) - Configure indexes **Testing**: 1. Run migration on clean Postgres instance 2. Verify tables created: `SELECT * FROM pg_tables WHERE schemaname = 'scanner'` 3. Verify indexes: `SELECT * FROM pg_indexes WHERE schemaname = 'scanner'` **Acceptance Criteria**: - [ ] Migration runs without errors - [ ] Tables and indexes created - [ ] EF Core can query entities --- #### Task 1.6: Proof Bundle Writer **File**: `src/__Libraries/StellaOps.Scanner.Core/ProofBundleWriter.cs` **Implementation**: 1. Add to `StellaOps.Scanner.Core` project 2. Add NuGet: `System.IO.Compression` 3. Implement `ProofBundleWriter.WriteAsync`: - Create base directory if not exists - Canonicalize manifest and ledger - Compute root hash over `{manifestHash, scoreProofHash, scoreRootHash}` - Sign root descriptor with DSSE - Create zip archive with `ZipArchive(stream, ZipArchiveMode.Create)` - Add entries: `manifest.json`, `manifest.dsse.json`, `score_proof.json`, `proof_root.dsse.json`, `meta.json` - Return `(rootHash, bundlePath)` **Tests** (`src/__Libraries/StellaOps.Scanner.Core.Tests/ProofBundleWriterTests.cs`): - `WriteAsync_CreatesValidBundle` — Verify zip contains expected files - `WriteAsync_SameInputs_ProducesSameRootHash` — Determinism check **Acceptance Criteria**: - [ ] Bundle is valid zip archive - [ ] All expected files present - [ ] Same inputs → same root hash --- ### Phase 2: API Integration (Sprint 3500.0002.0003) **Working Directory**: `src/Scanner/StellaOps.Scanner.WebService/` #### Task 2.1: POST /api/v1/scanner/scans Endpoint **File**: `src/Scanner/StellaOps.Scanner.WebService/Controllers/ScansController.cs` **Implementation**: 1. Add endpoint `POST /api/v1/scanner/scans` 2. Bind request body to `CreateScanRequest` DTO 3. Validate manifest fields (all required fields present) 4. Check idempotency: compute `Content-Digest`, query for existing scan 5. If exists, return existing scan (200 OK) 6. If not exists: - Generate scan ID (Guid) - Create `ScanManifest` record - Compute manifest hash - Sign manifest with DSSE (`IContentSigner` from DI) - Persist to `scanner.scan_manifest` via `ScannerDbContext` - Return 201 Created with `Location` header **Request DTO**: ```csharp public sealed record CreateScanRequest( string ArtifactDigest, string? ArtifactPurl, string ScannerVersion, string WorkerVersion, string ConcelierSnapshotHash, string ExcititorSnapshotHash, string LatticePolicyHash, bool Deterministic, string Seed, // base64 Dictionary? Knobs ); ``` **Response DTO**: ```csharp public sealed record CreateScanResponse( string ScanId, string ManifestHash, DateTimeOffset CreatedAt, ScanLinks Links ); public sealed record ScanLinks( string Self, string Manifest ); ``` **Tests** (`src/Scanner/__Tests/StellaOps.Scanner.WebService.Tests/Controllers/ScansControllerTests.cs`): - `CreateScan_ValidRequest_Returns201` - `CreateScan_IdempotentRequest_Returns200` - `CreateScan_InvalidManifest_Returns400` **Acceptance Criteria**: - [ ] Endpoint returns 201 Created for new scan - [ ] Idempotent requests return 200 OK - [ ] Manifest persisted to database - [ ] DSSE signature included in response --- #### Task 2.2: POST /api/v1/scanner/scans/{id}/score/replay Endpoint **File**: `src/Scanner/StellaOps.Scanner.WebService/Controllers/ScansController.cs` **Implementation**: 1. Add endpoint `POST /api/v1/scanner/scans/{scanId}/score/replay` 2. Retrieve scan manifest from database 3. Apply overrides (new Concelier/Excititor/Policy snapshot hashes if provided) 4. Load findings from SBOM + vulnerabilities 5. Call `RiskScoring.Score(inputs, ...)` to compute score proof 6. Call `ProofBundleWriter.WriteAsync` to create bundle 7. Persist `ProofBundleRow` to database 8. Return score proof + bundle URI **Request DTO**: ```csharp public sealed record ReplayScoreRequest( ReplayOverrides? Overrides ); public sealed record ReplayOverrides( string? ConcelierSnapshotHash, string? ExcititorSnapshotHash, string? LatticePolicyHash ); ``` **Response DTO**: ```csharp public sealed record ReplayScoreResponse( string ScanId, DateTimeOffset ReplayedAt, ScoreProof ScoreProof, string ProofBundleUri, ProofLinks Links ); public sealed record ScoreProof( string RootHash, IReadOnlyList Nodes ); ``` **Tests**: - `ReplayScore_ValidScan_Returns200` - `ReplayScore_WithOverrides_UsesNewSnapshots` - `ReplayScore_ScanNotFound_Returns404` **Acceptance Criteria**: - [ ] Endpoint computes score proof - [ ] Proof bundle created and persisted - [ ] Overrides applied correctly --- ## Epic B: Reachability Implementation ### Phase 1: .NET Call-Graph Extraction (Sprint 3500.0003.0001) **Working Directory**: `src/Scanner/StellaOps.Scanner.Worker/` #### Task 3.1: Roslyn-Based Call-Graph Extractor **File**: `src/Scanner/StellaOps.Scanner.Worker/CallGraph/DotNetCallGraphExtractor.cs` **Implementation**: 1. Add NuGet packages: - `Microsoft.CodeAnalysis.Workspaces.MSBuild` - `Microsoft.CodeAnalysis.CSharp.Workspaces` - `Microsoft.Build.Locator` 2. Implement `DotNetCallGraphExtractor.ExtractAsync(slnPath)`: - Register MSBuild: `MSBuildLocator.RegisterDefaults()` - Open solution: `MSBuildWorkspace.Create().OpenSolutionAsync(slnPath)` - For each project, for each document: - Get semantic model: `doc.GetSemanticModelAsync()` - Get syntax root: `doc.GetSyntaxRootAsync()` - Find all `InvocationExpressionSyntax` nodes - Resolve symbol: `model.GetSymbolInfo(node).Symbol` - Create `CgNode` for caller and callee - Create `CgEdge` with `kind=static`, `reason=direct_call` 3. Detect entrypoints: - ASP.NET Core controllers: `[ApiController]` attribute - Minimal APIs: `MapGet`/`MapPost` patterns (regex-based scan) - Background services: `IHostedService`, `BackgroundService` 4. Output `CallGraph.v1.json` per schema **Schema** (`CallGraph.v1.json`): ```json { "schema": "stella.callgraph.v1", "scanKey": "uuid", "language": "dotnet", "artifacts": [...], "nodes": [...], "edges": [...], "entrypoints": [...] } ``` **Node ID Computation**: ```csharp public static string ComputeNodeId(IMethodSymbol method) { var mvid = method.ContainingAssembly.GetMetadata().GetModuleVersionId(); var token = method.GetMetadataToken(); var arity = method.Arity; var sigShape = method.GetSignatureShape(); // Simplified signature var input = $"{mvid}:{token}:{arity}:{sigShape}"; var hash = SHA256.HashData(Encoding.UTF8.GetBytes(input)); return "sha256:" + Convert.ToHexString(hash).ToLowerInvariant(); } ``` **Tests** (`src/Scanner/__Tests/StellaOps.Scanner.Worker.Tests/CallGraph/DotNetCallGraphExtractorTests.cs`): - `ExtractAsync_SimpleSolution_ProducesCallGraph` - `ExtractAsync_DetectsAspNetCoreEntrypoints` - `ExtractAsync_HandlesReflection` — Heuristic edges **Acceptance Criteria**: - [ ] Extracts call-graph from .sln file - [ ] Detects HTTP entrypoints (ASP.NET Core) - [ ] Produces valid `CallGraph.v1.json` --- #### Task 3.2: Reachability BFS Algorithm **File**: `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/ReachabilityAnalyzer.cs` **Implementation**: 1. Create project: `StellaOps.Scanner.Reachability` 2. Implement `ReachabilityAnalyzer.Analyze(callGraph, sbom, vulns)`: - Build adjacency list from `cg_edge` where `kind='static'` - Seed BFS from entrypoints - Traverse graph (bounded depth: 100 hops) - Track visited nodes and paths - Map reachable nodes to PURLs via `symbol_component_map` - For each vulnerability: - Check if affected PURL's symbols are reachable - Assign status: `REACHABLE_STATIC`, `UNREACHABLE`, `POSSIBLY_REACHABLE` - Compute confidence score 3. Output `ReachabilityFinding[]` **Algorithm**: ```csharp public static ReachabilityFinding[] Analyze(CallGraph cg, Sbom sbom, Vulnerability[] vulns) { var adj = BuildAdjacencyList(cg.Edges.Where(e => e.Kind == "static")); var visited = new HashSet(); var parent = new Dictionary(); var queue = new Queue<(string nodeId, int depth)>(); foreach (var entry in cg.Entrypoints) { queue.Enqueue((entry.NodeId, 0)); visited.Add(entry.NodeId); } while (queue.Count > 0) { var (cur, depth) = queue.Dequeue(); if (depth >= 100) continue; // Max depth foreach (var next in adj[cur]) { if (visited.Add(next)) { parent[next] = cur; queue.Enqueue((next, depth + 1)); } } } // Map visited nodes to PURLs var reachablePurls = MapNodesToPurls(visited, sbom); // Classify vulnerabilities var findings = new List(); foreach (var vuln in vulns) { var status = reachablePurls.Contains(vuln.Purl) ? ReachabilityStatus.REACHABLE_STATIC : ReachabilityStatus.UNREACHABLE; findings.Add(new ReachabilityFinding( CveId: vuln.CveId, Purl: vuln.Purl, Status: status, Confidence: status == ReachabilityStatus.REACHABLE_STATIC ? 0.70 : 0.05, Path: status == ReachabilityStatus.REACHABLE_STATIC ? ReconstructPath(parent, FindNodeForPurl(vuln.Purl)) : null )); } return findings.ToArray(); } ``` **Tests** (`src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/ReachabilityAnalyzerTests.cs`): - `Analyze_ReachableVuln_ReturnsReachableStatic` - `Analyze_UnreachableVuln_ReturnsUnreachable` - `Analyze_MaxDepthExceeded_StopsSearch` **Acceptance Criteria**: - [ ] BFS traverses call-graph - [ ] Correctly classifies reachable/unreachable - [ ] Confidence scores computed --- ## Testing Strategy ### Unit Tests **Coverage Target**: ≥85% for all new code **Key Test Suites**: - `CanonJsonTests` — JSON canonicalization - `DsseEnvelopeTests` — Signature verification - `ProofLedgerTests` — Node hashing, root hash - `ScanManifestTests` — Manifest hash computation - `ProofBundleWriterTests` — Bundle creation - `DotNetCallGraphExtractorTests` — Call-graph extraction - `ReachabilityAnalyzerTests` — BFS algorithm **Running Tests**: ```bash cd src/Scanner dotnet test --filter "Category=Unit" ``` ### Integration Tests **Location**: `src/__Tests/StellaOps.Integration.Tests/` **Required Scenarios**: 1. Full pipeline: Scan → Manifest → Proof Bundle → Replay 2. Call-graph → Reachability → Findings 3. API endpoints: POST /scans → GET /manifest → POST /score/replay **Setup**: - Use Testcontainers for Postgres - Seed database with migrations - Use in-memory DSSE signer for tests **Running Integration Tests**: ```bash dotnet test --filter "Category=Integration" ``` ### Golden Corpus Tests **Location**: `/offline/corpus/ground-truth-v1/` **Test Cases**: 1. ASP.NET controller → reachable vuln 2. Vulnerable lib never called → unreachable 3. Reflection-based activation → possibly_reachable **Format**: ``` corpus/ ├── 001_reachable_vuln/ │ ├── app.sln │ ├── expected.json # Expected reachability verdict │ └── README.md ├── 002_unreachable_vuln/ └── ... ``` **Running Corpus Tests**: ```bash stella test corpus --path /offline/corpus/ground-truth-v1/ ``` --- ## Debugging Tips ### Common Issues **Issue**: Canonical JSON hashes don't match across runs **Solution**: - Check for floating-point precision differences - Verify no environment variables in serialization - Ensure stable key ordering (Ordinal comparison) **Issue**: DSSE signature verification fails **Solution**: - Check PAE encoding matches spec - Verify same key used for sign and verify - Inspect base64 encoding/decoding **Issue**: Reachability BFS misses paths **Solution**: - Verify adjacency list built correctly - Check max depth limit (100 hops) - Inspect edge filtering (`kind='static'` only) **Issue**: EF Core migration fails **Solution**: - Check advisory lock acquired - Verify no concurrent migrations - Inspect Postgres logs for errors --- ## Code Review Checklist Before submitting PR: - [ ] All unit tests pass (≥85% coverage) - [ ] Integration tests pass - [ ] Code follows .NET naming conventions - [ ] SOLID principles applied - [ ] No hard-coded secrets or credentials - [ ] Logging added for key operations - [ ] XML doc comments on public APIs - [ ] No TODOs or FIXMEs in code - [ ] Migration tested on clean Postgres - [ ] API returns RFC 7807 errors --- ## Deployment Checklist Before deploying to production: - [ ] Database migrations tested on staging - [ ] API rate limits configured - [ ] DSSE signing keys rotated - [ ] Rekor endpoints configured - [ ] Metrics dashboards created - [ ] Alerts configured (table growth, index bloat) - [ ] Runbook updated with new endpoints - [ ] Documentation published --- ## References **Sprint Files**: - `SPRINT_3500_0002_0001_score_proofs_foundations.md` - `SPRINT_3500_0002_0003_proof_replay_api.md` - `SPRINT_3500_0003_0001_reachability_dotnet_foundations.md` **Documentation**: - `docs/07_HIGH_LEVEL_ARCHITECTURE.md` - `docs/db/schemas/scanner_schema_specification.md` - `docs/api/scanner-score-proofs-api.md` - `docs/product-advisories/14-Dec-2025 - Reachability Analysis Technical Reference.md` **Existing Code**: - `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` — DSSE examples - `src/Policy/__Tests/StellaOps.Policy.Scoring.Tests/DeterminismScoringIntegrationTests.cs` --- **Last Updated**: 2025-12-17 **Agents**: Read this file BEFORE starting any task **Questions**: Mark task as BLOCKED in delivery tracker if unclear