# StellaOps.Scanner.EntryTrace — Agent Charter ## Mission Resolve container `ENTRYPOINT`/`CMD` chains into deterministic call graphs that fuel usage-aware SBOMs, policy explainability, and runtime drift detection. Implement the EntryTrace analyzers and expose them as restart-time plug-ins for the Scanner Worker. ## Scope - Parse POSIX/Bourne shell constructs (exec, command, case, if, source/run-parts) with deterministic AST output. - Walk layered root filesystems to resolve PATH lookups, interpreter hand-offs (Python/Node/Java), and record evidence. - Surface explainable diagnostics for unresolved branches (env indirection, missing files, unsupported syntax) and emit metrics. - Package analyzers as signed plug-ins under `plugins/scanner/entrytrace/`, guarded by restart-only policy. - **Semantic analysis**: Classify entrypoints by application intent (ApiEndpoint, Worker, CronJob, etc.), capability class (NetworkListener, FileSystemAccess, etc.), and threat vectors. - **Temporal tracking**: Track entrypoint evolution across image versions, detecting drift categories (intent changes, capability expansion, attack surface growth). - **Mesh analysis**: Parse multi-container orchestration manifests (K8s, Docker Compose) to build cross-container reachability graphs and identify vulnerable paths. ## Out of Scope - SBOM emission/diffing (owned by `Scanner.Emit`/`Scanner.Diff`). - Runtime enforcement or live drift reconciliation (owned by Zastava). - Registry/network fetchers beyond file lookups inside extracted layers. ## Interfaces & Contracts ### Core EntryTrace - Primary entry point: `IEntryTraceAnalyzer.ResolveAsync` returning a deterministic `EntryTraceGraph`. - Graph nodes must include file path, line span, interpreter classification, evidence source, and follow `Scanner.Core` timestamp/ID helpers when emitting events. - Diagnostics must enumerate unknown reasons from fixed enum; metrics tagged `entrytrace.*`. - Plug-ins register via `IEntryTraceAnalyzerFactory` and must validate against `IPluginCatalogGuard`. ### Semantic Entrypoints (Sprint 0411) Located in `Semantic/`: - `SemanticEntrypoint`: Classifies entrypoints with intent, capabilities, threat vectors, and confidence scores. - `ApplicationIntent`: Enum for high-level purpose (ApiEndpoint, Worker, CronJob, CliTool, etc.). - `CapabilityClass`: Enum for functional capabilities (NetworkListener, FileSystemAccess, ProcessSpawner, etc.). - `ThreatVector`: Enum for security-relevant classifications (NetworkExposure, FilePathTraversal, CommandInjection, etc.). - `DataFlowBoundary`: Record for trust boundaries in data flow. - `SemanticConfidence`: Confidence scores for classification results. ### Temporal Entrypoints (Sprint 0412) Located in `Temporal/`: - `TemporalEntrypointGraph`: Tracks entrypoints across image versions with snapshots and deltas. - `EntrypointSnapshot`: Point-in-time entrypoint state with content hash for comparison. - `EntrypointDelta`: Version-to-version changes (added/removed/modified entrypoints). - `EntrypointDrift`: Flags enum for drift categories (IntentChanged, CapabilitiesExpanded, AttackSurfaceGrew, PrivilegeEscalation, PortsAdded, etc.). - `ITemporalEntrypointStore`: Interface for storing and querying temporal graphs. - `InMemoryTemporalEntrypointStore`: Reference implementation with delta computation. ### Mesh Entrypoints (Sprint 0412) Located in `Mesh/`: - `MeshEntrypointGraph`: Multi-container service mesh with services, edges, and ingress paths. - `ServiceNode`: Container in the mesh with entrypoints, exposed ports, and labels. - `CrossContainerEdge`: Inter-service communication link. - `CrossContainerPath`: Reachability path across services with vulnerability tracking. - `IngressPath`: External exposure via ingress/load balancer. - `IManifestParser`: Interface for parsing orchestration manifests. - `KubernetesManifestParser`: Parser for K8s Deployment, Service, Ingress, StatefulSet, DaemonSet, Pod. - `DockerComposeParser`: Parser for Docker Compose v2/v3 files. - `MeshEntrypointAnalyzer`: Orchestrator for mesh analysis with security metrics and blast radius analysis. ### Speculative Execution (Sprint 0413) Located in `Speculative/`: - `SymbolicValue`: Algebraic type for symbolic values (Concrete, Symbolic, Unknown, Composite). - `SymbolicState`: Execution state with variable bindings, path constraints, and terminal commands. - `PathConstraint`: Branch predicate constraint with kind classification and env dependency tracking. - `ExecutionPath`: Complete execution path with constraints, commands, and reachability confidence. - `ExecutionTree`: All paths from symbolic execution with branch coverage metrics. - `BranchPoint`: Decision point in the script with coverage statistics. - `BranchCoverage`: Coverage metrics (total, covered, infeasible, env-dependent branches). - `ISymbolicExecutor`: Interface for symbolic execution of shell scripts. - `ShellSymbolicExecutor`: Implementation that explores all if/elif/else and case branches. - `IConstraintEvaluator`: Interface for path feasibility evaluation. - `PatternConstraintEvaluator`: Pattern-based evaluator for common shell conditionals. - `PathEnumerator`: Systematic path exploration with grouping by terminal command. - `PathConfidenceScorer`: Confidence scoring with multi-factor analysis. ### Binary Intelligence (Sprint 0414) Located in `Binary/`: - `CodeFingerprint`: Record for binary function fingerprinting with algorithm, hash, and metrics. - `FingerprintAlgorithm`: Enum for fingerprint types (BasicBlockHash, ControlFlowGraph, StringReferences, ImportReferences, Combined). - `FunctionSignature`: Record for extracted binary function metadata (name, offset, size, calling convention, basic blocks, references). - `BasicBlock`: Record for control flow basic block with offset, size, and instruction count. - `SymbolInfo`: Record for recovered symbol information with confidence and match method. - `SymbolMatchMethod`: Enum for how symbols were recovered (DebugInfo, ExactFingerprint, FuzzyFingerprint, PatternMatch, etc.). - `AlternativeMatch`: Record for secondary symbol match candidates. - `SourceCorrelation`: Record for mapping binary code to source packages/files. - `CorrelationEvidence`: Flags enum for evidence types (FingerprintMatch, SymbolName, StringPattern, ImportReference, SourcePath, ExactMatch). - `BinaryAnalysisResult`: Aggregate result with functions, recovered symbols, source correlations, and vulnerable matches. - `BinaryArchitecture`: Enum for CPU architectures (X86, X64, ARM, ARM64, RISCV32, RISCV64, WASM, Unknown). - `BinaryFormat`: Enum for binary formats (ELF, PE, MachO, WASM, Raw, Unknown). - `BinaryAnalysisMetrics`: Metrics for analysis coverage and timing. - `VulnerableFunctionMatch`: Match of a binary function to a known-vulnerable OSS function. - `VulnerabilitySeverity`: Enum for vulnerability severity levels. - `IFingerprintGenerator`: Interface for generating fingerprints from function signatures. - `BasicBlockFingerprintGenerator`, `ControlFlowFingerprintGenerator`, `CombinedFingerprintGenerator`: Implementations. - `FingerprintGeneratorFactory`: Factory for creating fingerprint generators. - `IFingerprintIndex`: Interface for fingerprint lookup with exact and similarity matching. - `InMemoryFingerprintIndex`: O(1) exact match, O(n) similarity search implementation. - `VulnerableFingerprintIndex`: Extends index with vulnerability tracking. - `FingerprintMatch`: Result record with source package, version, vulnerability associations, and similarity score. - `FingerprintIndexStatistics`: Statistics about the fingerprint index. - `ISymbolRecovery`: Interface for recovering symbol names from stripped binaries. - `PatternBasedSymbolRecovery`: Heuristic-based recovery using known patterns. - `FunctionPattern`: Record for function signature patterns (malloc, strlen, OpenSSL, zlib, etc.). - `BinaryIntelligenceAnalyzer`: Orchestrator coordinating fingerprinting, symbol recovery, source correlation, and vulnerability matching. - `BinaryIntelligenceOptions`: Configuration for analysis (algorithm, thresholds, parallelism). - `VulnerableFunctionMatcher`: Matches binary functions against known-vulnerable function corpus. - `VulnerableFunctionMatcherOptions`: Configuration for matching thresholds. - `FingerprintCorpusBuilder`: Builds fingerprint corpus from known OSS packages for later matching. ### Predictive Risk Scoring (Sprint 0415) Located in `Risk/`: - `RiskScore`: Record with OverallScore, Category, Confidence, Level, Factors, and ComputedAt. - `RiskCategory`: Enum for risk dimensions (Exploitability, Exposure, Privilege, DataSensitivity, BlastRadius, DriftVelocity, SupplyChain, Unknown). - `RiskLevel`: Enum for severity classification (Negligible, Low, Medium, High, Critical). - `RiskFactor`: Record for individual contributing factors with name, category, score, weight, evidence, and source ID. - `BusinessContext`: Record with environment, IsInternetFacing, DataClassification, CriticalityTier, ComplianceRegimes, and RiskMultiplier. - `DataClassification`: Enum for data sensitivity (Public, Internal, Confidential, Restricted, Unknown). - `SubjectType`: Enum for risk subject types (Image, Container, Service, Fleet). - `RiskAssessment`: Aggregate record with subject, scores, factors, context, recommendations, and timestamps. - `RiskTrend`: Record for tracking risk over time with snapshots and trend direction. - `RiskSnapshot`: Point-in-time risk score for trend analysis. - `TrendDirection`: Enum (Improving, Stable, Worsening, Volatile, Insufficient). - `IRiskScorer`: Interface for computing risk scores from entrypoint intelligence. - `IRiskContributor`: Interface for individual risk contributors (semantic, temporal, mesh, binary, vulnerability). - `RiskContext`: Record aggregating all signal sources for risk computation. - `VulnerabilityReference`: Record for known vulnerabilities with severity, CVSS, exploit status. - `SemanticRiskContributor`: Risk from capabilities and threat vectors. - `TemporalRiskContributor`: Risk from drift patterns and rapid changes. - `MeshRiskContributor`: Risk from exposure, blast radius, and vulnerable paths. - `BinaryRiskContributor`: Risk from vulnerable function usage in binaries. - `VulnerabilityRiskContributor`: Risk from known CVEs and exploitability. - `CompositeRiskScorer`: Combines all contributors with weighted scoring and business context adjustment. - `CompositeRiskScorerOptions`: Configuration for weights and thresholds. - `RiskExplainer`: Generates human-readable risk explanations with recommendations. - `RiskReport`: Record with assessment, explanation, and recommendations. - `RiskAggregator`: Fleet-level risk aggregation and trending. - `FleetRiskSummary`: Summary statistics across fleet (count by level, top risks, trend). - `RiskSummaryItem`: Individual subject summary for fleet views. - `EntrypointRiskReport`: Complete report combining entrypoint graph with risk assessment. ## Observability & Security - No dynamic assembly loading beyond restart-time plug-in catalog. - Structured logs include `scanId`, `imageDigest`, `layerDigest`, `command`, `reason`. - Metrics counters: `entrytrace_resolutions_total{result}`, `entrytrace_unresolved_total{reason}`. - Deny `source` directives outside image root; sandbox file IO via provided `IRootFileSystem`. ## Testing - Unit tests live in `../StellaOps.Scanner.EntryTrace.Tests` with golden fixtures under `Fixtures/`. - Determinism harness: same inputs produce byte-identical serialized graphs. - Parser fuzz seeds captured for regression; interpreter tracers validated with sample scripts for Python, Node, Java launchers. - **Temporal tests**: `Temporal/TemporalEntrypointGraphTests.cs`, `Temporal/InMemoryTemporalEntrypointStoreTests.cs`. - **Mesh tests**: `Mesh/MeshEntrypointGraphTests.cs`, `Mesh/KubernetesManifestParserTests.cs`, `Mesh/DockerComposeParserTests.cs`, `Mesh/MeshEntrypointAnalyzerTests.cs`. - **Speculative tests**: `Speculative/SymbolicStateTests.cs`, `Speculative/ShellSymbolicExecutorTests.cs`, `Speculative/PathEnumeratorTests.cs`, `Speculative/PathConfidenceScorerTests.cs`. - **Binary tests**: `Binary/CodeFingerprintTests.cs`, `Binary/FingerprintIndexTests.cs`, `Binary/SymbolRecoveryTests.cs`, `Binary/BinaryIntelligenceIntegrationTests.cs`. - **Risk tests** (TODO): `Risk/RiskScoreTests.cs`, `Risk/RiskContributorTests.cs`, `Risk/CompositeRiskScorerTests.cs`. ## Required Reading - `docs/modules/scanner/architecture.md` - `docs/modules/platform/architecture-overview.md` - `docs/modules/scanner/operations/entrypoint-problem.md` - `docs/reachability/function-level-evidence.md` ## Working Agreement - 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work. - 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met. - 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations. - 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change. - 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.