Add comprehensive tests for PathConfidenceScorer, PathEnumerator, ShellSymbolicExecutor, and SymbolicState

- Implemented unit tests for PathConfidenceScorer to evaluate path scoring under various conditions, including empty constraints, known and unknown constraints, environmental dependencies, and custom weights.
- Developed tests for PathEnumerator to ensure correct path enumeration from simple scripts, handling known environments, and respecting maximum paths and depth limits.
- Created tests for ShellSymbolicExecutor to validate execution of shell scripts, including handling of commands, branching, and environment tracking.
- Added tests for SymbolicState to verify state management, variable handling, constraint addition, and environment dependency collection.
This commit is contained in:
StellaOps Bot
2025-12-20 14:03:31 +02:00
parent 0ada1b583f
commit ce8cdcd23d
71 changed files with 12438 additions and 3349 deletions

View File

@@ -55,6 +55,87 @@ Located in `Mesh/`:
- `DockerComposeParser`: Parser for Docker Compose v2/v3 files.
- `MeshEntrypointAnalyzer`: Orchestrator for mesh analysis with security metrics and blast radius analysis.
### Speculative Execution (Sprint 0413)
Located in `Speculative/`:
- `SymbolicValue`: Algebraic type for symbolic values (Concrete, Symbolic, Unknown, Composite).
- `SymbolicState`: Execution state with variable bindings, path constraints, and terminal commands.
- `PathConstraint`: Branch predicate constraint with kind classification and env dependency tracking.
- `ExecutionPath`: Complete execution path with constraints, commands, and reachability confidence.
- `ExecutionTree`: All paths from symbolic execution with branch coverage metrics.
- `BranchPoint`: Decision point in the script with coverage statistics.
- `BranchCoverage`: Coverage metrics (total, covered, infeasible, env-dependent branches).
- `ISymbolicExecutor`: Interface for symbolic execution of shell scripts.
- `ShellSymbolicExecutor`: Implementation that explores all if/elif/else and case branches.
- `IConstraintEvaluator`: Interface for path feasibility evaluation.
- `PatternConstraintEvaluator`: Pattern-based evaluator for common shell conditionals.
- `PathEnumerator`: Systematic path exploration with grouping by terminal command.
- `PathConfidenceScorer`: Confidence scoring with multi-factor analysis.
### Binary Intelligence (Sprint 0414)
Located in `Binary/`:
- `CodeFingerprint`: Record for binary function fingerprinting with algorithm, hash, and metrics.
- `FingerprintAlgorithm`: Enum for fingerprint types (BasicBlockHash, ControlFlowGraph, StringReferences, ImportReferences, Combined).
- `FunctionSignature`: Record for extracted binary function metadata (name, offset, size, calling convention, basic blocks, references).
- `BasicBlock`: Record for control flow basic block with offset, size, and instruction count.
- `SymbolInfo`: Record for recovered symbol information with confidence and match method.
- `SymbolMatchMethod`: Enum for how symbols were recovered (DebugInfo, ExactFingerprint, FuzzyFingerprint, PatternMatch, etc.).
- `AlternativeMatch`: Record for secondary symbol match candidates.
- `SourceCorrelation`: Record for mapping binary code to source packages/files.
- `CorrelationEvidence`: Flags enum for evidence types (FingerprintMatch, SymbolName, StringPattern, ImportReference, SourcePath, ExactMatch).
- `BinaryAnalysisResult`: Aggregate result with functions, recovered symbols, source correlations, and vulnerable matches.
- `BinaryArchitecture`: Enum for CPU architectures (X86, X64, ARM, ARM64, RISCV32, RISCV64, WASM, Unknown).
- `BinaryFormat`: Enum for binary formats (ELF, PE, MachO, WASM, Raw, Unknown).
- `BinaryAnalysisMetrics`: Metrics for analysis coverage and timing.
- `VulnerableFunctionMatch`: Match of a binary function to a known-vulnerable OSS function.
- `VulnerabilitySeverity`: Enum for vulnerability severity levels.
- `IFingerprintGenerator`: Interface for generating fingerprints from function signatures.
- `BasicBlockFingerprintGenerator`, `ControlFlowFingerprintGenerator`, `CombinedFingerprintGenerator`: Implementations.
- `FingerprintGeneratorFactory`: Factory for creating fingerprint generators.
- `IFingerprintIndex`: Interface for fingerprint lookup with exact and similarity matching.
- `InMemoryFingerprintIndex`: O(1) exact match, O(n) similarity search implementation.
- `VulnerableFingerprintIndex`: Extends index with vulnerability tracking.
- `FingerprintMatch`: Result record with source package, version, vulnerability associations, and similarity score.
- `FingerprintIndexStatistics`: Statistics about the fingerprint index.
- `ISymbolRecovery`: Interface for recovering symbol names from stripped binaries.
- `PatternBasedSymbolRecovery`: Heuristic-based recovery using known patterns.
- `FunctionPattern`: Record for function signature patterns (malloc, strlen, OpenSSL, zlib, etc.).
- `BinaryIntelligenceAnalyzer`: Orchestrator coordinating fingerprinting, symbol recovery, source correlation, and vulnerability matching.
- `BinaryIntelligenceOptions`: Configuration for analysis (algorithm, thresholds, parallelism).
- `VulnerableFunctionMatcher`: Matches binary functions against known-vulnerable function corpus.
- `VulnerableFunctionMatcherOptions`: Configuration for matching thresholds.
- `FingerprintCorpusBuilder`: Builds fingerprint corpus from known OSS packages for later matching.
### Predictive Risk Scoring (Sprint 0415)
Located in `Risk/`:
- `RiskScore`: Record with OverallScore, Category, Confidence, Level, Factors, and ComputedAt.
- `RiskCategory`: Enum for risk dimensions (Exploitability, Exposure, Privilege, DataSensitivity, BlastRadius, DriftVelocity, SupplyChain, Unknown).
- `RiskLevel`: Enum for severity classification (Negligible, Low, Medium, High, Critical).
- `RiskFactor`: Record for individual contributing factors with name, category, score, weight, evidence, and source ID.
- `BusinessContext`: Record with environment, IsInternetFacing, DataClassification, CriticalityTier, ComplianceRegimes, and RiskMultiplier.
- `DataClassification`: Enum for data sensitivity (Public, Internal, Confidential, Restricted, Unknown).
- `SubjectType`: Enum for risk subject types (Image, Container, Service, Fleet).
- `RiskAssessment`: Aggregate record with subject, scores, factors, context, recommendations, and timestamps.
- `RiskTrend`: Record for tracking risk over time with snapshots and trend direction.
- `RiskSnapshot`: Point-in-time risk score for trend analysis.
- `TrendDirection`: Enum (Improving, Stable, Worsening, Volatile, Insufficient).
- `IRiskScorer`: Interface for computing risk scores from entrypoint intelligence.
- `IRiskContributor`: Interface for individual risk contributors (semantic, temporal, mesh, binary, vulnerability).
- `RiskContext`: Record aggregating all signal sources for risk computation.
- `VulnerabilityReference`: Record for known vulnerabilities with severity, CVSS, exploit status.
- `SemanticRiskContributor`: Risk from capabilities and threat vectors.
- `TemporalRiskContributor`: Risk from drift patterns and rapid changes.
- `MeshRiskContributor`: Risk from exposure, blast radius, and vulnerable paths.
- `BinaryRiskContributor`: Risk from vulnerable function usage in binaries.
- `VulnerabilityRiskContributor`: Risk from known CVEs and exploitability.
- `CompositeRiskScorer`: Combines all contributors with weighted scoring and business context adjustment.
- `CompositeRiskScorerOptions`: Configuration for weights and thresholds.
- `RiskExplainer`: Generates human-readable risk explanations with recommendations.
- `RiskReport`: Record with assessment, explanation, and recommendations.
- `RiskAggregator`: Fleet-level risk aggregation and trending.
- `FleetRiskSummary`: Summary statistics across fleet (count by level, top risks, trend).
- `RiskSummaryItem`: Individual subject summary for fleet views.
- `EntrypointRiskReport`: Complete report combining entrypoint graph with risk assessment.
## Observability & Security
- No dynamic assembly loading beyond restart-time plug-in catalog.
- Structured logs include `scanId`, `imageDigest`, `layerDigest`, `command`, `reason`.
@@ -67,6 +148,9 @@ Located in `Mesh/`:
- Parser fuzz seeds captured for regression; interpreter tracers validated with sample scripts for Python, Node, Java launchers.
- **Temporal tests**: `Temporal/TemporalEntrypointGraphTests.cs`, `Temporal/InMemoryTemporalEntrypointStoreTests.cs`.
- **Mesh tests**: `Mesh/MeshEntrypointGraphTests.cs`, `Mesh/KubernetesManifestParserTests.cs`, `Mesh/DockerComposeParserTests.cs`, `Mesh/MeshEntrypointAnalyzerTests.cs`.
- **Speculative tests**: `Speculative/SymbolicStateTests.cs`, `Speculative/ShellSymbolicExecutorTests.cs`, `Speculative/PathEnumeratorTests.cs`, `Speculative/PathConfidenceScorerTests.cs`.
- **Binary tests**: `Binary/CodeFingerprintTests.cs`, `Binary/FingerprintIndexTests.cs`, `Binary/SymbolRecoveryTests.cs`, `Binary/BinaryIntelligenceIntegrationTests.cs`.
- **Risk tests** (TODO): `Risk/RiskScoreTests.cs`, `Risk/RiskContributorTests.cs`, `Risk/CompositeRiskScorerTests.cs`.
## Required Reading
- `docs/modules/scanner/architecture.md`

View File

@@ -0,0 +1,406 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Complete result of binary analysis including fingerprints, symbols, and correlations.
/// </summary>
/// <param name="BinaryPath">Path to the analyzed binary.</param>
/// <param name="BinaryHash">SHA256 hash of the binary.</param>
/// <param name="Architecture">Target architecture.</param>
/// <param name="Format">Binary format (ELF, PE, Mach-O).</param>
/// <param name="Functions">Extracted functions with fingerprints.</param>
/// <param name="RecoveredSymbols">Symbol recovery results.</param>
/// <param name="SourceCorrelations">Source code correlations.</param>
/// <param name="VulnerableMatches">Functions matching known vulnerabilities.</param>
/// <param name="Metrics">Analysis metrics.</param>
/// <param name="AnalyzedAt">When the analysis was performed.</param>
public sealed record BinaryAnalysisResult(
string BinaryPath,
string BinaryHash,
BinaryArchitecture Architecture,
BinaryFormat Format,
ImmutableArray<FunctionSignature> Functions,
ImmutableDictionary<long, SymbolInfo> RecoveredSymbols,
ImmutableArray<SourceCorrelation> SourceCorrelations,
ImmutableArray<VulnerableFunctionMatch> VulnerableMatches,
BinaryAnalysisMetrics Metrics,
DateTimeOffset AnalyzedAt)
{
/// <summary>
/// Number of functions discovered.
/// </summary>
public int FunctionCount => Functions.Length;
/// <summary>
/// Number of functions with recovered symbols.
/// </summary>
public int RecoveredSymbolCount => RecoveredSymbols.Count(kv => kv.Value.RecoveredName is not null);
/// <summary>
/// Number of functions correlated to source.
/// </summary>
public int CorrelatedCount => SourceCorrelations.Length;
/// <summary>
/// Number of vulnerable function matches.
/// </summary>
public int VulnerableCount => VulnerableMatches.Length;
/// <summary>
/// Creates an empty result for a binary.
/// </summary>
public static BinaryAnalysisResult Empty(
string binaryPath,
string binaryHash,
BinaryArchitecture architecture = BinaryArchitecture.Unknown,
BinaryFormat format = BinaryFormat.Unknown) => new(
binaryPath,
binaryHash,
architecture,
format,
ImmutableArray<FunctionSignature>.Empty,
ImmutableDictionary<long, SymbolInfo>.Empty,
ImmutableArray<SourceCorrelation>.Empty,
ImmutableArray<VulnerableFunctionMatch>.Empty,
BinaryAnalysisMetrics.Empty,
DateTimeOffset.UtcNow);
/// <summary>
/// Gets functions at high-confidence correlation.
/// </summary>
public IEnumerable<SourceCorrelation> GetHighConfidenceCorrelations()
=> SourceCorrelations.Where(c => c.IsHighConfidence);
/// <summary>
/// Gets the source correlation for a function offset.
/// </summary>
public SourceCorrelation? GetCorrelation(long offset)
=> SourceCorrelations.FirstOrDefault(c =>
offset >= c.BinaryOffset && offset < c.BinaryOffset + c.BinarySize);
/// <summary>
/// Gets symbol info for a function.
/// </summary>
public SymbolInfo? GetSymbol(long offset)
=> RecoveredSymbols.TryGetValue(offset, out var info) ? info : null;
}
/// <summary>
/// Binary file architecture.
/// </summary>
public enum BinaryArchitecture
{
/// <summary>
/// Unknown architecture.
/// </summary>
Unknown,
/// <summary>
/// x86 32-bit.
/// </summary>
X86,
/// <summary>
/// x86-64 / AMD64.
/// </summary>
X64,
/// <summary>
/// ARM 32-bit.
/// </summary>
ARM,
/// <summary>
/// ARM 64-bit (AArch64).
/// </summary>
ARM64,
/// <summary>
/// RISC-V 64-bit.
/// </summary>
RISCV64,
/// <summary>
/// WebAssembly.
/// </summary>
WASM,
/// <summary>
/// MIPS 32-bit.
/// </summary>
MIPS,
/// <summary>
/// MIPS 64-bit.
/// </summary>
MIPS64,
/// <summary>
/// PowerPC 64-bit.
/// </summary>
PPC64,
/// <summary>
/// s390x (IBM Z).
/// </summary>
S390X
}
/// <summary>
/// Binary file format.
/// </summary>
public enum BinaryFormat
{
/// <summary>
/// Unknown format.
/// </summary>
Unknown,
/// <summary>
/// ELF (Linux, BSD, etc.).
/// </summary>
ELF,
/// <summary>
/// PE/COFF (Windows).
/// </summary>
PE,
/// <summary>
/// Mach-O (macOS, iOS).
/// </summary>
MachO,
/// <summary>
/// WebAssembly binary.
/// </summary>
WASM,
/// <summary>
/// Raw binary.
/// </summary>
Raw
}
/// <summary>
/// Metrics from binary analysis.
/// </summary>
/// <param name="TotalFunctions">Total functions discovered.</param>
/// <param name="FunctionsWithSymbols">Functions with original symbols.</param>
/// <param name="FunctionsRecovered">Functions with recovered symbols.</param>
/// <param name="FunctionsCorrelated">Functions correlated to source.</param>
/// <param name="TotalBasicBlocks">Total basic blocks analyzed.</param>
/// <param name="TotalInstructions">Total instructions analyzed.</param>
/// <param name="FingerprintCollisions">Fingerprint collision count.</param>
/// <param name="AnalysisDuration">Time spent analyzing.</param>
public sealed record BinaryAnalysisMetrics(
int TotalFunctions,
int FunctionsWithSymbols,
int FunctionsRecovered,
int FunctionsCorrelated,
int TotalBasicBlocks,
int TotalInstructions,
int FingerprintCollisions,
TimeSpan AnalysisDuration)
{
/// <summary>
/// Empty metrics.
/// </summary>
public static BinaryAnalysisMetrics Empty => new(0, 0, 0, 0, 0, 0, 0, TimeSpan.Zero);
/// <summary>
/// Symbol recovery rate.
/// </summary>
public float RecoveryRate => TotalFunctions > 0
? (float)(FunctionsWithSymbols + FunctionsRecovered) / TotalFunctions
: 0.0f;
/// <summary>
/// Source correlation rate.
/// </summary>
public float CorrelationRate => TotalFunctions > 0
? (float)FunctionsCorrelated / TotalFunctions
: 0.0f;
/// <summary>
/// Average basic blocks per function.
/// </summary>
public float AvgBasicBlocksPerFunction => TotalFunctions > 0
? (float)TotalBasicBlocks / TotalFunctions
: 0.0f;
/// <summary>
/// Gets a human-readable summary.
/// </summary>
public string GetSummary()
=> $"Functions: {TotalFunctions} ({FunctionsWithSymbols} with symbols, {FunctionsRecovered} recovered, " +
$"{FunctionsCorrelated} correlated), Recovery: {RecoveryRate:P0}, Duration: {AnalysisDuration.TotalSeconds:F1}s";
}
/// <summary>
/// A match indicating a binary function corresponds to a known vulnerable function.
/// </summary>
/// <param name="FunctionOffset">Offset of the matched function.</param>
/// <param name="FunctionName">Name of the matched function.</param>
/// <param name="VulnerabilityId">CVE or vulnerability ID.</param>
/// <param name="SourcePackage">PURL of the vulnerable package.</param>
/// <param name="VulnerableVersions">Affected version range.</param>
/// <param name="VulnerableFunctionName">Name of the vulnerable function.</param>
/// <param name="MatchConfidence">Confidence of the match (0.0-1.0).</param>
/// <param name="MatchEvidence">Evidence supporting the match.</param>
/// <param name="Severity">Vulnerability severity.</param>
public sealed record VulnerableFunctionMatch(
long FunctionOffset,
string? FunctionName,
string VulnerabilityId,
string SourcePackage,
string VulnerableVersions,
string VulnerableFunctionName,
float MatchConfidence,
CorrelationEvidence MatchEvidence,
VulnerabilitySeverity Severity)
{
/// <summary>
/// Whether this is a high-confidence match.
/// </summary>
public bool IsHighConfidence => MatchConfidence >= 0.9f;
/// <summary>
/// Whether this is a critical or high severity match.
/// </summary>
public bool IsCriticalOrHigh => Severity is VulnerabilitySeverity.Critical or VulnerabilitySeverity.High;
/// <summary>
/// Gets a summary for reporting.
/// </summary>
public string GetSummary()
=> $"{VulnerabilityId} in {VulnerableFunctionName} ({Severity}, {MatchConfidence:P0} confidence)";
}
/// <summary>
/// Vulnerability severity levels.
/// </summary>
public enum VulnerabilitySeverity
{
/// <summary>
/// Unknown severity.
/// </summary>
Unknown,
/// <summary>
/// Low severity.
/// </summary>
Low,
/// <summary>
/// Medium severity.
/// </summary>
Medium,
/// <summary>
/// High severity.
/// </summary>
High,
/// <summary>
/// Critical severity.
/// </summary>
Critical
}
/// <summary>
/// Builder for constructing BinaryAnalysisResult incrementally.
/// </summary>
public sealed class BinaryAnalysisResultBuilder
{
private readonly string _binaryPath;
private readonly string _binaryHash;
private readonly BinaryArchitecture _architecture;
private readonly BinaryFormat _format;
private readonly List<FunctionSignature> _functions = new();
private readonly Dictionary<long, SymbolInfo> _symbols = new();
private readonly List<SourceCorrelation> _correlations = new();
private readonly List<VulnerableFunctionMatch> _vulnerableMatches = new();
private readonly DateTimeOffset _startTime = DateTimeOffset.UtcNow;
public BinaryAnalysisResultBuilder(
string binaryPath,
string binaryHash,
BinaryArchitecture architecture = BinaryArchitecture.Unknown,
BinaryFormat format = BinaryFormat.Unknown)
{
_binaryPath = binaryPath;
_binaryHash = binaryHash;
_architecture = architecture;
_format = format;
}
/// <summary>
/// Adds a function signature.
/// </summary>
public BinaryAnalysisResultBuilder AddFunction(FunctionSignature function)
{
_functions.Add(function);
return this;
}
/// <summary>
/// Adds a recovered symbol.
/// </summary>
public BinaryAnalysisResultBuilder AddSymbol(long offset, SymbolInfo symbol)
{
_symbols[offset] = symbol;
return this;
}
/// <summary>
/// Adds a source correlation.
/// </summary>
public BinaryAnalysisResultBuilder AddCorrelation(SourceCorrelation correlation)
{
_correlations.Add(correlation);
return this;
}
/// <summary>
/// Adds a vulnerable function match.
/// </summary>
public BinaryAnalysisResultBuilder AddVulnerableMatch(VulnerableFunctionMatch match)
{
_vulnerableMatches.Add(match);
return this;
}
/// <summary>
/// Builds the final result.
/// </summary>
public BinaryAnalysisResult Build()
{
var duration = DateTimeOffset.UtcNow - _startTime;
var metrics = new BinaryAnalysisMetrics(
TotalFunctions: _functions.Count,
FunctionsWithSymbols: _functions.Count(f => f.HasSymbols),
FunctionsRecovered: _symbols.Count(kv => kv.Value.RecoveredName is not null),
FunctionsCorrelated: _correlations.Count,
TotalBasicBlocks: _functions.Sum(f => f.BasicBlockCount),
TotalInstructions: _functions.Sum(f => f.InstructionCount),
FingerprintCollisions: 0, // TODO: detect collisions
AnalysisDuration: duration);
return new BinaryAnalysisResult(
_binaryPath,
_binaryHash,
_architecture,
_format,
_functions.OrderBy(f => f.Offset).ToImmutableArray(),
_symbols.ToImmutableDictionary(),
_correlations.OrderBy(c => c.BinaryOffset).ToImmutableArray(),
_vulnerableMatches.OrderByDescending(m => m.Severity).ToImmutableArray(),
metrics,
DateTimeOffset.UtcNow);
}
}

View File

@@ -0,0 +1,249 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Diagnostics;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Orchestrator for binary intelligence analysis.
/// Coordinates fingerprinting, symbol recovery, source correlation, and vulnerability matching.
/// </summary>
public sealed class BinaryIntelligenceAnalyzer
{
private readonly IFingerprintGenerator _fingerprintGenerator;
private readonly IFingerprintIndex _fingerprintIndex;
private readonly ISymbolRecovery _symbolRecovery;
private readonly VulnerableFunctionMatcher _vulnerabilityMatcher;
private readonly BinaryIntelligenceOptions _options;
/// <summary>
/// Creates a new binary intelligence analyzer.
/// </summary>
public BinaryIntelligenceAnalyzer(
IFingerprintGenerator? fingerprintGenerator = null,
IFingerprintIndex? fingerprintIndex = null,
ISymbolRecovery? symbolRecovery = null,
VulnerableFunctionMatcher? vulnerabilityMatcher = null,
BinaryIntelligenceOptions? options = null)
{
_fingerprintGenerator = fingerprintGenerator ?? new CombinedFingerprintGenerator();
_fingerprintIndex = fingerprintIndex ?? new InMemoryFingerprintIndex();
_symbolRecovery = symbolRecovery ?? new PatternBasedSymbolRecovery();
_vulnerabilityMatcher = vulnerabilityMatcher ?? new VulnerableFunctionMatcher(_fingerprintIndex);
_options = options ?? BinaryIntelligenceOptions.Default;
}
/// <summary>
/// Analyzes a binary and returns comprehensive intelligence.
/// </summary>
/// <param name="binaryPath">Path to the binary.</param>
/// <param name="binaryHash">Content hash of the binary.</param>
/// <param name="functions">Pre-extracted functions from the binary.</param>
/// <param name="architecture">Binary architecture.</param>
/// <param name="format">Binary format.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Complete binary analysis result.</returns>
public async Task<BinaryAnalysisResult> AnalyzeAsync(
string binaryPath,
string binaryHash,
IReadOnlyList<FunctionSignature> functions,
BinaryArchitecture architecture = BinaryArchitecture.Unknown,
BinaryFormat format = BinaryFormat.Unknown,
CancellationToken cancellationToken = default)
{
var stopwatch = Stopwatch.StartNew();
var builder = new BinaryAnalysisResultBuilder(binaryPath, binaryHash, architecture, format);
// Phase 1: Generate fingerprints for all functions
var fingerprints = new Dictionary<long, CodeFingerprint>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
if (function.Size < _options.MinFunctionSize || function.Size > _options.MaxFunctionSize)
{
continue;
}
var fingerprint = await _fingerprintGenerator.GenerateAsync(
function,
new FingerprintOptions(Algorithm: _options.FingerprintAlgorithm),
cancellationToken);
if (fingerprint.Id != "empty")
{
fingerprints[function.Offset] = fingerprint;
}
builder.AddFunction(function);
}
// Phase 2: Recover symbols for stripped functions
if (_options.EnableSymbolRecovery)
{
var strippedFunctions = functions.Where(f => !f.HasSymbols).ToList();
var recoveredSymbols = await _symbolRecovery.RecoverBatchAsync(
strippedFunctions,
_fingerprintIndex,
cancellationToken);
foreach (var (offset, symbol) in recoveredSymbols)
{
builder.AddSymbol(offset, symbol);
}
}
// Phase 3: Build source correlations
if (_options.EnableSourceCorrelation)
{
foreach (var (offset, fingerprint) in fingerprints)
{
cancellationToken.ThrowIfCancellationRequested();
var matches = await _fingerprintIndex.LookupAsync(fingerprint, cancellationToken);
if (matches.Length > 0)
{
var bestMatch = matches[0];
var function = functions.FirstOrDefault(f => f.Offset == offset);
if (function is not null && bestMatch.Similarity >= _options.MinCorrelationConfidence)
{
var correlation = new SourceCorrelation(
BinaryOffset: offset,
BinarySize: function.Size,
FunctionName: function.Name ?? bestMatch.FunctionName,
SourcePackage: bestMatch.SourcePackage,
SourceVersion: bestMatch.SourceVersion,
SourceFile: bestMatch.SourceFile ?? "unknown",
SourceFunction: bestMatch.FunctionName,
SourceLineStart: bestMatch.SourceLine ?? 0,
SourceLineEnd: bestMatch.SourceLine ?? 0,
Confidence: bestMatch.Similarity,
Evidence: CorrelationEvidence.FingerprintMatch);
builder.AddCorrelation(correlation);
}
}
}
}
// Phase 4: Match vulnerable functions
if (_options.EnableVulnerabilityMatching)
{
var vulnerableMatches = await _vulnerabilityMatcher.MatchAsync(
functions,
fingerprints,
cancellationToken);
foreach (var match in vulnerableMatches)
{
builder.AddVulnerableMatch(match);
}
}
stopwatch.Stop();
return builder.Build();
}
/// <summary>
/// Indexes functions from a known package for later matching.
/// </summary>
public async Task<int> IndexPackageAsync(
string sourcePackage,
string sourceVersion,
IReadOnlyList<FunctionSignature> functions,
IReadOnlyList<string>? vulnerabilityIds = null,
CancellationToken cancellationToken = default)
{
var indexedCount = 0;
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
if (function.Size < _options.MinFunctionSize)
{
continue;
}
var fingerprint = await _fingerprintGenerator.GenerateAsync(function, cancellationToken: cancellationToken);
if (fingerprint.Id == "empty")
{
continue;
}
var entry = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: function.Name ?? $"sub_{function.Offset:x}",
SourcePackage: sourcePackage,
SourceVersion: sourceVersion,
SourceFile: null,
SourceLine: null,
VulnerabilityIds: vulnerabilityIds?.ToImmutableArray() ?? ImmutableArray<string>.Empty,
Similarity: 1.0f,
MatchedAt: DateTimeOffset.UtcNow);
if (await _fingerprintIndex.AddAsync(entry, cancellationToken))
{
indexedCount++;
}
}
return indexedCount;
}
/// <summary>
/// Gets statistics about the fingerprint index.
/// </summary>
public FingerprintIndexStatistics GetIndexStatistics() => _fingerprintIndex.GetStatistics();
}
/// <summary>
/// Options for binary intelligence analysis.
/// </summary>
/// <param name="FingerprintAlgorithm">Algorithm to use for fingerprinting.</param>
/// <param name="MinFunctionSize">Minimum function size to analyze.</param>
/// <param name="MaxFunctionSize">Maximum function size to analyze.</param>
/// <param name="MinCorrelationConfidence">Minimum confidence for source correlation.</param>
/// <param name="EnableSymbolRecovery">Whether to attempt symbol recovery.</param>
/// <param name="EnableSourceCorrelation">Whether to correlate with source.</param>
/// <param name="EnableVulnerabilityMatching">Whether to match vulnerable functions.</param>
/// <param name="MaxParallelism">Maximum parallel operations.</param>
public sealed record BinaryIntelligenceOptions(
FingerprintAlgorithm FingerprintAlgorithm = FingerprintAlgorithm.Combined,
int MinFunctionSize = 16,
int MaxFunctionSize = 1_000_000,
float MinCorrelationConfidence = 0.85f,
bool EnableSymbolRecovery = true,
bool EnableSourceCorrelation = true,
bool EnableVulnerabilityMatching = true,
int MaxParallelism = 4)
{
/// <summary>
/// Default options.
/// </summary>
public static BinaryIntelligenceOptions Default => new();
/// <summary>
/// Fast options for quick scanning (lower confidence thresholds).
/// </summary>
public static BinaryIntelligenceOptions Fast => new(
FingerprintAlgorithm: FingerprintAlgorithm.BasicBlockHash,
MinCorrelationConfidence: 0.75f,
EnableSymbolRecovery: false);
/// <summary>
/// Thorough options for detailed analysis.
/// </summary>
public static BinaryIntelligenceOptions Thorough => new(
FingerprintAlgorithm: FingerprintAlgorithm.Combined,
MinCorrelationConfidence: 0.90f,
EnableSymbolRecovery: true,
EnableSourceCorrelation: true,
EnableVulnerabilityMatching: true);
}

View File

@@ -0,0 +1,299 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Numerics;
using System.Security.Cryptography;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Fingerprint of a binary function for identification and matching.
/// Fingerprints are deterministic and can identify functions across different builds.
/// </summary>
/// <param name="Id">Deterministic fingerprint identifier.</param>
/// <param name="Algorithm">Algorithm used to generate this fingerprint.</param>
/// <param name="Hash">The fingerprint hash bytes.</param>
/// <param name="FunctionSize">Size of the function in bytes.</param>
/// <param name="BasicBlockCount">Number of basic blocks in the function.</param>
/// <param name="InstructionCount">Number of instructions in the function.</param>
/// <param name="Metadata">Additional metadata about the fingerprint.</param>
public sealed record CodeFingerprint(
string Id,
FingerprintAlgorithm Algorithm,
ImmutableArray<byte> Hash,
int FunctionSize,
int BasicBlockCount,
int InstructionCount,
ImmutableDictionary<string, string> Metadata)
{
/// <summary>
/// Creates a fingerprint ID from a hash.
/// </summary>
public static string ComputeId(FingerprintAlgorithm algorithm, ReadOnlySpan<byte> hash)
{
var prefix = algorithm switch
{
FingerprintAlgorithm.BasicBlockHash => "bb",
FingerprintAlgorithm.ControlFlowGraph => "cfg",
FingerprintAlgorithm.StringReferences => "str",
FingerprintAlgorithm.ImportReferences => "imp",
FingerprintAlgorithm.Combined => "cmb",
_ => "unk"
};
return $"{prefix}-{Convert.ToHexString(hash[..Math.Min(16, hash.Length)]).ToLowerInvariant()}";
}
/// <summary>
/// Computes similarity with another fingerprint (0.0-1.0).
/// </summary>
public float ComputeSimilarity(CodeFingerprint other)
{
if (Algorithm != other.Algorithm)
{
return 0.0f;
}
// Hamming distance for hash comparison
var minLen = Math.Min(Hash.Length, other.Hash.Length);
if (minLen == 0)
{
return 0.0f;
}
var matchingBits = 0;
var totalBits = minLen * 8;
for (var i = 0; i < minLen; i++)
{
var xor = (byte)(Hash[i] ^ other.Hash[i]);
matchingBits += 8 - BitOperations.PopCount(xor);
}
return (float)matchingBits / totalBits;
}
/// <summary>
/// Gets the hash as a hex string.
/// </summary>
public string HashHex => Convert.ToHexString(Hash.AsSpan()).ToLowerInvariant();
/// <summary>
/// Creates an empty fingerprint.
/// </summary>
public static CodeFingerprint Empty => new(
"empty",
FingerprintAlgorithm.BasicBlockHash,
ImmutableArray<byte>.Empty,
0, 0, 0,
ImmutableDictionary<string, string>.Empty);
}
/// <summary>
/// Algorithm used for generating binary function fingerprints.
/// </summary>
public enum FingerprintAlgorithm
{
/// <summary>
/// Hash of normalized basic block sequence.
/// Good for exact function matching.
/// </summary>
BasicBlockHash,
/// <summary>
/// Hash of control flow graph structure.
/// Resistant to instruction reordering within blocks.
/// </summary>
ControlFlowGraph,
/// <summary>
/// Hash based on referenced string constants.
/// Useful for functions with unique strings.
/// </summary>
StringReferences,
/// <summary>
/// Hash based on imported function references.
/// Useful for wrapper/stub functions.
/// </summary>
ImportReferences,
/// <summary>
/// Combined multi-feature fingerprint.
/// Most robust but larger.
/// </summary>
Combined
}
/// <summary>
/// Options for fingerprint generation.
/// </summary>
/// <param name="Algorithm">Which algorithm(s) to use.</param>
/// <param name="NormalizeRegisters">Whether to normalize register names.</param>
/// <param name="NormalizeConstants">Whether to normalize constant values.</param>
/// <param name="IncludeStrings">Whether to include string references.</param>
/// <param name="MinFunctionSize">Minimum function size to fingerprint.</param>
/// <param name="MaxFunctionSize">Maximum function size to fingerprint.</param>
public sealed record FingerprintOptions(
FingerprintAlgorithm Algorithm = FingerprintAlgorithm.BasicBlockHash,
bool NormalizeRegisters = true,
bool NormalizeConstants = true,
bool IncludeStrings = true,
int MinFunctionSize = 16,
int MaxFunctionSize = 1_000_000)
{
/// <summary>
/// Default fingerprint options.
/// </summary>
public static FingerprintOptions Default => new();
/// <summary>
/// Options optimized for stripped binaries.
/// </summary>
public static FingerprintOptions ForStripped => new(
Algorithm: FingerprintAlgorithm.Combined,
NormalizeRegisters: true,
NormalizeConstants: true,
IncludeStrings: true,
MinFunctionSize: 32);
}
/// <summary>
/// A basic block in a function's control flow graph.
/// </summary>
/// <param name="Id">Block identifier within the function.</param>
/// <param name="Offset">Offset from function start.</param>
/// <param name="Size">Size in bytes.</param>
/// <param name="InstructionCount">Number of instructions.</param>
/// <param name="Successors">IDs of successor blocks.</param>
/// <param name="Predecessors">IDs of predecessor blocks.</param>
/// <param name="NormalizedBytes">Normalized instruction bytes for hashing.</param>
public sealed record BasicBlock(
int Id,
int Offset,
int Size,
int InstructionCount,
ImmutableArray<int> Successors,
ImmutableArray<int> Predecessors,
ImmutableArray<byte> NormalizedBytes)
{
/// <summary>
/// Computes a hash of this basic block.
/// </summary>
public ImmutableArray<byte> ComputeHash()
{
if (NormalizedBytes.IsEmpty)
{
return ImmutableArray<byte>.Empty;
}
var hash = SHA256.HashData(NormalizedBytes.AsSpan());
return ImmutableArray.Create(hash);
}
/// <summary>
/// Whether this is a function entry block.
/// </summary>
public bool IsEntry => Offset == 0;
/// <summary>
/// Whether this is a function exit block.
/// </summary>
public bool IsExit => Successors.IsEmpty;
}
/// <summary>
/// Represents a function extracted from a binary.
/// </summary>
/// <param name="Name">Function name (if available from symbols).</param>
/// <param name="Offset">Offset in the binary file.</param>
/// <param name="Size">Function size in bytes.</param>
/// <param name="CallingConvention">Detected calling convention.</param>
/// <param name="ParameterCount">Inferred parameter count.</param>
/// <param name="ReturnType">Inferred return type.</param>
/// <param name="Fingerprint">The function's fingerprint.</param>
/// <param name="BasicBlocks">Basic blocks in the function.</param>
/// <param name="StringReferences">String constants referenced.</param>
/// <param name="ImportReferences">Imported functions called.</param>
public sealed record FunctionSignature(
string? Name,
long Offset,
int Size,
CallingConvention CallingConvention,
int? ParameterCount,
string? ReturnType,
CodeFingerprint Fingerprint,
ImmutableArray<BasicBlock> BasicBlocks,
ImmutableArray<string> StringReferences,
ImmutableArray<string> ImportReferences)
{
/// <summary>
/// Whether this function has debug symbols.
/// </summary>
public bool HasSymbols => !string.IsNullOrEmpty(Name);
/// <summary>
/// Gets a display name (symbol name or offset-based).
/// </summary>
public string DisplayName => Name ?? $"sub_{Offset:x}";
/// <summary>
/// Number of basic blocks.
/// </summary>
public int BasicBlockCount => BasicBlocks.Length;
/// <summary>
/// Total instruction count across all blocks.
/// </summary>
public int InstructionCount => BasicBlocks.Sum(b => b.InstructionCount);
}
/// <summary>
/// Calling conventions for binary functions.
/// </summary>
public enum CallingConvention
{
/// <summary>
/// Unknown or undetected calling convention.
/// </summary>
Unknown,
/// <summary>
/// C calling convention (cdecl).
/// </summary>
Cdecl,
/// <summary>
/// Standard call (stdcall).
/// </summary>
Stdcall,
/// <summary>
/// Fast call (fastcall).
/// </summary>
Fastcall,
/// <summary>
/// This call for C++ methods.
/// </summary>
Thiscall,
/// <summary>
/// System V AMD64 ABI.
/// </summary>
SysV64,
/// <summary>
/// Microsoft x64 calling convention.
/// </summary>
Win64,
/// <summary>
/// ARM AAPCS calling convention.
/// </summary>
ARM,
/// <summary>
/// ARM64 calling convention.
/// </summary>
ARM64
}

View File

@@ -0,0 +1,358 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Text.Json;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Builds and manages a corpus of fingerprints from OSS packages.
/// Used to populate the fingerprint index for symbol recovery and vulnerability matching.
/// </summary>
public sealed class FingerprintCorpusBuilder
{
private readonly IFingerprintGenerator _fingerprintGenerator;
private readonly IFingerprintIndex _targetIndex;
private readonly FingerprintCorpusOptions _options;
private readonly List<CorpusBuildRecord> _buildHistory = new();
/// <summary>
/// Creates a new corpus builder.
/// </summary>
public FingerprintCorpusBuilder(
IFingerprintIndex targetIndex,
IFingerprintGenerator? fingerprintGenerator = null,
FingerprintCorpusOptions? options = null)
{
_targetIndex = targetIndex;
_fingerprintGenerator = fingerprintGenerator ?? new CombinedFingerprintGenerator();
_options = options ?? FingerprintCorpusOptions.Default;
}
/// <summary>
/// Indexes functions from a package into the corpus.
/// </summary>
/// <param name="package">Package metadata.</param>
/// <param name="functions">Functions extracted from the package binary.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Number of functions indexed.</returns>
public async Task<CorpusBuildResult> IndexPackageAsync(
PackageInfo package,
IReadOnlyList<FunctionSignature> functions,
CancellationToken cancellationToken = default)
{
var startTime = DateTimeOffset.UtcNow;
var indexed = 0;
var skipped = 0;
var duplicates = 0;
var errors = new List<string>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
// Skip functions that don't meet criteria
if (function.Size < _options.MinFunctionSize)
{
skipped++;
continue;
}
if (function.Size > _options.MaxFunctionSize)
{
skipped++;
continue;
}
// Skip functions without names unless configured otherwise
if (!function.HasSymbols && !_options.IndexUnnamedFunctions)
{
skipped++;
continue;
}
try
{
var fingerprint = await _fingerprintGenerator.GenerateAsync(
function,
new FingerprintOptions(Algorithm: _options.FingerprintAlgorithm),
cancellationToken);
if (fingerprint.Id == "empty")
{
skipped++;
continue;
}
var entry = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: function.Name ?? $"sub_{function.Offset:x}",
SourcePackage: package.Purl,
SourceVersion: package.Version,
SourceFile: package.SourceFile,
SourceLine: null,
VulnerabilityIds: package.VulnerabilityIds,
Similarity: 1.0f,
MatchedAt: DateTimeOffset.UtcNow);
var added = await _targetIndex.AddAsync(entry, cancellationToken);
if (added)
{
indexed++;
}
else
{
duplicates++;
}
}
catch (Exception ex)
{
errors.Add($"Function at 0x{function.Offset:x}: {ex.Message}");
}
}
var result = new CorpusBuildResult(
Package: package,
TotalFunctions: functions.Count,
Indexed: indexed,
Skipped: skipped,
Duplicates: duplicates,
Errors: errors.ToImmutableArray(),
Duration: DateTimeOffset.UtcNow - startTime);
_buildHistory.Add(new CorpusBuildRecord(package.Purl, package.Version, result, DateTimeOffset.UtcNow));
return result;
}
/// <summary>
/// Indexes multiple packages in batch.
/// </summary>
public async Task<ImmutableArray<CorpusBuildResult>> IndexPackagesBatchAsync(
IEnumerable<(PackageInfo Package, IReadOnlyList<FunctionSignature> Functions)> packages,
CancellationToken cancellationToken = default)
{
var results = new List<CorpusBuildResult>();
foreach (var (package, functions) in packages)
{
cancellationToken.ThrowIfCancellationRequested();
var result = await IndexPackageAsync(package, functions, cancellationToken);
results.Add(result);
}
return results.ToImmutableArray();
}
/// <summary>
/// Imports corpus data from a JSON file.
/// </summary>
public async Task<int> ImportFromJsonAsync(
Stream jsonStream,
CancellationToken cancellationToken = default)
{
var data = await JsonSerializer.DeserializeAsync<CorpusExportData>(
jsonStream,
cancellationToken: cancellationToken);
if (data?.Entries is null)
{
return 0;
}
var imported = 0;
foreach (var entry in data.Entries)
{
cancellationToken.ThrowIfCancellationRequested();
var fingerprint = new CodeFingerprint(
entry.FingerprintId,
Enum.Parse<FingerprintAlgorithm>(entry.Algorithm),
Convert.FromHexString(entry.HashHex).ToImmutableArray(),
entry.FunctionSize,
entry.BasicBlockCount,
entry.InstructionCount,
entry.Metadata?.ToImmutableDictionary() ?? ImmutableDictionary<string, string>.Empty);
var match = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: entry.FunctionName,
SourcePackage: entry.SourcePackage,
SourceVersion: entry.SourceVersion,
SourceFile: entry.SourceFile,
SourceLine: entry.SourceLine,
VulnerabilityIds: entry.VulnerabilityIds?.ToImmutableArray() ?? ImmutableArray<string>.Empty,
Similarity: 1.0f,
MatchedAt: entry.IndexedAt);
if (await _targetIndex.AddAsync(match, cancellationToken))
{
imported++;
}
}
return imported;
}
/// <summary>
/// Exports the corpus to a JSON stream.
/// </summary>
public async Task ExportToJsonAsync(
Stream outputStream,
CancellationToken cancellationToken = default)
{
// Note: This would require index enumeration support
// For now, export build history as a summary
var data = new CorpusExportData
{
ExportedAt = DateTimeOffset.UtcNow,
Statistics = _targetIndex.GetStatistics(),
Entries = Array.Empty<CorpusEntryData>() // Full export would need index enumeration
};
await JsonSerializer.SerializeAsync(outputStream, data, cancellationToken: cancellationToken);
}
/// <summary>
/// Gets build history.
/// </summary>
public ImmutableArray<CorpusBuildRecord> GetBuildHistory() => _buildHistory.ToImmutableArray();
/// <summary>
/// Gets corpus statistics.
/// </summary>
public FingerprintIndexStatistics GetStatistics() => _targetIndex.GetStatistics();
}
/// <summary>
/// Options for corpus building.
/// </summary>
/// <param name="FingerprintAlgorithm">Algorithm to use.</param>
/// <param name="MinFunctionSize">Minimum function size to index.</param>
/// <param name="MaxFunctionSize">Maximum function size to index.</param>
/// <param name="IndexUnnamedFunctions">Whether to index functions without symbols.</param>
/// <param name="BatchSize">Batch size for parallel processing.</param>
public sealed record FingerprintCorpusOptions(
FingerprintAlgorithm FingerprintAlgorithm = FingerprintAlgorithm.Combined,
int MinFunctionSize = 16,
int MaxFunctionSize = 100_000,
bool IndexUnnamedFunctions = false,
int BatchSize = 100)
{
/// <summary>
/// Default options.
/// </summary>
public static FingerprintCorpusOptions Default => new();
/// <summary>
/// Options for comprehensive indexing.
/// </summary>
public static FingerprintCorpusOptions Comprehensive => new(
FingerprintAlgorithm: FingerprintAlgorithm.Combined,
MinFunctionSize: 8,
IndexUnnamedFunctions: true);
}
/// <summary>
/// Information about a package being indexed.
/// </summary>
/// <param name="Purl">Package URL (PURL).</param>
/// <param name="Version">Package version.</param>
/// <param name="SourceFile">Source file path (if known).</param>
/// <param name="VulnerabilityIds">Known vulnerability IDs for this package.</param>
/// <param name="Tags">Additional metadata tags.</param>
public sealed record PackageInfo(
string Purl,
string Version,
string? SourceFile = null,
ImmutableArray<string> VulnerabilityIds = default,
ImmutableDictionary<string, string>? Tags = null)
{
/// <summary>
/// Creates package info without vulnerabilities.
/// </summary>
public static PackageInfo Create(string purl, string version, string? sourceFile = null)
=> new(purl, version, sourceFile, ImmutableArray<string>.Empty, null);
/// <summary>
/// Creates package info with vulnerabilities.
/// </summary>
public static PackageInfo CreateVulnerable(string purl, string version, params string[] vulnIds)
=> new(purl, version, null, vulnIds.ToImmutableArray(), null);
}
/// <summary>
/// Result of indexing a package.
/// </summary>
/// <param name="Package">The package that was indexed.</param>
/// <param name="TotalFunctions">Total functions in the package.</param>
/// <param name="Indexed">Functions successfully indexed.</param>
/// <param name="Skipped">Functions skipped (too small, no symbols, etc.).</param>
/// <param name="Duplicates">Functions already in index.</param>
/// <param name="Errors">Error messages.</param>
/// <param name="Duration">Time taken.</param>
public sealed record CorpusBuildResult(
PackageInfo Package,
int TotalFunctions,
int Indexed,
int Skipped,
int Duplicates,
ImmutableArray<string> Errors,
TimeSpan Duration)
{
/// <summary>
/// Whether the build was successful (indexed some functions).
/// </summary>
public bool IsSuccess => Indexed > 0 && Errors.IsEmpty;
/// <summary>
/// Index rate as a percentage.
/// </summary>
public float IndexRate => TotalFunctions > 0 ? (float)Indexed / TotalFunctions : 0.0f;
}
/// <summary>
/// Record of a corpus build operation.
/// </summary>
/// <param name="PackagePurl">Package that was indexed.</param>
/// <param name="Version">Version indexed.</param>
/// <param name="Result">Build result.</param>
/// <param name="BuildTime">When the build occurred.</param>
public sealed record CorpusBuildRecord(
string PackagePurl,
string Version,
CorpusBuildResult Result,
DateTimeOffset BuildTime);
/// <summary>
/// Data structure for corpus export/import.
/// </summary>
public sealed class CorpusExportData
{
public DateTimeOffset ExportedAt { get; set; }
public FingerprintIndexStatistics? Statistics { get; set; }
public CorpusEntryData[]? Entries { get; set; }
}
/// <summary>
/// Single entry in exported corpus data.
/// </summary>
public sealed class CorpusEntryData
{
public required string FingerprintId { get; set; }
public required string Algorithm { get; set; }
public required string HashHex { get; set; }
public required int FunctionSize { get; set; }
public required int BasicBlockCount { get; set; }
public required int InstructionCount { get; set; }
public required string FunctionName { get; set; }
public required string SourcePackage { get; set; }
public required string SourceVersion { get; set; }
public string? SourceFile { get; set; }
public int? SourceLine { get; set; }
public string[]? VulnerabilityIds { get; set; }
public Dictionary<string, string>? Metadata { get; set; }
public DateTimeOffset IndexedAt { get; set; }
}

View File

@@ -0,0 +1,312 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Security.Cryptography;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Interface for generating fingerprints from binary functions.
/// </summary>
public interface IFingerprintGenerator
{
/// <summary>
/// Generates a fingerprint for a function.
/// </summary>
/// <param name="function">The function to fingerprint.</param>
/// <param name="options">Fingerprint options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The generated fingerprint.</returns>
Task<CodeFingerprint> GenerateAsync(
FunctionSignature function,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Generates fingerprints for multiple functions.
/// </summary>
Task<ImmutableArray<CodeFingerprint>> GenerateBatchAsync(
IEnumerable<FunctionSignature> functions,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default);
/// <summary>
/// The algorithm this generator produces.
/// </summary>
FingerprintAlgorithm Algorithm { get; }
}
/// <summary>
/// Generates fingerprints based on basic block hashes.
/// </summary>
public sealed class BasicBlockFingerprintGenerator : IFingerprintGenerator
{
/// <inheritdoc/>
public FingerprintAlgorithm Algorithm => FingerprintAlgorithm.BasicBlockHash;
/// <inheritdoc/>
public Task<CodeFingerprint> GenerateAsync(
FunctionSignature function,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
options ??= FingerprintOptions.Default;
if (function.BasicBlocks.IsEmpty || function.Size < options.MinFunctionSize)
{
return Task.FromResult(CodeFingerprint.Empty);
}
// Concatenate normalized basic block bytes
var combinedBytes = new List<byte>();
foreach (var block in function.BasicBlocks.OrderBy(b => b.Offset))
{
cancellationToken.ThrowIfCancellationRequested();
combinedBytes.AddRange(block.NormalizedBytes);
}
if (combinedBytes.Count == 0)
{
return Task.FromResult(CodeFingerprint.Empty);
}
// Generate hash
var hash = SHA256.HashData(combinedBytes.ToArray());
var id = CodeFingerprint.ComputeId(Algorithm, hash);
var metadata = ImmutableDictionary<string, string>.Empty
.Add("generator", nameof(BasicBlockFingerprintGenerator))
.Add("version", "1.0");
if (!string.IsNullOrEmpty(function.Name))
{
metadata = metadata.Add("originalName", function.Name);
}
var fingerprint = new CodeFingerprint(
id,
Algorithm,
ImmutableArray.Create(hash),
function.Size,
function.BasicBlockCount,
function.InstructionCount,
metadata);
return Task.FromResult(fingerprint);
}
/// <inheritdoc/>
public async Task<ImmutableArray<CodeFingerprint>> GenerateBatchAsync(
IEnumerable<FunctionSignature> functions,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
var results = new List<CodeFingerprint>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
var fingerprint = await GenerateAsync(function, options, cancellationToken);
results.Add(fingerprint);
}
return results.ToImmutableArray();
}
}
/// <summary>
/// Generates fingerprints based on control flow graph structure.
/// </summary>
public sealed class ControlFlowFingerprintGenerator : IFingerprintGenerator
{
/// <inheritdoc/>
public FingerprintAlgorithm Algorithm => FingerprintAlgorithm.ControlFlowGraph;
/// <inheritdoc/>
public Task<CodeFingerprint> GenerateAsync(
FunctionSignature function,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
options ??= FingerprintOptions.Default;
if (function.BasicBlocks.IsEmpty || function.Size < options.MinFunctionSize)
{
return Task.FromResult(CodeFingerprint.Empty);
}
// Build CFG signature: encode block sizes and edge patterns
var cfgBytes = new List<byte>();
foreach (var block in function.BasicBlocks.OrderBy(b => b.Id))
{
cancellationToken.ThrowIfCancellationRequested();
// Encode block properties
cfgBytes.AddRange(BitConverter.GetBytes(block.InstructionCount));
cfgBytes.AddRange(BitConverter.GetBytes(block.Successors.Length));
cfgBytes.AddRange(BitConverter.GetBytes(block.Predecessors.Length));
// Encode successor pattern
foreach (var succ in block.Successors.OrderBy(s => s))
{
cfgBytes.AddRange(BitConverter.GetBytes(succ));
}
}
if (cfgBytes.Count == 0)
{
return Task.FromResult(CodeFingerprint.Empty);
}
var hash = SHA256.HashData(cfgBytes.ToArray());
var id = CodeFingerprint.ComputeId(Algorithm, hash);
var metadata = ImmutableDictionary<string, string>.Empty
.Add("generator", nameof(ControlFlowFingerprintGenerator))
.Add("version", "1.0")
.Add("blockCount", function.BasicBlockCount.ToString());
var fingerprint = new CodeFingerprint(
id,
Algorithm,
ImmutableArray.Create(hash),
function.Size,
function.BasicBlockCount,
function.InstructionCount,
metadata);
return Task.FromResult(fingerprint);
}
/// <inheritdoc/>
public async Task<ImmutableArray<CodeFingerprint>> GenerateBatchAsync(
IEnumerable<FunctionSignature> functions,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
var results = new List<CodeFingerprint>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
var fingerprint = await GenerateAsync(function, options, cancellationToken);
results.Add(fingerprint);
}
return results.ToImmutableArray();
}
}
/// <summary>
/// Generates combined multi-feature fingerprints.
/// </summary>
public sealed class CombinedFingerprintGenerator : IFingerprintGenerator
{
private readonly BasicBlockFingerprintGenerator _basicBlockGenerator = new();
private readonly ControlFlowFingerprintGenerator _cfgGenerator = new();
/// <inheritdoc/>
public FingerprintAlgorithm Algorithm => FingerprintAlgorithm.Combined;
/// <inheritdoc/>
public async Task<CodeFingerprint> GenerateAsync(
FunctionSignature function,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
options ??= FingerprintOptions.Default;
if (function.BasicBlocks.IsEmpty || function.Size < options.MinFunctionSize)
{
return CodeFingerprint.Empty;
}
// Generate component fingerprints
var bbFingerprint = await _basicBlockGenerator.GenerateAsync(function, options, cancellationToken);
var cfgFingerprint = await _cfgGenerator.GenerateAsync(function, options, cancellationToken);
// Combine hashes
var combinedBytes = new List<byte>();
combinedBytes.AddRange(bbFingerprint.Hash);
combinedBytes.AddRange(cfgFingerprint.Hash);
// Add string references if requested
if (options.IncludeStrings && !function.StringReferences.IsEmpty)
{
foreach (var str in function.StringReferences.OrderBy(s => s))
{
combinedBytes.AddRange(System.Text.Encoding.UTF8.GetBytes(str));
}
}
// Add import references
if (!function.ImportReferences.IsEmpty)
{
foreach (var import in function.ImportReferences.OrderBy(i => i))
{
combinedBytes.AddRange(System.Text.Encoding.UTF8.GetBytes(import));
}
}
var hash = SHA256.HashData(combinedBytes.ToArray());
var id = CodeFingerprint.ComputeId(Algorithm, hash);
var metadata = ImmutableDictionary<string, string>.Empty
.Add("generator", nameof(CombinedFingerprintGenerator))
.Add("version", "1.0")
.Add("bbHash", bbFingerprint.HashHex[..16])
.Add("cfgHash", cfgFingerprint.HashHex[..16])
.Add("stringCount", function.StringReferences.Length.ToString())
.Add("importCount", function.ImportReferences.Length.ToString());
var fingerprint = new CodeFingerprint(
id,
Algorithm,
ImmutableArray.Create(hash),
function.Size,
function.BasicBlockCount,
function.InstructionCount,
metadata);
return fingerprint;
}
/// <inheritdoc/>
public async Task<ImmutableArray<CodeFingerprint>> GenerateBatchAsync(
IEnumerable<FunctionSignature> functions,
FingerprintOptions? options = null,
CancellationToken cancellationToken = default)
{
var results = new List<CodeFingerprint>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
var fingerprint = await GenerateAsync(function, options, cancellationToken);
results.Add(fingerprint);
}
return results.ToImmutableArray();
}
}
/// <summary>
/// Factory for creating fingerprint generators.
/// </summary>
public static class FingerprintGeneratorFactory
{
/// <summary>
/// Creates a fingerprint generator for the specified algorithm.
/// </summary>
public static IFingerprintGenerator Create(FingerprintAlgorithm algorithm)
{
return algorithm switch
{
FingerprintAlgorithm.BasicBlockHash => new BasicBlockFingerprintGenerator(),
FingerprintAlgorithm.ControlFlowGraph => new ControlFlowFingerprintGenerator(),
FingerprintAlgorithm.Combined => new CombinedFingerprintGenerator(),
_ => new BasicBlockFingerprintGenerator()
};
}
}

View File

@@ -0,0 +1,451 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Concurrent;
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Interface for an index of fingerprints enabling fast lookup.
/// </summary>
public interface IFingerprintIndex
{
/// <summary>
/// Adds a fingerprint to the index.
/// </summary>
/// <param name="fingerprint">The fingerprint to add.</param>
/// <param name="sourcePackage">Source package PURL.</param>
/// <param name="functionName">Function name.</param>
/// <param name="sourceFile">Source file path.</param>
/// <param name="cancellationToken">Cancellation token.</param>
Task AddAsync(
CodeFingerprint fingerprint,
string sourcePackage,
string functionName,
string? sourceFile = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Adds a fingerprint match to the index.
/// </summary>
/// <param name="match">The fingerprint match to add.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>True if added, false if duplicate.</returns>
Task<bool> AddAsync(FingerprintMatch match, CancellationToken cancellationToken = default);
/// <summary>
/// Looks up a fingerprint and returns matching entries.
/// </summary>
/// <param name="fingerprint">The fingerprint to look up.</param>
/// <param name="cancellationToken">Cancellation token.</param>
Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default);
/// <summary>
/// Looks up a fingerprint with additional options.
/// </summary>
Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
float minSimilarity,
int maxResults,
CancellationToken cancellationToken = default);
/// <summary>
/// Looks up an exact fingerprint match.
/// </summary>
Task<FingerprintMatch?> LookupExactAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets the number of fingerprints in the index.
/// </summary>
int Count { get; }
/// <summary>
/// Gets all packages indexed.
/// </summary>
ImmutableHashSet<string> IndexedPackages { get; }
/// <summary>
/// Clears the index.
/// </summary>
Task ClearAsync(CancellationToken cancellationToken = default);
/// <summary>
/// Gets statistics about the index.
/// </summary>
FingerprintIndexStatistics GetStatistics();
}
/// <summary>
/// Statistics about a fingerprint index.
/// </summary>
/// <param name="TotalFingerprints">Total fingerprints in the index.</param>
/// <param name="TotalPackages">Total unique packages indexed.</param>
/// <param name="TotalVulnerabilities">Total vulnerability associations.</param>
/// <param name="IndexedAt">When the index was last updated.</param>
public sealed record FingerprintIndexStatistics(
int TotalFingerprints,
int TotalPackages,
int TotalVulnerabilities,
DateTimeOffset IndexedAt);
/// <summary>
/// Result of a fingerprint lookup.
/// </summary>
/// <param name="Fingerprint">The matched fingerprint.</param>
/// <param name="FunctionName">Name of the function.</param>
/// <param name="SourcePackage">PURL of the source package.</param>
/// <param name="SourceVersion">Version of the source package.</param>
/// <param name="SourceFile">Source file path.</param>
/// <param name="SourceLine">Source line number.</param>
/// <param name="VulnerabilityIds">Associated vulnerability IDs.</param>
/// <param name="Similarity">Similarity score (0.0-1.0).</param>
/// <param name="MatchedAt">When the match was found.</param>
public sealed record FingerprintMatch(
CodeFingerprint Fingerprint,
string FunctionName,
string SourcePackage,
string? SourceVersion,
string? SourceFile,
int? SourceLine,
ImmutableArray<string> VulnerabilityIds,
float Similarity,
DateTimeOffset MatchedAt)
{
/// <summary>
/// Whether this is an exact match.
/// </summary>
public bool IsExactMatch => Similarity >= 0.999f;
/// <summary>
/// Whether this is a high-confidence match.
/// </summary>
public bool IsHighConfidence => Similarity >= 0.95f;
/// <summary>
/// Whether this match has associated vulnerabilities.
/// </summary>
public bool HasVulnerabilities => !VulnerabilityIds.IsEmpty;
}
/// <summary>
/// In-memory fingerprint index for fast lookups.
/// </summary>
public sealed class InMemoryFingerprintIndex : IFingerprintIndex
{
private readonly ConcurrentDictionary<string, FingerprintMatch> _exactIndex = new();
private readonly ConcurrentDictionary<FingerprintAlgorithm, List<FingerprintMatch>> _algorithmIndex = new();
private readonly HashSet<string> _packages = new();
private readonly object _packagesLock = new();
private DateTimeOffset _lastUpdated = DateTimeOffset.UtcNow;
/// <inheritdoc/>
public int Count => _exactIndex.Count;
/// <inheritdoc/>
public ImmutableHashSet<string> IndexedPackages
{
get
{
lock (_packagesLock)
{
return _packages.ToImmutableHashSet();
}
}
}
/// <inheritdoc/>
public Task<bool> AddAsync(FingerprintMatch match, CancellationToken cancellationToken = default)
{
cancellationToken.ThrowIfCancellationRequested();
var added = _exactIndex.TryAdd(match.Fingerprint.Id, match);
if (added)
{
// Add to algorithm-specific index for similarity search
var algorithmList = _algorithmIndex.GetOrAdd(
match.Fingerprint.Algorithm,
_ => new List<FingerprintMatch>());
lock (algorithmList)
{
algorithmList.Add(match);
}
// Track packages
lock (_packagesLock)
{
_packages.Add(match.SourcePackage);
}
_lastUpdated = DateTimeOffset.UtcNow;
}
return Task.FromResult(added);
}
/// <inheritdoc/>
public Task<FingerprintMatch?> LookupExactAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default)
{
cancellationToken.ThrowIfCancellationRequested();
if (_exactIndex.TryGetValue(fingerprint.Id, out var match))
{
return Task.FromResult<FingerprintMatch?>(match);
}
return Task.FromResult<FingerprintMatch?>(null);
}
/// <inheritdoc/>
public Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default)
=> LookupAsync(fingerprint, 0.95f, 10, cancellationToken);
/// <inheritdoc/>
public Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
float minSimilarity,
int maxResults,
CancellationToken cancellationToken = default)
{
cancellationToken.ThrowIfCancellationRequested();
// First try exact match
if (_exactIndex.TryGetValue(fingerprint.Id, out var exactMatch))
{
return Task.FromResult(ImmutableArray.Create(exactMatch));
}
// Search for similar fingerprints
if (!_algorithmIndex.TryGetValue(fingerprint.Algorithm, out var algorithmList))
{
return Task.FromResult(ImmutableArray<FingerprintMatch>.Empty);
}
var matches = new List<(FingerprintMatch Match, float Similarity)>();
lock (algorithmList)
{
foreach (var entry in algorithmList)
{
cancellationToken.ThrowIfCancellationRequested();
var similarity = fingerprint.ComputeSimilarity(entry.Fingerprint);
if (similarity >= minSimilarity)
{
matches.Add((entry, similarity));
}
}
}
var result = matches
.OrderByDescending(m => m.Similarity)
.Take(maxResults)
.Select(m => m.Match with { Similarity = m.Similarity })
.ToImmutableArray();
return Task.FromResult(result);
}
/// <inheritdoc/>
public Task ClearAsync(CancellationToken cancellationToken = default)
{
_exactIndex.Clear();
_algorithmIndex.Clear();
lock (_packagesLock)
{
_packages.Clear();
}
return Task.CompletedTask;
}
/// <inheritdoc/>
public FingerprintIndexStatistics GetStatistics()
{
int vulnCount;
lock (_packagesLock)
{
vulnCount = _exactIndex.Values.Sum(m => m.VulnerabilityIds.Length);
}
return new FingerprintIndexStatistics(
TotalFingerprints: Count,
TotalPackages: IndexedPackages.Count,
TotalVulnerabilities: vulnCount,
IndexedAt: _lastUpdated);
}
/// <inheritdoc/>
public Task AddAsync(
CodeFingerprint fingerprint,
string sourcePackage,
string functionName,
string? sourceFile = null,
CancellationToken cancellationToken = default)
{
var match = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: functionName,
SourcePackage: sourcePackage,
SourceVersion: null,
SourceFile: sourceFile,
SourceLine: null,
VulnerabilityIds: ImmutableArray<string>.Empty,
Similarity: 1.0f,
MatchedAt: DateTimeOffset.UtcNow);
return AddAsync(match, cancellationToken).ContinueWith(_ => { }, cancellationToken);
}
}
/// <summary>
/// Vulnerability-aware fingerprint index that tracks known-vulnerable functions.
/// </summary>
public sealed class VulnerableFingerprintIndex : IFingerprintIndex
{
private readonly InMemoryFingerprintIndex _baseIndex = new();
private readonly ConcurrentDictionary<string, VulnerabilityInfo> _vulnerabilities = new();
/// <inheritdoc/>
public int Count => _baseIndex.Count;
/// <inheritdoc/>
public ImmutableHashSet<string> IndexedPackages => _baseIndex.IndexedPackages;
/// <summary>
/// Adds a fingerprint with associated vulnerability information.
/// </summary>
public async Task<bool> AddVulnerableAsync(
CodeFingerprint fingerprint,
string sourcePackage,
string functionName,
string vulnerabilityId,
string vulnerableVersions,
VulnerabilitySeverity severity,
string? sourceFile = null,
CancellationToken cancellationToken = default)
{
var match = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: functionName,
SourcePackage: sourcePackage,
SourceVersion: null,
SourceFile: sourceFile,
SourceLine: null,
VulnerabilityIds: ImmutableArray.Create(vulnerabilityId),
Similarity: 1.0f,
MatchedAt: DateTimeOffset.UtcNow);
var added = await _baseIndex.AddAsync(match, cancellationToken);
if (added)
{
_vulnerabilities[fingerprint.Id] = new VulnerabilityInfo(
vulnerabilityId,
vulnerableVersions,
severity);
}
return added;
}
/// <inheritdoc/>
public Task<bool> AddAsync(FingerprintMatch match, CancellationToken cancellationToken = default)
=> _baseIndex.AddAsync(match, cancellationToken);
/// <inheritdoc/>
public Task AddAsync(
CodeFingerprint fingerprint,
string sourcePackage,
string functionName,
string? sourceFile = null,
CancellationToken cancellationToken = default)
=> _baseIndex.AddAsync(fingerprint, sourcePackage, functionName, sourceFile, cancellationToken);
/// <inheritdoc/>
public Task<FingerprintMatch?> LookupExactAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default)
=> _baseIndex.LookupExactAsync(fingerprint, cancellationToken);
/// <inheritdoc/>
public Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default)
=> _baseIndex.LookupAsync(fingerprint, cancellationToken);
/// <inheritdoc/>
public Task<ImmutableArray<FingerprintMatch>> LookupAsync(
CodeFingerprint fingerprint,
float minSimilarity,
int maxResults,
CancellationToken cancellationToken = default)
=> _baseIndex.LookupAsync(fingerprint, minSimilarity, maxResults, cancellationToken);
/// <summary>
/// Looks up vulnerability information for a fingerprint.
/// </summary>
public VulnerabilityInfo? GetVulnerability(string fingerprintId)
=> _vulnerabilities.TryGetValue(fingerprintId, out var info) ? info : null;
/// <summary>
/// Checks if a fingerprint matches a known-vulnerable function.
/// </summary>
public async Task<VulnerableFunctionMatch?> CheckVulnerableAsync(
CodeFingerprint fingerprint,
long functionOffset,
CancellationToken cancellationToken = default)
{
var matches = await LookupAsync(fingerprint, 0.95f, 1, cancellationToken);
if (matches.IsEmpty)
{
return null;
}
var match = matches[0];
var vulnInfo = GetVulnerability(match.Fingerprint.Id);
if (vulnInfo is null)
{
return null;
}
return new VulnerableFunctionMatch(
functionOffset,
match.FunctionName,
vulnInfo.VulnerabilityId,
match.SourcePackage,
vulnInfo.VulnerableVersions,
match.FunctionName,
match.Similarity,
CorrelationEvidence.FingerprintMatch,
vulnInfo.Severity);
}
/// <inheritdoc/>
public async Task ClearAsync(CancellationToken cancellationToken = default)
{
await _baseIndex.ClearAsync(cancellationToken);
_vulnerabilities.Clear();
}
/// <inheritdoc/>
public FingerprintIndexStatistics GetStatistics() => _baseIndex.GetStatistics();
/// <summary>
/// Vulnerability information associated with a fingerprint.
/// </summary>
public sealed record VulnerabilityInfo(
string VulnerabilityId,
string VulnerableVersions,
VulnerabilitySeverity Severity);
}

View File

@@ -0,0 +1,379 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Text.RegularExpressions;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Interface for recovering symbol information from stripped binaries.
/// </summary>
public interface ISymbolRecovery
{
/// <summary>
/// Attempts to recover symbol information for a function.
/// </summary>
/// <param name="function">The function to analyze.</param>
/// <param name="index">Optional fingerprint index for matching.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Recovered symbol information.</returns>
Task<SymbolInfo> RecoverAsync(
FunctionSignature function,
IFingerprintIndex? index = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Recovers symbols for multiple functions in batch.
/// </summary>
Task<ImmutableDictionary<long, SymbolInfo>> RecoverBatchAsync(
IEnumerable<FunctionSignature> functions,
IFingerprintIndex? index = null,
CancellationToken cancellationToken = default);
/// <summary>
/// The recovery methods this implementation supports.
/// </summary>
ImmutableArray<SymbolMatchMethod> SupportedMethods { get; }
}
/// <summary>
/// Pattern-based symbol recovery using known code patterns.
/// </summary>
public sealed class PatternBasedSymbolRecovery : ISymbolRecovery
{
private readonly IFingerprintGenerator _fingerprintGenerator;
private readonly ImmutableArray<FunctionPattern> _patterns;
/// <summary>
/// Creates a new pattern-based symbol recovery instance.
/// </summary>
public PatternBasedSymbolRecovery(
IFingerprintGenerator? fingerprintGenerator = null,
IEnumerable<FunctionPattern>? patterns = null)
{
_fingerprintGenerator = fingerprintGenerator ?? new CombinedFingerprintGenerator();
_patterns = patterns?.ToImmutableArray() ?? GetDefaultPatterns();
}
/// <inheritdoc/>
public ImmutableArray<SymbolMatchMethod> SupportedMethods =>
ImmutableArray.Create(
SymbolMatchMethod.PatternMatch,
SymbolMatchMethod.StringAnalysis,
SymbolMatchMethod.FingerprintMatch,
SymbolMatchMethod.Inferred);
/// <inheritdoc/>
public async Task<SymbolInfo> RecoverAsync(
FunctionSignature function,
IFingerprintIndex? index = null,
CancellationToken cancellationToken = default)
{
// If function already has symbols, return them
if (function.HasSymbols)
{
return SymbolInfo.FromDebugSymbols(function.Name!);
}
// Try fingerprint matching first (highest confidence)
if (index is not null)
{
var fingerprint = await _fingerprintGenerator.GenerateAsync(function, cancellationToken: cancellationToken);
var matches = await index.LookupAsync(fingerprint, cancellationToken);
if (matches.Length > 0)
{
var bestMatch = matches[0];
return new SymbolInfo(
OriginalName: null,
RecoveredName: bestMatch.FunctionName,
Confidence: bestMatch.Similarity,
SourcePackage: bestMatch.SourcePackage,
SourceVersion: bestMatch.SourceVersion,
SourceFile: bestMatch.SourceFile,
SourceLine: bestMatch.SourceLine,
MatchMethod: SymbolMatchMethod.FingerprintMatch,
AlternativeMatches: matches.Skip(1)
.Take(3)
.Select(m => new AlternativeMatch(m.FunctionName, m.SourcePackage, m.Similarity))
.ToImmutableArray());
}
}
// Try pattern matching
var patternMatch = TryMatchPattern(function);
if (patternMatch is not null)
{
return patternMatch;
}
// Try string analysis
var stringMatch = TryStringAnalysis(function);
if (stringMatch is not null)
{
return stringMatch;
}
// Heuristic inference based on function characteristics
var inferred = TryInferFromCharacteristics(function);
if (inferred is not null)
{
return inferred;
}
// No match found
return SymbolInfo.Unmatched();
}
/// <inheritdoc/>
public async Task<ImmutableDictionary<long, SymbolInfo>> RecoverBatchAsync(
IEnumerable<FunctionSignature> functions,
IFingerprintIndex? index = null,
CancellationToken cancellationToken = default)
{
var results = ImmutableDictionary.CreateBuilder<long, SymbolInfo>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
var symbol = await RecoverAsync(function, index, cancellationToken);
results[function.Offset] = symbol;
}
return results.ToImmutable();
}
private SymbolInfo? TryMatchPattern(FunctionSignature function)
{
foreach (var pattern in _patterns)
{
if (pattern.Matches(function))
{
return new SymbolInfo(
OriginalName: null,
RecoveredName: pattern.InferredName,
Confidence: pattern.Confidence,
SourcePackage: pattern.SourcePackage,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
MatchMethod: SymbolMatchMethod.PatternMatch,
AlternativeMatches: ImmutableArray<AlternativeMatch>.Empty);
}
}
return null;
}
private SymbolInfo? TryStringAnalysis(FunctionSignature function)
{
if (function.StringReferences.IsEmpty)
{
return null;
}
// Look for common patterns in string references
foreach (var str in function.StringReferences)
{
// Error message patterns often contain function names
var errorMatch = Regex.Match(str, @"^(?:error|warning|fatal|assert)\s+in\s+(\w+)", RegexOptions.IgnoreCase);
if (errorMatch.Success)
{
return new SymbolInfo(
OriginalName: null,
RecoveredName: errorMatch.Groups[1].Value,
Confidence: 0.7f,
SourcePackage: null,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
MatchMethod: SymbolMatchMethod.StringAnalysis,
AlternativeMatches: ImmutableArray<AlternativeMatch>.Empty);
}
// Debug format strings often contain function names
var debugMatch = Regex.Match(str, @"^\[(\w+)\]", RegexOptions.None);
if (debugMatch.Success && debugMatch.Groups[1].Length >= 3)
{
return new SymbolInfo(
OriginalName: null,
RecoveredName: debugMatch.Groups[1].Value,
Confidence: 0.5f,
SourcePackage: null,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
MatchMethod: SymbolMatchMethod.StringAnalysis,
AlternativeMatches: ImmutableArray<AlternativeMatch>.Empty);
}
}
return null;
}
private SymbolInfo? TryInferFromCharacteristics(FunctionSignature function)
{
// Very short functions are often stubs/wrappers
if (function.Size < 32 && function.BasicBlockCount == 1)
{
if (!function.ImportReferences.IsEmpty)
{
// Likely a wrapper for the first import
var import = function.ImportReferences[0];
return new SymbolInfo(
OriginalName: null,
RecoveredName: $"wrapper_{import}",
Confidence: 0.3f,
SourcePackage: null,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
MatchMethod: SymbolMatchMethod.Inferred,
AlternativeMatches: ImmutableArray<AlternativeMatch>.Empty);
}
}
// Functions with many string references are often print/log functions
if (function.StringReferences.Length > 5)
{
return new SymbolInfo(
OriginalName: null,
RecoveredName: "log_or_print_function",
Confidence: 0.2f,
SourcePackage: null,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
MatchMethod: SymbolMatchMethod.Inferred,
AlternativeMatches: ImmutableArray<AlternativeMatch>.Empty);
}
return null;
}
private static ImmutableArray<FunctionPattern> GetDefaultPatterns()
{
return ImmutableArray.Create(
// Common C runtime patterns
new FunctionPattern(
Name: "malloc",
MinSize: 32, MaxSize: 256,
RequiredImports: new[] { "sbrk", "mmap" },
InferredName: "malloc",
Confidence: 0.85f),
new FunctionPattern(
Name: "free",
MinSize: 16, MaxSize: 128,
RequiredImports: new[] { "munmap" },
InferredName: "free",
Confidence: 0.80f),
new FunctionPattern(
Name: "memcpy",
MinSize: 8, MaxSize: 64,
RequiredImports: Array.Empty<string>(),
MinBasicBlocks: 1, MaxBasicBlocks: 3,
InferredName: "memcpy",
Confidence: 0.75f),
new FunctionPattern(
Name: "strlen",
MinSize: 8, MaxSize: 48,
RequiredImports: Array.Empty<string>(),
MinBasicBlocks: 1, MaxBasicBlocks: 2,
InferredName: "strlen",
Confidence: 0.70f),
// OpenSSL patterns
new FunctionPattern(
Name: "EVP_EncryptInit",
MinSize: 128, MaxSize: 512,
RequiredImports: new[] { "EVP_CIPHER_CTX_new", "EVP_CIPHER_CTX_init" },
InferredName: "EVP_EncryptInit",
Confidence: 0.90f,
SourcePackage: "pkg:generic/openssl"),
// zlib patterns
new FunctionPattern(
Name: "inflate",
MinSize: 256, MaxSize: 2048,
RequiredImports: Array.Empty<string>(),
InferredName: "inflate",
Confidence: 0.85f,
RequiredStrings: new[] { "invalid block type", "incorrect data check" },
SourcePackage: "pkg:generic/zlib")
);
}
}
/// <summary>
/// Pattern for matching known function signatures.
/// </summary>
/// <param name="Name">Pattern name for identification.</param>
/// <param name="MinSize">Minimum function size.</param>
/// <param name="MaxSize">Maximum function size.</param>
/// <param name="RequiredImports">Imports that must be present.</param>
/// <param name="RequiredStrings">Strings that must be referenced.</param>
/// <param name="MinBasicBlocks">Minimum basic block count.</param>
/// <param name="MaxBasicBlocks">Maximum basic block count.</param>
/// <param name="InferredName">Name to infer if pattern matches.</param>
/// <param name="SourcePackage">Source package PURL.</param>
/// <param name="Confidence">Confidence level for this pattern.</param>
public sealed record FunctionPattern(
string Name,
int MinSize,
int MaxSize,
string[] RequiredImports,
string InferredName,
float Confidence,
string[]? RequiredStrings = null,
int? MinBasicBlocks = null,
int? MaxBasicBlocks = null,
string? SourcePackage = null)
{
/// <summary>
/// Checks if a function matches this pattern.
/// </summary>
public bool Matches(FunctionSignature function)
{
// Check size bounds
if (function.Size < MinSize || function.Size > MaxSize)
{
return false;
}
// Check basic block count
if (MinBasicBlocks.HasValue && function.BasicBlockCount < MinBasicBlocks.Value)
{
return false;
}
if (MaxBasicBlocks.HasValue && function.BasicBlockCount > MaxBasicBlocks.Value)
{
return false;
}
// Check required imports
if (RequiredImports.Length > 0)
{
var functionImports = function.ImportReferences.ToHashSet(StringComparer.OrdinalIgnoreCase);
if (!RequiredImports.All(r => functionImports.Contains(r)))
{
return false;
}
}
// Check required strings
if (RequiredStrings is { Length: > 0 })
{
var functionStrings = function.StringReferences.ToHashSet(StringComparer.OrdinalIgnoreCase);
if (!RequiredStrings.All(s => functionStrings.Any(fs => fs.Contains(s, StringComparison.OrdinalIgnoreCase))))
{
return false;
}
}
return true;
}
}

View File

@@ -0,0 +1,276 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Numerics;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Recovered symbol information for a binary function.
/// </summary>
/// <param name="OriginalName">Original symbol name (if available).</param>
/// <param name="RecoveredName">Name recovered via matching.</param>
/// <param name="Confidence">Match confidence (0.0-1.0).</param>
/// <param name="SourcePackage">PURL of the source package.</param>
/// <param name="SourceVersion">Version of the source package.</param>
/// <param name="SourceFile">Original source file path.</param>
/// <param name="SourceLine">Original source line number.</param>
/// <param name="MatchMethod">How the symbol was recovered.</param>
/// <param name="AlternativeMatches">Other possible matches.</param>
public sealed record SymbolInfo(
string? OriginalName,
string? RecoveredName,
float Confidence,
string? SourcePackage,
string? SourceVersion,
string? SourceFile,
int? SourceLine,
SymbolMatchMethod MatchMethod,
ImmutableArray<AlternativeMatch> AlternativeMatches)
{
/// <summary>
/// Gets the best available name.
/// </summary>
public string? BestName => OriginalName ?? RecoveredName;
/// <summary>
/// Whether we have high confidence in this match.
/// </summary>
public bool IsHighConfidence => Confidence >= 0.9f;
/// <summary>
/// Whether we have source location information.
/// </summary>
public bool HasSourceLocation => !string.IsNullOrEmpty(SourceFile);
/// <summary>
/// Creates an unmatched symbol info.
/// </summary>
public static SymbolInfo Unmatched(string? originalName = null) => new(
originalName,
RecoveredName: null,
Confidence: 0.0f,
SourcePackage: null,
SourceVersion: null,
SourceFile: null,
SourceLine: null,
SymbolMatchMethod.None,
ImmutableArray<AlternativeMatch>.Empty);
/// <summary>
/// Creates a symbol info from debug symbols.
/// </summary>
public static SymbolInfo FromDebugSymbols(
string name,
string? sourceFile = null,
int? sourceLine = null) => new(
name,
RecoveredName: null,
Confidence: 1.0f,
SourcePackage: null,
SourceVersion: null,
sourceFile,
sourceLine,
SymbolMatchMethod.DebugSymbols,
ImmutableArray<AlternativeMatch>.Empty);
}
/// <summary>
/// How a symbol was recovered/matched.
/// </summary>
public enum SymbolMatchMethod
{
/// <summary>
/// No match found.
/// </summary>
None,
/// <summary>
/// From debug information (DWARF, PDB, etc.).
/// </summary>
DebugSymbols,
/// <summary>
/// From export table.
/// </summary>
ExportTable,
/// <summary>
/// From import table.
/// </summary>
ImportTable,
/// <summary>
/// Matched via fingerprint against corpus.
/// </summary>
FingerprintMatch,
/// <summary>
/// Matched via known code patterns.
/// </summary>
PatternMatch,
/// <summary>
/// Matched via string reference analysis.
/// </summary>
StringAnalysis,
/// <summary>
/// Heuristic inference.
/// </summary>
Inferred,
/// <summary>
/// Multiple methods combined.
/// </summary>
Combined
}
/// <summary>
/// An alternative possible match for a symbol.
/// </summary>
/// <param name="Name">Alternative function name.</param>
/// <param name="SourcePackage">PURL of the alternative source.</param>
/// <param name="Confidence">Confidence for this alternative.</param>
public sealed record AlternativeMatch(
string Name,
string? SourcePackage,
float Confidence);
/// <summary>
/// Correlation between binary code and source code.
/// </summary>
/// <param name="BinaryOffset">Offset in the binary file.</param>
/// <param name="BinarySize">Size of the binary region.</param>
/// <param name="FunctionName">Function name (if known).</param>
/// <param name="SourcePackage">PURL of the source package.</param>
/// <param name="SourceVersion">Version of the source package.</param>
/// <param name="SourceFile">Original source file path.</param>
/// <param name="SourceFunction">Original function name in source.</param>
/// <param name="SourceLineStart">Start line in source.</param>
/// <param name="SourceLineEnd">End line in source.</param>
/// <param name="Confidence">Correlation confidence (0.0-1.0).</param>
/// <param name="Evidence">Evidence supporting the correlation.</param>
public sealed record SourceCorrelation(
long BinaryOffset,
int BinarySize,
string? FunctionName,
string SourcePackage,
string SourceVersion,
string SourceFile,
string SourceFunction,
int SourceLineStart,
int SourceLineEnd,
float Confidence,
CorrelationEvidence Evidence)
{
/// <summary>
/// Number of source lines covered.
/// </summary>
public int SourceLineCount => SourceLineEnd - SourceLineStart + 1;
/// <summary>
/// Whether this is a high-confidence correlation.
/// </summary>
public bool IsHighConfidence => Confidence >= 0.9f;
/// <summary>
/// Gets a source location string.
/// </summary>
public string SourceLocation => $"{SourceFile}:{SourceLineStart}-{SourceLineEnd}";
}
/// <summary>
/// Evidence types supporting source correlation.
/// </summary>
[Flags]
public enum CorrelationEvidence
{
/// <summary>
/// No evidence.
/// </summary>
None = 0,
/// <summary>
/// Matched via fingerprint.
/// </summary>
FingerprintMatch = 1 << 0,
/// <summary>
/// Matched via string constants.
/// </summary>
StringMatch = 1 << 1,
/// <summary>
/// Matched via symbol names.
/// </summary>
SymbolMatch = 1 << 2,
/// <summary>
/// Matched via build ID/debug link.
/// </summary>
BuildIdMatch = 1 << 3,
/// <summary>
/// Matched via source path in debug info.
/// </summary>
DebugPathMatch = 1 << 4,
/// <summary>
/// Matched via import/export correlation.
/// </summary>
ImportExportMatch = 1 << 5,
/// <summary>
/// Matched via structural similarity.
/// </summary>
StructuralMatch = 1 << 6
}
/// <summary>
/// Extension methods for CorrelationEvidence.
/// </summary>
public static class CorrelationEvidenceExtensions
{
/// <summary>
/// Gets a human-readable description of the evidence.
/// </summary>
public static string ToDescription(this CorrelationEvidence evidence)
{
if (evidence == CorrelationEvidence.None)
{
return "No evidence";
}
var parts = new List<string>();
if (evidence.HasFlag(CorrelationEvidence.FingerprintMatch))
parts.Add("fingerprint");
if (evidence.HasFlag(CorrelationEvidence.StringMatch))
parts.Add("strings");
if (evidence.HasFlag(CorrelationEvidence.SymbolMatch))
parts.Add("symbols");
if (evidence.HasFlag(CorrelationEvidence.BuildIdMatch))
parts.Add("build-id");
if (evidence.HasFlag(CorrelationEvidence.DebugPathMatch))
parts.Add("debug-path");
if (evidence.HasFlag(CorrelationEvidence.ImportExportMatch))
parts.Add("imports");
if (evidence.HasFlag(CorrelationEvidence.StructuralMatch))
parts.Add("structure");
return string.Join(", ", parts);
}
/// <summary>
/// Counts the number of evidence types present.
/// </summary>
public static int EvidenceCount(this CorrelationEvidence evidence)
=> BitOperations.PopCount((uint)evidence);
/// <summary>
/// Whether multiple evidence types are present.
/// </summary>
public static bool HasMultipleEvidence(this CorrelationEvidence evidence)
=> evidence.EvidenceCount() > 1;
}

View File

@@ -0,0 +1,227 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Binary;
/// <summary>
/// Matches binary functions against known-vulnerable function signatures.
/// </summary>
public sealed class VulnerableFunctionMatcher
{
private readonly IFingerprintIndex _index;
private readonly VulnerableMatcherOptions _options;
/// <summary>
/// Creates a new vulnerable function matcher.
/// </summary>
public VulnerableFunctionMatcher(
IFingerprintIndex index,
VulnerableMatcherOptions? options = null)
{
_index = index;
_options = options ?? VulnerableMatcherOptions.Default;
}
/// <summary>
/// Matches functions against known vulnerabilities.
/// </summary>
/// <param name="functions">Functions to check.</param>
/// <param name="fingerprints">Pre-computed fingerprints by offset.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Vulnerable function matches.</returns>
public async Task<ImmutableArray<VulnerableFunctionMatch>> MatchAsync(
IReadOnlyList<FunctionSignature> functions,
IDictionary<long, CodeFingerprint> fingerprints,
CancellationToken cancellationToken = default)
{
var matches = new List<VulnerableFunctionMatch>();
foreach (var function in functions)
{
cancellationToken.ThrowIfCancellationRequested();
if (!fingerprints.TryGetValue(function.Offset, out var fingerprint))
{
continue;
}
var indexMatches = await _index.LookupAsync(fingerprint, cancellationToken);
foreach (var indexMatch in indexMatches)
{
// Only process matches with vulnerabilities
if (!indexMatch.HasVulnerabilities)
{
continue;
}
// Check confidence threshold
if (indexMatch.Similarity < _options.MinMatchConfidence)
{
continue;
}
// Create a match for each vulnerability
foreach (var vulnId in indexMatch.VulnerabilityIds)
{
var severity = InferSeverity(vulnId);
// Filter by minimum severity
if (severity < _options.MinSeverity)
{
continue;
}
var match = new VulnerableFunctionMatch(
FunctionOffset: function.Offset,
FunctionName: function.Name,
VulnerabilityId: vulnId,
SourcePackage: indexMatch.SourcePackage,
VulnerableVersions: indexMatch.SourceVersion,
VulnerableFunctionName: indexMatch.FunctionName,
MatchConfidence: indexMatch.Similarity,
MatchEvidence: CorrelationEvidence.FingerprintMatch,
Severity: severity);
matches.Add(match);
}
}
}
// Deduplicate and sort by severity
return matches
.GroupBy(m => (m.FunctionOffset, m.VulnerabilityId))
.Select(g => g.OrderByDescending(m => m.MatchConfidence).First())
.OrderByDescending(m => m.Severity)
.ThenByDescending(m => m.MatchConfidence)
.ToImmutableArray();
}
/// <summary>
/// Matches a single function against known vulnerabilities.
/// </summary>
public async Task<ImmutableArray<VulnerableFunctionMatch>> MatchSingleAsync(
FunctionSignature function,
CodeFingerprint fingerprint,
CancellationToken cancellationToken = default)
{
var fingerprints = new Dictionary<long, CodeFingerprint>
{
[function.Offset] = fingerprint
};
return await MatchAsync(new[] { function }, fingerprints, cancellationToken);
}
/// <summary>
/// Infers severity from vulnerability ID patterns.
/// </summary>
private static VulnerabilitySeverity InferSeverity(string vulnerabilityId)
{
// This is a simplified heuristic - in production, query the vulnerability database
var upper = vulnerabilityId.ToUpperInvariant();
// Known critical vulnerabilities
if (upper.Contains("LOG4J") || upper.Contains("HEARTBLEED") || upper.Contains("SHELLSHOCK"))
{
return VulnerabilitySeverity.Critical;
}
// CVE prefix - would normally look up CVSS score
if (upper.StartsWith("CVE-"))
{
// Default to Medium for unknown CVEs
return VulnerabilitySeverity.Medium;
}
// GHSA prefix (GitHub Security Advisory)
if (upper.StartsWith("GHSA-"))
{
return VulnerabilitySeverity.Medium;
}
return VulnerabilitySeverity.Unknown;
}
/// <summary>
/// Registers a vulnerable function in the index.
/// </summary>
public async Task<bool> RegisterVulnerableAsync(
CodeFingerprint fingerprint,
string functionName,
string sourcePackage,
string sourceVersion,
string vulnerabilityId,
VulnerabilitySeverity severity,
CancellationToken cancellationToken = default)
{
var entry = new FingerprintMatch(
Fingerprint: fingerprint,
FunctionName: functionName,
SourcePackage: sourcePackage,
SourceVersion: sourceVersion,
SourceFile: null,
SourceLine: null,
VulnerabilityIds: ImmutableArray.Create(vulnerabilityId),
Similarity: 1.0f,
MatchedAt: DateTimeOffset.UtcNow);
return await _index.AddAsync(entry, cancellationToken);
}
/// <summary>
/// Bulk registers vulnerable functions.
/// </summary>
public async Task<int> RegisterVulnerableBatchAsync(
IEnumerable<(CodeFingerprint Fingerprint, string FunctionName, string Package, string Version, string VulnId)> entries,
CancellationToken cancellationToken = default)
{
var count = 0;
foreach (var (fingerprint, functionName, package, version, vulnId) in entries)
{
cancellationToken.ThrowIfCancellationRequested();
if (await RegisterVulnerableAsync(fingerprint, functionName, package, version, vulnId,
VulnerabilitySeverity.Unknown, cancellationToken))
{
count++;
}
}
return count;
}
}
/// <summary>
/// Options for vulnerable function matching.
/// </summary>
/// <param name="MinMatchConfidence">Minimum fingerprint match confidence.</param>
/// <param name="MinSeverity">Minimum severity to report.</param>
/// <param name="IncludeUnknownSeverity">Whether to include unknown severity matches.</param>
public sealed record VulnerableMatcherOptions(
float MinMatchConfidence = 0.85f,
VulnerabilitySeverity MinSeverity = VulnerabilitySeverity.Low,
bool IncludeUnknownSeverity = true)
{
/// <summary>
/// Default options.
/// </summary>
public static VulnerableMatcherOptions Default => new();
/// <summary>
/// High-confidence only options.
/// </summary>
public static VulnerableMatcherOptions HighConfidence => new(
MinMatchConfidence: 0.95f,
MinSeverity: VulnerabilitySeverity.Medium);
/// <summary>
/// Critical-only options.
/// </summary>
public static VulnerableMatcherOptions CriticalOnly => new(
MinMatchConfidence: 0.90f,
MinSeverity: VulnerabilitySeverity.Critical,
IncludeUnknownSeverity: false);
}

View File

@@ -0,0 +1,430 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Risk;
/// <summary>
/// Composite risk scorer that combines multiple contributors.
/// </summary>
public sealed class CompositeRiskScorer : IRiskScorer
{
private readonly ImmutableArray<IRiskContributor> _contributors;
private readonly CompositeRiskScorerOptions _options;
/// <summary>
/// Creates a composite scorer with default contributors.
/// </summary>
public CompositeRiskScorer(CompositeRiskScorerOptions? options = null)
: this(GetDefaultContributors(), options)
{
}
/// <summary>
/// Creates a composite scorer with custom contributors.
/// </summary>
public CompositeRiskScorer(
IEnumerable<IRiskContributor> contributors,
CompositeRiskScorerOptions? options = null)
{
_contributors = contributors.ToImmutableArray();
_options = options ?? CompositeRiskScorerOptions.Default;
}
/// <inheritdoc/>
public ImmutableArray<string> ContributedFactors => _contributors
.Select(c => c.Name)
.ToImmutableArray();
/// <inheritdoc/>
public async Task<RiskAssessment> AssessAsync(
RiskContext context,
BusinessContext? businessContext = null,
CancellationToken cancellationToken = default)
{
var allFactors = new List<RiskFactor>();
// Collect factors from all contributors
foreach (var contributor in _contributors)
{
cancellationToken.ThrowIfCancellationRequested();
var factors = await contributor.ComputeFactorsAsync(context, cancellationToken);
allFactors.AddRange(factors);
}
// Compute overall score
var overallScore = ComputeOverallScore(allFactors, businessContext);
// Generate recommendations
var recommendations = GenerateRecommendations(allFactors, overallScore);
return new RiskAssessment(
SubjectId: context.SubjectId,
SubjectType: context.SubjectType,
OverallScore: overallScore,
Factors: allFactors.ToImmutableArray(),
BusinessContext: businessContext,
Recommendations: recommendations,
AssessedAt: DateTimeOffset.UtcNow);
}
private RiskScore ComputeOverallScore(
IReadOnlyList<RiskFactor> factors,
BusinessContext? businessContext)
{
if (factors.Count == 0)
{
return RiskScore.Zero;
}
// Weighted average of factor contributions
var totalWeight = factors.Sum(f => f.Weight);
var weightedSum = factors.Sum(f => f.Contribution);
var baseScore = totalWeight > 0 ? weightedSum / totalWeight : 0;
// Apply business context multiplier
if (businessContext is not null)
{
baseScore *= businessContext.RiskMultiplier;
}
// Clamp to [0, 1]
baseScore = Math.Clamp(baseScore, 0, 1);
// Determine primary category
var primaryCategory = factors
.GroupBy(f => f.Category)
.OrderByDescending(g => g.Sum(f => f.Contribution))
.FirstOrDefault()?.Key ?? RiskCategory.Unknown;
// Compute confidence based on data availability
var confidence = ComputeConfidence(factors);
return new RiskScore(
OverallScore: baseScore,
Category: primaryCategory,
Confidence: confidence,
ComputedAt: DateTimeOffset.UtcNow);
}
private float ComputeConfidence(IReadOnlyList<RiskFactor> factors)
{
if (factors.Count == 0)
{
return 0.1f; // Very low confidence with no data
}
// More factors = more confidence (up to a point)
var factorBonus = Math.Min(factors.Count / 20.0f, 0.3f);
// Multiple categories = more comprehensive view
var categoryCount = factors.Select(f => f.Category).Distinct().Count();
var categoryBonus = Math.Min(categoryCount / 5.0f, 0.2f);
// High-weight factors boost confidence
var highWeightCount = factors.Count(f => f.Weight >= 0.3f);
var weightBonus = Math.Min(highWeightCount / 10.0f, 0.2f);
return Math.Min(0.3f + factorBonus + categoryBonus + weightBonus, 1.0f);
}
private ImmutableArray<string> GenerateRecommendations(
IReadOnlyList<RiskFactor> factors,
RiskScore score)
{
var recommendations = new List<string>();
// Get top contributing factors
var topFactors = factors
.OrderByDescending(f => f.Contribution)
.Take(5)
.ToList();
foreach (var factor in topFactors)
{
var recommendation = factor.Category switch
{
RiskCategory.Exploitability when factor.SourceId?.StartsWith("CVE") == true
=> $"Patch or mitigate {factor.SourceId} - {factor.Evidence}",
RiskCategory.Exposure
=> $"Review network exposure - {factor.Evidence}",
RiskCategory.Privilege
=> $"Review privilege levels - {factor.Evidence}",
RiskCategory.BlastRadius
=> $"Consider service isolation - {factor.Evidence}",
RiskCategory.DriftVelocity
=> $"Investigate recent changes - {factor.Evidence}",
RiskCategory.SupplyChain
=> $"Verify supply chain integrity - {factor.Evidence}",
_ => null
};
if (recommendation is not null && !recommendations.Contains(recommendation))
{
recommendations.Add(recommendation);
}
}
// Add general recommendations based on score level
if (score.Level >= RiskLevel.Critical)
{
recommendations.Insert(0, "CRITICAL: Immediate action required - consider taking service offline");
}
else if (score.Level >= RiskLevel.High)
{
recommendations.Insert(0, "HIGH PRIORITY: Schedule remediation within 24-48 hours");
}
return recommendations.Take(_options.MaxRecommendations).ToImmutableArray();
}
private static ImmutableArray<IRiskContributor> GetDefaultContributors()
{
return ImmutableArray.Create<IRiskContributor>(
new VulnerabilityRiskContributor(),
new BinaryRiskContributor(),
new MeshRiskContributor(),
new SemanticRiskContributor(),
new TemporalRiskContributor());
}
}
/// <summary>
/// Options for composite risk scoring.
/// </summary>
/// <param name="MaxRecommendations">Maximum recommendations to generate.</param>
/// <param name="MinFactorContribution">Minimum contribution to include a factor.</param>
public sealed record CompositeRiskScorerOptions(
int MaxRecommendations = 10,
float MinFactorContribution = 0.01f)
{
/// <summary>
/// Default options.
/// </summary>
public static CompositeRiskScorerOptions Default => new();
}
/// <summary>
/// Generates human-readable risk explanations.
/// </summary>
public sealed class RiskExplainer
{
/// <summary>
/// Generates a summary explanation for a risk assessment.
/// </summary>
public string ExplainSummary(RiskAssessment assessment)
{
var level = assessment.OverallScore.Level;
var category = assessment.OverallScore.Category;
var confidence = assessment.OverallScore.Confidence;
var summary = level switch
{
RiskLevel.Critical => $"CRITICAL RISK: This {assessment.SubjectType.ToString().ToLowerInvariant()} requires immediate attention.",
RiskLevel.High => $"HIGH RISK: This {assessment.SubjectType.ToString().ToLowerInvariant()} should be prioritized for remediation.",
RiskLevel.Medium => $"MEDIUM RISK: This {assessment.SubjectType.ToString().ToLowerInvariant()} has elevated risk that should be addressed.",
RiskLevel.Low => $"LOW RISK: This {assessment.SubjectType.ToString().ToLowerInvariant()} has minimal risk but should be monitored.",
_ => $"NEGLIGIBLE RISK: This {assessment.SubjectType.ToString().ToLowerInvariant()} appears safe."
};
summary += $" Primary concern: {CategoryToString(category)}.";
if (confidence < 0.5f)
{
summary += " Note: Assessment confidence is low due to limited data.";
}
return summary;
}
/// <summary>
/// Generates detailed factor explanations.
/// </summary>
public ImmutableArray<string> ExplainFactors(RiskAssessment assessment)
{
return assessment.TopFactors
.Select(f => $"[{f.Category}] {f.Evidence} (contribution: {f.Contribution:P0})")
.ToImmutableArray();
}
/// <summary>
/// Generates a structured report.
/// </summary>
public RiskReport GenerateReport(RiskAssessment assessment)
{
return new RiskReport(
SubjectId: assessment.SubjectId,
Summary: ExplainSummary(assessment),
Level: assessment.OverallScore.Level,
Score: assessment.OverallScore.OverallScore,
Confidence: assessment.OverallScore.Confidence,
TopFactors: ExplainFactors(assessment),
Recommendations: assessment.Recommendations,
GeneratedAt: DateTimeOffset.UtcNow);
}
private static string CategoryToString(RiskCategory category) => category switch
{
RiskCategory.Exploitability => "known vulnerability exploitation",
RiskCategory.Exposure => "network exposure",
RiskCategory.Privilege => "elevated privileges",
RiskCategory.DataSensitivity => "data sensitivity",
RiskCategory.BlastRadius => "potential blast radius",
RiskCategory.DriftVelocity => "rapid configuration changes",
RiskCategory.Misconfiguration => "misconfiguration",
RiskCategory.SupplyChain => "supply chain concerns",
RiskCategory.CryptoWeakness => "cryptographic weakness",
RiskCategory.AuthWeakness => "authentication weakness",
_ => "unknown factors"
};
}
/// <summary>
/// Human-readable risk report.
/// </summary>
/// <param name="SubjectId">Subject identifier.</param>
/// <param name="Summary">Executive summary.</param>
/// <param name="Level">Risk level.</param>
/// <param name="Score">Numeric score.</param>
/// <param name="Confidence">Confidence level.</param>
/// <param name="TopFactors">Key contributing factors.</param>
/// <param name="Recommendations">Actionable recommendations.</param>
/// <param name="GeneratedAt">Report generation time.</param>
public sealed record RiskReport(
string SubjectId,
string Summary,
RiskLevel Level,
float Score,
float Confidence,
ImmutableArray<string> TopFactors,
ImmutableArray<string> Recommendations,
DateTimeOffset GeneratedAt);
/// <summary>
/// Aggregates risk across multiple subjects for fleet-level views.
/// </summary>
public sealed class RiskAggregator
{
/// <summary>
/// Aggregates assessments for a fleet-level view.
/// </summary>
public FleetRiskSummary Aggregate(IEnumerable<RiskAssessment> assessments)
{
var assessmentList = assessments.ToList();
if (assessmentList.Count == 0)
{
return FleetRiskSummary.Empty;
}
var distribution = assessmentList
.GroupBy(a => a.OverallScore.Level)
.ToDictionary(g => g.Key, g => g.Count());
var categoryBreakdown = assessmentList
.GroupBy(a => a.OverallScore.Category)
.ToDictionary(g => g.Key, g => g.Count());
var topRisks = assessmentList
.OrderByDescending(a => a.OverallScore.OverallScore)
.Take(10)
.Select(a => new RiskSummaryItem(a.SubjectId, a.OverallScore.OverallScore, a.OverallScore.Level))
.ToImmutableArray();
var avgScore = assessmentList.Average(a => a.OverallScore.OverallScore);
var avgConfidence = assessmentList.Average(a => a.OverallScore.Confidence);
return new FleetRiskSummary(
TotalSubjects: assessmentList.Count,
AverageScore: avgScore,
AverageConfidence: avgConfidence,
Distribution: distribution.ToImmutableDictionary(),
CategoryBreakdown: categoryBreakdown.ToImmutableDictionary(),
TopRisks: topRisks,
AggregatedAt: DateTimeOffset.UtcNow);
}
}
/// <summary>
/// Fleet-level risk summary.
/// </summary>
/// <param name="TotalSubjects">Total subjects assessed.</param>
/// <param name="AverageScore">Average risk score.</param>
/// <param name="AverageConfidence">Average confidence.</param>
/// <param name="Distribution">Distribution by risk level.</param>
/// <param name="CategoryBreakdown">Breakdown by category.</param>
/// <param name="TopRisks">Highest risk subjects.</param>
/// <param name="AggregatedAt">Aggregation time.</param>
public sealed record FleetRiskSummary(
int TotalSubjects,
float AverageScore,
float AverageConfidence,
ImmutableDictionary<RiskLevel, int> Distribution,
ImmutableDictionary<RiskCategory, int> CategoryBreakdown,
ImmutableArray<RiskSummaryItem> TopRisks,
DateTimeOffset AggregatedAt)
{
/// <summary>
/// Empty summary.
/// </summary>
public static FleetRiskSummary Empty => new(
TotalSubjects: 0,
AverageScore: 0,
AverageConfidence: 0,
Distribution: ImmutableDictionary<RiskLevel, int>.Empty,
CategoryBreakdown: ImmutableDictionary<RiskCategory, int>.Empty,
TopRisks: ImmutableArray<RiskSummaryItem>.Empty,
AggregatedAt: DateTimeOffset.UtcNow);
/// <summary>
/// Count of critical/high risk subjects.
/// </summary>
public int CriticalAndHighCount =>
Distribution.GetValueOrDefault(RiskLevel.Critical) +
Distribution.GetValueOrDefault(RiskLevel.High);
/// <summary>
/// Percentage of subjects at elevated risk.
/// </summary>
public float ElevatedRiskPercentage =>
TotalSubjects > 0 ? CriticalAndHighCount / (float)TotalSubjects : 0;
}
/// <summary>
/// Summary item for a single subject.
/// </summary>
/// <param name="SubjectId">Subject identifier.</param>
/// <param name="Score">Risk score.</param>
/// <param name="Level">Risk level.</param>
public sealed record RiskSummaryItem(string SubjectId, float Score, RiskLevel Level);
/// <summary>
/// Complete entrypoint risk report combining all intelligence.
/// </summary>
/// <param name="Assessment">Full risk assessment.</param>
/// <param name="Report">Human-readable report.</param>
/// <param name="Trend">Historical trend if available.</param>
/// <param name="ComparableSubjects">Similar subjects for context.</param>
public sealed record EntrypointRiskReport(
RiskAssessment Assessment,
RiskReport Report,
RiskTrend? Trend,
ImmutableArray<RiskSummaryItem> ComparableSubjects)
{
/// <summary>
/// Creates a basic report without trend or comparables.
/// </summary>
public static EntrypointRiskReport Basic(RiskAssessment assessment, RiskExplainer explainer) => new(
Assessment: assessment,
Report: explainer.GenerateReport(assessment),
Trend: null,
ComparableSubjects: ImmutableArray<RiskSummaryItem>.Empty);
}

View File

@@ -0,0 +1,484 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using StellaOps.Scanner.EntryTrace.Binary;
using StellaOps.Scanner.EntryTrace.Mesh;
using StellaOps.Scanner.EntryTrace.Semantic;
using StellaOps.Scanner.EntryTrace.Temporal;
namespace StellaOps.Scanner.EntryTrace.Risk;
/// <summary>
/// Interface for computing risk scores.
/// </summary>
public interface IRiskScorer
{
/// <summary>
/// Computes a risk assessment for the given subject.
/// </summary>
/// <param name="context">Risk context with all available intelligence.</param>
/// <param name="businessContext">Optional business context for weighting.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Complete risk assessment.</returns>
Task<RiskAssessment> AssessAsync(
RiskContext context,
BusinessContext? businessContext = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets the factors this scorer contributes.
/// </summary>
ImmutableArray<string> ContributedFactors { get; }
}
/// <summary>
/// Interface for a risk contributor that provides specific factors.
/// </summary>
public interface IRiskContributor
{
/// <summary>
/// Computes risk factors from the context.
/// </summary>
/// <param name="context">Risk context.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Contributing factors.</returns>
Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default);
/// <summary>
/// Name of this contributor.
/// </summary>
string Name { get; }
/// <summary>
/// Default weight for factors from this contributor.
/// </summary>
float DefaultWeight { get; }
}
/// <summary>
/// Context for risk assessment containing all available intelligence.
/// </summary>
/// <param name="SubjectId">Subject identifier.</param>
/// <param name="SubjectType">Type of subject.</param>
/// <param name="SemanticEntrypoints">Semantic entrypoint data.</param>
/// <param name="TemporalGraph">Temporal drift data.</param>
/// <param name="MeshGraph">Service mesh data.</param>
/// <param name="BinaryAnalysis">Binary intelligence data.</param>
/// <param name="KnownVulnerabilities">Known CVEs affecting the subject.</param>
public sealed record RiskContext(
string SubjectId,
SubjectType SubjectType,
ImmutableArray<SemanticEntrypoint> SemanticEntrypoints,
TemporalEntrypointGraph? TemporalGraph,
MeshEntrypointGraph? MeshGraph,
BinaryAnalysisResult? BinaryAnalysis,
ImmutableArray<VulnerabilityReference> KnownVulnerabilities)
{
/// <summary>
/// Creates an empty context.
/// </summary>
public static RiskContext Empty(string subjectId, SubjectType subjectType) => new(
SubjectId: subjectId,
SubjectType: subjectType,
SemanticEntrypoints: ImmutableArray<SemanticEntrypoint>.Empty,
TemporalGraph: null,
MeshGraph: null,
BinaryAnalysis: null,
KnownVulnerabilities: ImmutableArray<VulnerabilityReference>.Empty);
/// <summary>
/// Whether semantic data is available.
/// </summary>
public bool HasSemanticData => !SemanticEntrypoints.IsEmpty;
/// <summary>
/// Whether temporal data is available.
/// </summary>
public bool HasTemporalData => TemporalGraph is not null;
/// <summary>
/// Whether mesh data is available.
/// </summary>
public bool HasMeshData => MeshGraph is not null;
/// <summary>
/// Whether binary data is available.
/// </summary>
public bool HasBinaryData => BinaryAnalysis is not null;
/// <summary>
/// Whether vulnerability data is available.
/// </summary>
public bool HasVulnerabilityData => !KnownVulnerabilities.IsEmpty;
}
/// <summary>
/// Reference to a known vulnerability.
/// </summary>
/// <param name="VulnerabilityId">CVE or advisory ID.</param>
/// <param name="Severity">CVSS-based severity.</param>
/// <param name="CvssScore">CVSS score if known.</param>
/// <param name="ExploitAvailable">Whether an exploit is publicly available.</param>
/// <param name="AffectedPackage">PURL of affected package.</param>
/// <param name="FixedVersion">Version where fix is available.</param>
public sealed record VulnerabilityReference(
string VulnerabilityId,
VulnerabilitySeverity Severity,
float? CvssScore,
bool ExploitAvailable,
string AffectedPackage,
string? FixedVersion)
{
/// <summary>
/// Whether a fix is available.
/// </summary>
public bool HasFix => FixedVersion is not null;
/// <summary>
/// Whether this is a critical vulnerability.
/// </summary>
public bool IsCritical => Severity == VulnerabilitySeverity.Critical;
/// <summary>
/// Whether this is actively exploitable.
/// </summary>
public bool IsActivelyExploitable => ExploitAvailable && Severity >= VulnerabilitySeverity.High;
}
/// <summary>
/// Semantic risk contributor based on entrypoint intent and capabilities.
/// </summary>
public sealed class SemanticRiskContributor : IRiskContributor
{
/// <inheritdoc/>
public string Name => "Semantic";
/// <inheritdoc/>
public float DefaultWeight => 0.2f;
/// <inheritdoc/>
public Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default)
{
if (!context.HasSemanticData)
{
return Task.FromResult(ImmutableArray<RiskFactor>.Empty);
}
var factors = new List<RiskFactor>();
foreach (var entrypoint in context.SemanticEntrypoints)
{
var entrypointPath = entrypoint.Specification.Entrypoint.FirstOrDefault() ?? entrypoint.Id;
// Network exposure
if (entrypoint.Capabilities.HasFlag(CapabilityClass.NetworkListen))
{
factors.Add(new RiskFactor(
Name: "NetworkListen",
Category: RiskCategory.Exposure,
Score: 0.6f,
Weight: DefaultWeight,
Evidence: $"Entrypoint {entrypointPath} listens on network",
SourceId: entrypointPath));
}
// Privilege concerns
if (entrypoint.Capabilities.HasFlag(CapabilityClass.ProcessSpawn) &&
entrypoint.Capabilities.HasFlag(CapabilityClass.FileWrite))
{
factors.Add(new RiskFactor(
Name: "ProcessSpawnWithFileWrite",
Category: RiskCategory.Privilege,
Score: 0.7f,
Weight: DefaultWeight,
Evidence: $"Entrypoint {entrypointPath} can spawn processes and write files",
SourceId: entrypointPath));
}
// Threat vectors
foreach (var threat in entrypoint.AttackSurface)
{
var score = threat.Type switch
{
ThreatVectorType.CommandInjection => 0.9f,
ThreatVectorType.Rce => 0.85f,
ThreatVectorType.PathTraversal => 0.7f,
ThreatVectorType.Ssrf => 0.6f,
ThreatVectorType.InformationDisclosure => 0.5f,
_ => 0.5f
};
factors.Add(new RiskFactor(
Name: $"ThreatVector_{threat.Type}",
Category: RiskCategory.Exploitability,
Score: score * (float)threat.Confidence,
Weight: DefaultWeight,
Evidence: $"Threat vector {threat.Type} identified in {entrypointPath}",
SourceId: entrypointPath));
}
}
return Task.FromResult(factors.ToImmutableArray());
}
}
/// <summary>
/// Temporal risk contributor based on drift patterns.
/// </summary>
public sealed class TemporalRiskContributor : IRiskContributor
{
/// <inheritdoc/>
public string Name => "Temporal";
/// <inheritdoc/>
public float DefaultWeight => 0.15f;
/// <inheritdoc/>
public Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default)
{
if (!context.HasTemporalData)
{
return Task.FromResult(ImmutableArray<RiskFactor>.Empty);
}
var graph = context.TemporalGraph!;
var factors = new List<RiskFactor>();
// Check current delta for concerning drift
var delta = graph.Delta;
if (delta is not null)
{
foreach (var drift in delta.DriftCategories)
{
if (drift.HasFlag(EntrypointDrift.AttackSurfaceGrew))
{
factors.Add(new RiskFactor(
Name: "AttackSurfaceGrowth",
Category: RiskCategory.DriftVelocity,
Score: 0.7f,
Weight: DefaultWeight,
Evidence: $"Attack surface grew between versions {graph.PreviousVersion} and {graph.CurrentVersion}",
SourceId: graph.CurrentVersion));
}
if (drift.HasFlag(EntrypointDrift.PrivilegeEscalation))
{
factors.Add(new RiskFactor(
Name: "PrivilegeEscalation",
Category: RiskCategory.Privilege,
Score: 0.85f,
Weight: DefaultWeight,
Evidence: $"Privilege escalation detected between versions {graph.PreviousVersion} and {graph.CurrentVersion}",
SourceId: graph.CurrentVersion));
}
if (drift.HasFlag(EntrypointDrift.CapabilitiesExpanded))
{
factors.Add(new RiskFactor(
Name: "CapabilitiesExpanded",
Category: RiskCategory.DriftVelocity,
Score: 0.5f,
Weight: DefaultWeight,
Evidence: $"Capabilities expanded between versions {graph.PreviousVersion} and {graph.CurrentVersion}",
SourceId: graph.CurrentVersion));
}
}
}
return Task.FromResult(factors.ToImmutableArray());
}
}
/// <summary>
/// Mesh risk contributor based on service exposure and blast radius.
/// </summary>
public sealed class MeshRiskContributor : IRiskContributor
{
/// <inheritdoc/>
public string Name => "Mesh";
/// <inheritdoc/>
public float DefaultWeight => 0.25f;
/// <inheritdoc/>
public Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default)
{
if (!context.HasMeshData)
{
return Task.FromResult(ImmutableArray<RiskFactor>.Empty);
}
var graph = context.MeshGraph!;
var factors = new List<RiskFactor>();
// Internet exposure via ingress
if (!graph.IngressPaths.IsEmpty)
{
factors.Add(new RiskFactor(
Name: "InternetExposure",
Category: RiskCategory.Exposure,
Score: Math.Min(0.5f + (graph.IngressPaths.Length * 0.1f), 0.95f),
Weight: DefaultWeight,
Evidence: $"{graph.IngressPaths.Length} ingress paths expose services to internet",
SourceId: null));
}
// Blast radius analysis
var blastRadius = graph.Services.SelectMany(s =>
graph.Edges.Where(e => e.FromServiceId == s.ServiceId)).Count();
if (blastRadius > 5)
{
factors.Add(new RiskFactor(
Name: "HighBlastRadius",
Category: RiskCategory.BlastRadius,
Score: Math.Min(0.4f + (blastRadius * 0.05f), 0.9f),
Weight: DefaultWeight,
Evidence: $"Service has {blastRadius} downstream dependencies",
SourceId: null));
}
// Services with vulnerable components
var vulnServices = graph.Services.Count(s => !s.VulnerableComponents.IsEmpty);
if (vulnServices > 0)
{
var maxVulns = graph.Services.Max(s => s.VulnerableComponents.Length);
factors.Add(new RiskFactor(
Name: "VulnerableServices",
Category: RiskCategory.Exploitability,
Score: Math.Min(0.5f + (maxVulns * 0.1f), 0.95f),
Weight: DefaultWeight,
Evidence: $"{vulnServices} services have vulnerable components (max {maxVulns} per service)",
SourceId: null));
}
return Task.FromResult(factors.ToImmutableArray());
}
}
/// <summary>
/// Binary risk contributor based on vulnerable function matches.
/// </summary>
public sealed class BinaryRiskContributor : IRiskContributor
{
/// <inheritdoc/>
public string Name => "Binary";
/// <inheritdoc/>
public float DefaultWeight => 0.3f;
/// <inheritdoc/>
public Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default)
{
if (!context.HasBinaryData)
{
return Task.FromResult(ImmutableArray<RiskFactor>.Empty);
}
var analysis = context.BinaryAnalysis!;
var factors = new List<RiskFactor>();
// Vulnerable function matches
foreach (var match in analysis.VulnerableMatches)
{
var score = match.Severity switch
{
VulnerabilitySeverity.Critical => 0.95f,
VulnerabilitySeverity.High => 0.8f,
VulnerabilitySeverity.Medium => 0.5f,
VulnerabilitySeverity.Low => 0.3f,
_ => 0.4f
};
factors.Add(new RiskFactor(
Name: $"VulnerableFunction_{match.VulnerabilityId}",
Category: RiskCategory.Exploitability,
Score: score * match.MatchConfidence,
Weight: DefaultWeight,
Evidence: $"Binary contains function {match.VulnerableFunctionName} vulnerable to {match.VulnerabilityId}",
SourceId: match.VulnerabilityId));
}
// High proportion of stripped/unrecovered symbols is suspicious
var strippedRatio = analysis.Functions.Count(f => !f.HasSymbols) / (float)Math.Max(1, analysis.Functions.Length);
if (strippedRatio > 0.8f && analysis.Functions.Length > 20)
{
factors.Add(new RiskFactor(
Name: "HighlyStrippedBinary",
Category: RiskCategory.SupplyChain,
Score: 0.3f,
Weight: DefaultWeight * 0.5f,
Evidence: $"{strippedRatio:P0} of functions are stripped (may indicate tampering or obfuscation)",
SourceId: null));
}
return Task.FromResult(factors.ToImmutableArray());
}
}
/// <summary>
/// Vulnerability-based risk contributor.
/// </summary>
public sealed class VulnerabilityRiskContributor : IRiskContributor
{
/// <inheritdoc/>
public string Name => "Vulnerability";
/// <inheritdoc/>
public float DefaultWeight => 0.4f;
/// <inheritdoc/>
public Task<ImmutableArray<RiskFactor>> ComputeFactorsAsync(
RiskContext context,
CancellationToken cancellationToken = default)
{
if (!context.HasVulnerabilityData)
{
return Task.FromResult(ImmutableArray<RiskFactor>.Empty);
}
var factors = new List<RiskFactor>();
foreach (var vuln in context.KnownVulnerabilities)
{
var score = vuln.CvssScore.HasValue
? vuln.CvssScore.Value / 10.0f
: vuln.Severity switch
{
VulnerabilitySeverity.Critical => 0.95f,
VulnerabilitySeverity.High => 0.75f,
VulnerabilitySeverity.Medium => 0.5f,
VulnerabilitySeverity.Low => 0.25f,
_ => 0.4f
};
// Boost score if exploit is available
if (vuln.ExploitAvailable)
{
score = Math.Min(score * 1.3f, 1.0f);
}
factors.Add(new RiskFactor(
Name: $"CVE_{vuln.VulnerabilityId}",
Category: RiskCategory.Exploitability,
Score: score,
Weight: DefaultWeight,
Evidence: vuln.ExploitAvailable
? $"CVE {vuln.VulnerabilityId} in {vuln.AffectedPackage} with known exploit"
: $"CVE {vuln.VulnerabilityId} in {vuln.AffectedPackage}",
SourceId: vuln.VulnerabilityId));
}
return Task.FromResult(factors.ToImmutableArray());
}
}

View File

@@ -0,0 +1,448 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Risk;
/// <summary>
/// Multi-dimensional risk score with category and confidence.
/// </summary>
/// <param name="OverallScore">Normalized risk score (0.0-1.0).</param>
/// <param name="Category">Primary risk category.</param>
/// <param name="Confidence">Confidence in the assessment (0.0-1.0).</param>
/// <param name="ComputedAt">When the score was computed.</param>
public sealed record RiskScore(
float OverallScore,
RiskCategory Category,
float Confidence,
DateTimeOffset ComputedAt)
{
/// <summary>
/// Creates a zero risk score.
/// </summary>
public static RiskScore Zero => new(0.0f, RiskCategory.Unknown, 1.0f, DateTimeOffset.UtcNow);
/// <summary>
/// Creates a critical risk score.
/// </summary>
public static RiskScore Critical(RiskCategory category, float confidence = 0.9f)
=> new(1.0f, category, confidence, DateTimeOffset.UtcNow);
/// <summary>
/// Creates a high risk score.
/// </summary>
public static RiskScore High(RiskCategory category, float confidence = 0.85f)
=> new(0.85f, category, confidence, DateTimeOffset.UtcNow);
/// <summary>
/// Creates a medium risk score.
/// </summary>
public static RiskScore Medium(RiskCategory category, float confidence = 0.8f)
=> new(0.5f, category, confidence, DateTimeOffset.UtcNow);
/// <summary>
/// Creates a low risk score.
/// </summary>
public static RiskScore Low(RiskCategory category, float confidence = 0.75f)
=> new(0.2f, category, confidence, DateTimeOffset.UtcNow);
/// <summary>
/// Descriptive risk level based on score.
/// </summary>
public RiskLevel Level => OverallScore switch
{
>= 0.9f => RiskLevel.Critical,
>= 0.7f => RiskLevel.High,
>= 0.4f => RiskLevel.Medium,
>= 0.1f => RiskLevel.Low,
_ => RiskLevel.Negligible
};
/// <summary>
/// Whether this score represents elevated risk.
/// </summary>
public bool IsElevated => OverallScore >= 0.4f;
/// <summary>
/// Whether the score has high confidence.
/// </summary>
public bool IsHighConfidence => Confidence >= 0.8f;
}
/// <summary>
/// Risk categories for classification.
/// </summary>
public enum RiskCategory
{
/// <summary>Insufficient data to categorize.</summary>
Unknown = 0,
/// <summary>Known CVE with exploit available.</summary>
Exploitability = 1,
/// <summary>Internet-facing, publicly reachable.</summary>
Exposure = 2,
/// <summary>Runs with elevated privileges.</summary>
Privilege = 3,
/// <summary>Accesses sensitive data.</summary>
DataSensitivity = 4,
/// <summary>Can affect many other services.</summary>
BlastRadius = 5,
/// <summary>Rapid changes indicate instability.</summary>
DriftVelocity = 6,
/// <summary>Configuration weakness.</summary>
Misconfiguration = 7,
/// <summary>Supply chain risk.</summary>
SupplyChain = 8,
/// <summary>Cryptographic weakness.</summary>
CryptoWeakness = 9,
/// <summary>Authentication/authorization issue.</summary>
AuthWeakness = 10,
}
/// <summary>
/// Human-readable risk level.
/// </summary>
public enum RiskLevel
{
/// <summary>Negligible risk, no action needed.</summary>
Negligible = 0,
/// <summary>Low risk, monitor but no immediate action.</summary>
Low = 1,
/// <summary>Medium risk, should be addressed in normal maintenance.</summary>
Medium = 2,
/// <summary>High risk, prioritize remediation.</summary>
High = 3,
/// <summary>Critical risk, immediate action required.</summary>
Critical = 4,
}
/// <summary>
/// Individual contributing factor to risk.
/// </summary>
/// <param name="Name">Factor identifier.</param>
/// <param name="Category">Risk category.</param>
/// <param name="Score">Factor-specific score (0.0-1.0).</param>
/// <param name="Weight">Weight in overall score (0.0-1.0).</param>
/// <param name="Evidence">Human-readable evidence.</param>
/// <param name="SourceId">Link to source data (CVE, drift, etc.).</param>
public sealed record RiskFactor(
string Name,
RiskCategory Category,
float Score,
float Weight,
string Evidence,
string? SourceId = null)
{
/// <summary>
/// Weighted contribution to overall score.
/// </summary>
public float Contribution => Score * Weight;
/// <summary>
/// Whether this is a significant contributor.
/// </summary>
public bool IsSignificant => Contribution >= 0.1f;
}
/// <summary>
/// Business context for risk weighting.
/// </summary>
/// <param name="Environment">Deployment environment (production, staging, dev).</param>
/// <param name="IsInternetFacing">Whether exposed to the internet.</param>
/// <param name="DataClassification">Data sensitivity level.</param>
/// <param name="CriticalityTier">Criticality tier (1=mission-critical, 3=best-effort).</param>
/// <param name="ComplianceRegimes">Applicable compliance regimes.</param>
/// <param name="TeamOwner">Team responsible for the service.</param>
public sealed record BusinessContext(
string Environment,
bool IsInternetFacing,
DataClassification DataClassification,
int CriticalityTier,
ImmutableArray<string> ComplianceRegimes,
string? TeamOwner = null)
{
/// <summary>
/// Default context for unknown business criticality.
/// </summary>
public static BusinessContext Unknown => new(
Environment: "unknown",
IsInternetFacing: false,
DataClassification: DataClassification.Unknown,
CriticalityTier: 3,
ComplianceRegimes: ImmutableArray<string>.Empty);
/// <summary>
/// Production internet-facing context.
/// </summary>
public static BusinessContext ProductionInternetFacing => new(
Environment: "production",
IsInternetFacing: true,
DataClassification: DataClassification.Internal,
CriticalityTier: 1,
ComplianceRegimes: ImmutableArray<string>.Empty);
/// <summary>
/// Development context with minimal risk weight.
/// </summary>
public static BusinessContext Development => new(
Environment: "development",
IsInternetFacing: false,
DataClassification: DataClassification.Public,
CriticalityTier: 3,
ComplianceRegimes: ImmutableArray<string>.Empty);
/// <summary>
/// Whether this is a production environment.
/// </summary>
public bool IsProduction => Environment.Equals("production", StringComparison.OrdinalIgnoreCase);
/// <summary>
/// Whether this context has compliance requirements.
/// </summary>
public bool HasComplianceRequirements => !ComplianceRegimes.IsEmpty;
/// <summary>
/// Weight multiplier based on business context.
/// </summary>
public float RiskMultiplier
{
get
{
var multiplier = 1.0f;
// Environment weight
multiplier *= Environment.ToLowerInvariant() switch
{
"production" => 1.5f,
"staging" => 1.2f,
"qa" or "test" => 1.0f,
"development" or "dev" => 0.5f,
_ => 1.0f
};
// Internet exposure
if (IsInternetFacing)
{
multiplier *= 1.5f;
}
// Data classification
multiplier *= DataClassification switch
{
DataClassification.Restricted => 2.0f,
DataClassification.Confidential => 1.5f,
DataClassification.Internal => 1.2f,
DataClassification.Public => 1.0f,
_ => 1.0f
};
// Criticality
multiplier *= CriticalityTier switch
{
1 => 1.5f,
2 => 1.2f,
_ => 1.0f
};
// Compliance
if (HasComplianceRequirements)
{
multiplier *= 1.2f;
}
return Math.Min(multiplier, 5.0f); // Cap at 5x
}
}
}
/// <summary>
/// Data classification levels.
/// </summary>
public enum DataClassification
{
/// <summary>Classification unknown.</summary>
Unknown = 0,
/// <summary>Public data, no sensitivity.</summary>
Public = 1,
/// <summary>Internal use only.</summary>
Internal = 2,
/// <summary>Confidential, limited access.</summary>
Confidential = 3,
/// <summary>Restricted, maximum protection.</summary>
Restricted = 4,
}
/// <summary>
/// Subject type for risk assessment.
/// </summary>
public enum SubjectType
{
/// <summary>Container image.</summary>
Image = 0,
/// <summary>Running container.</summary>
Container = 1,
/// <summary>Service (group of containers).</summary>
Service = 2,
/// <summary>Namespace or deployment.</summary>
Namespace = 3,
/// <summary>Entire cluster.</summary>
Cluster = 4,
}
/// <summary>
/// Complete risk assessment for an image/container.
/// </summary>
/// <param name="SubjectId">Image digest or container ID.</param>
/// <param name="SubjectType">Type of subject.</param>
/// <param name="OverallScore">Synthesized risk score.</param>
/// <param name="Factors">All contributing factors.</param>
/// <param name="BusinessContext">Business context for weighting.</param>
/// <param name="Recommendations">Actionable recommendations.</param>
/// <param name="AssessedAt">When the assessment was performed.</param>
public sealed record RiskAssessment(
string SubjectId,
SubjectType SubjectType,
RiskScore OverallScore,
ImmutableArray<RiskFactor> Factors,
BusinessContext? BusinessContext,
ImmutableArray<string> Recommendations,
DateTimeOffset AssessedAt)
{
/// <summary>
/// Top contributing factors.
/// </summary>
public IEnumerable<RiskFactor> TopFactors => Factors
.OrderByDescending(f => f.Contribution)
.Take(5);
/// <summary>
/// Whether the assessment requires immediate attention.
/// </summary>
public bool RequiresImmediateAction => OverallScore.Level >= RiskLevel.Critical;
/// <summary>
/// Whether the assessment is actionable (has recommendations).
/// </summary>
public bool IsActionable => !Recommendations.IsEmpty;
/// <summary>
/// Creates an empty assessment for a subject with no risk data.
/// </summary>
public static RiskAssessment Empty(string subjectId, SubjectType subjectType) => new(
SubjectId: subjectId,
SubjectType: subjectType,
OverallScore: RiskScore.Zero,
Factors: ImmutableArray<RiskFactor>.Empty,
BusinessContext: null,
Recommendations: ImmutableArray<string>.Empty,
AssessedAt: DateTimeOffset.UtcNow);
}
/// <summary>
/// Risk trend over time.
/// </summary>
/// <param name="SubjectId">Subject being tracked.</param>
/// <param name="Snapshots">Historical score snapshots.</param>
/// <param name="TrendDirection">Overall trend direction.</param>
/// <param name="VelocityPerDay">Rate of change per day.</param>
public sealed record RiskTrend(
string SubjectId,
ImmutableArray<RiskSnapshot> Snapshots,
TrendDirection TrendDirection,
float VelocityPerDay)
{
/// <summary>
/// Whether risk is increasing.
/// </summary>
public bool IsIncreasing => TrendDirection == TrendDirection.Increasing;
/// <summary>
/// Whether risk is decreasing.
/// </summary>
public bool IsDecreasing => TrendDirection == TrendDirection.Decreasing;
/// <summary>
/// Whether risk is accelerating.
/// </summary>
public bool IsAccelerating => Math.Abs(VelocityPerDay) > 0.1f;
/// <summary>
/// Creates a trend from a series of assessments.
/// </summary>
public static RiskTrend FromAssessments(string subjectId, IEnumerable<RiskAssessment> assessments)
{
var snapshots = assessments
.OrderBy(a => a.AssessedAt)
.Select(a => new RiskSnapshot(a.OverallScore.OverallScore, a.AssessedAt))
.ToImmutableArray();
if (snapshots.Length < 2)
{
return new RiskTrend(subjectId, snapshots, TrendDirection.Stable, 0.0f);
}
var first = snapshots[0];
var last = snapshots[^1];
var daysDiff = (float)(last.Timestamp - first.Timestamp).TotalDays;
if (daysDiff < 0.01f)
{
return new RiskTrend(subjectId, snapshots, TrendDirection.Stable, 0.0f);
}
var scoreDiff = last.Score - first.Score;
var velocity = scoreDiff / daysDiff;
var direction = scoreDiff switch
{
> 0.05f => TrendDirection.Increasing,
< -0.05f => TrendDirection.Decreasing,
_ => TrendDirection.Stable
};
return new RiskTrend(subjectId, snapshots, direction, velocity);
}
}
/// <summary>
/// Point-in-time risk score snapshot.
/// </summary>
/// <param name="Score">Risk score at this time.</param>
/// <param name="Timestamp">When the score was recorded.</param>
public sealed record RiskSnapshot(float Score, DateTimeOffset Timestamp);
/// <summary>
/// Direction of risk trend.
/// </summary>
public enum TrendDirection
{
/// <summary>Risk is stable.</summary>
Stable = 0,
/// <summary>Risk is decreasing.</summary>
Decreasing = 1,
/// <summary>Risk is increasing.</summary>
Increasing = 2,
}

View File

@@ -0,0 +1,393 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Security.Cryptography;
using System.Text;
using StellaOps.Scanner.EntryTrace.Parsing;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Represents a complete execution path through a shell script.
/// </summary>
/// <param name="PathId">Unique deterministic identifier for this path.</param>
/// <param name="Constraints">All path constraints accumulated along this path.</param>
/// <param name="TerminalCommands">Terminal commands reachable on this path.</param>
/// <param name="BranchHistory">Sequence of branch decisions taken.</param>
/// <param name="IsFeasible">True if the path constraints are satisfiable.</param>
/// <param name="ReachabilityConfidence">Confidence score for this path being reachable (0.0-1.0).</param>
/// <param name="EnvDependencies">Environment variables this path depends on.</param>
public sealed record ExecutionPath(
string PathId,
ImmutableArray<PathConstraint> Constraints,
ImmutableArray<TerminalCommand> TerminalCommands,
ImmutableArray<BranchDecision> BranchHistory,
bool IsFeasible,
float ReachabilityConfidence,
ImmutableHashSet<string> EnvDependencies)
{
/// <summary>
/// Creates an execution path from a symbolic state.
/// </summary>
public static ExecutionPath FromState(SymbolicState state, bool isFeasible, float confidence)
{
var envDeps = new HashSet<string>();
foreach (var c in state.PathConstraints)
{
envDeps.UnionWith(c.DependsOnEnv);
}
return new ExecutionPath(
ComputePathId(state.BranchHistory),
state.PathConstraints,
state.TerminalCommands,
state.BranchHistory,
isFeasible,
confidence,
envDeps.ToImmutableHashSet());
}
/// <summary>
/// Computes a deterministic path ID from the branch history.
/// </summary>
private static string ComputePathId(ImmutableArray<BranchDecision> history)
{
if (history.IsEmpty)
{
return "path-root";
}
var canonical = new StringBuilder();
foreach (var decision in history)
{
canonical.Append($"{decision.BranchKind}:{decision.BranchIndex}/{decision.TotalBranches};");
}
var hashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(canonical.ToString()));
return $"path-{Convert.ToHexString(hashBytes)[..16].ToLowerInvariant()}";
}
/// <summary>
/// Whether this path depends on environment variables.
/// </summary>
public bool IsEnvDependent => !EnvDependencies.IsEmpty;
/// <summary>
/// Number of branches in the path.
/// </summary>
public int BranchCount => BranchHistory.Length;
/// <summary>
/// Gets all concrete terminal commands on this path.
/// </summary>
public IEnumerable<TerminalCommand> GetConcreteCommands()
=> TerminalCommands.Where(c => c.IsConcrete);
/// <summary>
/// Gets a human-readable summary of this path.
/// </summary>
public string GetSummary()
{
var sb = new StringBuilder();
sb.Append($"Path {PathId[..Math.Min(12, PathId.Length)]}");
sb.Append($" ({BranchCount} branches, {TerminalCommands.Length} commands)");
if (!IsFeasible)
{
sb.Append(" [INFEASIBLE]");
}
else if (IsEnvDependent)
{
sb.Append($" [ENV: {string.Join(", ", EnvDependencies)}]");
}
return sb.ToString();
}
}
/// <summary>
/// Represents a branch point in the execution tree.
/// </summary>
/// <param name="Location">Source location of the branch.</param>
/// <param name="BranchKind">Type of branch construct.</param>
/// <param name="Predicate">The predicate expression (null for case/else).</param>
/// <param name="TotalBranches">Total number of branches at this point.</param>
/// <param name="TakenBranches">Number of branches that lead to feasible paths.</param>
/// <param name="EnvDependentBranches">Number of branches that depend on environment.</param>
/// <param name="InfeasibleBranches">Number of branches proven infeasible.</param>
public sealed record BranchPoint(
ShellSpan Location,
BranchKind BranchKind,
string? Predicate,
int TotalBranches,
int TakenBranches,
int EnvDependentBranches,
int InfeasibleBranches)
{
/// <summary>
/// Coverage ratio for this branch point.
/// </summary>
public float Coverage => TotalBranches > 0
? (float)TakenBranches / TotalBranches
: 1.0f;
/// <summary>
/// Whether all branches at this point were explored.
/// </summary>
public bool IsFullyCovered => TakenBranches == TotalBranches;
/// <summary>
/// Whether any branch depends on environment variables.
/// </summary>
public bool HasEnvDependence => EnvDependentBranches > 0;
}
/// <summary>
/// Represents the complete execution tree from symbolic execution.
/// </summary>
/// <param name="ScriptPath">Path to the analyzed script.</param>
/// <param name="AllPaths">All discovered execution paths.</param>
/// <param name="BranchPoints">All branch points in the script.</param>
/// <param name="Coverage">Branch coverage metrics.</param>
/// <param name="AnalysisDepthLimit">Maximum depth used during analysis.</param>
/// <param name="DepthLimitReached">True if any path hit the depth limit.</param>
public sealed record ExecutionTree(
string ScriptPath,
ImmutableArray<ExecutionPath> AllPaths,
ImmutableArray<BranchPoint> BranchPoints,
BranchCoverage Coverage,
int AnalysisDepthLimit,
bool DepthLimitReached)
{
/// <summary>
/// Creates an empty execution tree.
/// </summary>
public static ExecutionTree Empty(string scriptPath, int depthLimit) => new(
scriptPath,
ImmutableArray<ExecutionPath>.Empty,
ImmutableArray<BranchPoint>.Empty,
BranchCoverage.Empty,
depthLimit,
DepthLimitReached: false);
/// <summary>
/// Gets all feasible paths.
/// </summary>
public IEnumerable<ExecutionPath> FeasiblePaths
=> AllPaths.Where(p => p.IsFeasible);
/// <summary>
/// Gets all environment-dependent paths.
/// </summary>
public IEnumerable<ExecutionPath> EnvDependentPaths
=> AllPaths.Where(p => p.IsEnvDependent);
/// <summary>
/// Gets all unique terminal commands across all feasible paths.
/// </summary>
public ImmutableHashSet<string> GetAllConcreteCommands()
{
var commands = new HashSet<string>();
foreach (var path in FeasiblePaths)
{
foreach (var cmd in path.GetConcreteCommands())
{
if (cmd.GetConcreteCommand() is { } concrete)
{
commands.Add(concrete);
}
}
}
return commands.ToImmutableHashSet();
}
/// <summary>
/// Gets all environment variables that affect execution paths.
/// </summary>
public ImmutableHashSet<string> GetAllEnvDependencies()
{
var deps = new HashSet<string>();
foreach (var path in AllPaths)
{
deps.UnionWith(path.EnvDependencies);
}
return deps.ToImmutableHashSet();
}
}
/// <summary>
/// Branch coverage metrics for speculative execution.
/// </summary>
/// <param name="TotalBranches">Total number of branches discovered.</param>
/// <param name="CoveredBranches">Branches that lead to feasible paths.</param>
/// <param name="InfeasibleBranches">Branches proven unreachable.</param>
/// <param name="EnvDependentBranches">Branches depending on environment.</param>
/// <param name="DepthLimitedBranches">Branches not fully explored due to depth limit.</param>
public sealed record BranchCoverage(
int TotalBranches,
int CoveredBranches,
int InfeasibleBranches,
int EnvDependentBranches,
int DepthLimitedBranches)
{
/// <summary>
/// Empty coverage metrics.
/// </summary>
public static BranchCoverage Empty => new(0, 0, 0, 0, 0);
/// <summary>
/// Coverage ratio (0.0-1.0).
/// </summary>
public float CoverageRatio => TotalBranches > 0
? (float)CoveredBranches / TotalBranches
: 1.0f;
/// <summary>
/// Percentage of branches that are environment-dependent.
/// </summary>
public float EnvDependentRatio => TotalBranches > 0
? (float)EnvDependentBranches / TotalBranches
: 0.0f;
/// <summary>
/// Creates coverage metrics from a collection of branch points.
/// </summary>
public static BranchCoverage FromBranchPoints(
IEnumerable<BranchPoint> branchPoints,
int depthLimitedCount = 0)
{
var points = branchPoints.ToList();
return new BranchCoverage(
TotalBranches: points.Sum(p => p.TotalBranches),
CoveredBranches: points.Sum(p => p.TakenBranches),
InfeasibleBranches: points.Sum(p => p.InfeasibleBranches),
EnvDependentBranches: points.Sum(p => p.EnvDependentBranches),
DepthLimitedBranches: depthLimitedCount);
}
/// <summary>
/// Gets a human-readable summary.
/// </summary>
public string GetSummary()
=> $"Coverage: {CoverageRatio:P1} ({CoveredBranches}/{TotalBranches} branches), " +
$"Infeasible: {InfeasibleBranches}, Env-dependent: {EnvDependentBranches}";
}
/// <summary>
/// Builder for constructing execution trees incrementally.
/// </summary>
public sealed class ExecutionTreeBuilder
{
private readonly string _scriptPath;
private readonly int _depthLimit;
private readonly List<ExecutionPath> _paths = new();
private readonly Dictionary<string, BranchPointBuilder> _branchPoints = new();
private bool _depthLimitReached;
public ExecutionTreeBuilder(string scriptPath, int depthLimit)
{
_scriptPath = scriptPath;
_depthLimit = depthLimit;
}
/// <summary>
/// Adds a completed execution path.
/// </summary>
public void AddPath(ExecutionPath path)
{
_paths.Add(path);
}
/// <summary>
/// Records a branch point visit.
/// </summary>
public void RecordBranchPoint(
ShellSpan location,
BranchKind kind,
string? predicate,
int totalBranches,
int branchIndex,
bool isEnvDependent,
bool isFeasible)
{
var key = $"{location.StartLine}:{location.StartColumn}";
if (!_branchPoints.TryGetValue(key, out var builder))
{
builder = new BranchPointBuilder(location, kind, predicate, totalBranches);
_branchPoints[key] = builder;
}
builder.RecordBranch(branchIndex, isEnvDependent, !isFeasible);
}
/// <summary>
/// Marks that the depth limit was reached.
/// </summary>
public void MarkDepthLimitReached()
{
_depthLimitReached = true;
}
/// <summary>
/// Builds the final execution tree.
/// </summary>
public ExecutionTree Build()
{
var branchPoints = _branchPoints.Values
.Select(b => b.Build())
.OrderBy(bp => bp.Location.StartLine)
.ThenBy(bp => bp.Location.StartColumn)
.ToImmutableArray();
var coverage = BranchCoverage.FromBranchPoints(
branchPoints,
_depthLimitReached ? 1 : 0);
return new ExecutionTree(
_scriptPath,
_paths.OrderBy(p => p.PathId).ToImmutableArray(),
branchPoints,
coverage,
_depthLimit,
_depthLimitReached);
}
private sealed class BranchPointBuilder
{
private readonly ShellSpan _location;
private readonly BranchKind _kind;
private readonly string? _predicate;
private readonly int _totalBranches;
private readonly HashSet<int> _takenBranches = new();
private int _envDependentCount;
private int _infeasibleCount;
public BranchPointBuilder(
ShellSpan location,
BranchKind kind,
string? predicate,
int totalBranches)
{
_location = location;
_kind = kind;
_predicate = predicate;
_totalBranches = totalBranches;
}
public void RecordBranch(int branchIndex, bool isEnvDependent, bool isInfeasible)
{
_takenBranches.Add(branchIndex);
if (isEnvDependent) _envDependentCount++;
if (isInfeasible) _infeasibleCount++;
}
public BranchPoint Build() => new(
_location,
_kind,
_predicate,
_totalBranches,
_takenBranches.Count,
_envDependentCount,
_infeasibleCount);
}
}

View File

@@ -0,0 +1,299 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using StellaOps.Scanner.EntryTrace.Parsing;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Interface for symbolic execution of shell scripts and similar constructs.
/// </summary>
public interface ISymbolicExecutor
{
/// <summary>
/// Executes symbolic analysis on a parsed shell script.
/// </summary>
/// <param name="script">The parsed shell script AST.</param>
/// <param name="options">Execution options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The execution tree containing all discovered paths.</returns>
Task<ExecutionTree> ExecuteAsync(
ShellScript script,
SymbolicExecutionOptions options,
CancellationToken cancellationToken = default);
/// <summary>
/// Executes symbolic analysis on shell source code.
/// </summary>
/// <param name="source">The shell script source code.</param>
/// <param name="scriptPath">Path to the script (for reporting).</param>
/// <param name="options">Execution options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The execution tree containing all discovered paths.</returns>
Task<ExecutionTree> ExecuteAsync(
string source,
string scriptPath,
SymbolicExecutionOptions? options = null,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Options for symbolic execution.
/// </summary>
/// <param name="MaxDepth">Maximum depth for path exploration.</param>
/// <param name="MaxPaths">Maximum number of paths to explore.</param>
/// <param name="InitialEnvironment">Known environment variables.</param>
/// <param name="ConstraintEvaluator">Evaluator for path feasibility.</param>
/// <param name="TrackAllCommands">Whether to track all commands or just terminal ones.</param>
/// <param name="PruneInfeasiblePaths">Whether to prune paths with unsatisfiable constraints.</param>
public sealed record SymbolicExecutionOptions(
int MaxDepth = 100,
int MaxPaths = 1000,
IReadOnlyDictionary<string, string>? InitialEnvironment = null,
IConstraintEvaluator? ConstraintEvaluator = null,
bool TrackAllCommands = false,
bool PruneInfeasiblePaths = true)
{
/// <summary>
/// Default options with reasonable limits.
/// </summary>
public static SymbolicExecutionOptions Default => new();
}
/// <summary>
/// Interface for evaluating path constraint feasibility.
/// </summary>
public interface IConstraintEvaluator
{
/// <summary>
/// Evaluates whether a set of constraints is satisfiable.
/// </summary>
/// <param name="constraints">The constraints to evaluate.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The evaluation result.</returns>
Task<ConstraintResult> EvaluateAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default);
/// <summary>
/// Attempts to simplify a set of constraints.
/// </summary>
/// <param name="constraints">The constraints to simplify.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Simplified constraints.</returns>
Task<ImmutableArray<PathConstraint>> SimplifyAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default);
/// <summary>
/// Computes a confidence score for path reachability.
/// </summary>
/// <param name="constraints">The path constraints.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Confidence score between 0.0 and 1.0.</returns>
Task<float> ComputeConfidenceAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Result of constraint evaluation.
/// </summary>
public enum ConstraintResult
{
/// <summary>
/// Constraints are satisfiable.
/// </summary>
Satisfiable,
/// <summary>
/// Constraints are provably unsatisfiable.
/// </summary>
Unsatisfiable,
/// <summary>
/// Satisfiability cannot be determined statically.
/// </summary>
Unknown
}
/// <summary>
/// Pattern-based constraint evaluator for common shell conditionals.
/// </summary>
public sealed class PatternConstraintEvaluator : IConstraintEvaluator
{
/// <summary>
/// Singleton instance.
/// </summary>
public static PatternConstraintEvaluator Instance { get; } = new();
/// <inheritdoc/>
public Task<ConstraintResult> EvaluateAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default)
{
if (constraints.IsEmpty)
{
return Task.FromResult(ConstraintResult.Satisfiable);
}
// Check for direct contradictions
var seenConstraints = new Dictionary<string, bool>();
foreach (var constraint in constraints)
{
cancellationToken.ThrowIfCancellationRequested();
// Normalize the constraint expression
var key = constraint.Expression.Trim();
var isPositive = !constraint.IsNegated;
if (seenConstraints.TryGetValue(key, out var existingValue))
{
// If we've seen the same constraint with opposite polarity, it's unsatisfiable
if (existingValue != isPositive)
{
return Task.FromResult(ConstraintResult.Unsatisfiable);
}
}
else
{
seenConstraints[key] = isPositive;
}
}
// Check for string equality contradictions
var equalityConstraints = constraints
.Where(c => c.Kind == ConstraintKind.StringEquality)
.ToList();
foreach (var group in equalityConstraints.GroupBy(c => ExtractVariable(c.Expression)))
{
var values = group.ToList();
if (values.Count > 1)
{
// Multiple equality constraints on same variable
var positiveValues = values
.Where(c => !c.IsNegated)
.Select(c => ExtractValue(c.Expression))
.Distinct()
.ToList();
if (positiveValues.Count > 1)
{
// Variable must equal multiple different values - unsatisfiable
return Task.FromResult(ConstraintResult.Unsatisfiable);
}
}
}
// If we have environment-dependent constraints, we can't fully determine
if (constraints.Any(c => c.IsEnvDependent))
{
return Task.FromResult(ConstraintResult.Unknown);
}
// Default to satisfiable (conservative)
return Task.FromResult(ConstraintResult.Satisfiable);
}
/// <inheritdoc/>
public Task<ImmutableArray<PathConstraint>> SimplifyAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default)
{
if (constraints.Length <= 1)
{
return Task.FromResult(constraints);
}
// Remove duplicate constraints
var seen = new HashSet<string>();
var simplified = new List<PathConstraint>();
foreach (var constraint in constraints)
{
cancellationToken.ThrowIfCancellationRequested();
var canonical = constraint.ToCanonical();
if (seen.Add(canonical))
{
simplified.Add(constraint);
}
}
return Task.FromResult(simplified.ToImmutableArray());
}
/// <inheritdoc/>
public Task<float> ComputeConfidenceAsync(
ImmutableArray<PathConstraint> constraints,
CancellationToken cancellationToken = default)
{
if (constraints.IsEmpty)
{
return Task.FromResult(1.0f);
}
// Base confidence starts at 1.0
var confidence = 1.0f;
foreach (var constraint in constraints)
{
cancellationToken.ThrowIfCancellationRequested();
// Reduce confidence for each constraint
switch (constraint.Kind)
{
case ConstraintKind.Unknown:
// Unknown constraints reduce confidence significantly
confidence *= 0.5f;
break;
case ConstraintKind.FileExists:
case ConstraintKind.DirectoryExists:
case ConstraintKind.IsExecutable:
case ConstraintKind.IsReadable:
case ConstraintKind.IsWritable:
// File system constraints moderately reduce confidence
confidence *= 0.7f;
break;
case ConstraintKind.StringEmpty:
case ConstraintKind.StringEquality:
case ConstraintKind.StringInequality:
case ConstraintKind.NumericComparison:
case ConstraintKind.PatternMatch:
// Value constraints slightly reduce confidence
confidence *= 0.9f;
break;
}
// Environment-dependent constraints reduce confidence
if (constraint.IsEnvDependent)
{
confidence *= 0.8f;
}
}
return Task.FromResult(Math.Max(0.01f, confidence));
}
private static string ExtractVariable(string expression)
{
// Simple extraction of variable name from expressions like "$VAR" = "value"
var match = System.Text.RegularExpressions.Regex.Match(
expression,
@"\$\{?(\w+)\}?");
return match.Success ? match.Groups[1].Value : expression;
}
private static string ExtractValue(string expression)
{
// Simple extraction of value from expressions like "$VAR" = "value"
var match = System.Text.RegularExpressions.Regex.Match(
expression,
@"=\s*""?([^""]+)""?");
return match.Success ? match.Groups[1].Value : expression;
}
}

View File

@@ -0,0 +1,313 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Computes confidence scores for execution path reachability.
/// </summary>
public sealed class PathConfidenceScorer
{
private readonly IConstraintEvaluator _constraintEvaluator;
/// <summary>
/// Default weights for confidence factors.
/// </summary>
public static PathConfidenceWeights DefaultWeights { get; } = new(
ConstraintComplexityWeight: 0.3f,
EnvDependencyWeight: 0.25f,
BranchDepthWeight: 0.2f,
ConstraintTypeWeight: 0.15f,
FeasibilityWeight: 0.1f);
/// <summary>
/// Creates a new confidence scorer.
/// </summary>
public PathConfidenceScorer(IConstraintEvaluator? constraintEvaluator = null)
{
_constraintEvaluator = constraintEvaluator ?? PatternConstraintEvaluator.Instance;
}
/// <summary>
/// Computes a confidence score for a single execution path.
/// </summary>
/// <param name="path">The execution path to score.</param>
/// <param name="weights">Custom weights (optional).</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Detailed confidence analysis.</returns>
public async Task<PathConfidenceAnalysis> ScorePathAsync(
ExecutionPath path,
PathConfidenceWeights? weights = null,
CancellationToken cancellationToken = default)
{
weights ??= DefaultWeights;
var factors = new List<ConfidenceFactor>();
// Factor 1: Constraint complexity
var complexityScore = ComputeComplexityScore(path.Constraints);
factors.Add(new ConfidenceFactor(
"ConstraintComplexity",
complexityScore,
weights.ConstraintComplexityWeight,
$"{path.Constraints.Length} constraints"));
// Factor 2: Environment dependency
var envScore = ComputeEnvDependencyScore(path);
factors.Add(new ConfidenceFactor(
"EnvironmentDependency",
envScore,
weights.EnvDependencyWeight,
$"{path.EnvDependencies.Count} env vars"));
// Factor 3: Branch depth
var depthScore = ComputeBranchDepthScore(path);
factors.Add(new ConfidenceFactor(
"BranchDepth",
depthScore,
weights.BranchDepthWeight,
$"{path.BranchCount} branches"));
// Factor 4: Constraint type distribution
var typeScore = ComputeConstraintTypeScore(path.Constraints);
factors.Add(new ConfidenceFactor(
"ConstraintType",
typeScore,
weights.ConstraintTypeWeight,
GetConstraintTypeSummary(path.Constraints)));
// Factor 5: Feasibility
var feasibilityScore = path.IsFeasible ? 1.0f : 0.0f;
factors.Add(new ConfidenceFactor(
"Feasibility",
feasibilityScore,
weights.FeasibilityWeight,
path.IsFeasible ? "feasible" : "infeasible"));
// Compute weighted average
var totalWeight = factors.Sum(f => f.Weight);
var weightedSum = factors.Sum(f => f.Score * f.Weight);
var overallConfidence = totalWeight > 0 ? weightedSum / totalWeight : 0.0f;
// Get base confidence from constraint evaluator
var baseConfidence = await _constraintEvaluator.ComputeConfidenceAsync(
path.Constraints, cancellationToken);
// Combine with computed confidence
var finalConfidence = (overallConfidence + baseConfidence) / 2.0f;
return new PathConfidenceAnalysis(
path.PathId,
finalConfidence,
factors.ToImmutableArray(),
ClassifyConfidence(finalConfidence));
}
/// <summary>
/// Computes confidence scores for all paths in an execution tree.
/// </summary>
public async Task<ExecutionTreeConfidenceAnalysis> ScoreTreeAsync(
ExecutionTree tree,
PathConfidenceWeights? weights = null,
CancellationToken cancellationToken = default)
{
var pathAnalyses = new List<PathConfidenceAnalysis>();
foreach (var path in tree.AllPaths)
{
cancellationToken.ThrowIfCancellationRequested();
var analysis = await ScorePathAsync(path, weights, cancellationToken);
pathAnalyses.Add(analysis);
}
var overallConfidence = pathAnalyses.Count > 0
? pathAnalyses.Average(a => a.Confidence)
: 1.0f;
var highConfidencePaths = pathAnalyses.Count(a => a.Level == ConfidenceLevel.High);
var mediumConfidencePaths = pathAnalyses.Count(a => a.Level == ConfidenceLevel.Medium);
var lowConfidencePaths = pathAnalyses.Count(a => a.Level == ConfidenceLevel.Low);
return new ExecutionTreeConfidenceAnalysis(
tree.ScriptPath,
overallConfidence,
pathAnalyses.ToImmutableArray(),
highConfidencePaths,
mediumConfidencePaths,
lowConfidencePaths,
ClassifyConfidence(overallConfidence));
}
private static float ComputeComplexityScore(ImmutableArray<PathConstraint> constraints)
{
if (constraints.IsEmpty)
{
return 1.0f; // No constraints = high confidence
}
// More constraints = lower confidence
// 0 constraints = 1.0, 10+ constraints = ~0.3
return Math.Max(0.3f, 1.0f - (constraints.Length * 0.07f));
}
private static float ComputeEnvDependencyScore(ExecutionPath path)
{
if (path.EnvDependencies.IsEmpty)
{
return 1.0f; // No env dependencies = high confidence
}
// More env dependencies = lower confidence
// 0 deps = 1.0, 5+ deps = ~0.4
return Math.Max(0.4f, 1.0f - (path.EnvDependencies.Count * 0.12f));
}
private static float ComputeBranchDepthScore(ExecutionPath path)
{
if (path.BranchCount == 0)
{
return 1.0f; // Straight-line path = high confidence
}
// More branches = lower confidence
// 0 branches = 1.0, 20+ branches = ~0.4
return Math.Max(0.4f, 1.0f - (path.BranchCount * 0.03f));
}
private static float ComputeConstraintTypeScore(ImmutableArray<PathConstraint> constraints)
{
if (constraints.IsEmpty)
{
return 1.0f;
}
var knownTypeCount = constraints.Count(c => c.Kind != ConstraintKind.Unknown);
var knownRatio = (float)knownTypeCount / constraints.Length;
// Higher ratio of known constraint types = higher confidence
return 0.4f + (knownRatio * 0.6f);
}
private static string GetConstraintTypeSummary(ImmutableArray<PathConstraint> constraints)
{
if (constraints.IsEmpty)
{
return "none";
}
var typeCounts = constraints
.GroupBy(c => c.Kind)
.OrderByDescending(g => g.Count())
.Take(3)
.Select(g => $"{g.Key}:{g.Count()}");
return string.Join(", ", typeCounts);
}
private static ConfidenceLevel ClassifyConfidence(float confidence)
{
return confidence switch
{
>= 0.7f => ConfidenceLevel.High,
>= 0.4f => ConfidenceLevel.Medium,
_ => ConfidenceLevel.Low
};
}
}
/// <summary>
/// Weights for confidence scoring factors.
/// </summary>
/// <param name="ConstraintComplexityWeight">Weight for constraint complexity.</param>
/// <param name="EnvDependencyWeight">Weight for environment dependency.</param>
/// <param name="BranchDepthWeight">Weight for branch depth.</param>
/// <param name="ConstraintTypeWeight">Weight for constraint type distribution.</param>
/// <param name="FeasibilityWeight">Weight for feasibility.</param>
public sealed record PathConfidenceWeights(
float ConstraintComplexityWeight,
float EnvDependencyWeight,
float BranchDepthWeight,
float ConstraintTypeWeight,
float FeasibilityWeight);
/// <summary>
/// Confidence analysis for a single execution path.
/// </summary>
/// <param name="PathId">The path identifier.</param>
/// <param name="Confidence">Overall confidence score (0.0-1.0).</param>
/// <param name="Factors">Individual contributing factors.</param>
/// <param name="Level">Classified confidence level.</param>
public sealed record PathConfidenceAnalysis(
string PathId,
float Confidence,
ImmutableArray<ConfidenceFactor> Factors,
ConfidenceLevel Level)
{
/// <summary>
/// Gets a human-readable summary.
/// </summary>
public string GetSummary()
=> $"Path {PathId[..Math.Min(12, PathId.Length)]}: {Confidence:P0} ({Level})";
}
/// <summary>
/// A single factor contributing to confidence score.
/// </summary>
/// <param name="Name">Factor name.</param>
/// <param name="Score">Factor score (0.0-1.0).</param>
/// <param name="Weight">Factor weight.</param>
/// <param name="Description">Human-readable description.</param>
public sealed record ConfidenceFactor(
string Name,
float Score,
float Weight,
string Description);
/// <summary>
/// Confidence level classification.
/// </summary>
public enum ConfidenceLevel
{
/// <summary>
/// High confidence (≥70%).
/// </summary>
High,
/// <summary>
/// Medium confidence (40-70%).
/// </summary>
Medium,
/// <summary>
/// Low confidence (&lt;40%).
/// </summary>
Low
}
/// <summary>
/// Confidence analysis for an entire execution tree.
/// </summary>
/// <param name="ScriptPath">Path to the analyzed script.</param>
/// <param name="OverallConfidence">Average confidence across all paths.</param>
/// <param name="PathAnalyses">Individual path analyses.</param>
/// <param name="HighConfidencePaths">Count of high-confidence paths.</param>
/// <param name="MediumConfidencePaths">Count of medium-confidence paths.</param>
/// <param name="LowConfidencePaths">Count of low-confidence paths.</param>
/// <param name="OverallLevel">Overall confidence level.</param>
public sealed record ExecutionTreeConfidenceAnalysis(
string ScriptPath,
float OverallConfidence,
ImmutableArray<PathConfidenceAnalysis> PathAnalyses,
int HighConfidencePaths,
int MediumConfidencePaths,
int LowConfidencePaths,
ConfidenceLevel OverallLevel)
{
/// <summary>
/// Gets a human-readable summary.
/// </summary>
public string GetSummary()
=> $"Script {ScriptPath}: {OverallConfidence:P0} ({OverallLevel}), " +
$"Paths: {HighConfidencePaths} high, {MediumConfidencePaths} medium, {LowConfidencePaths} low";
}

View File

@@ -0,0 +1,301 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Enumerates all execution paths in a shell script systematically.
/// </summary>
public sealed class PathEnumerator
{
private readonly ISymbolicExecutor _executor;
private readonly IConstraintEvaluator _constraintEvaluator;
/// <summary>
/// Creates a new path enumerator.
/// </summary>
/// <param name="executor">The symbolic executor to use.</param>
/// <param name="constraintEvaluator">The constraint evaluator for feasibility checking.</param>
public PathEnumerator(
ISymbolicExecutor? executor = null,
IConstraintEvaluator? constraintEvaluator = null)
{
_executor = executor ?? new ShellSymbolicExecutor();
_constraintEvaluator = constraintEvaluator ?? PatternConstraintEvaluator.Instance;
}
/// <summary>
/// Enumerates all paths in a shell script.
/// </summary>
/// <param name="source">Shell script source code.</param>
/// <param name="scriptPath">Path to the script (for reporting).</param>
/// <param name="options">Enumeration options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Result containing all enumerated paths.</returns>
public async Task<PathEnumerationResult> EnumerateAsync(
string source,
string scriptPath,
PathEnumerationOptions? options = null,
CancellationToken cancellationToken = default)
{
options ??= PathEnumerationOptions.Default;
var execOptions = new SymbolicExecutionOptions(
MaxDepth: options.MaxDepth,
MaxPaths: options.MaxPaths,
InitialEnvironment: options.KnownEnvironment,
ConstraintEvaluator: _constraintEvaluator,
TrackAllCommands: options.TrackAllCommands,
PruneInfeasiblePaths: options.PruneInfeasible);
var tree = await _executor.ExecuteAsync(source, scriptPath, execOptions, cancellationToken);
return new PathEnumerationResult(
tree,
ComputeMetrics(tree, options),
options.GroupByTerminalCommand
? GroupByTerminalCommand(tree)
: ImmutableDictionary<string, ImmutableArray<ExecutionPath>>.Empty);
}
/// <summary>
/// Finds all paths that lead to a specific command.
/// </summary>
/// <param name="source">Shell script source code.</param>
/// <param name="scriptPath">Path to the script.</param>
/// <param name="targetCommand">The command to find paths to.</param>
/// <param name="options">Enumeration options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Paths that lead to the target command.</returns>
public async Task<ImmutableArray<ExecutionPath>> FindPathsToCommandAsync(
string source,
string scriptPath,
string targetCommand,
PathEnumerationOptions? options = null,
CancellationToken cancellationToken = default)
{
var result = await EnumerateAsync(source, scriptPath, options, cancellationToken);
return result.Tree.AllPaths
.Where(p => p.TerminalCommands.Any(c =>
c.GetConcreteCommand()?.Equals(targetCommand, StringComparison.OrdinalIgnoreCase) == true))
.ToImmutableArray();
}
/// <summary>
/// Finds all paths that are environment-dependent.
/// </summary>
public async Task<ImmutableArray<ExecutionPath>> FindEnvDependentPathsAsync(
string source,
string scriptPath,
PathEnumerationOptions? options = null,
CancellationToken cancellationToken = default)
{
var result = await EnumerateAsync(source, scriptPath, options, cancellationToken);
return result.Tree.AllPaths
.Where(p => p.IsEnvDependent)
.ToImmutableArray();
}
/// <summary>
/// Computes which environment variables affect execution paths.
/// </summary>
public async Task<EnvironmentImpactAnalysis> AnalyzeEnvironmentImpactAsync(
string source,
string scriptPath,
PathEnumerationOptions? options = null,
CancellationToken cancellationToken = default)
{
var result = await EnumerateAsync(source, scriptPath, options, cancellationToken);
var varImpact = new Dictionary<string, EnvironmentVariableImpact>();
foreach (var path in result.Tree.AllPaths)
{
foreach (var envVar in path.EnvDependencies)
{
if (!varImpact.TryGetValue(envVar, out var impact))
{
impact = new EnvironmentVariableImpact(envVar, 0, new List<string>());
varImpact[envVar] = impact;
}
impact.AffectedPaths.Add(path.PathId);
}
}
// Calculate impact scores
var totalPaths = result.Tree.AllPaths.Length;
var impacts = varImpact.Values
.Select(v => v with
{
ImpactScore = totalPaths > 0
? (float)v.AffectedPaths.Count / totalPaths
: 0
})
.OrderByDescending(v => v.ImpactScore)
.ToImmutableArray();
return new EnvironmentImpactAnalysis(
result.Tree.GetAllEnvDependencies(),
impacts,
result.Tree.AllPaths.Count(p => p.IsEnvDependent),
totalPaths);
}
private static PathEnumerationMetrics ComputeMetrics(
ExecutionTree tree,
PathEnumerationOptions options)
{
var feasiblePaths = tree.AllPaths.Count(p => p.IsFeasible);
var infeasiblePaths = tree.AllPaths.Count(p => !p.IsFeasible);
var envDependentPaths = tree.AllPaths.Count(p => p.IsEnvDependent);
var avgConfidence = tree.AllPaths.Length > 0
? tree.AllPaths.Average(p => p.ReachabilityConfidence)
: 1.0f;
var maxBranchDepth = tree.AllPaths.Length > 0
? tree.AllPaths.Max(p => p.BranchCount)
: 0;
var uniqueCommands = tree.GetAllConcreteCommands().Count;
return new PathEnumerationMetrics(
TotalPaths: tree.AllPaths.Length,
FeasiblePaths: feasiblePaths,
InfeasiblePaths: infeasiblePaths,
EnvDependentPaths: envDependentPaths,
AverageConfidence: avgConfidence,
MaxBranchDepth: maxBranchDepth,
UniqueTerminalCommands: uniqueCommands,
BranchCoverage: tree.Coverage,
DepthLimitReached: tree.DepthLimitReached,
PathLimitReached: tree.AllPaths.Length >= options.MaxPaths);
}
private static ImmutableDictionary<string, ImmutableArray<ExecutionPath>> GroupByTerminalCommand(
ExecutionTree tree)
{
var groups = new Dictionary<string, List<ExecutionPath>>();
foreach (var path in tree.FeasiblePaths)
{
foreach (var cmd in path.GetConcreteCommands())
{
var command = cmd.GetConcreteCommand();
if (command is null) continue;
if (!groups.TryGetValue(command, out var list))
{
list = new List<ExecutionPath>();
groups[command] = list;
}
list.Add(path);
}
}
return groups.ToImmutableDictionary(
kv => kv.Key,
kv => kv.Value.ToImmutableArray());
}
}
/// <summary>
/// Options for path enumeration.
/// </summary>
/// <param name="MaxDepth">Maximum depth for path exploration.</param>
/// <param name="MaxPaths">Maximum number of paths to enumerate.</param>
/// <param name="KnownEnvironment">Known environment variable values.</param>
/// <param name="PruneInfeasible">Whether to prune infeasible paths.</param>
/// <param name="TrackAllCommands">Whether to track all commands or just terminal ones.</param>
/// <param name="GroupByTerminalCommand">Whether to group paths by terminal command.</param>
public sealed record PathEnumerationOptions(
int MaxDepth = 100,
int MaxPaths = 1000,
IReadOnlyDictionary<string, string>? KnownEnvironment = null,
bool PruneInfeasible = true,
bool TrackAllCommands = false,
bool GroupByTerminalCommand = true)
{
/// <summary>
/// Default options.
/// </summary>
public static PathEnumerationOptions Default => new();
}
/// <summary>
/// Result of path enumeration.
/// </summary>
/// <param name="Tree">The complete execution tree.</param>
/// <param name="Metrics">Enumeration metrics.</param>
/// <param name="PathsByCommand">Paths grouped by terminal command (if requested).</param>
public sealed record PathEnumerationResult(
ExecutionTree Tree,
PathEnumerationMetrics Metrics,
ImmutableDictionary<string, ImmutableArray<ExecutionPath>> PathsByCommand);
/// <summary>
/// Metrics from path enumeration.
/// </summary>
public sealed record PathEnumerationMetrics(
int TotalPaths,
int FeasiblePaths,
int InfeasiblePaths,
int EnvDependentPaths,
float AverageConfidence,
int MaxBranchDepth,
int UniqueTerminalCommands,
BranchCoverage BranchCoverage,
bool DepthLimitReached,
bool PathLimitReached)
{
/// <summary>
/// Gets a human-readable summary.
/// </summary>
public string GetSummary()
=> $"Paths: {TotalPaths} ({FeasiblePaths} feasible, {InfeasiblePaths} infeasible, " +
$"{EnvDependentPaths} env-dependent), Commands: {UniqueTerminalCommands}, " +
$"Avg confidence: {AverageConfidence:P0}";
}
/// <summary>
/// Analysis of environment variable impact on execution paths.
/// </summary>
/// <param name="AllDependencies">All environment variables that affect paths.</param>
/// <param name="ImpactsByVariable">Impact analysis per variable.</param>
/// <param name="EnvDependentPathCount">Number of paths depending on environment.</param>
/// <param name="TotalPathCount">Total number of paths.</param>
public sealed record EnvironmentImpactAnalysis(
ImmutableHashSet<string> AllDependencies,
ImmutableArray<EnvironmentVariableImpact> ImpactsByVariable,
int EnvDependentPathCount,
int TotalPathCount)
{
/// <summary>
/// Ratio of paths that depend on environment.
/// </summary>
public float EnvDependentRatio => TotalPathCount > 0
? (float)EnvDependentPathCount / TotalPathCount
: 0;
}
/// <summary>
/// Impact of a single environment variable.
/// </summary>
/// <param name="VariableName">The environment variable name.</param>
/// <param name="ImpactScore">Score indicating importance (0.0-1.0).</param>
/// <param name="AffectedPaths">Path IDs affected by this variable.</param>
public sealed record EnvironmentVariableImpact(
string VariableName,
float ImpactScore,
List<string> AffectedPaths)
{
/// <summary>
/// Number of paths affected.
/// </summary>
public int AffectedPathCount => AffectedPaths.Count;
}

View File

@@ -0,0 +1,589 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using System.Text.RegularExpressions;
using StellaOps.Scanner.EntryTrace.Parsing;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Symbolic executor for shell scripts that explores all execution paths.
/// </summary>
public sealed class ShellSymbolicExecutor : ISymbolicExecutor
{
private static readonly Regex EnvVarPattern = new(
@"\$\{?(\w+)\}?",
RegexOptions.Compiled);
private static readonly Regex TestEmptyPattern = new(
@"\[\s*-z\s+""?\$\{?(\w+)\}?""?\s*\]",
RegexOptions.Compiled);
private static readonly Regex TestNonEmptyPattern = new(
@"\[\s*-n\s+""?\$\{?(\w+)\}?""?\s*\]",
RegexOptions.Compiled);
private static readonly Regex TestEqualityPattern = new(
@"\[\s*""?\$\{?(\w+)\}?""?\s*=\s*""?([^""\]]+)""?\s*\]",
RegexOptions.Compiled);
private static readonly Regex TestFileExistsPattern = new(
@"\[\s*-[fe]\s+""?([^""\]]+)""?\s*\]",
RegexOptions.Compiled);
private static readonly Regex TestDirExistsPattern = new(
@"\[\s*-d\s+""?([^""\]]+)""?\s*\]",
RegexOptions.Compiled);
private static readonly Regex TestExecutablePattern = new(
@"\[\s*-x\s+""?([^""\]]+)""?\s*\]",
RegexOptions.Compiled);
/// <inheritdoc/>
public Task<ExecutionTree> ExecuteAsync(
string source,
string scriptPath,
SymbolicExecutionOptions? options = null,
CancellationToken cancellationToken = default)
{
var script = ShellParser.Parse(source);
return ExecuteAsync(script, options ?? SymbolicExecutionOptions.Default, cancellationToken);
}
/// <inheritdoc/>
public async Task<ExecutionTree> ExecuteAsync(
ShellScript script,
SymbolicExecutionOptions options,
CancellationToken cancellationToken = default)
{
var builder = new ExecutionTreeBuilder("script", options.MaxDepth);
var constraintEvaluator = options.ConstraintEvaluator ?? PatternConstraintEvaluator.Instance;
var initialState = options.InitialEnvironment is { } env
? SymbolicState.WithEnvironment(env)
: SymbolicState.Initial();
var pathCount = 0;
var workList = new Stack<(SymbolicState State, int NodeIndex)>();
workList.Push((initialState, 0));
while (workList.Count > 0 && pathCount < options.MaxPaths)
{
cancellationToken.ThrowIfCancellationRequested();
var (state, nodeIndex) = workList.Pop();
// Check depth limit
if (state.Depth > options.MaxDepth)
{
builder.MarkDepthLimitReached();
var path = await CreatePathAsync(state, constraintEvaluator, cancellationToken);
builder.AddPath(path);
pathCount++;
continue;
}
// If we've processed all nodes, this is a complete path
if (nodeIndex >= script.Nodes.Length)
{
var path = await CreatePathAsync(state, constraintEvaluator, cancellationToken);
builder.AddPath(path);
pathCount++;
continue;
}
var node = script.Nodes[nodeIndex];
var nextIndex = nodeIndex + 1;
switch (node)
{
case ShellCommandNode cmd:
var cmdState = ProcessCommand(state, cmd);
workList.Push((cmdState, nextIndex));
break;
case ShellExecNode exec:
var execState = ProcessExec(state, exec);
// exec replaces the shell, so this path terminates
var execPath = await CreatePathAsync(execState, constraintEvaluator, cancellationToken);
builder.AddPath(execPath);
pathCount++;
break;
case ShellIfNode ifNode:
var ifStates = await ProcessIfAsync(
state, ifNode, builder, constraintEvaluator, options, cancellationToken);
foreach (var (branchState, branchNodes) in ifStates)
{
// Process the if body, then continue to next statement
var combinedState = await ProcessNodesAsync(
branchState, branchNodes, constraintEvaluator, options, cancellationToken);
workList.Push((combinedState, nextIndex));
}
break;
case ShellCaseNode caseNode:
var caseStates = await ProcessCaseAsync(
state, caseNode, builder, constraintEvaluator, options, cancellationToken);
foreach (var (branchState, branchNodes) in caseStates)
{
var combinedState = await ProcessNodesAsync(
branchState, branchNodes, constraintEvaluator, options, cancellationToken);
workList.Push((combinedState, nextIndex));
}
break;
case ShellIncludeNode:
case ShellRunPartsNode:
// Source includes and run-parts add unknown commands
var includeState = state.AddTerminalCommand(
new TerminalCommand(
SymbolicValue.Unknown(UnknownValueReason.ExternalInput, "source/run-parts"),
ImmutableArray<SymbolicValue>.Empty,
node.Span,
IsExec: false,
ImmutableDictionary<string, SymbolicValue>.Empty));
workList.Push((includeState, nextIndex));
break;
default:
workList.Push((state.IncrementDepth(), nextIndex));
break;
}
}
return builder.Build();
}
private SymbolicState ProcessCommand(SymbolicState state, ShellCommandNode cmd)
{
// Check for variable assignment (VAR=value)
if (cmd.Command.Contains('=') && !cmd.Command.StartsWith('-'))
{
var eqIndex = cmd.Command.IndexOf('=');
var varName = cmd.Command[..eqIndex];
var varValue = cmd.Command[(eqIndex + 1)..];
return state.SetVariable(varName, ParseValue(varValue, state));
}
// Regular command - add as terminal command
var commandValue = ParseValue(cmd.Command, state);
var arguments = cmd.Arguments
.Where(t => t.Kind == ShellTokenKind.Word)
.Select(t => ParseValue(t.Value, state))
.ToImmutableArray();
var terminalCmd = new TerminalCommand(
commandValue,
arguments,
cmd.Span,
IsExec: false,
ImmutableDictionary<string, SymbolicValue>.Empty);
return state.AddTerminalCommand(terminalCmd);
}
private SymbolicState ProcessExec(SymbolicState state, ShellExecNode exec)
{
// Find the actual command (skip 'exec' and any flags)
var args = exec.Arguments
.Where(t => t.Kind == ShellTokenKind.Word && t.Value != "exec" && !t.Value.StartsWith('-'))
.ToList();
if (args.Count == 0)
{
return state;
}
var command = ParseValue(args[0].Value, state);
var cmdArgs = args.Skip(1)
.Select(t => ParseValue(t.Value, state))
.ToImmutableArray();
var terminalCmd = new TerminalCommand(
command,
cmdArgs,
exec.Span,
IsExec: true,
ImmutableDictionary<string, SymbolicValue>.Empty);
return state.AddTerminalCommand(terminalCmd);
}
private async Task<List<(SymbolicState State, ImmutableArray<ShellNode> Nodes)>> ProcessIfAsync(
SymbolicState state,
ShellIfNode ifNode,
ExecutionTreeBuilder builder,
IConstraintEvaluator constraintEvaluator,
SymbolicExecutionOptions options,
CancellationToken cancellationToken)
{
var results = new List<(SymbolicState, ImmutableArray<ShellNode>)>();
var hasElse = ifNode.Branches.Any(b => b.Kind == ShellConditionalKind.Else);
var totalBranches = ifNode.Branches.Length + (hasElse ? 0 : 1); // +1 for implicit fall-through if no else
for (var i = 0; i < ifNode.Branches.Length; i++)
{
cancellationToken.ThrowIfCancellationRequested();
var branch = ifNode.Branches[i];
var predicate = branch.PredicateSummary ?? "";
// Create constraint for taking this branch
var constraint = CreateConstraint(predicate, branch.Span, isNegated: false);
// For if/elif, we need to negate all previous predicates
var branchState = state;
for (var j = 0; j < i; j++)
{
var prevBranch = ifNode.Branches[j];
if (prevBranch.Kind != ShellConditionalKind.Else)
{
var negatedConstraint = CreateConstraint(
prevBranch.PredicateSummary ?? "",
prevBranch.Span,
isNegated: true);
branchState = branchState.AddConstraint(negatedConstraint);
}
}
// Add the current branch constraint (positive for if/elif, none for else)
if (branch.Kind != ShellConditionalKind.Else)
{
branchState = branchState.AddConstraint(constraint);
}
// Check feasibility
var feasibility = await constraintEvaluator.EvaluateAsync(
branchState.PathConstraints, cancellationToken);
if (feasibility == ConstraintResult.Unsatisfiable && options.PruneInfeasiblePaths)
{
continue; // Skip this branch
}
// Fork the state for this branch
var decision = new BranchDecision(
branch.Span,
branch.Kind switch
{
ShellConditionalKind.If => BranchKind.If,
ShellConditionalKind.Elif => BranchKind.Elif,
ShellConditionalKind.Else => BranchKind.Else,
_ => BranchKind.If
},
i,
totalBranches,
predicate);
var forkedState = branchState.Fork(decision, $"if-{i}");
// Record branch point for coverage
builder.RecordBranchPoint(
branch.Span,
decision.BranchKind,
predicate,
totalBranches,
i,
constraint.IsEnvDependent,
feasibility != ConstraintResult.Unsatisfiable);
results.Add((forkedState, branch.Body));
}
// If no else branch, add fall-through path
if (!hasElse)
{
var fallThroughState = state;
for (var j = 0; j < ifNode.Branches.Length; j++)
{
var branch = ifNode.Branches[j];
if (branch.Kind != ShellConditionalKind.Else)
{
var negatedConstraint = CreateConstraint(
branch.PredicateSummary ?? "",
branch.Span,
isNegated: true);
fallThroughState = fallThroughState.AddConstraint(negatedConstraint);
}
}
var feasibility = await constraintEvaluator.EvaluateAsync(
fallThroughState.PathConstraints, cancellationToken);
if (feasibility != ConstraintResult.Unsatisfiable || !options.PruneInfeasiblePaths)
{
var decision = new BranchDecision(
ifNode.Span,
BranchKind.FallThrough,
ifNode.Branches.Length,
totalBranches,
null);
var forkedState = fallThroughState.Fork(decision, "if-fallthrough");
results.Add((forkedState, ImmutableArray<ShellNode>.Empty));
}
}
return results;
}
private async Task<List<(SymbolicState State, ImmutableArray<ShellNode> Nodes)>> ProcessCaseAsync(
SymbolicState state,
ShellCaseNode caseNode,
ExecutionTreeBuilder builder,
IConstraintEvaluator constraintEvaluator,
SymbolicExecutionOptions options,
CancellationToken cancellationToken)
{
var results = new List<(SymbolicState, ImmutableArray<ShellNode>)>();
var totalBranches = caseNode.Arms.Length + 1; // +1 for fall-through
for (var i = 0; i < caseNode.Arms.Length; i++)
{
cancellationToken.ThrowIfCancellationRequested();
var arm = caseNode.Arms[i];
var pattern = string.Join("|", arm.Patterns);
var constraint = new PathConstraint(
pattern,
IsNegated: false,
arm.Span,
ConstraintKind.PatternMatch,
ExtractEnvVars(pattern));
var branchState = state.AddConstraint(constraint);
var feasibility = await constraintEvaluator.EvaluateAsync(
branchState.PathConstraints, cancellationToken);
if (feasibility == ConstraintResult.Unsatisfiable && options.PruneInfeasiblePaths)
{
continue;
}
var decision = new BranchDecision(
arm.Span,
BranchKind.Case,
i,
totalBranches,
pattern);
var forkedState = branchState.Fork(decision, $"case-{i}");
builder.RecordBranchPoint(
arm.Span,
BranchKind.Case,
pattern,
totalBranches,
i,
constraint.IsEnvDependent,
feasibility != ConstraintResult.Unsatisfiable);
results.Add((forkedState, arm.Body));
}
// Add fall-through for no match
var fallThroughState = state;
foreach (var arm in caseNode.Arms)
{
var pattern = string.Join("|", arm.Patterns);
var negatedConstraint = new PathConstraint(
pattern,
IsNegated: true,
arm.Span,
ConstraintKind.PatternMatch,
ExtractEnvVars(pattern));
fallThroughState = fallThroughState.AddConstraint(negatedConstraint);
}
var fallThroughFeasibility = await constraintEvaluator.EvaluateAsync(
fallThroughState.PathConstraints, cancellationToken);
if (fallThroughFeasibility != ConstraintResult.Unsatisfiable || !options.PruneInfeasiblePaths)
{
var decision = new BranchDecision(
caseNode.Span,
BranchKind.FallThrough,
caseNode.Arms.Length,
totalBranches,
null);
var forkedState = fallThroughState.Fork(decision, "case-fallthrough");
results.Add((forkedState, ImmutableArray<ShellNode>.Empty));
}
return results;
}
private async Task<SymbolicState> ProcessNodesAsync(
SymbolicState state,
ImmutableArray<ShellNode> nodes,
IConstraintEvaluator constraintEvaluator,
SymbolicExecutionOptions options,
CancellationToken cancellationToken)
{
var currentState = state;
foreach (var node in nodes)
{
cancellationToken.ThrowIfCancellationRequested();
if (currentState.Depth > options.MaxDepth)
{
break;
}
switch (node)
{
case ShellCommandNode cmd:
currentState = ProcessCommand(currentState, cmd);
break;
case ShellExecNode exec:
return ProcessExec(currentState, exec);
case ShellIfNode ifNode:
// For nested if, just take the first feasible branch (simplified)
var ifStates = await ProcessIfAsync(
currentState, ifNode,
new ExecutionTreeBuilder("nested", options.MaxDepth),
constraintEvaluator, options, cancellationToken);
if (ifStates.Count > 0)
{
currentState = await ProcessNodesAsync(
ifStates[0].State, ifStates[0].Nodes,
constraintEvaluator, options, cancellationToken);
}
break;
case ShellCaseNode caseNode:
var caseStates = await ProcessCaseAsync(
currentState, caseNode,
new ExecutionTreeBuilder("nested", options.MaxDepth),
constraintEvaluator, options, cancellationToken);
if (caseStates.Count > 0)
{
currentState = await ProcessNodesAsync(
caseStates[0].State, caseStates[0].Nodes,
constraintEvaluator, options, cancellationToken);
}
break;
}
currentState = currentState.IncrementDepth();
}
return currentState;
}
private async Task<ExecutionPath> CreatePathAsync(
SymbolicState state,
IConstraintEvaluator constraintEvaluator,
CancellationToken cancellationToken)
{
var feasibility = await constraintEvaluator.EvaluateAsync(
state.PathConstraints, cancellationToken);
var confidence = await constraintEvaluator.ComputeConfidenceAsync(
state.PathConstraints, cancellationToken);
return ExecutionPath.FromState(
state,
feasibility != ConstraintResult.Unsatisfiable,
confidence);
}
private PathConstraint CreateConstraint(string predicate, ShellSpan span, bool isNegated)
{
var kind = ClassifyPredicate(predicate);
var envVars = ExtractEnvVars(predicate);
return new PathConstraint(predicate, isNegated, span, kind, envVars);
}
private ConstraintKind ClassifyPredicate(string predicate)
{
if (TestEmptyPattern.IsMatch(predicate))
return ConstraintKind.StringEmpty;
if (TestNonEmptyPattern.IsMatch(predicate))
return ConstraintKind.StringEmpty;
if (TestEqualityPattern.IsMatch(predicate))
return ConstraintKind.StringEquality;
if (TestFileExistsPattern.IsMatch(predicate))
return ConstraintKind.FileExists;
if (TestDirExistsPattern.IsMatch(predicate))
return ConstraintKind.DirectoryExists;
if (TestExecutablePattern.IsMatch(predicate))
return ConstraintKind.IsExecutable;
return ConstraintKind.Unknown;
}
private ImmutableArray<string> ExtractEnvVars(string expression)
{
var matches = EnvVarPattern.Matches(expression);
if (matches.Count == 0)
{
return ImmutableArray<string>.Empty;
}
return matches
.Select(m => m.Groups[1].Value)
.Distinct()
.ToImmutableArray();
}
private SymbolicValue ParseValue(string token, SymbolicState state)
{
if (!token.Contains('$'))
{
return SymbolicValue.Concrete(token);
}
// Check for command substitution
if (token.Contains("$(") || token.Contains('`'))
{
return SymbolicValue.Unknown(UnknownValueReason.CommandSubstitution);
}
// Extract variable references
var matches = EnvVarPattern.Matches(token);
if (matches.Count == 0)
{
return SymbolicValue.Concrete(token);
}
if (matches.Count == 1 && matches[0].Value == token)
{
// Entire token is a single variable reference
var varName = matches[0].Groups[1].Value;
return state.GetVariable(varName);
}
// Mixed content - create composite
var parts = new List<SymbolicValue>();
var lastEnd = 0;
foreach (Match match in matches)
{
if (match.Index > lastEnd)
{
parts.Add(SymbolicValue.Concrete(token[lastEnd..match.Index]));
}
var varName = match.Groups[1].Value;
parts.Add(state.GetVariable(varName));
lastEnd = match.Index + match.Length;
}
if (lastEnd < token.Length)
{
parts.Add(SymbolicValue.Concrete(token[lastEnd..]));
}
return SymbolicValue.Composite(parts.ToImmutableArray());
}
}

View File

@@ -0,0 +1,226 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using StellaOps.Scanner.EntryTrace.Parsing;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Represents the complete state during symbolic execution of a shell script.
/// Immutable to support forking at branch points.
/// </summary>
/// <param name="Variables">Current variable bindings (name → symbolic value).</param>
/// <param name="PathConstraints">Accumulated constraints from branches taken.</param>
/// <param name="TerminalCommands">Terminal commands encountered on this path.</param>
/// <param name="Depth">Current depth in the execution tree.</param>
/// <param name="PathId">Unique identifier for this execution path.</param>
/// <param name="BranchHistory">History of branches taken (for deterministic path IDs).</param>
public sealed record SymbolicState(
ImmutableDictionary<string, SymbolicValue> Variables,
ImmutableArray<PathConstraint> PathConstraints,
ImmutableArray<TerminalCommand> TerminalCommands,
int Depth,
string PathId,
ImmutableArray<BranchDecision> BranchHistory)
{
/// <summary>
/// Creates an initial empty state.
/// </summary>
public static SymbolicState Initial() => new(
ImmutableDictionary<string, SymbolicValue>.Empty,
ImmutableArray<PathConstraint>.Empty,
ImmutableArray<TerminalCommand>.Empty,
Depth: 0,
PathId: "root",
ImmutableArray<BranchDecision>.Empty);
/// <summary>
/// Creates an initial state with predefined environment variables.
/// </summary>
public static SymbolicState WithEnvironment(
IReadOnlyDictionary<string, string> environment)
{
var variables = environment
.ToImmutableDictionary(
kv => kv.Key,
kv => (SymbolicValue)new ConcreteValue(kv.Value));
return new SymbolicState(
variables,
ImmutableArray<PathConstraint>.Empty,
ImmutableArray<TerminalCommand>.Empty,
Depth: 0,
PathId: "root",
ImmutableArray<BranchDecision>.Empty);
}
/// <summary>
/// Sets a variable to a new value.
/// </summary>
public SymbolicState SetVariable(string name, SymbolicValue value)
=> this with { Variables = Variables.SetItem(name, value) };
/// <summary>
/// Gets a variable's value, returning a symbolic reference if not found.
/// </summary>
public SymbolicValue GetVariable(string name)
=> Variables.TryGetValue(name, out var value)
? value
: SymbolicValue.Symbolic(name);
/// <summary>
/// Adds a constraint from taking a branch.
/// </summary>
public SymbolicState AddConstraint(PathConstraint constraint)
=> this with { PathConstraints = PathConstraints.Add(constraint) };
/// <summary>
/// Records a terminal command executed on this path.
/// </summary>
public SymbolicState AddTerminalCommand(TerminalCommand command)
=> this with { TerminalCommands = TerminalCommands.Add(command) };
/// <summary>
/// Increments the depth counter.
/// </summary>
public SymbolicState IncrementDepth()
=> this with { Depth = Depth + 1 };
/// <summary>
/// Forks this state for a new branch, recording the decision.
/// </summary>
public SymbolicState Fork(BranchDecision decision, string branchSuffix)
=> this with
{
PathId = $"{PathId}/{branchSuffix}",
BranchHistory = BranchHistory.Add(decision),
Depth = Depth + 1
};
/// <summary>
/// Gets all environment variable names this state depends on.
/// </summary>
public ImmutableHashSet<string> GetEnvDependencies()
{
var deps = new HashSet<string>();
foreach (var constraint in PathConstraints)
{
deps.UnionWith(constraint.DependsOnEnv);
}
foreach (var (_, value) in Variables)
{
deps.UnionWith(value.GetDependentVariables());
}
return deps.ToImmutableHashSet();
}
}
/// <summary>
/// Records a branch decision made during symbolic execution.
/// </summary>
/// <param name="Location">Source location of the branch.</param>
/// <param name="BranchKind">Type of branch (If, Elif, Else, Case).</param>
/// <param name="BranchIndex">Index of the branch taken (0-based).</param>
/// <param name="TotalBranches">Total number of branches at this point.</param>
/// <param name="Predicate">The predicate expression (if applicable).</param>
public sealed record BranchDecision(
ShellSpan Location,
BranchKind BranchKind,
int BranchIndex,
int TotalBranches,
string? Predicate);
/// <summary>
/// Classification of branch types in shell scripts.
/// </summary>
public enum BranchKind
{
/// <summary>
/// An if branch.
/// </summary>
If,
/// <summary>
/// An elif branch.
/// </summary>
Elif,
/// <summary>
/// An else branch (no predicate).
/// </summary>
Else,
/// <summary>
/// A case arm.
/// </summary>
Case,
/// <summary>
/// A loop (for, while, until).
/// </summary>
Loop,
/// <summary>
/// An implicit fall-through (no matching branch).
/// </summary>
FallThrough
}
/// <summary>
/// Represents a terminal command discovered during symbolic execution.
/// </summary>
/// <param name="Command">The command name or path.</param>
/// <param name="Arguments">Command arguments (may contain symbolic values).</param>
/// <param name="Location">Source location in the script.</param>
/// <param name="IsExec">True if this is an exec (replaces shell process).</param>
/// <param name="EnvironmentOverrides">Environment variables set for this command.</param>
public sealed record TerminalCommand(
SymbolicValue Command,
ImmutableArray<SymbolicValue> Arguments,
ShellSpan Location,
bool IsExec,
ImmutableDictionary<string, SymbolicValue> EnvironmentOverrides)
{
/// <summary>
/// Whether the command is fully concrete (can be resolved statically).
/// </summary>
public bool IsConcrete => Command.IsConcrete && Arguments.All(a => a.IsConcrete);
/// <summary>
/// Gets the concrete command string if available.
/// </summary>
public string? GetConcreteCommand()
=> Command.TryGetConcrete(out var cmd) ? cmd : null;
/// <summary>
/// Gets all environment variables this command depends on.
/// </summary>
public ImmutableArray<string> GetDependentVariables()
{
var deps = new HashSet<string>();
deps.UnionWith(Command.GetDependentVariables());
foreach (var arg in Arguments)
{
deps.UnionWith(arg.GetDependentVariables());
}
return deps.ToImmutableArray();
}
/// <summary>
/// Creates a concrete terminal command.
/// </summary>
public static TerminalCommand Concrete(
string command,
IEnumerable<string> arguments,
ShellSpan location,
bool isExec = false)
=> new(
new ConcreteValue(command),
arguments.Select(a => (SymbolicValue)new ConcreteValue(a)).ToImmutableArray(),
location,
isExec,
ImmutableDictionary<string, SymbolicValue>.Empty);
}

View File

@@ -0,0 +1,295 @@
// Licensed to StellaOps under the AGPL-3.0-or-later license.
using System.Collections.Immutable;
using StellaOps.Scanner.EntryTrace.Parsing;
namespace StellaOps.Scanner.EntryTrace.Speculative;
/// <summary>
/// Represents a symbolic value during speculative execution.
/// Values can be concrete (known), symbolic (constrained), unknown, or composite.
/// </summary>
public abstract record SymbolicValue
{
/// <summary>
/// Creates a concrete value with a known string representation.
/// </summary>
public static SymbolicValue Concrete(string value) => new ConcreteValue(value);
/// <summary>
/// Creates a symbolic value representing an unknown variable.
/// </summary>
public static SymbolicValue Symbolic(string name, ImmutableArray<PathConstraint> constraints = default)
=> new SymbolicVariable(name, constraints.IsDefault ? ImmutableArray<PathConstraint>.Empty : constraints);
/// <summary>
/// Creates an unknown value with a reason.
/// </summary>
public static SymbolicValue Unknown(UnknownValueReason reason, string? description = null)
=> new UnknownValue(reason, description);
/// <summary>
/// Creates a composite value from multiple parts.
/// </summary>
public static SymbolicValue Composite(ImmutableArray<SymbolicValue> parts)
=> new CompositeValue(parts);
/// <summary>
/// Whether this value is fully concrete (known at analysis time).
/// </summary>
public abstract bool IsConcrete { get; }
/// <summary>
/// Attempts to get the concrete string value if known.
/// </summary>
public abstract bool TryGetConcrete(out string? value);
/// <summary>
/// Gets all environment variable names this value depends on.
/// </summary>
public abstract ImmutableArray<string> GetDependentVariables();
}
/// <summary>
/// A concrete (fully known) value.
/// </summary>
public sealed record ConcreteValue(string Value) : SymbolicValue
{
public override bool IsConcrete => true;
public override bool TryGetConcrete(out string? value)
{
value = Value;
return true;
}
public override ImmutableArray<string> GetDependentVariables()
=> ImmutableArray<string>.Empty;
public override string ToString() => $"Concrete(\"{Value}\")";
}
/// <summary>
/// A symbolic variable with optional constraints.
/// </summary>
public sealed record SymbolicVariable(
string Name,
ImmutableArray<PathConstraint> Constraints) : SymbolicValue
{
public override bool IsConcrete => false;
public override bool TryGetConcrete(out string? value)
{
value = null;
return false;
}
public override ImmutableArray<string> GetDependentVariables()
=> ImmutableArray.Create(Name);
public override string ToString() => $"Symbolic({Name})";
}
/// <summary>
/// An unknown value with a reason for being unknown.
/// </summary>
public sealed record UnknownValue(
UnknownValueReason Reason,
string? Description) : SymbolicValue
{
public override bool IsConcrete => false;
public override bool TryGetConcrete(out string? value)
{
value = null;
return false;
}
public override ImmutableArray<string> GetDependentVariables()
=> ImmutableArray<string>.Empty;
public override string ToString() => $"Unknown({Reason})";
}
/// <summary>
/// A composite value built from multiple parts (e.g., string concatenation).
/// </summary>
public sealed record CompositeValue(ImmutableArray<SymbolicValue> Parts) : SymbolicValue
{
public override bool IsConcrete => Parts.All(p => p.IsConcrete);
public override bool TryGetConcrete(out string? value)
{
if (!IsConcrete)
{
value = null;
return false;
}
var builder = new System.Text.StringBuilder();
foreach (var part in Parts)
{
if (part.TryGetConcrete(out var partValue))
{
builder.Append(partValue);
}
}
value = builder.ToString();
return true;
}
public override ImmutableArray<string> GetDependentVariables()
=> Parts.SelectMany(p => p.GetDependentVariables()).Distinct().ToImmutableArray();
public override string ToString()
=> $"Composite([{string.Join(", ", Parts)}])";
}
/// <summary>
/// Reasons why a value cannot be determined statically.
/// </summary>
public enum UnknownValueReason
{
/// <summary>
/// Value comes from command substitution (e.g., $(command)).
/// </summary>
CommandSubstitution,
/// <summary>
/// Value comes from process substitution.
/// </summary>
ProcessSubstitution,
/// <summary>
/// Value requires runtime evaluation.
/// </summary>
RuntimeEvaluation,
/// <summary>
/// Value comes from external input (stdin, file).
/// </summary>
ExternalInput,
/// <summary>
/// Arithmetic expression that couldn't be evaluated.
/// </summary>
ArithmeticExpression,
/// <summary>
/// Dynamic variable name (indirect reference).
/// </summary>
IndirectReference,
/// <summary>
/// Array expansion with unknown indices.
/// </summary>
ArrayExpansion,
/// <summary>
/// Glob pattern expansion.
/// </summary>
GlobExpansion,
/// <summary>
/// Analysis depth limit reached.
/// </summary>
DepthLimitReached,
/// <summary>
/// Unsupported shell construct.
/// </summary>
UnsupportedConstruct
}
/// <summary>
/// A constraint on an execution path derived from a conditional branch.
/// </summary>
/// <param name="Expression">The original predicate expression text.</param>
/// <param name="IsNegated">True if we took the else/false branch.</param>
/// <param name="Source">Source location of the branch.</param>
/// <param name="Kind">The type of constraint.</param>
/// <param name="DependsOnEnv">Environment variables this constraint depends on.</param>
public sealed record PathConstraint(
string Expression,
bool IsNegated,
ShellSpan Source,
ConstraintKind Kind,
ImmutableArray<string> DependsOnEnv)
{
/// <summary>
/// Creates the negation of this constraint.
/// </summary>
public PathConstraint Negate() => this with { IsNegated = !IsNegated };
/// <summary>
/// Whether this constraint depends on environment variables.
/// </summary>
public bool IsEnvDependent => !DependsOnEnv.IsEmpty;
/// <summary>
/// Gets a deterministic string representation for hashing.
/// </summary>
public string ToCanonical()
=> $"{(IsNegated ? "!" : "")}{Expression}@{Source.StartLine}:{Source.StartColumn}";
}
/// <summary>
/// Classification of constraint types for pattern-based evaluation.
/// </summary>
public enum ConstraintKind
{
/// <summary>
/// Variable existence/emptiness: [ -z "$VAR" ] or [ -n "$VAR" ]
/// </summary>
StringEmpty,
/// <summary>
/// String equality: [ "$VAR" = "value" ]
/// </summary>
StringEquality,
/// <summary>
/// String inequality: [ "$VAR" != "value" ]
/// </summary>
StringInequality,
/// <summary>
/// File existence: [ -f "$PATH" ] or [ -e "$PATH" ]
/// </summary>
FileExists,
/// <summary>
/// Directory existence: [ -d "$PATH" ]
/// </summary>
DirectoryExists,
/// <summary>
/// Executable check: [ -x "$PATH" ]
/// </summary>
IsExecutable,
/// <summary>
/// Readable check: [ -r "$PATH" ]
/// </summary>
IsReadable,
/// <summary>
/// Writable check: [ -w "$PATH" ]
/// </summary>
IsWritable,
/// <summary>
/// Numeric comparison: [ "$A" -eq "$B" ]
/// </summary>
NumericComparison,
/// <summary>
/// Case pattern match.
/// </summary>
PatternMatch,
/// <summary>
/// Complex or unknown constraint type.
/// </summary>
Unknown
}

View File

@@ -2,14 +2,14 @@
| Task ID | Sprint | Status | Notes |
| --- | --- | --- | --- |
| `PROOFSPINE-3100-DB` | `docs/implplan/SPRINT_3100_0001_0001_proof_spine_system.md` | DOING | Add Postgres migrations and repository for ProofSpine persistence (`proof_spines`, `proof_segments`, `proof_spine_history`). |
| `PROOFSPINE-3100-DB` | `docs/implplan/archived/SPRINT_3100_0001_0001_proof_spine_system.md` | DONE | Postgres migrations and repository for ProofSpine implemented (`proof_spines`, `proof_segments`, `proof_spine_history`). |
| `SCAN-API-3103-004` | `docs/implplan/SPRINT_3103_0001_0001_scanner_api_ingestion_completion.md` | DONE | Fix scanner storage connection/schema issues surfaced by Scanner WebService ingestion tests. |
| `DRIFT-3600-DB` | `docs/implplan/SPRINT_3600_0003_0001_drift_detection_engine.md` | DONE | Add drift tables migration + code change/drift result repositories + DI wiring. |
| `EPSS-3410-001` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | Added EPSS schema migration `Postgres/Migrations/008_epss_integration.sql` and wired via `MigrationIds.cs`. |
| `EPSS-3410-002` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement `EpssScoreRow` + ingestion models. |
| `EPSS-3410-003` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement `IEpssSource` interface (online vs bundle). |
| `EPSS-3410-004` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement `EpssOnlineSource` (download to temp; hash provenance). |
| `EPSS-3410-005` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement `EpssBundleSource` (air-gap file input). |
| `EPSS-3410-006` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement streaming `EpssCsvStreamParser` (validation + header comment extraction). |
| `EPSS-3410-007` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement Postgres `IEpssRepository` (runs + scores/current/changes). |
| `EPSS-3410-008` | `docs/implplan/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DOING | Implement change detection + flags (`compute_epss_change_flags` + delta join). |
| `EPSS-3410-001` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | Added EPSS schema migration `Postgres/Migrations/008_epss_integration.sql` and wired via `MigrationIds.cs`. |
| `EPSS-3410-002` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | `EpssScoreRow` + ingestion models implemented. |
| `EPSS-3410-003` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | `IEpssSource` interface implemented (online vs bundle). |
| `EPSS-3410-004` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | `EpssOnlineSource` implemented (download to temp; hash provenance). |
| `EPSS-3410-005` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | `EpssBundleSource` implemented (air-gap file input). |
| `EPSS-3410-006` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | Streaming `EpssCsvStreamParser` implemented (validation + header comment extraction). |
| `EPSS-3410-007` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | Postgres `IEpssRepository` implemented (runs + scores/current/changes). |
| `EPSS-3410-008` | `docs/implplan/archived/SPRINT_3410_0001_0001_epss_ingestion_storage.md` | DONE | Change detection + flags implemented (`EpssChangeDetector` + delta join). |