Refactor code structure for improved readability and maintainability; optimize performance in key functions.

This commit is contained in:
master
2025-12-22 19:06:31 +02:00
parent dfaa2079aa
commit 0536a4f7d4
1443 changed files with 109671 additions and 7840 deletions

View File

@@ -1,295 +1,142 @@
# CVESymbol Mapping
# CVE-to-Symbol Mapping
_Last updated: 2025-12-22. Owner: Scanner Guild + Concelier Guild._
This document describes how Stella Ops maps CVE identifiers to specific binary symbols/functions for precise reachability analysis.
This document describes how StellaOps maps CVE identifiers to specific binary symbols/functions for reachability slices.
---
## 1. Overview
To determine if a vulnerability is reachable, we need to know which specific functions are affected. The **CVE→Symbol Mapping** service bridges:
To determine if a vulnerability is reachable, StellaOps resolves:
- **CVE identifiers** (e.g., `CVE-2024-1234`)
- **Package coordinates** (e.g., `pkg:npm/lodash@4.17.21`)
- **Affected symbols** (e.g., `lodash.template`, `openssl:EVP_PKEY_decrypt`)
The mapping is used by `SliceExtractor` to target the right symbols and by downstream VEX decisions.
---
## 2. Data Sources
### 2.1 Patch Diff Analysis
### 2.1 Patch Diff Surfaces (Preferred)
The highest-fidelity source: analyze git commits that fix vulnerabilities.
Highest-fidelity source: compute method-level diffs between vulnerable and fixed versions.
```
CVE-2024-1234 fixed in commit abc123
→ Diff shows changes to:
- src/crypto.c: EVP_PKEY_decrypt() [modified]
- src/crypto.c: decrypt_internal() [added guard]
→ Affected symbols: EVP_PKEY_decrypt, decrypt_internal
```
**Implementation**: `StellaOps.Scanner.VulnSurfaces`
**Implementation**: `StellaOps.Scanner.VulnSurfaces.PatchDiffAnalyzer`
### 2.2 Advisory Linksets (Concelier)
### 2.2 Advisory Metadata
Scanner queries Concelier's LNM linksets for package coordinates and optional symbol hints.
Structured advisories with function-level detail:
**Implementation**: `StellaOps.Scanner.Advisory` -> Concelier `/v1/lnm/linksets/{cveId}` or `/v1/lnm/linksets/search`
- **OSV** (`affected[].ranges[].events[].introduced/fixed`)
- **NVD CPE** with CWE → typical affected patterns
- **Vendor advisories** (GitHub, npm, PyPI security advisories)
### 2.3 Offline Bundles
**Implementation**: `StellaOps.Concelier.Connectors.*`
For air-gapped environments, precomputed bundles map CVEs to packages and symbols.
### 2.3 Heuristic Inference
When precise mappings unavailable:
1. **All public exports** of affected package version
2. **CWE-based patterns** (e.g., CWE-79 XSS → output functions)
3. **Function name patterns** (e.g., `*_decrypt*`, `*_parse*`)
**Implementation**: `StellaOps.Scanner.VulnSurfaces.HeuristicMapper`
**Implementation**: `FileAdvisoryBundleStore`
---
## 3. Mapping Confidence Tiers
## 3. Service Contracts
| Tier | Source | Confidence | Example |
|------|--------|------------|---------|
| **Confirmed** | Patch diff analysis | 0.951.0 | Exact function from git diff |
| **Likely** | Advisory with function names | 0.70.9 | OSV with `affected.functions[]` |
| **Inferred** | CWE/pattern heuristics | 0.40.6 | All exports of vulnerable version |
| **Unknown** | No data available | 0.00.3 | Package-level only |
### 3.1 CVE -> Package/Symbol Mapping
---
```csharp
public interface IAdvisoryClient
{
Task<AdvisorySymbolMapping?> GetCveSymbolsAsync(string cveId, CancellationToken ct = default);
}
## 4. Query Interface
public sealed record AdvisorySymbolMapping
{
public required string CveId { get; init; }
public ImmutableArray<AdvisoryPackageSymbols> Packages { get; init; }
public required string Source { get; init; } // "concelier" | "bundle"
}
### 4.1 Service Contract
public sealed record AdvisoryPackageSymbols
{
public required string Purl { get; init; }
public ImmutableArray<string> Symbols { get; init; }
}
```
### 3.2 CVE + PURL -> Affected Symbols
```csharp
public interface IVulnSurfaceService
{
/// <summary>
/// Get symbols affected by a CVE for a specific package.
/// </summary>
Task<VulnSurfaceResult> GetAffectedSymbolsAsync(
string cveId,
string purl,
VulnSurfaceOptions? options = null,
CancellationToken ct = default);
/// <summary>
/// Batch query for multiple CVE+PURL pairs.
/// </summary>
Task<IReadOnlyList<VulnSurfaceResult>> GetAffectedSymbolsBatchAsync(
IEnumerable<(string CveId, string Purl)> queries,
CancellationToken ct = default);
}
```
### 4.2 Result Model
```csharp
public sealed record VulnSurfaceResult
{
public required string CveId { get; init; }
public required string Purl { get; init; }
public required ImmutableArray<AffectedSymbol> Symbols { get; init; }
public required VulnSurfaceSource Source { get; init; }
public required string Source { get; init; } // "surface" | "package-symbols" | "heuristic"
public required double Confidence { get; init; }
public DateTimeOffset? CachedAt { get; init; }
}
public sealed record AffectedSymbol
{
public required string Name { get; init; }
public required string SymbolId { get; init; }
public string? File { get; init; }
public int? Line { get; init; }
public string? Signature { get; init; }
public SymbolChangeType ChangeType { get; init; }
}
public enum VulnSurfaceSource
{
PatchDiff,
Advisory,
Heuristic,
Unknown
}
public enum SymbolChangeType
{
Modified, // Function code changed
Added, // New guard/check added
Removed, // Vulnerable code removed
Renamed // Function renamed
public string? MethodKey { get; init; }
public string? DisplayName { get; init; }
public string? ChangeType { get; init; }
public double Confidence { get; init; }
}
```
---
## 5. Integration with Concelier
## 4. Caching Strategy
The CVE→Symbol mapping service integrates with Concelier's advisory feed:
```
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ Scanner │────►│ VulnSurface │────►│ Concelier │
│ (Query) │ │ Service │ │ Advisory API │
└─────────────────┘ └──────────────────┘ └───────────────────┘
┌──────────────────┐
│ Patch Diff │
│ Analyzer │
└──────────────────┘
```
### 5.1 Advisory Client
```csharp
public interface IAdvisoryClient
{
Task<Advisory?> GetAdvisoryAsync(string cveId, CancellationToken ct);
Task<IReadOnlyList<AffectedPackage>> GetAffectedPackagesAsync(
string cveId,
CancellationToken ct);
}
```
### 5.2 Caching Strategy
| Data | TTL | Invalidation |
|------|-----|--------------|
| Advisory metadata | 1 hour | On feed update |
| Patch diff results | 24 hours | On new CVE revision |
| Heuristic mappings | 15 minutes | On query |
| Data | TTL | Notes |
|------|-----|------|
| Advisory linksets | 1 hour | In-memory cache; configurable TTL |
| Offline bundles | Process lifetime | Loaded once from file |
---
## 6. Offline Support
For air-gapped environments:
### 6.1 Pre-computed Bundles
```
offline-bundles/
vuln-surfaces/
cve-2024-*.json # Pre-computed mappings
ecosystem-npm.json # NPM ecosystem mappings
ecosystem-pypi.json # PyPI ecosystem mappings
```
### 6.2 Bundle Format
## 5. Offline Bundle Format
```json
{
"version": "1.0.0",
"generatedAt": "2025-12-22T00:00:00Z",
"mappings": {
"CVE-2024-1234": {
"pkg:npm/lodash@4.17.21": {
"symbols": ["template", "templateSettings"],
"source": "patch_diff",
"confidence": 0.95
}
}
}
}
```
---
## 7. Fallback Behavior
When no mapping is available:
1. **Ecosystem-specific defaults**:
- npm: All `exports` from package.json
- PyPI: All public functions (`__all__`)
- Native: All exported symbols (`.dynsym`)
2. **Conservative approach**:
- Mark all public APIs as potentially affected
- Set confidence = 0.3 (Inferred tier)
- Include explanation in verdict reasons
3. **Manual override**:
- Allow user-provided symbol lists via policy
- Support suppression rules for known false positives
---
## 8. Performance Considerations
| Metric | Target | Notes |
|--------|--------|-------|
| Cache hit rate | >90% | Most queries hit cache |
| Cold query latency | <500ms | Concelier API call |
| Batch throughput | >100 queries/sec | Parallel execution |
---
## 9. Example Queries
### Simple Query
```http
POST /api/vuln-surfaces/query
Content-Type: application/json
{
"cveId": "CVE-2024-1234",
"purl": "pkg:npm/lodash@4.17.21"
}
```
Response:
```json
{
"cveId": "CVE-2024-1234",
"purl": "pkg:npm/lodash@4.17.21",
"symbols": [
"items": [
{
"name": "template",
"symbolId": "js:lodash/template",
"file": "lodash.js",
"line": 14850,
"changeType": "modified"
"cveId": "CVE-2024-1234",
"source": "bundle",
"packages": [
{
"purl": "pkg:npm/lodash@4.17.21",
"symbols": ["template", "templateSettings"]
}
]
}
],
"source": "patch_diff",
"confidence": 0.95
}
```
### Batch Query
```http
POST /api/vuln-surfaces/batch
Content-Type: application/json
{
"queries": [
{"cveId": "CVE-2024-1234", "purl": "pkg:npm/lodash@4.17.21"},
{"cveId": "CVE-2024-5678", "purl": "pkg:pypi/requests@2.28.0"}
]
}
```
---
## 10. Related Documentation
## 6. Fallback Behavior
When no surface or advisory mapping is available, the service returns an empty symbol list with low confidence and `Source = "heuristic"`. Callers may inject an `IPackageSymbolProvider` to supply public-symbol fallbacks.
---
## 7. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Patch Oracles](./patch-oracles.md)
- [Concelier Architecture](../modules/concelier/architecture.md)
- [Vulnerability Surfaces](../modules/scanner/vuln-surfaces.md)
---

View File

@@ -1,190 +1,179 @@
# Reachability Slice Schema
# Reachability Slice Schema
_Last updated: 2025-12-22. Owner: Scanner Guild._
This document defines the **Reachability Slice** schemaa minimal, attestable proof unit that answers whether a vulnerable symbol is reachable from application entrypoints.
This document defines the **Reachability Slice** schema - a minimal, attestable proof unit that answers whether a vulnerable symbol is reachable from application entrypoints.
---
## 1. Overview
A **slice** is a focused subgraph extracted from a full reachability graph, containing only the nodes and edges relevant to answering a specific reachability query (e.g., "Is CVE-2024-1234's vulnerable function reachable?").
A **slice** is a focused subgraph extracted from a full reachability graph, containing only the nodes and edges relevant to answering a specific reachability query (for example, "Is CVE-2024-1234's vulnerable function reachable?").
### Key Properties
| Property | Description |
|----------|-------------|
| **Minimal** | Contains only nodes/edges on paths between entrypoints and targets |
| **Attestable** | DSSE-signed with in-toto predicate format |
| **Reproducible** | Same inputs same bytes (deterministic) |
| **Attestable** | DSSE-signed with a dedicated slice predicate |
| **Reproducible** | Same inputs -> same bytes (deterministic) |
| **Content-addressed** | Retrieved by BLAKE3 digest |
---
## 2. Schema Definition
## 2. Predicate Type & Schema
### 2.1 DSSE Predicate Type
- Predicate type: `stellaops.dev/predicates/reachability-slice@v1`
- JSON schema: `https://stellaops.dev/schemas/stellaops-slice.v1.schema.json`
- DSSE payload type: `application/vnd.stellaops.slice.v1+json`
```
https://stellaops.dev/predicates/reachability-slice/v1
```
---
### 2.2 Full Schema
## 3. Schema Structure
```json
### 3.1 ReachabilitySlice
```csharp
public sealed record ReachabilitySlice
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://stellaops.dev/schemas/reachability-slice.v1.schema.json",
"title": "Reachability Slice",
"type": "object",
"required": ["_type", "inputs", "query", "subgraph", "verdict", "manifest"],
"properties": {
"_type": {
"const": "https://stellaops.dev/predicates/reachability-slice/v1"
},
"inputs": { "$ref": "#/$defs/SliceInputs" },
"query": { "$ref": "#/$defs/SliceQuery" },
"subgraph": { "$ref": "#/$defs/SliceSubgraph" },
"verdict": { "$ref": "#/$defs/SliceVerdict" },
"manifest": { "$ref": "#/$defs/ScanManifest" }
},
"$defs": {
"SliceInputs": {
"type": "object",
"required": ["graphDigest", "binaryDigests"],
"properties": {
"graphDigest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" },
"binaryDigests": {
"type": "array",
"items": { "type": "string", "pattern": "^sha256:[a-f0-9]{64}$" }
},
"sbomDigest": { "type": "string" },
"layerDigests": { "type": "array", "items": { "type": "string" } }
}
},
"SliceQuery": {
"type": "object",
"properties": {
"cveId": { "type": "string", "pattern": "^CVE-\\d{4}-\\d+$" },
"targetSymbols": { "type": "array", "items": { "type": "string" } },
"entrypoints": { "type": "array", "items": { "type": "string" } },
"policyHash": { "type": "string" }
}
},
"SliceSubgraph": {
"type": "object",
"required": ["nodes", "edges"],
"properties": {
"nodes": {
"type": "array",
"items": { "$ref": "#/$defs/SliceNode" }
},
"edges": {
"type": "array",
"items": { "$ref": "#/$defs/SliceEdge" }
}
}
},
"SliceNode": {
"type": "object",
"required": ["id", "symbol", "kind"],
"properties": {
"id": { "type": "string" },
"symbol": { "type": "string" },
"kind": { "enum": ["entrypoint", "intermediate", "target", "unknown"] },
"file": { "type": "string" },
"line": { "type": "integer" },
"purl": { "type": "string" },
"attributes": { "type": "object" }
}
},
"SliceEdge": {
"type": "object",
"required": ["from", "to", "confidence"],
"properties": {
"from": { "type": "string" },
"to": { "type": "string" },
"kind": { "enum": ["direct", "plt", "iat", "dynamic", "unknown"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"evidence": { "type": "string" },
"gate": { "$ref": "#/$defs/GateInfo" },
"observed": { "$ref": "#/$defs/ObservedInfo" }
}
},
"GateInfo": {
"type": "object",
"properties": {
"type": { "enum": ["feature_flag", "auth", "config", "admin_only"] },
"condition": { "type": "string" },
"satisfied": { "type": "boolean" }
}
},
"ObservedInfo": {
"type": "object",
"properties": {
"firstObserved": { "type": "string", "format": "date-time" },
"lastObserved": { "type": "string", "format": "date-time" },
"count": { "type": "integer" }
}
},
"SliceVerdict": {
"type": "object",
"required": ["status", "confidence"],
"properties": {
"status": { "enum": ["reachable", "unreachable", "unknown", "gated"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"reasons": { "type": "array", "items": { "type": "string" } },
"pathWitnesses": { "type": "array", "items": { "type": "string" } },
"unknownCount": { "type": "integer" },
"gatedPaths": { "type": "array", "items": { "$ref": "#/$defs/GateInfo" } }
}
},
"ScanManifest": {
"type": "object",
"required": ["analyzerVersion", "createdAt"],
"properties": {
"analyzerVersion": { "type": "string" },
"rulesetHash": { "type": "string" },
"feedVersions": { "type": "object" },
"createdAt": { "type": "string", "format": "date-time" },
"toolchain": { "type": "string" }
}
}
}
[JsonPropertyName("_type")]
public string Type { get; init; } = "stellaops.dev/predicates/reachability-slice@v1";
[JsonPropertyName("inputs")]
public required SliceInputs Inputs { get; init; }
[JsonPropertyName("query")]
public required SliceQuery Query { get; init; }
[JsonPropertyName("subgraph")]
public required SliceSubgraph Subgraph { get; init; }
[JsonPropertyName("verdict")]
public required SliceVerdict Verdict { get; init; }
[JsonPropertyName("manifest")]
public required ScanManifest Manifest { get; init; }
}
```
---
## 3. Verdict Status Definitions
| Status | Meaning | Confidence Range |
|--------|---------|------------------|
| `reachable` | Path exists from entrypoint to target | ≥0.7 |
| `unreachable` | No path found, no unknowns | ≥0.9 |
| `unknown` | Unknowns present on potential paths | 0.30.7 |
| `gated` | Path exists but gated by feature flag/auth | 0.50.8 |
### Verdict Computation Rules
### 3.2 SliceInputs
```csharp
public sealed record SliceInputs
{
public required string GraphDigest { get; init; }
public ImmutableArray<string> BinaryDigests { get; init; }
public string? SbomDigest { get; init; }
public ImmutableArray<string> LayerDigests { get; init; }
}
```
reachable := path_exists AND min(path_confidence) ≥ 0.7 AND unknown_edges = 0
unreachable := NOT path_exists AND unknown_edges = 0
gated := path_exists AND all_paths_gated AND gates_not_satisfied
unknown := unknown_edges > 0 OR min(path_confidence) < 0.5
### 3.3 SliceQuery
```csharp
public sealed record SliceQuery
{
public string? CveId { get; init; }
public ImmutableArray<string> TargetSymbols { get; init; }
public ImmutableArray<string> Entrypoints { get; init; }
public string? PolicyHash { get; init; }
}
```
### 3.4 SliceSubgraph, Nodes, Edges
```csharp
public sealed record SliceSubgraph
{
public ImmutableArray<SliceNode> Nodes { get; init; }
public ImmutableArray<SliceEdge> Edges { get; init; }
}
public sealed record SliceNode
{
public required string Id { get; init; }
public required string Symbol { get; init; }
public required SliceNodeKind Kind { get; init; } // entrypoint | intermediate | target | unknown
public string? File { get; init; }
public int? Line { get; init; }
public string? Purl { get; init; }
public IReadOnlyDictionary<string, string>? Attributes { get; init; }
}
public sealed record SliceEdge
{
public required string From { get; init; }
public required string To { get; init; }
public SliceEdgeKind Kind { get; init; } // direct | plt | iat | dynamic | unknown
public double Confidence { get; init; }
public string? Evidence { get; init; }
public SliceGateInfo? Gate { get; init; }
public ObservedEdgeMetadata? Observed { get; init; }
}
```
### 3.5 SliceVerdict
```csharp
public sealed record SliceVerdict
{
public required SliceVerdictStatus Status { get; init; }
public required double Confidence { get; init; }
public ImmutableArray<string> Reasons { get; init; }
public ImmutableArray<string> PathWitnesses { get; init; }
public int UnknownCount { get; init; }
public ImmutableArray<GatedPath> GatedPaths { get; init; }
}
```
`SliceVerdictStatus` values (snake_case):
- `reachable`
- `unreachable`
- `unknown`
- `gated`
- `observed_reachable`
### 3.6 ScanManifest
`ScanManifest` is imported from `StellaOps.Scanner.Core` and includes required fields for reproducibility:
- `scanId`
- `createdAtUtc`
- `artifactDigest`
- `scannerVersion`
- `workerVersion`
- `concelierSnapshotHash`
- `excititorSnapshotHash`
- `latticePolicyHash`
- `deterministic`
- `seed` (base64-encoded 32-byte seed)
- `knobs` (string map)
`artifactPurl` is optional.
---
## 4. Example Slice
## 4. Verdict Computation Rules
```
reachable := path_exists AND min(path_confidence) > 0.7 AND unknown_edges == 0
unreachable := NOT path_exists AND unknown_edges == 0
unknown := otherwise
```
`gated` and `observed_reachable` are reserved for feature-gate and runtime-observed paths (see Sprint 3830 and 3840).
---
## 5. Example Slice
```json
{
"_type": "https://stellaops.dev/predicates/reachability-slice/v1",
"_type": "stellaops.dev/predicates/reachability-slice@v1",
"inputs": {
"graphDigest": "blake3:a1b2c3d4e5f6789012345678901234567890123456789012345678901234abcd",
"binaryDigests": ["sha256:deadbeef..."],
"sbomDigest": "sha256:cafebabe..."
"binaryDigests": ["sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef"],
"sbomDigest": "sha256:cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe"
},
"query": {
"cveId": "CVE-2024-1234",
@@ -207,75 +196,42 @@ unknown := unknown_edges > 0 OR min(path_confidence) < 0.5
"verdict": {
"status": "reachable",
"confidence": 0.9,
"reasons": ["Direct call path from main() to EVP_PKEY_decrypt()"],
"pathWitnesses": ["main process_request decrypt_data EVP_PKEY_decrypt"]
"reasons": ["path_exists_high_confidence"],
"pathWitnesses": ["main -> process_request -> decrypt_data -> EVP_PKEY_decrypt"],
"unknownCount": 0
},
"manifest": {
"analyzerVersion": "scanner.native:1.2.0",
"rulesetHash": "sha256:...",
"createdAt": "2025-12-22T10:00:00Z",
"toolchain": "iced-x86:1.21.0"
"scanId": "scan-1234",
"createdAtUtc": "2025-12-22T10:00:00Z",
"artifactDigest": "sha256:00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff",
"artifactPurl": "pkg:generic/app@1.0.0",
"scannerVersion": "scanner.native:1.2.0",
"workerVersion": "scanner.worker:1.2.0",
"concelierSnapshotHash": "sha256:1111222233334444555566667777888899990000aaaabbbbccccddddeeeeffff",
"excititorSnapshotHash": "sha256:2222333344445555666677778888999900001111aaaabbbbccccddddeeeeffff",
"latticePolicyHash": "sha256:3333444455556666777788889999000011112222aaaabbbbccccddddeeeeffff",
"deterministic": true,
"seed": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
"knobs": { "maxDepth": "20" }
}
}
```
---
## 5. DSSE Envelope Format
Slices are wrapped in DSSE envelopes for attestation:
```json
{
"payloadType": "application/vnd.in-toto+json",
"payload": "<base64-encoded slice JSON>",
"signatures": [
{
"keyid": "sha256:abc123...",
"sig": "<base64-encoded signature>"
}
]
}
```
---
## 6. Storage & Retrieval
### CAS URI Format
```
cas://slices/blake3:<digest>
```
### OCI Artifact Format
```json
{
"mediaType": "application/vnd.stellaops.slice.v1+json",
"digest": "sha256:...",
"annotations": {
"org.stellaops.slice.cve": "CVE-2024-1234",
"org.stellaops.slice.verdict": "reachable"
}
}
```
---
## 7. Determinism Requirements
## 6. Determinism Requirements
For reproducible slices:
1. **Node ordering**: Sort by `id` lexicographically
2. **Edge ordering**: Sort by `(from, to)` tuple
3. **Timestamps**: Use UTC ISO-8601 with Z suffix
4. **Floating point**: Round to 6 decimal places
5. **JSON serialization**: No whitespace, sorted keys
1. **Node ordering**: Sort by `id` (ordinal).
2. **Edge ordering**: Sort by `from`, then `to`, then `kind`.
3. **Strings**: Trim and de-duplicate lists (`targetSymbols`, `entrypoints`, `reasons`).
4. **Timestamps**: Use UTC ISO-8601 with `Z` suffix.
5. **JSON serialization**: Canonical JSON (sorted keys, no whitespace).
---
## 8. Related Documentation
## 7. Related Documentation
- [Binary Reachability Schema](./binary-reachability-schema.md)
- [RichGraph Contract](../contracts/richgraph-v1.md)