devops folders consolidate
This commit is contained in:
@@ -305,11 +305,33 @@ public interface IFeedConnector {
|
||||
### 5.2 Linkset correlation
|
||||
|
||||
1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
|
||||
2. **Canonical grouping** — builder resolves aliases using Concelier’s alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
|
||||
2. **Canonical grouping** — builder resolves aliases using Concelier's alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
|
||||
3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
|
||||
4. **Conflict detection** — builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability.
|
||||
4. **Conflict detection** — builder emits structured conflicts with typed severities (Hard/Soft/Info). Conflicts carry per-observation values for explainability.
|
||||
5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.
|
||||
|
||||
#### Correlation Algorithm (v2)
|
||||
|
||||
The v2 correlation algorithm (see `linkset-correlation-v2.md`) replaces intersection-based scoring with graph-based connectivity and adds new signals:
|
||||
|
||||
| Signal | Weight | Description |
|
||||
|--------|--------|-------------|
|
||||
| Alias connectivity | 0.30 | LCC ratio from bipartite graph (transitive bridging) |
|
||||
| Alias authority | 0.10 | Scope hierarchy (CVE > GHSA > VND > DST) |
|
||||
| Package coverage | 0.20 | Pairwise + IDF-weighted overlap |
|
||||
| Version compatibility | 0.10 | Equivalent/Overlapping/Disjoint classification |
|
||||
| CPE match | 0.10 | Exact or vendor/product overlap |
|
||||
| Patch lineage | 0.10 | Shared commit SHA from fix references |
|
||||
| Reference overlap | 0.05 | Positive-only URL matching |
|
||||
| Freshness | 0.05 | Fetch timestamp spread |
|
||||
|
||||
Conflict penalties are typed:
|
||||
- **Hard** (`distinct-cves`, `disjoint-version-ranges`): -0.30 to -0.40
|
||||
- **Soft** (`affected-range-divergence`, `severity-mismatch`): -0.05 to -0.10
|
||||
- **Info** (`reference-clash` on simple disjoint sets): no penalty
|
||||
|
||||
Configure via `concelier:correlation:version` (v1 or v2) and optional weight overrides.
|
||||
|
||||
### 5.3 Event contract
|
||||
|
||||
| Event | Schema | Notes |
|
||||
@@ -317,7 +339,7 @@ public interface IFeedConnector {
|
||||
| `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
|
||||
| `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |
|
||||
|
||||
Events are emitted via NATS (primary) and Valkey Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay.
|
||||
Events are emitted via Valkey Streams. Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures event streams during bundle creation for air-gapped replay.
|
||||
|
||||
---
|
||||
|
||||
|
||||
379
docs/modules/concelier/linkset-correlation-v2.md
Normal file
379
docs/modules/concelier/linkset-correlation-v2.md
Normal file
@@ -0,0 +1,379 @@
|
||||
# CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2)
|
||||
|
||||
> Supersedes `linkset-correlation-21-002.md` for new linkset builds.
|
||||
> V1 linksets remain valid; migration job will recompute confidence using v2 algorithm.
|
||||
|
||||
Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching).
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
- Applies to linksets produced from `advisory_observations` (LNM v2).
|
||||
- Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
|
||||
- Output persists in `advisory_linksets` and drives `advisory.linkset.updated@1` events.
|
||||
- Maintains determinism, offline posture, and LNM/AOC contracts.
|
||||
|
||||
---
|
||||
|
||||
## Key Changes from v1
|
||||
|
||||
| Aspect | v1 Behavior | v2 Behavior |
|
||||
|--------|-------------|-------------|
|
||||
| Alias matching | Intersection across all inputs | Graph connectivity (LCC ratio) |
|
||||
| PURL matching | Intersection across all inputs | Pairwise coverage + IDF weighting |
|
||||
| Reference clash | Emitted on zero overlap | Only on true URL contradictions |
|
||||
| Conflict penalty | Single -0.1 for any conflict | Typed severities with per-reason penalties |
|
||||
| Patch lineage | Not used | Top-tier signal (+0.35 for exact SHA) |
|
||||
| Version ranges | Divergence noted only | Classified (Equivalent/Overlapping/Disjoint) |
|
||||
|
||||
---
|
||||
|
||||
## Deterministic Confidence Calculation (0-1)
|
||||
|
||||
### Signal Weights
|
||||
|
||||
```
|
||||
confidence = clamp(
|
||||
0.30 * alias_connectivity +
|
||||
0.10 * alias_authority +
|
||||
0.20 * package_coverage +
|
||||
0.10 * version_compatibility +
|
||||
0.10 * cpe_match +
|
||||
0.10 * patch_lineage +
|
||||
0.05 * reference_overlap +
|
||||
0.05 * freshness_score
|
||||
) - typed_penalty
|
||||
```
|
||||
|
||||
### Signal Definitions
|
||||
|
||||
#### `alias_connectivity` (weight: 0.30)
|
||||
|
||||
**Graph-based scoring** replacing intersection-across-all.
|
||||
|
||||
1. Build bipartite graph: observation nodes ↔ alias nodes
|
||||
2. Connect observations that share any alias (transitive bridging)
|
||||
3. Compute LCC (largest connected component) ratio: `|LCC| / N`
|
||||
|
||||
| Scenario | Score |
|
||||
|----------|-------|
|
||||
| All observations in single connected component | 1.0 |
|
||||
| 80% of observations connected | 0.8 |
|
||||
| No alias overlap at all | 0.0 |
|
||||
|
||||
**Why this matters**: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0.
|
||||
|
||||
#### `alias_authority` (weight: 0.10)
|
||||
|
||||
Scope-based weighting using existing canonical key prefixes:
|
||||
|
||||
| Alias Type | Authority Score |
|
||||
|------------|-----------------|
|
||||
| CVE-* (global) | 1.0 |
|
||||
| GHSA-* (ecosystem) | 0.8 |
|
||||
| Vendor IDs (RHSA, MSRC, CISCO, VMSA) | 0.6 |
|
||||
| Distribution IDs (DSA, USN, SUSE) | 0.4 |
|
||||
| Unknown scheme | 0.2 |
|
||||
|
||||
#### `package_coverage` (weight: 0.20)
|
||||
|
||||
**Pairwise + IDF weighting** replacing intersection-across-all.
|
||||
|
||||
1. Extract package keys (PURL without version) from each observation
|
||||
2. For each package key, compute IDF weight: `log(N / (1 + df))` where N = corpus size, df = observations containing package
|
||||
3. Score = weighted overlap ratio across pairs
|
||||
|
||||
| Scenario | Score |
|
||||
|----------|-------|
|
||||
| All sources share same rare package | ~1.0 |
|
||||
| All sources share common package (lodash) | ~0.6 |
|
||||
| One "thin" source with no packages | Other sources still score > 0 |
|
||||
| No package overlap | 0.0 |
|
||||
|
||||
**IDF fallback**: When cache unavailable, uniform weights (1.0) are used.
|
||||
|
||||
#### `version_compatibility` (weight: 0.10)
|
||||
|
||||
Classifies version relationships per shared package:
|
||||
|
||||
| Relation | Score | Conflict |
|
||||
|----------|-------|----------|
|
||||
| **Equivalent**: ranges normalize identically | 1.0 | None |
|
||||
| **Overlapping**: non-empty intersection | 0.6 | Soft (`affected-range-divergence`) |
|
||||
| **Disjoint**: no intersection | 0.0 | Hard (`disjoint-version-ranges`) |
|
||||
| **Unknown**: parse failure | 0.5 | None |
|
||||
|
||||
Uses `SemanticVersionRangeResolver` for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk.
|
||||
|
||||
#### `cpe_match` (weight: 0.10)
|
||||
|
||||
Unchanged from v1:
|
||||
- Exact CPE overlap: 1.0
|
||||
- Same vendor/product: 0.5
|
||||
- No match: 0.0
|
||||
|
||||
#### `patch_lineage` (weight: 0.10)
|
||||
|
||||
**New signal**: correlation via shared fix commits.
|
||||
|
||||
1. Extract patch references from observation references (type: `patch`, `fix`, `commit`)
|
||||
2. Normalize to commit SHAs using `PatchLineageNormalizer`
|
||||
3. Any pairwise SHA match: 1.0; otherwise 0.0
|
||||
|
||||
**Why this matters**: "These advisories fix the same code" is high-confidence evidence most platforms lack.
|
||||
|
||||
#### `reference_overlap` (weight: 0.05)
|
||||
|
||||
**Positive-only** (no conflict on zero overlap):
|
||||
|
||||
1. Normalize URLs (lowercase, strip tracking params, https://)
|
||||
2. Compute max pairwise overlap ratio
|
||||
3. Map to score: `0.5 + (overlap * 0.5)`
|
||||
|
||||
| Scenario | Score |
|
||||
|----------|-------|
|
||||
| 100% URL overlap | 1.0 |
|
||||
| 50% URL overlap | 0.75 |
|
||||
| Zero URL overlap | 0.5 (neutral) |
|
||||
|
||||
**No `reference-clash` emission** for simple disjoint sets.
|
||||
|
||||
#### `freshness_score` (weight: 0.05)
|
||||
|
||||
Unchanged from v1:
|
||||
- Spread ≤ 48h: 1.0
|
||||
- Spread ≥ 14d: 0.0
|
||||
- Linear decay between
|
||||
|
||||
---
|
||||
|
||||
## Conflict Emission (Typed Severities)
|
||||
|
||||
### Severity Levels
|
||||
|
||||
| Severity | Penalty Range | Meaning |
|
||||
|----------|---------------|---------|
|
||||
| **Hard** | 0.30 - 0.40 | Significant disagreement; likely prevents high-confidence linking |
|
||||
| **Soft** | 0.05 - 0.10 | Minor disagreement; link with reduced confidence |
|
||||
| **Info** | 0.00 | Informational; no penalty |
|
||||
|
||||
### Conflict Types and Penalties
|
||||
|
||||
| Conflict Reason | Severity | Penalty | Trigger Condition |
|
||||
|-----------------|----------|---------|-------------------|
|
||||
| `distinct-cves` | Hard | -0.40 | Two different CVE-* identifiers in cluster |
|
||||
| `disjoint-version-ranges` | Hard | -0.30 | Same package key, ranges have no intersection |
|
||||
| `alias-inconsistency` | Soft | -0.10 | Disconnected alias graph (but no CVE conflict) |
|
||||
| `affected-range-divergence` | Soft | -0.05 | Ranges overlap but differ |
|
||||
| `severity-mismatch` | Soft | -0.05 | CVSS base score delta > 1.0 |
|
||||
| `reference-clash` | Info | 0.00 | Reserved for true contradictions only |
|
||||
| `metadata-gap` | Info | 0.00 | Required provenance missing |
|
||||
|
||||
### Penalty Calculation
|
||||
|
||||
```
|
||||
typed_penalty = min(0.6, sum(penalty_per_conflict))
|
||||
```
|
||||
|
||||
Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists.
|
||||
|
||||
### Conflict Record Shape
|
||||
|
||||
```json
|
||||
{
|
||||
"field": "aliases",
|
||||
"reason": "distinct-cves",
|
||||
"severity": "Hard",
|
||||
"values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"],
|
||||
"sourceIds": ["nvd", "ghsa"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Linkset Output Shape
|
||||
|
||||
Additions from v1:
|
||||
|
||||
```json
|
||||
{
|
||||
"key": {
|
||||
"vulnerabilityId": "CVE-2025-1234",
|
||||
"productKey": "pkg:npm/lodash",
|
||||
"confidence": 0.85
|
||||
},
|
||||
"conflicts": [
|
||||
{
|
||||
"field": "affected.versions[pkg:npm/lodash]",
|
||||
"reason": "affected-range-divergence",
|
||||
"severity": "Soft",
|
||||
"values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"],
|
||||
"sourceIds": ["nvd", "ghsa"]
|
||||
}
|
||||
],
|
||||
"signalScores": {
|
||||
"aliasConnectivity": 1.0,
|
||||
"aliasAuthority": 1.0,
|
||||
"packageCoverage": 0.85,
|
||||
"versionCompatibility": 0.6,
|
||||
"cpeMatch": 0.5,
|
||||
"patchLineage": 1.0,
|
||||
"referenceOverlap": 0.75,
|
||||
"freshness": 1.0
|
||||
},
|
||||
"provenance": {
|
||||
"observationHashes": ["sha256:abc...", "sha256:def..."],
|
||||
"toolVersion": "concelier/2.0.0",
|
||||
"correlationVersion": "v2"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Algorithm Pseudocode
|
||||
|
||||
```
|
||||
function Compute(observations):
|
||||
if observations.empty:
|
||||
return (confidence=1.0, conflicts=[])
|
||||
|
||||
conflicts = []
|
||||
|
||||
# 1. Alias connectivity (graph-based)
|
||||
aliasGraph = buildBipartiteGraph(observations)
|
||||
aliasConnectivity = LCC(aliasGraph) / observations.count
|
||||
if hasDistinctCVEs(aliasGraph):
|
||||
conflicts.add(HardConflict("distinct-cves"))
|
||||
elif aliasConnectivity < 1.0:
|
||||
conflicts.add(SoftConflict("alias-inconsistency"))
|
||||
|
||||
# 2. Alias authority
|
||||
aliasAuthority = maxAuthorityScore(observations)
|
||||
|
||||
# 3. Package coverage (pairwise + IDF)
|
||||
packageCoverage = computeIDFWeightedCoverage(observations)
|
||||
|
||||
# 4. Version compatibility
|
||||
for sharedPackage in findSharedPackages(observations):
|
||||
relation = classifyVersionRelation(observations, sharedPackage)
|
||||
if relation == Disjoint:
|
||||
conflicts.add(HardConflict("disjoint-version-ranges"))
|
||||
elif relation == Overlapping:
|
||||
conflicts.add(SoftConflict("affected-range-divergence"))
|
||||
versionScore = averageRelationScore(observations)
|
||||
|
||||
# 5. CPE match
|
||||
cpeScore = computeCpeOverlap(observations)
|
||||
|
||||
# 6. Patch lineage
|
||||
patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0
|
||||
|
||||
# 7. Reference overlap (positive-only)
|
||||
referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5)
|
||||
|
||||
# 8. Freshness
|
||||
freshnessScore = computeFreshness(observations)
|
||||
|
||||
# Calculate weighted sum
|
||||
baseConfidence = (
|
||||
0.30 * aliasConnectivity +
|
||||
0.10 * aliasAuthority +
|
||||
0.20 * packageCoverage +
|
||||
0.10 * versionScore +
|
||||
0.10 * cpeScore +
|
||||
0.10 * patchScore +
|
||||
0.05 * referenceScore +
|
||||
0.05 * freshnessScore
|
||||
)
|
||||
|
||||
# Apply typed penalties
|
||||
penalty = min(0.6, sum(conflict.penalty for conflict in conflicts))
|
||||
finalConfidence = max(0.1, baseConfidence - penalty)
|
||||
|
||||
return (confidence=finalConfidence, conflicts=dedupe(conflicts))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
### Code Locations
|
||||
|
||||
| Component | Path |
|
||||
|-----------|------|
|
||||
| V2 Algorithm | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs` |
|
||||
| Conflict Model | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs` |
|
||||
| Patch Normalizer | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs` |
|
||||
| Version Resolver | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs` |
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
concelier:
|
||||
correlation:
|
||||
version: v2 # v1 | v2
|
||||
weights:
|
||||
aliasConnectivity: 0.30
|
||||
aliasAuthority: 0.10
|
||||
packageCoverage: 0.20
|
||||
versionCompatibility: 0.10
|
||||
cpeMatch: 0.10
|
||||
patchLineage: 0.10
|
||||
referenceOverlap: 0.05
|
||||
freshness: 0.05
|
||||
idf:
|
||||
enabled: true
|
||||
cacheKey: "concelier:package:idf"
|
||||
refreshIntervalMinutes: 60
|
||||
textSimilarity:
|
||||
enabled: false # Phase 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Telemetry
|
||||
|
||||
| Instrument | Type | Tags | Purpose |
|
||||
|------------|------|------|---------|
|
||||
| `concelier.linkset.confidence` | Histogram | `version` | Confidence score distribution |
|
||||
| `concelier.linkset.conflicts_total` | Counter | `reason`, `severity` | Conflict counts by type |
|
||||
| `concelier.linkset.signal_score` | Histogram | `signal` | Per-signal score distribution |
|
||||
| `concelier.linkset.patch_lineage_hits` | Counter | - | Patch SHA matches found |
|
||||
| `concelier.linkset.idf_cache_hit` | Counter | `hit` | IDF cache effectiveness |
|
||||
|
||||
---
|
||||
|
||||
## Migration
|
||||
|
||||
### Recompute Job
|
||||
|
||||
```bash
|
||||
stella db linksets recompute --correlation-version v2 --batch-size 1000
|
||||
```
|
||||
|
||||
Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data.
|
||||
|
||||
### Rollback
|
||||
|
||||
Set `concelier:correlation:version: v1` to revert to intersection-based scoring.
|
||||
|
||||
---
|
||||
|
||||
## Fixtures
|
||||
|
||||
- `docs/modules/concelier/samples/linkset-v2-transitive-bridge.json`: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity.
|
||||
- `docs/modules/concelier/samples/linkset-v2-patch-match.json`: Two-source correlation via shared commit SHA.
|
||||
- `docs/modules/concelier/samples/linkset-v2-hard-conflict.json`: Distinct CVEs in cluster triggering hard penalty.
|
||||
|
||||
All fixtures use ASCII ordering and ISO-8601 UTC timestamps.
|
||||
|
||||
---
|
||||
|
||||
## Change Control
|
||||
|
||||
- V2 is add-only relative to v1 output schema.
|
||||
- Signal weight adjustments require sprint note but not schema version bump.
|
||||
- New conflict reasons require `advisory.linkset.updated@2` event schema and doc update.
|
||||
- Removal of a signal requires deprecation period and migration guidance.
|
||||
@@ -81,6 +81,26 @@ Expect all logs at `Information`. Ensure OTEL exporters include the scope `Stell
|
||||
|
||||
## 5. Conflict Classification Matrix
|
||||
|
||||
### 5.1 Linkset Conflicts (v2 Correlation)
|
||||
|
||||
Linkset conflicts now carry typed severities that affect confidence scoring:
|
||||
|
||||
| Severity | Penalty | Conflicts | Triage Priority |
|
||||
|----------|---------|-----------|-----------------|
|
||||
| **Hard** | -0.30 to -0.40 | `distinct-cves`, `disjoint-version-ranges` | High - investigate immediately |
|
||||
| **Soft** | -0.05 to -0.10 | `affected-range-divergence`, `severity-mismatch`, `alias-inconsistency` | Medium - review in batch |
|
||||
| **Info** | 0.00 | `metadata-gap`, `reference-clash` (disjoint only) | Low - informational |
|
||||
|
||||
| Conflict Reason | Severity | Likely Cause | Immediate Action |
|
||||
|-----------------|----------|--------------|------------------|
|
||||
| `distinct-cves` | Hard | Two different CVE-* IDs in same linkset cluster | Investigate alias mappings; likely compound advisory or incorrect aliasing |
|
||||
| `disjoint-version-ranges` | Hard | Same package, no version overlap between sources | Check if distro backport; verify connector range parsing |
|
||||
| `affected-range-divergence` | Soft | Ranges overlap but differ | Often benign (distro vs upstream versioning); monitor trends |
|
||||
| `severity-mismatch` | Soft | CVSS scores differ by > 1.0 | Normal for cross-source; freshest source typically wins |
|
||||
| `alias-inconsistency` | Soft | Disconnected alias graph (no shared CVE) | Review alias extraction; may indicate unrelated advisories grouped |
|
||||
|
||||
### 5.2 Merge Conflicts (Legacy)
|
||||
|
||||
| Signal | Likely Cause | Immediate Action |
|
||||
|--------|--------------|------------------|
|
||||
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
|
||||
|
||||
@@ -16,7 +16,7 @@ authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles c
|
||||
that hold `concelier` JSON bundles and `excititor` VEX exports.
|
||||
- **Persistent volumes** – storage for Concelier job metadata and mirror export trees.
|
||||
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
|
||||
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
|
||||
`excititor-mirror-exports`) before rollout.
|
||||
|
||||
### 1.1 Service configuration quick reference
|
||||
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
|
||||
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
|
||||
|
||||
---
|
||||
|
||||
@@ -14,14 +14,14 @@
|
||||
**Boundaries.**
|
||||
|
||||
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
|
||||
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
|
||||
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
|
||||
for policy configuration and reporting expectations.
|
||||
|
||||
---
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
|
||||
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
|
||||
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
|
||||
for policy configuration and reporting expectations.
|
||||
|
||||
---
|
||||
|
||||
## 1) Solution & project layout
|
||||
|
||||
@@ -98,34 +98,27 @@ CLI usage: `stella scan --semantic <image>` enables semantic analysis in output.
|
||||
- **Hybrid attestation**: emit **graph-level DSSE** for every `richgraph-v1` (mandatory) and optional **edge-bundle DSSE** (≤512 edges) for runtime/init-root/contested edges or third-party provenance. Publish graph DSSE digests to Rekor by default; edge-bundle Rekor publish is policy-driven. CAS layout: `cas://reachability/graphs/{blake3}` for graph body, `.../{blake3}.dsse` for envelope, and `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` for bundles. Deterministic ordering before hashing/signing is required.
|
||||
- **Deterministic call-graph manifest**: capture analyzer versions, feed hashes, toolchain digests, and flags in a manifest stored alongside `richgraph-v1`; replaying with the same manifest MUST yield identical node/edge sets and hashes (see `docs/modules/reach-graph/guides/lead.md`).
|
||||
|
||||
### 1.1 Queue backbone (Valkey / NATS)
|
||||
### 1.1 Queue backbone (Valkey Streams)
|
||||
|
||||
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:
|
||||
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers.
|
||||
|
||||
- **Valkey Streams** (default). Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
|
||||
- **NATS JetStream**. Provisions the `SCANNER_JOBS` work-queue stream + durable consumer `scanner-workers`, publishes with `MsgId` for dedupe, applies backoff via `NAK` delays, and routes dead-lettered jobs to `SCANNER_JOBS_DEAD`.
|
||||
**Valkey Streams** is the standard transport. Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
|
||||
|
||||
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the active backend (Valkey `PING`, NATS `PING`). Configuration is bound from `scanner.queue`:
|
||||
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the Valkey backend. Configuration is bound from `scanner.queue`:
|
||||
|
||||
```yaml
|
||||
scanner:
|
||||
queue:
|
||||
kind: valkey # or nats (valkey uses redis:// protocol)
|
||||
kind: valkey
|
||||
valkey:
|
||||
connectionString: "redis://queue:6379/0"
|
||||
connectionString: "valkey://valkey:6379/0"
|
||||
streamName: "scanner:jobs"
|
||||
nats:
|
||||
url: "nats://queue:4222"
|
||||
stream: "SCANNER_JOBS"
|
||||
subject: "scanner.jobs"
|
||||
durableConsumer: "scanner-workers"
|
||||
deadLetterSubject: "scanner.jobs.dead"
|
||||
maxDeliveryAttempts: 5
|
||||
retryInitialBackoff: 00:00:05
|
||||
retryMaxBackoff: 00:02:00
|
||||
```
|
||||
|
||||
The DI extension (`AddScannerQueue`) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
|
||||
The DI extension (`AddScannerQueue`) wires the transport.
|
||||
|
||||
**Runtime form‑factor:** two deployables
|
||||
|
||||
@@ -137,7 +130,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
|
||||
## 2) External dependencies
|
||||
|
||||
* **OCI registry** with **Referrers API** (discover attached SBOMs/signatures).
|
||||
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
|
||||
* **RustFS** (default, offline-first) for SBOM artifacts; S3-compatible interface with **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
|
||||
* **PostgreSQL** for catalog, job state, diffs, ILM rules.
|
||||
* **Queue** (Valkey Streams/NATS/RabbitMQ).
|
||||
* **Authority** (on‑prem OIDC) for **OpToks** (DPoP/mTLS).
|
||||
@@ -206,9 +199,7 @@ attest/<artifactSha256>.dsse.json # DSSE bundle (cert chain + Rekor
|
||||
RustFS exposes a deterministic HTTP API (`PUT|GET|DELETE /api/v1/buckets/{bucket}/objects/{key}`).
|
||||
Scanner clients tag immutable uploads with `X-RustFS-Immutable: true` and, when retention applies,
|
||||
`X-RustFS-Retain-Seconds: <ttlSeconds>`. Additional headers can be injected via
|
||||
`scanner.artifactStore.headers` to support custom auth or proxy requirements. Legacy MinIO/S3
|
||||
deployments remain supported by setting `scanner.artifactStore.driver = "s3"` during phased
|
||||
migrations.
|
||||
`scanner.artifactStore.headers` to support custom auth or proxy requirements. RustFS provides the standard S3-compatible interface for all artifact storage.
|
||||
|
||||
---
|
||||
|
||||
@@ -378,40 +369,40 @@ public sealed record BinaryFindingEvidence
|
||||
|
||||
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
|
||||
|
||||
### 5.5.1 Service security analysis (Sprint 20260119_016)
|
||||
|
||||
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
|
||||
|
||||
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
|
||||
- Authentication and trust-boundary enforcement.
|
||||
- Sensitive data flow exposure and unencrypted transfers.
|
||||
- Deprecated service versions and rate-limiting metadata gaps.
|
||||
|
||||
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
|
||||
|
||||
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
|
||||
|
||||
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
|
||||
|
||||
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
|
||||
|
||||
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
|
||||
|
||||
### 5.5.4 Build provenance verification (Sprint 20260119_019)
|
||||
|
||||
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
|
||||
|
||||
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
|
||||
|
||||
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
|
||||
|
||||
- `reachability.report` (JSON) for component and vulnerability reachability.
|
||||
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
|
||||
- `reachability.graph.dot` (GraphViz) for dependency visualization.
|
||||
|
||||
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
### 5.5.1 Service security analysis (Sprint 20260119_016)
|
||||
|
||||
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
|
||||
|
||||
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
|
||||
- Authentication and trust-boundary enforcement.
|
||||
- Sensitive data flow exposure and unencrypted transfers.
|
||||
- Deprecated service versions and rate-limiting metadata gaps.
|
||||
|
||||
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
|
||||
|
||||
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
|
||||
|
||||
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
|
||||
|
||||
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
|
||||
|
||||
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
|
||||
|
||||
### 5.5.4 Build provenance verification (Sprint 20260119_019)
|
||||
|
||||
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
|
||||
|
||||
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
|
||||
|
||||
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
|
||||
|
||||
- `reachability.report` (JSON) for component and vulnerability reachability.
|
||||
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
|
||||
- `reachability.graph.dot` (GraphViz) for dependency visualization.
|
||||
|
||||
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
|
||||
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
|
||||
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.
|
||||
|
||||
Reference in New Issue
Block a user