devops folders consolidate

This commit is contained in:
master
2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a743bb9a1d
613 changed files with 8611 additions and 41846 deletions

View File

@@ -127,7 +127,6 @@ docker compose -f docker-compose.dev.yaml up -d
- PostgreSQL v16+ (port 5432) - Primary database for all services
- Valkey 8.0 (port 6379) - Cache, DPoP nonces, event streams, rate limiting
- RustFS (port 8080) - S3-compatible object storage for artifacts/SBOMs
- NATS JetStream (port 4222) - Optional transport (only if configured)
- Authority (port 8440) - OAuth2/OIDC authentication
- Signer (port 8441) - Cryptographic signing
- Attestor (port 8442) - in-toto attestation generation
@@ -250,26 +249,6 @@ All services follow this configuration priority (highest to lowest):
}
```
#### NATS Queue Configuration (Optional Alternative Transport)
```json
{
"Scanner": {
"Events": {
"Driver": "nats",
"Dsn": "nats://localhost:4222"
}
},
"Scheduler": {
"Queue": {
"Kind": "Nats",
"Nats": {
"Url": "nats://localhost:4222"
}
}
}
}
```
#### RustFS Configuration (S3-Compatible Object Storage)
@@ -489,25 +468,25 @@ docker network inspect compose_stellaops
}
```
#### 3. NATS Connection Refused
#### 3. Queue Connection Refused
**Error:**
```
NATS connection error: connection refused
Connection error: connection refused
```
**Solution:**
By default, services use **Valkey** for messaging, not NATS. Ensure Valkey is running:
Services use **Valkey** for messaging. Ensure Valkey is running:
```bash
docker compose -f docker-compose.dev.yaml ps valkey
docker compose -f docker-compose.stella-ops.yml ps valkey
# Should show: State = "Up"
# Test connectivity
telnet localhost 6379
```
Update configuration to use Valkey (default):
Configuration should use Valkey:
```json
{
"Scanner": {
@@ -527,22 +506,6 @@ Update configuration to use Valkey (default):
}
```
**If you explicitly want to use NATS** (optional):
```bash
docker compose -f docker-compose.dev.yaml ps nats
# Ensure NATS is running
# Update appsettings.Development.json:
{
"Scanner": {
"Events": {
"Driver": "nats",
"Dsn": "nats://localhost:4222"
}
}
}
```
#### 4. Valkey Connection Refused
**Error:**
@@ -694,7 +657,6 @@ sudo docker compose -f docker-compose.dev.yaml up -d
- Understand PostgreSQL schema isolation (all services use PostgreSQL)
- Learn Valkey streams for event queuing and caching
- Study RustFS S3-compatible object storage
- Optional: NATS JetStream as alternative transport
2. **Week 2: Core Services**
- Deep dive into Scanner architecture (analyzers, workers, caching)
@@ -733,8 +695,8 @@ sudo docker compose -f docker-compose.dev.yaml up -d
```bash
# Start full platform
cd deploy\compose
docker compose -f docker-compose.dev.yaml up -d
cd devops\compose
docker compose -f docker-compose.stella-ops.yml up -d
# Stop a specific service for debugging
docker compose -f docker-compose.dev.yaml stop <service-name>
@@ -771,7 +733,6 @@ dotnet run
| PostgreSQL | 5432 | `localhost:5432` | Primary database (REQUIRED) |
| Valkey | 6379 | `localhost:6379` | Cache/events/queues (REQUIRED) |
| RustFS | 8080 | http://localhost:8080 | S3-compatible storage (REQUIRED) |
| NATS | 4222 | `nats://localhost:4222` | Optional alternative transport |
| **Services** |
| Authority | 8440 | https://localhost:8440 | OAuth2/OIDC auth |
| Signer | 8441 | https://localhost:8441 | Cryptographic signing |

View File

@@ -0,0 +1,95 @@
# ADR-001: Linkset Correlation Algorithm V2
**Status:** Accepted
**Date:** 2026-01-25
**Sprint:** SPRINT_20260125_001_Concelier_linkset_correlation_v2
## Context
The Concelier module's linkset correlation algorithm determines whether multiple vulnerability observations (from different sources like NVD, GitHub Advisories, vendor feeds) refer to the same underlying vulnerability. The V1 algorithm had several critical failure modes:
1. **Alias intersection transitivity failure**: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) produced empty intersection despite transitive identity through shared aliases.
2. **Thin source penalty**: A source with zero packages collapsed the entire group's package score to 0, even when other sources shared packages.
3. **False reference conflicts**: Zero reference overlap was treated as a conflict rather than neutral evidence.
4. **Uniform conflict penalties**: All conflicts applied the same -0.1 penalty regardless of severity.
These issues caused both false negatives (failing to link related advisories) and false positives (emitting spurious conflicts).
## Decision
We will replace the V1 intersection-based correlation algorithm with a V2 graph-based approach that:
### 1. Graph-Based Alias Connectivity
Replace intersection-across-all with union-find graph connectivity. Build a bipartite graph (observation ↔ alias nodes) and compute Largest Connected Component (LCC) ratio.
**Rationale**: Transitive relationships are naturally captured by graph connectivity. Three sources with partial alias overlap can still achieve high correlation if they form a connected component.
### 2. Pairwise Package Coverage
Replace intersection-across-all with pairwise coverage scoring. Score is positive when any pair shares a package key, even if some sources have no packages.
**Rationale**: "Thin" sources (e.g., vendor advisories with only CVE IDs) should not penalize correlation when other sources provide package evidence.
### 3. Neutral Reference Scoring
Zero reference overlap returns 0.5 (neutral) instead of emitting a conflict. Reserve conflicts for true contradictions.
**Rationale**: Disjoint reference sets indicate lack of supporting evidence, not contradiction.
### 4. Typed Conflict Severities
Replace uniform -0.1 penalty with severity-based penalties:
| Conflict Type | Severity | Penalty |
|---------------|----------|---------|
| Distinct CVEs in cluster | Hard | -0.4 |
| Disjoint version ranges | Hard | -0.3 |
| Overlapping divergent ranges | Soft | -0.05 |
| CVSS/severity mismatch | Soft | -0.05 |
| Alias inconsistency | Soft | -0.1 |
| Zero reference overlap | None | 0 |
**Rationale**: Hard conflicts (distinct identities) should heavily penalize confidence. Soft conflicts (metadata differences) may indicate data quality issues but not identity mismatch.
### 5. Additional Correlation Signals
Add high-discriminative signals:
- **Patch lineage** (0.10 weight): Shared commit SHA indicates same fix
- **Version compatibility** (0.10 weight): Classify range relationships
- **IDF weighting**: Rare package matches weighted higher than common packages
### 6. V1/V2 Switchable Interface
Provide `ILinksetCorrelationService` with configurable version selection to enable gradual rollout and A/B testing.
## Consequences
### Positive
- Eliminates false negatives from transitive alias chains
- Eliminates false negatives from thin sources
- Reduces false positive conflicts from disjoint references
- Enables fine-grained conflict severity handling by downstream policy
- Adds discriminative signals (patch lineage) that differentiate from commodity linkers
### Negative
- Changes correlation weights, affecting existing linkset confidence scores
- Requires recomputation of existing linksets during migration
- Adds Valkey dependency for IDF caching (mitigated by graceful fallback)
### Neutral
- Algorithm complexity increases but remains O(n²) in observations
- Determinism preserved through fixed scorer order and tie-breakers
## Implementation
- **Core algorithm**: `LinksetCorrelationV2.cs`
- **Service interface**: `ILinksetCorrelationService.cs`
- **Service implementation**: `LinksetCorrelationService.cs`
- **Model extension**: `ConflictSeverity` enum in `AdvisoryLinkset.cs`
- **IDF caching**: `ValkeyPackageIdfService.cs`
- **Tests**: 27 V2 tests + 18 IDF tests
## References
- Sprint: `docs/implplan/SPRINT_20260125_001_Concelier_linkset_correlation_v2.md`
- Algorithm documentation: `docs/modules/concelier/linkset-correlation-v2.md`
- Architecture section: `docs/modules/concelier/architecture.md` § 5.2
- Conflict resolution runbook: `docs/modules/concelier/operations/conflict-resolution.md` § 5.1

View File

@@ -305,11 +305,33 @@ public interface IFeedConnector {
### 5.2 Linkset correlation
1. **Queue** observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
2. **Canonical grouping** builder resolves aliases using Conceliers alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
2. **Canonical grouping** builder resolves aliases using Concelier's alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
3. **Linkset materialization** `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
4. **Conflict detection** builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability.
4. **Conflict detection** builder emits structured conflicts with typed severities (Hard/Soft/Info). Conflicts carry per-observation values for explainability.
5. **Event emission** `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.
#### Correlation Algorithm (v2)
The v2 correlation algorithm (see `linkset-correlation-v2.md`) replaces intersection-based scoring with graph-based connectivity and adds new signals:
| Signal | Weight | Description |
|--------|--------|-------------|
| Alias connectivity | 0.30 | LCC ratio from bipartite graph (transitive bridging) |
| Alias authority | 0.10 | Scope hierarchy (CVE > GHSA > VND > DST) |
| Package coverage | 0.20 | Pairwise + IDF-weighted overlap |
| Version compatibility | 0.10 | Equivalent/Overlapping/Disjoint classification |
| CPE match | 0.10 | Exact or vendor/product overlap |
| Patch lineage | 0.10 | Shared commit SHA from fix references |
| Reference overlap | 0.05 | Positive-only URL matching |
| Freshness | 0.05 | Fetch timestamp spread |
Conflict penalties are typed:
- **Hard** (`distinct-cves`, `disjoint-version-ranges`): -0.30 to -0.40
- **Soft** (`affected-range-divergence`, `severity-mismatch`): -0.05 to -0.10
- **Info** (`reference-clash` on simple disjoint sets): no penalty
Configure via `concelier:correlation:version` (v1 or v2) and optional weight overrides.
### 5.3 Event contract
| Event | Schema | Notes |
@@ -317,7 +339,7 @@ public interface IFeedConnector {
| `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
| `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |
Events are emitted via NATS (primary) and Valkey Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay.
Events are emitted via Valkey Streams. Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures event streams during bundle creation for air-gapped replay.
---

View File

@@ -0,0 +1,379 @@
# CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2)
> Supersedes `linkset-correlation-21-002.md` for new linkset builds.
> V1 linksets remain valid; migration job will recompute confidence using v2 algorithm.
Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching).
---
## Scope
- Applies to linksets produced from `advisory_observations` (LNM v2).
- Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
- Output persists in `advisory_linksets` and drives `advisory.linkset.updated@1` events.
- Maintains determinism, offline posture, and LNM/AOC contracts.
---
## Key Changes from v1
| Aspect | v1 Behavior | v2 Behavior |
|--------|-------------|-------------|
| Alias matching | Intersection across all inputs | Graph connectivity (LCC ratio) |
| PURL matching | Intersection across all inputs | Pairwise coverage + IDF weighting |
| Reference clash | Emitted on zero overlap | Only on true URL contradictions |
| Conflict penalty | Single -0.1 for any conflict | Typed severities with per-reason penalties |
| Patch lineage | Not used | Top-tier signal (+0.35 for exact SHA) |
| Version ranges | Divergence noted only | Classified (Equivalent/Overlapping/Disjoint) |
---
## Deterministic Confidence Calculation (0-1)
### Signal Weights
```
confidence = clamp(
0.30 * alias_connectivity +
0.10 * alias_authority +
0.20 * package_coverage +
0.10 * version_compatibility +
0.10 * cpe_match +
0.10 * patch_lineage +
0.05 * reference_overlap +
0.05 * freshness_score
) - typed_penalty
```
### Signal Definitions
#### `alias_connectivity` (weight: 0.30)
**Graph-based scoring** replacing intersection-across-all.
1. Build bipartite graph: observation nodes ↔ alias nodes
2. Connect observations that share any alias (transitive bridging)
3. Compute LCC (largest connected component) ratio: `|LCC| / N`
| Scenario | Score |
|----------|-------|
| All observations in single connected component | 1.0 |
| 80% of observations connected | 0.8 |
| No alias overlap at all | 0.0 |
**Why this matters**: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0.
#### `alias_authority` (weight: 0.10)
Scope-based weighting using existing canonical key prefixes:
| Alias Type | Authority Score |
|------------|-----------------|
| CVE-* (global) | 1.0 |
| GHSA-* (ecosystem) | 0.8 |
| Vendor IDs (RHSA, MSRC, CISCO, VMSA) | 0.6 |
| Distribution IDs (DSA, USN, SUSE) | 0.4 |
| Unknown scheme | 0.2 |
#### `package_coverage` (weight: 0.20)
**Pairwise + IDF weighting** replacing intersection-across-all.
1. Extract package keys (PURL without version) from each observation
2. For each package key, compute IDF weight: `log(N / (1 + df))` where N = corpus size, df = observations containing package
3. Score = weighted overlap ratio across pairs
| Scenario | Score |
|----------|-------|
| All sources share same rare package | ~1.0 |
| All sources share common package (lodash) | ~0.6 |
| One "thin" source with no packages | Other sources still score > 0 |
| No package overlap | 0.0 |
**IDF fallback**: When cache unavailable, uniform weights (1.0) are used.
#### `version_compatibility` (weight: 0.10)
Classifies version relationships per shared package:
| Relation | Score | Conflict |
|----------|-------|----------|
| **Equivalent**: ranges normalize identically | 1.0 | None |
| **Overlapping**: non-empty intersection | 0.6 | Soft (`affected-range-divergence`) |
| **Disjoint**: no intersection | 0.0 | Hard (`disjoint-version-ranges`) |
| **Unknown**: parse failure | 0.5 | None |
Uses `SemanticVersionRangeResolver` for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk.
#### `cpe_match` (weight: 0.10)
Unchanged from v1:
- Exact CPE overlap: 1.0
- Same vendor/product: 0.5
- No match: 0.0
#### `patch_lineage` (weight: 0.10)
**New signal**: correlation via shared fix commits.
1. Extract patch references from observation references (type: `patch`, `fix`, `commit`)
2. Normalize to commit SHAs using `PatchLineageNormalizer`
3. Any pairwise SHA match: 1.0; otherwise 0.0
**Why this matters**: "These advisories fix the same code" is high-confidence evidence most platforms lack.
#### `reference_overlap` (weight: 0.05)
**Positive-only** (no conflict on zero overlap):
1. Normalize URLs (lowercase, strip tracking params, https://)
2. Compute max pairwise overlap ratio
3. Map to score: `0.5 + (overlap * 0.5)`
| Scenario | Score |
|----------|-------|
| 100% URL overlap | 1.0 |
| 50% URL overlap | 0.75 |
| Zero URL overlap | 0.5 (neutral) |
**No `reference-clash` emission** for simple disjoint sets.
#### `freshness_score` (weight: 0.05)
Unchanged from v1:
- Spread ≤ 48h: 1.0
- Spread ≥ 14d: 0.0
- Linear decay between
---
## Conflict Emission (Typed Severities)
### Severity Levels
| Severity | Penalty Range | Meaning |
|----------|---------------|---------|
| **Hard** | 0.30 - 0.40 | Significant disagreement; likely prevents high-confidence linking |
| **Soft** | 0.05 - 0.10 | Minor disagreement; link with reduced confidence |
| **Info** | 0.00 | Informational; no penalty |
### Conflict Types and Penalties
| Conflict Reason | Severity | Penalty | Trigger Condition |
|-----------------|----------|---------|-------------------|
| `distinct-cves` | Hard | -0.40 | Two different CVE-* identifiers in cluster |
| `disjoint-version-ranges` | Hard | -0.30 | Same package key, ranges have no intersection |
| `alias-inconsistency` | Soft | -0.10 | Disconnected alias graph (but no CVE conflict) |
| `affected-range-divergence` | Soft | -0.05 | Ranges overlap but differ |
| `severity-mismatch` | Soft | -0.05 | CVSS base score delta > 1.0 |
| `reference-clash` | Info | 0.00 | Reserved for true contradictions only |
| `metadata-gap` | Info | 0.00 | Required provenance missing |
### Penalty Calculation
```
typed_penalty = min(0.6, sum(penalty_per_conflict))
```
Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists.
### Conflict Record Shape
```json
{
"field": "aliases",
"reason": "distinct-cves",
"severity": "Hard",
"values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"],
"sourceIds": ["nvd", "ghsa"]
}
```
---
## Linkset Output Shape
Additions from v1:
```json
{
"key": {
"vulnerabilityId": "CVE-2025-1234",
"productKey": "pkg:npm/lodash",
"confidence": 0.85
},
"conflicts": [
{
"field": "affected.versions[pkg:npm/lodash]",
"reason": "affected-range-divergence",
"severity": "Soft",
"values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"],
"sourceIds": ["nvd", "ghsa"]
}
],
"signalScores": {
"aliasConnectivity": 1.0,
"aliasAuthority": 1.0,
"packageCoverage": 0.85,
"versionCompatibility": 0.6,
"cpeMatch": 0.5,
"patchLineage": 1.0,
"referenceOverlap": 0.75,
"freshness": 1.0
},
"provenance": {
"observationHashes": ["sha256:abc...", "sha256:def..."],
"toolVersion": "concelier/2.0.0",
"correlationVersion": "v2"
}
}
```
---
## Algorithm Pseudocode
```
function Compute(observations):
if observations.empty:
return (confidence=1.0, conflicts=[])
conflicts = []
# 1. Alias connectivity (graph-based)
aliasGraph = buildBipartiteGraph(observations)
aliasConnectivity = LCC(aliasGraph) / observations.count
if hasDistinctCVEs(aliasGraph):
conflicts.add(HardConflict("distinct-cves"))
elif aliasConnectivity < 1.0:
conflicts.add(SoftConflict("alias-inconsistency"))
# 2. Alias authority
aliasAuthority = maxAuthorityScore(observations)
# 3. Package coverage (pairwise + IDF)
packageCoverage = computeIDFWeightedCoverage(observations)
# 4. Version compatibility
for sharedPackage in findSharedPackages(observations):
relation = classifyVersionRelation(observations, sharedPackage)
if relation == Disjoint:
conflicts.add(HardConflict("disjoint-version-ranges"))
elif relation == Overlapping:
conflicts.add(SoftConflict("affected-range-divergence"))
versionScore = averageRelationScore(observations)
# 5. CPE match
cpeScore = computeCpeOverlap(observations)
# 6. Patch lineage
patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0
# 7. Reference overlap (positive-only)
referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5)
# 8. Freshness
freshnessScore = computeFreshness(observations)
# Calculate weighted sum
baseConfidence = (
0.30 * aliasConnectivity +
0.10 * aliasAuthority +
0.20 * packageCoverage +
0.10 * versionScore +
0.10 * cpeScore +
0.10 * patchScore +
0.05 * referenceScore +
0.05 * freshnessScore
)
# Apply typed penalties
penalty = min(0.6, sum(conflict.penalty for conflict in conflicts))
finalConfidence = max(0.1, baseConfidence - penalty)
return (confidence=finalConfidence, conflicts=dedupe(conflicts))
```
---
## Implementation
### Code Locations
| Component | Path |
|-----------|------|
| V2 Algorithm | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs` |
| Conflict Model | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs` |
| Patch Normalizer | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs` |
| Version Resolver | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs` |
### Configuration
```yaml
concelier:
correlation:
version: v2 # v1 | v2
weights:
aliasConnectivity: 0.30
aliasAuthority: 0.10
packageCoverage: 0.20
versionCompatibility: 0.10
cpeMatch: 0.10
patchLineage: 0.10
referenceOverlap: 0.05
freshness: 0.05
idf:
enabled: true
cacheKey: "concelier:package:idf"
refreshIntervalMinutes: 60
textSimilarity:
enabled: false # Phase 3
```
---
## Telemetry
| Instrument | Type | Tags | Purpose |
|------------|------|------|---------|
| `concelier.linkset.confidence` | Histogram | `version` | Confidence score distribution |
| `concelier.linkset.conflicts_total` | Counter | `reason`, `severity` | Conflict counts by type |
| `concelier.linkset.signal_score` | Histogram | `signal` | Per-signal score distribution |
| `concelier.linkset.patch_lineage_hits` | Counter | - | Patch SHA matches found |
| `concelier.linkset.idf_cache_hit` | Counter | `hit` | IDF cache effectiveness |
---
## Migration
### Recompute Job
```bash
stella db linksets recompute --correlation-version v2 --batch-size 1000
```
Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data.
### Rollback
Set `concelier:correlation:version: v1` to revert to intersection-based scoring.
---
## Fixtures
- `docs/modules/concelier/samples/linkset-v2-transitive-bridge.json`: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity.
- `docs/modules/concelier/samples/linkset-v2-patch-match.json`: Two-source correlation via shared commit SHA.
- `docs/modules/concelier/samples/linkset-v2-hard-conflict.json`: Distinct CVEs in cluster triggering hard penalty.
All fixtures use ASCII ordering and ISO-8601 UTC timestamps.
---
## Change Control
- V2 is add-only relative to v1 output schema.
- Signal weight adjustments require sprint note but not schema version bump.
- New conflict reasons require `advisory.linkset.updated@2` event schema and doc update.
- Removal of a signal requires deprecation period and migration guidance.

View File

@@ -81,6 +81,26 @@ Expect all logs at `Information`. Ensure OTEL exporters include the scope `Stell
## 5. Conflict Classification Matrix
### 5.1 Linkset Conflicts (v2 Correlation)
Linkset conflicts now carry typed severities that affect confidence scoring:
| Severity | Penalty | Conflicts | Triage Priority |
|----------|---------|-----------|-----------------|
| **Hard** | -0.30 to -0.40 | `distinct-cves`, `disjoint-version-ranges` | High - investigate immediately |
| **Soft** | -0.05 to -0.10 | `affected-range-divergence`, `severity-mismatch`, `alias-inconsistency` | Medium - review in batch |
| **Info** | 0.00 | `metadata-gap`, `reference-clash` (disjoint only) | Low - informational |
| Conflict Reason | Severity | Likely Cause | Immediate Action |
|-----------------|----------|--------------|------------------|
| `distinct-cves` | Hard | Two different CVE-* IDs in same linkset cluster | Investigate alias mappings; likely compound advisory or incorrect aliasing |
| `disjoint-version-ranges` | Hard | Same package, no version overlap between sources | Check if distro backport; verify connector range parsing |
| `affected-range-divergence` | Soft | Ranges overlap but differ | Often benign (distro vs upstream versioning); monitor trends |
| `severity-mismatch` | Soft | CVSS scores differ by > 1.0 | Normal for cross-source; freshest source typically wins |
| `alias-inconsistency` | Soft | Disconnected alias graph (no shared CVE) | Review alias extraction; may indicate unrelated advisories grouped |
### 5.2 Merge Conflicts (Legacy)
| Signal | Likely Cause | Immediate Action |
|--------|--------------|------------------|
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |

View File

@@ -16,7 +16,7 @@ authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles c
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
`excititor-mirror-exports`) before rollout.
### 1.1 Service configuration quick reference

View File

@@ -2,8 +2,8 @@
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
> **Scope.** Implementationâ€ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perâ€layer caching, threeâ€way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation handâ€off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
> **Scope.** Implementationâ€ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perâ€layer caching, threeâ€way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation handâ€off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
---
@@ -14,14 +14,14 @@
**Boundaries.**
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
* Scanner **does not** keep thirdâ€party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugâ€ins (e.g., patchâ€presence) run under explicit flags and never contaminate the core SBOM.
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
for policy configuration and reporting expectations.
---
* Scanner **does not** keep thirdâ€party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugâ€ins (e.g., patchâ€presence) run under explicit flags and never contaminate the core SBOM.
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
for policy configuration and reporting expectations.
---
## 1) Solution & project layout
@@ -98,34 +98,27 @@ CLI usage: `stella scan --semantic <image>` enables semantic analysis in output.
- **Hybrid attestation**: emit **graph-level DSSE** for every `richgraph-v1` (mandatory) and optional **edge-bundle DSSE** (≤512 edges) for runtime/init-root/contested edges or third-party provenance. Publish graph DSSE digests to Rekor by default; edge-bundle Rekor publish is policy-driven. CAS layout: `cas://reachability/graphs/{blake3}` for graph body, `.../{blake3}.dsse` for envelope, and `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` for bundles. Deterministic ordering before hashing/signing is required.
- **Deterministic call-graph manifest**: capture analyzer versions, feed hashes, toolchain digests, and flags in a manifest stored alongside `richgraph-v1`; replaying with the same manifest MUST yield identical node/edge sets and hashes (see `docs/modules/reach-graph/guides/lead.md`).
### 1.1 Queue backbone (Valkey / NATS)
### 1.1 Queue backbone (Valkey Streams)
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers.
- **Valkey Streams** (default). Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
- **NATS JetStream**. Provisions the `SCANNER_JOBS` work-queue stream + durable consumer `scanner-workers`, publishes with `MsgId` for dedupe, applies backoff via `NAK` delays, and routes dead-lettered jobs to `SCANNER_JOBS_DEAD`.
**Valkey Streams** is the standard transport. Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the active backend (Valkey `PING`, NATS `PING`). Configuration is bound from `scanner.queue`:
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the Valkey backend. Configuration is bound from `scanner.queue`:
```yaml
scanner:
queue:
kind: valkey # or nats (valkey uses redis:// protocol)
kind: valkey
valkey:
connectionString: "redis://queue:6379/0"
connectionString: "valkey://valkey:6379/0"
streamName: "scanner:jobs"
nats:
url: "nats://queue:4222"
stream: "SCANNER_JOBS"
subject: "scanner.jobs"
durableConsumer: "scanner-workers"
deadLetterSubject: "scanner.jobs.dead"
maxDeliveryAttempts: 5
retryInitialBackoff: 00:00:05
retryMaxBackoff: 00:02:00
```
The DI extension (`AddScannerQueue`) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
The DI extension (`AddScannerQueue`) wires the transport.
**Runtime formâ€factor:** two deployables
@@ -137,7 +130,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
## 2) External dependencies
* **OCI registry** with **Referrers API** (discover attached SBOMs/signatures).
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
* **RustFS** (default, offline-first) for SBOM artifacts; S3-compatible interface with **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
* **PostgreSQL** for catalog, job state, diffs, ILM rules.
* **Queue** (Valkey Streams/NATS/RabbitMQ).
* **Authority** (onâ€prem OIDC) for **OpToks** (DPoP/mTLS).
@@ -206,9 +199,7 @@ attest/<artifactSha256>.dsse.json # DSSE bundle (cert chain + Rekor
RustFS exposes a deterministic HTTP API (`PUT|GET|DELETE /api/v1/buckets/{bucket}/objects/{key}`).
Scanner clients tag immutable uploads with `X-RustFS-Immutable: true` and, when retention applies,
`X-RustFS-Retain-Seconds: <ttlSeconds>`. Additional headers can be injected via
`scanner.artifactStore.headers` to support custom auth or proxy requirements. Legacy MinIO/S3
deployments remain supported by setting `scanner.artifactStore.driver = "s3"` during phased
migrations.
`scanner.artifactStore.headers` to support custom auth or proxy requirements. RustFS provides the standard S3-compatible interface for all artifact storage.
---
@@ -378,40 +369,40 @@ public sealed record BinaryFindingEvidence
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
### 5.5.1 Service security analysis (Sprint 20260119_016)
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
- Authentication and trust-boundary enforcement.
- Sensitive data flow exposure and unencrypted transfers.
- Deprecated service versions and rate-limiting metadata gaps.
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
### 5.5.4 Build provenance verification (Sprint 20260119_019)
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
- `reachability.report` (JSON) for component and vulnerability reachability.
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
- `reachability.graph.dot` (GraphViz) for dependency visualization.
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
### 5.6 DSSE attestation (via Signer/Attestor)
### 5.5.1 Service security analysis (Sprint 20260119_016)
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
- Authentication and trust-boundary enforcement.
- Sensitive data flow exposure and unencrypted transfers.
- Deprecated service versions and rate-limiting metadata gaps.
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
### 5.5.4 Build provenance verification (Sprint 20260119_019)
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
- `reachability.report` (JSON) for component and vulnerability reachability.
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
- `reachability.graph.dot` (GraphViz) for dependency visualization.
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
### 5.6 DSSE attestation (via Signer/Attestor)
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.

View File

@@ -14,7 +14,7 @@ This runbook describes how to promote a new release across the supported deploym
| `stable` | `deploy/releases/2025.09-stable.yaml` | `devops/helm/stellaops/values-stage.yaml`, `devops/helm/stellaops/values-prod.yaml` | `devops/compose/docker-compose.stage.yaml`, `devops/compose/docker-compose.prod.yaml` |
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `devops/helm/stellaops/values-airgap.yaml` | `devops/compose/docker-compose.airgap.yaml` |
Infrastructure components (PostgreSQL, Valkey, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `devops/compose/*.yaml` for the authoritative set.
Infrastructure components (PostgreSQL, Valkey, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Review `devops/compose/*.yaml` for the authoritative set.
---

View File

@@ -255,29 +255,28 @@ The local CI uses Docker Compose to run required services.
| Service | Port | Purpose |
|---------|------|---------|
| postgres-ci | 5433 | PostgreSQL 16 for tests |
| valkey-ci | 6380 | Cache/messaging tests |
| nats-ci | 4223 | Message queue tests |
| postgres-test | 5433 | PostgreSQL 18 for tests |
| valkey-test | 6380 | Cache/messaging tests |
| rustfs-test | 8180 | S3-compatible storage |
| mock-registry | 5001 | Container registry |
| minio-ci | 9100 | S3-compatible storage |
### Manual Service Management
```bash
# Start services
docker compose -f devops/compose/docker-compose.ci.yaml up -d
docker compose -f devops/compose/docker-compose.testing.yml --profile ci up -d
# Check status
docker compose -f devops/compose/docker-compose.ci.yaml ps
docker compose -f devops/compose/docker-compose.testing.yml --profile ci ps
# View logs
docker compose -f devops/compose/docker-compose.ci.yaml logs postgres-ci
docker compose -f devops/compose/docker-compose.testing.yml logs postgres-test
# Stop services
docker compose -f devops/compose/docker-compose.ci.yaml down
docker compose -f devops/compose/docker-compose.testing.yml --profile ci down
# Stop and remove volumes
docker compose -f devops/compose/docker-compose.ci.yaml down -v
docker compose -f devops/compose/docker-compose.testing.yml --profile ci down -v
```
---
@@ -372,13 +371,13 @@ Pre-pull required CI images to avoid network dependency during tests:
```bash
# Pull CI services
docker compose -f devops/compose/docker-compose.ci.yaml pull
docker compose -f devops/compose/docker-compose.testing.yml --profile ci pull
# Build local CI image
docker build -t stellaops-ci:local -f devops/docker/Dockerfile.ci .
# Verify images are cached
docker images | grep -E "stellaops|postgres|valkey|nats"
docker images | grep -E "stellaops|postgres|valkey|rustfs"
```
### Offline-Safe Test Execution
@@ -388,7 +387,7 @@ For fully offline validation:
```bash
# 1. Ensure NuGet cache is warm (see above)
# 2. Start local CI services (pre-pulled)
docker compose -f devops/compose/docker-compose.ci.yaml up -d
docker compose -f devops/compose/docker-compose.testing.yml --profile ci up -d
# 3. Run smoke with no network dependency
./devops/scripts/local-ci.sh smoke --no-restore
@@ -423,7 +422,7 @@ find src -type d -name "Fixtures" | head -20
```bash
# Reset CI services
docker compose -f devops/compose/docker-compose.ci.yaml down -v
docker compose -f devops/compose/docker-compose.testing.yml --profile ci down -v
# Rebuild CI image
docker build --no-cache -t stellaops-ci:local -f devops/docker/Dockerfile.ci .