Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
This commit is contained in:
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions

View File

@@ -1,606 +0,0 @@
# CVSS and Competitive Analysis Technical Reference
**Source Advisories**:
- 29-Nov-2025 - CVSS v4.0 Momentum in Vulnerability Management
- 30-Nov-2025 - Comparative Evidence Patterns for Stella Ops
- 03-Dec-2025 - NextGen Scanner Differentiators and Evidence Moat
**Last Updated**: 2025-12-14
---
## 1. CVSS V4.0 INTEGRATION
### 1.1 Requirements
- Vendors (NVD, GitHub, Microsoft, Snyk) shipping CVSS v4 signals
- Awareness needed for receipt schemas, reporting, UI alignment
### 1.2 Determinism & Offline
- Keep CVSS vector parsing deterministic
- Pin scoring library versions in receipts
- Avoid live API dependency
- Rely on mirrored NVD feeds or frozen samples
### 1.3 Schema Mapping
- Map impacts to receipt schemas
- Identify UI/reporting deltas for transparency
- Note in sprint Decisions & Risks for CVSS receipts
### 1.4 CVSS v4.0 MacroVector Scoring System
CVSS v4.0 uses a **MacroVector-based scoring system** instead of the direct formula computation used in v2/v3. The MacroVector is a 6-digit string derived from the base metrics, which maps to a precomputed score table with 486 possible combinations.
**MacroVector Structure**:
```
MacroVector = EQ1 + EQ2 + EQ3 + EQ4 + EQ5 + EQ6
Example: "001100" -> Base Score = 8.2
```
**Equivalence Classes (EQ1-EQ6)**:
| EQ | Metrics Used | Values | Meaning |
|----|--------------|--------|---------|
| EQ1 | Attack Vector + Privileges Required | 0-2 | Network reachability and auth barrier |
| EQ2 | Attack Complexity + User Interaction | 0-1 | Attack prerequisites |
| EQ3 | Vulnerable System CIA | 0-2 | Impact on vulnerable system |
| EQ4 | Subsequent System CIA | 0-2 | Impact on downstream systems |
| EQ5 | Attack Requirements | 0-1 | Preconditions needed |
| EQ6 | Combined Impact Pattern | 0-2 | Multi-impact severity |
**EQ1 (Attack Vector + Privileges Required)**:
- AV=Network + PR=None -> 0 (worst case: remote, no auth)
- AV=Network + PR=Low/High -> 1
- AV=Adjacent + PR=None -> 1
- AV=Adjacent + PR=Low/High -> 2
- AV=Local or Physical -> 2 (requires local access)
**EQ2 (Attack Complexity + User Interaction)**:
- AC=Low + UI=None -> 0 (easiest to exploit)
- AC=Low + UI=Passive/Active -> 1
- AC=High + any UI -> 1 (harder to exploit)
**EQ3 (Vulnerable System CIA)**:
- Any High in VC/VI/VA -> 0 (severe impact)
- Any Low in VC/VI/VA -> 1 (moderate impact)
- All None -> 2 (no impact)
**EQ4 (Subsequent System CIA)**:
- Any High in SC/SI/SA -> 0 (cascading impact)
- Any Low in SC/SI/SA -> 1
- All None -> 2
**EQ5 (Attack Requirements)**:
- AT=None -> 0 (no preconditions)
- AT=Present -> 1 (needs specific setup)
**EQ6 (Combined Impact Pattern)**:
- >=2 High impacts (vuln OR sub) -> 0 (severe multi-impact)
- 1 High impact -> 1
- 0 High impacts -> 2
**Scoring Algorithm**:
1. Parse base metrics from vector string
2. Compute EQ1-EQ6 from metrics
3. Build MacroVector string: "{EQ1}{EQ2}{EQ3}{EQ4}{EQ5}{EQ6}"
4. Lookup base score from MacroVectorLookup table
5. Round up to nearest 0.1 (per FIRST spec)
**Implementation**: `src/Policy/StellaOps.Policy.Scoring/Engine/CvssV4Engine.cs:262-359`
### 1.5 Threat Metrics and Exploit Maturity
CVSS v4.0 introduces **Threat Metrics** to adjust scores based on real-world exploit intelligence. The primary metric is **Exploit Maturity (E)**, which applies a multiplier to the base score.
**Exploit Maturity Values**:
| Value | Code | Multiplier | Description |
|-------|------|------------|-------------|
| Attacked | A | **1.00** | Active exploitation in the wild |
| Proof of Concept | P | **0.94** | Public PoC exists but no active exploitation |
| Unreported | U | **0.91** | No known exploit activity |
| Not Defined | X | 1.00 | Default (assume worst case) |
**Score Computation (CVSS-BT)**:
```
Threat Score = Base Score x Threat Multiplier
Example:
Base Score = 9.1
Exploit Maturity = Unreported (U)
Threat Score = 9.1 x 0.91 = 8.3 (rounded up)
```
**Threat Metrics in Vector String**:
```
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N/E:A
^^^
Exploit Maturity
```
**Why Threat Metrics Matter**:
- Reduces noise: An unreported vulnerability scores ~9% lower
- Prioritizes real threats: Actively exploited vulns maintain full score
- Evidence-based: Integrates with KEV, EPSS, and internal threat feeds
**Implementation**: `src/Policy/StellaOps.Policy.Scoring/Engine/CvssV4Engine.cs:365-375`
### 1.6 Environmental Score Modifiers
**Security Requirements Multipliers**:
| Requirement | Low | Medium | High |
|-------------|-----|--------|------|
| Confidentiality (CR) | 0.5 | 1.0 | 1.5 |
| Integrity (IR) | 0.5 | 1.0 | 1.5 |
| Availability (AR) | 0.5 | 1.0 | 1.5 |
**Modified Base Metrics** (can override any base metric):
- MAV (Modified Attack Vector)
- MAC (Modified Attack Complexity)
- MAT (Modified Attack Requirements)
- MPR (Modified Privileges Required)
- MUI (Modified User Interaction)
- MVC/MVI/MVA (Modified Vulnerable System CIA)
- MSC/MSI/MSA (Modified Subsequent System CIA)
**Score Computation (CVSS-BE)**:
1. Apply modified metrics to base metrics (if defined)
2. Compute modified MacroVector
3. Lookup modified base score
4. Multiply by average of Security Requirements
5. Clamp to [0, 10]
```
Environmental Score = Modified Base x (CR + IR + AR) / 3
```
### 1.7 Supplemental Metrics (Non-Scoring)
CVSS v4.0 introduces supplemental metrics that provide context but **do not affect the score**:
| Metric | Values | Purpose |
|--------|--------|---------|
| Safety (S) | Negligible/Present | Safety impact (ICS/OT systems) |
| Automatable (AU) | No/Yes | Can attack be automated? |
| Recovery (R) | Automatic/User/Irrecoverable | System recovery difficulty |
| Value Density (V) | Diffuse/Concentrated | Target value concentration |
| Response Effort (RE) | Low/Moderate/High | Effort to respond |
| Provider Urgency (U) | Clear/Green/Amber/Red | Vendor urgency rating |
**Use Cases**:
- **Safety**: Critical for ICS/SCADA vulnerability prioritization
- **Automatable**: Indicates wormable vulnerabilities
- **Provider Urgency**: Vendor-supplied priority signal
## 2. SCANNER DISCREPANCIES ANALYSIS
### 2.1 Trivy vs Grype Comparative Study (927 images)
**Findings**:
- Tools disagreed on total vulnerability counts and specific CVE IDs
- Grype: ~603,259 vulns; Trivy: ~473,661 vulns
- Exact match in only 9.2% of cases (80 out of 865 vulnerable images)
- Even with same counts, specific vulnerability IDs differed
**Root Causes**:
- Divergent vulnerability databases
- Differing matching logic
- Incomplete visibility
### 2.2 VEX Tools Consistency Study (2025)
**Tools Tested**:
- Trivy
- Grype
- OWASP DepScan
- Docker Scout
- Snyk CLI
- OSV-Scanner
- Vexy
**Results**:
- Low consistency/similarity across container scanners
- DepScan: 18,680 vulns; Vexy: 191 vulns (2 orders of magnitude difference)
- Pairwise Jaccard indices very low (near 0)
- 4 most consistent tools shared only ~18% common vulnerabilities
### 2.3 Implications for StellaOps
**Moats Needed**:
- Golden-fixture benchmarks (container images with known, audited vulnerabilities)
- Deterministic, replayable scans
- Cryptographic integrity
- VEX/SBOM proofs
**Metrics**:
- **Closure rate**: Time from flagged to confirmed exploitable
- **Proof coverage**: % of dependencies with valid SBOM/VEX proofs
- **Differential-closure**: Impact of database updates or policy changes on prior scan results
### 2.4 Deterministic Receipt System
Every CVSS scoring decision in StellaOps is captured in a **deterministic receipt** that enables audit-grade reproducibility.
**Receipt Schema**:
```json
{
"receiptId": "uuid",
"inputHash": "sha256:...",
"baseMetrics": { ... },
"threatMetrics": { ... },
"environmentalMetrics": { ... },
"supplementalMetrics": { ... },
"scores": {
"baseScore": 9.1,
"threatScore": 8.3,
"environmentalScore": null,
"fullScore": null,
"effectiveScore": 8.3,
"effectiveScoreType": "threat"
},
"policyRef": "policy/cvss-v4-default@v1.2.0",
"policyDigest": "sha256:...",
"evidence": [ ... ],
"attestationRefs": [ ... ],
"createdAt": "2025-12-14T00:00:00Z"
}
```
**InputHash Computation**:
```
inputHash = SHA256(canonicalize({
baseMetrics,
threatMetrics,
environmentalMetrics,
supplementalMetrics,
policyRef,
policyDigest
}))
```
**Determinism Guarantees**:
- Same inputs -> same `inputHash` -> same scores
- Receipts are immutable once created
- Amendments create new receipts with `supersedes` reference
- Optional DSSE signatures for cryptographic binding
**Implementation**: `src/Policy/StellaOps.Policy.Scoring/Receipts/ReceiptBuilder.cs`
## 3. RUNTIME REACHABILITY APPROACHES
### 3.1 Runtime-Aware Vulnerability Prioritization
**Approach**:
- Monitor container workloads at runtime to determine which vulnerable components are actually used
- Use eBPF-based monitors, dynamic tracers, or built-in profiling
- Construct runtime call graph or dependency graph
- Map vulnerabilities to code entities (functions/modules)
- If execution trace covers entity, vulnerability is "reachable"
**Findings**: ~85% of critical vulns in containers are in inactive code (Sysdig)
### 3.2 Reachability Analysis Techniques
**Static**:
- Call-graph analysis (Snyk reachability, CodeQL)
- All possible paths
**Dynamic**:
- Runtime observation (loaded modules, invoked functions)
- Actual runtime paths
**Granularity Levels**:
- Function-level (precise, limited languages: Java, .NET)
- Package/module-level (broader, coarse)
**Hybrid Approach**: Combine static (all possible paths) + dynamic (actual runtime paths)
## 4. CONTAINER PROVENANCE & SUPPLY CHAIN
### 4.1 In-Toto/DSSE Framework (NDSS 2024)
**Purpose**:
- Track chain of custody in software builds
- Signed metadata (attestations) for each step
- DSSE: Dead Simple Signing Envelope for standardized signing
### 4.2 Scudo System
**Features**:
- Combines in-toto with Uptane
- Verifies build process and final image
- Full verification on client inefficient; verify upstream and trust summary
- Client checks final signature + hash only
### 4.3 Supply Chain Verification
**Signers**:
- Developer key signs code commit
- CI key signs build attestation
- Scanner key signs vulnerability attestation
- Release key signs container image
**Verification Optimization**: Repository verifies in-toto attestations; client verifies final metadata only
## 5. VENDOR EVIDENCE PATTERNS
### 5.1 Snyk
**Evidence Handling**:
- Runtime insights integration (Nov 2025)
- Evolution from static-scan noise to prioritized workflow
- Deployment context awareness
**VEX Support**:
- CycloneDX VEX format
- Reachability-aware suppression
### 5.2 GitHub Advanced Security
**Features**:
- CodeQL for static analysis
- Dependency graph
- Dependabot alerts
- Security advisories
**Evidence**:
- SARIF output
- SBOM generation (SPDX)
### 5.3 Aqua Security
**Approach**:
- Runtime protection
- Image scanning
- Kubernetes security
**Evidence**:
- Dynamic runtime traces
- Network policy violations
### 5.4 Anchore/Grype
**Features**:
- Open-source scanner
- Policy-based compliance
- SBOM generation
**Evidence**:
- CycloneDX/SPDX SBOM
- Vulnerability reports (JSON)
### 5.5 Prisma Cloud
**Features**:
- Cloud-native security
- Runtime defense
- Compliance monitoring
**Evidence**:
- Multi-cloud attestations
- Compliance reports
## 6. STELLAOPS DIFFERENTIATORS
### 6.1 Reachability-with-Evidence
**Why it Matters**:
- Snyk Container integrating runtime insights as "signal" (Nov 2025)
- Evolution from static-scan noise to prioritized, actionable workflow
- Deployment context: what's running, what's reachable, what's exploitable
**Implication**: Container security triage relies on runtime/context signals
### 6.2 Proof-First Architecture
**Advantages**:
- Every claim backed by DSSE-signed attestations
- Cryptographic integrity
- Audit trail
- Offline verification
### 6.3 Deterministic Scanning
**Advantages**:
- Reproducible results
- Bit-identical outputs given same inputs
- Replay manifests
- Golden fixture benchmarks
### 6.4 VEX-First Decisioning
**Advantages**:
- Exploitability modeled in OpenVEX
- Lattice logic for stable outcomes
- Evidence-linked justifications
### 6.5 Offline/Air-Gap First
**Advantages**:
- No hidden network dependencies
- Bundled feeds, keys, Rekor snapshots
- Verifiable without internet access
### 6.6 CVSS + KEV Risk Signal Combination
StellaOps combines CVSS scores with KEV (Known Exploited Vulnerabilities) data using a deterministic formula:
**Risk Formula**:
```
risk_score = clamp01((cvss / 10) + kevBonus)
where:
kevBonus = 0.2 if vulnerability is in CISA KEV catalog
kevBonus = 0.0 otherwise
```
**Example Calculations**:
| CVSS Score | KEV Flag | Risk Score |
|------------|----------|------------|
| 9.0 | No | 0.90 |
| 9.0 | Yes | 1.00 (clamped) |
| 7.5 | No | 0.75 |
| 7.5 | Yes | 0.95 |
| 5.0 | No | 0.50 |
| 5.0 | Yes | 0.70 |
**Rationale**:
- KEV inclusion indicates active exploitation
- 20% bonus prioritizes known-exploited over theoretical risks
- Clamping prevents scores > 1.0
- Deterministic formula enables reproducible prioritization
**Implementation**: `src/RiskEngine/StellaOps.RiskEngine/StellaOps.RiskEngine.Core/Providers/CvssKevProvider.cs`
## 7. COMPETITIVE POSITIONING
### 7.1 Market Segments
| Vendor | Strength | Weakness vs StellaOps |
|--------|----------|----------------------|
| Snyk | Developer experience | Less deterministic, SaaS-only |
| Aqua | Runtime protection | Less reachability precision |
| Anchore | Open-source, SBOM | Less proof infrastructure |
| Prisma Cloud | Cloud-native breadth | Less offline/air-gap support |
| GitHub | Integration with dev workflow | Less cryptographic proof chain |
### 7.2 StellaOps Unique Value
1. **Deterministic + Provable**: Bit-identical scans with cryptographic proofs
2. **Reachability + Runtime**: Hybrid static/dynamic analysis
3. **Offline/Sovereign**: Air-gap operation with regional crypto (FIPS/GOST/eIDAS/SM)
4. **VEX-First**: Evidence-backed decisioning, not just alerting
5. **AGPL-3.0**: Self-hostable, no vendor lock-in
## 8. MOAT METRICS
### 8.1 Proof Coverage
```
proof_coverage = findings_with_valid_receipts / total_findings
Target: ≥95%
```
### 8.2 Closure Rate
```
closure_rate = time_from_flagged_to_confirmed_exploitable
Target: P95 < 24 hours
```
### 8.3 Differential-Closure Impact
```
differential_impact = findings_changed_after_db_update / total_findings
Target: <5% (non-code changes)
```
### 8.4 False Positive Reduction
```
fp_reduction = (baseline_fp_rate - stella_fp_rate) / baseline_fp_rate
Target: ≥50% vs baseline scanner
```
### 8.5 Reachability Accuracy
```
reachability_accuracy = correct_r0_r1_r2_r3_classifications / total_classifications
Target: ≥90%
```
## 9. COMPETITIVE INTELLIGENCE TRACKING
### 9.1 Feature Parity Matrix
| Feature | Snyk | Aqua | Anchore | Prisma | StellaOps |
|---------|------|------|---------|--------|-----------|
| SBOM Generation | ✓ | ✓ | ✓ | ✓ | ✓ |
| VEX Support | ✓ | ✗ | Partial | ✗ | ✓ |
| Reachability Analysis | ✓ | ✗ | ✗ | ✗ | ✓ |
| Runtime Evidence | ✓ | ✓ | ✗ | ✓ | ✓ |
| Cryptographic Proofs | ✗ | ✗ | ✗ | ✗ | ✓ |
| Deterministic Scans | ✗ | ✗ | ✗ | ✗ | ✓ |
| Offline/Air-Gap | ✗ | Partial | ✗ | ✗ | ✓ |
| Regional Crypto | ✗ | ✗ | ✗ | ✗ | ✓ |
### 9.2 Monitoring Strategy
- Track vendor release notes
- Monitor GitHub repos for feature announcements
- Participate in security conferences
- Engage with customer feedback
- Update competitive matrix quarterly
## 10. MESSAGING FRAMEWORK
### 10.1 Core Message
"StellaOps provides deterministic, proof-backed vulnerability management with reachability analysis for offline/air-gapped environments."
### 10.2 Key Differentiators (Elevator Pitch)
1. **Deterministic**: Same inputs → same outputs, every time
2. **Provable**: Cryptographic proof chains for every decision
3. **Reachable**: Static + runtime analysis, not just presence
4. **Sovereign**: Offline operation, regional crypto compliance
5. **Open**: AGPL-3.0, self-hostable, no lock-in
### 10.3 Target Personas
- **Security Engineers**: Need proof-backed decisions for audits
- **DevOps Teams**: Need deterministic scans in CI/CD
- **Compliance Officers**: Need offline/air-gap for regulated environments
- **Platform Engineers**: Need self-hostable, sovereign solution
## 11. BENCHMARKING PROTOCOL
### 11.1 Comparative Test Suite
**Images**:
- 50 representative production images
- Known vulnerabilities labeled
- Reachability ground truth established
**Metrics**:
- Precision (1 - FP rate)
- Recall (TP / (TP + FN))
- F1 score
- Scan time (P50, P95)
- Determinism (identical outputs over 10 runs)
### 11.2 Test Execution
```bash
# Run StellaOps scan
stellaops scan --image test-image:v1 --output stella-results.json
# Run competitor scans
trivy image --format json test-image:v1 > trivy-results.json
grype test-image:v1 -o json > grype-results.json
snyk container test test-image:v1 --json > snyk-results.json
# Compare results
stellaops benchmark compare \
--ground-truth ground-truth.json \
--stella stella-results.json \
--trivy trivy-results.json \
--grype grype-results.json \
--snyk snyk-results.json
```
### 11.3 Results Publication
- Publish benchmarks quarterly
- Open-source test images and ground truth
- Invite community contributions
- Document methodology transparently
---
**Document Version**: 1.0
**Target Platform**: .NET 10, PostgreSQL ≥16, Angular v17