audit work, fixed StellaOps.sln warnings/errors, fixed tests, sprints work, new advisories

2026-01-07 18:49:59 +02:00
parent 04ec098046
commit 608a7f85c0
866 changed files with 56323 additions and 6231 deletions
--- a/docs/dev/contributing/api-contracts.md
+++ b/docs/dev/contributing/api-contracts.md
@@ -0,0 +1,58 @@
+# Contributing to API Contracts
+
+Last updated: 2025-11-25
+
+## Scope
+Guidelines for editing service OpenAPI specs, lint rules, compatibility checks, and release artefacts across StellaOps services.
+
+## Required tools
+- Node.js 20.x + pnpm 9.x
+- Spectral CLI (invoked via `pnpm api:lint` in `src/Api/StellaOps.Api.OpenApi`)
+- `diff2html` (optional) for human-friendly compat reports
+
+## Workflow (per change)
+1) Edit service OAS under `src/Api/StellaOps.Api.OpenApi/services/<service>/*.yaml`.
+2) Run lint + compose + compat + changelog from repo root:
+   ```bash
+   pnpm --filter @stella/api-openapi api:lint       # spectral
+   pnpm --filter @stella/api-openapi api:compose    # build stella.yaml
+   pnpm --filter @stella/api-openapi api:compat     # compare against baseline
+   pnpm --filter @stella/api-openapi api:changelog  # generate digest/signature
+   ```
+3) Review outputs:
+   - `out/api/stella.yaml` (composed spec)
+   - `out/api/compat-report.json` (+ `.html` if generated)
+   - `out/api/changelog/` (digest + signature for SDK pipeline)
+4) Update examples: ensure request/response examples exist for changed endpoints; add to `examples/` directories.
+5) Commit changes with notes on breaking/additive results; attach compat report paths.
+
+## Lint rules (Spectral)
+- House rules live in `.spectral.yaml` under `src/Api/StellaOps.Api.OpenApi`.
+- Enforce: tagged operations, error envelope shape, schema refs reuse, pagination tokens, RBAC scopes, standard headers (`traceparent`, `x-correlation-id`).
+
+## Compatibility checks
+- Baseline file: `stella-baseline.yaml` (kept in repo under `out/api/` and updated per release).
+- `api:compat` flags additive/breaking/unchanged deltas. Breaking changes require approval from API Governance Guild and affected service guilds.
+
+## Examples & error envelopes
+- Every operation must have at least one request + response example.
+- Error responses must use the standard envelope (`error.code`, `error.message`, `trace_id`).
+
+## Offline/air-gap
+- Keep `pnpm-lock.yaml` pinned; store `node_modules/.pnpm` cache in Offline Kit when needed.
+- Include composed spec, compat report, and changelog artefacts in offline bundles under `offline/api/` with checksums.
+
+## Release handoff
+- Deliverables for each release:
+  - `out/api/stella.yaml`
+  - `out/api/compat-report.json` (+ `.html` if produced)
+  - `out/api/changelog/*` (digest, signature, manifest)
+- Provide SHA256 checksums for artefacts and note `source_commit` + `timestamp` in release notes.
+
+## Review checklist
+- [ ] Lint clean (`api:lint`)
+- [ ] Examples present for changed endpoints
+- [ ] Compat report reviewed (additive/breaking noted)
+- [ ] Changelog artefacts generated + checksums
+- [ ] Offline bundle artefacts staged (if required)
+- [ ] Docs/UI/SDK owners notified of breaking changes
--- a/docs/dev/contributing/canonicalization-determinism.md
+++ b/docs/dev/contributing/canonicalization-determinism.md
@@ -0,0 +1,336 @@
+# Canonicalization & Determinism Patterns
+
+**Version:** 1.0  
+**Date:** December 2025  
+**Sprint:** SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20)
+
+> **Audience:** All StellaOps contributors working on code that produces digests, attestations, or replayable outputs.  
+> **Goal:** Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations.
+
+---
+
+## 1. Why Determinism Matters
+
+StellaOps is built on **proof-of-state**: every verdict, attestation, and replay must be reproducible. Non-determinism breaks:
+
+- **Signature verification:** Different serialization → different digest → invalid signature.
+- **Replay guarantees:** Feed snapshots that produce different hashes cannot be replayed.
+- **Audit trails:** Compliance teams require bit-exact reproduction of historical scans.
+- **Cross-platform compatibility:** Windows/Linux/macOS must produce identical outputs.
+
+---
+
+## 2. RFC 8785 JSON Canonicalization Scheme (JCS)
+
+All JSON that participates in digest computation **must** use RFC 8785 JCS. This includes:
+
+- Attestation payloads (DSSE)
+- Verdict JSON
+- Policy evaluation results
+- Feed snapshot manifests
+- Proof bundles
+
+### 2.1 The Rfc8785JsonCanonicalizer
+
+Use the `Rfc8785JsonCanonicalizer` class for all canonical JSON operations:
+
+```csharp
+using StellaOps.Attestor.ProofChain.Json;
+
+// Create canonicalizer (optionally with NFC normalization)
+var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
+
+// Canonicalize JSON
+string canonical = canonicalizer.Canonicalize(jsonString);
+
+// Or from JsonElement
+string canonical = canonicalizer.Canonicalize(jsonElement);
+```
+
+### 2.2 JCS Rules Summary
+
+RFC 8785 requires:
+
+1. **No whitespace** between tokens.
+2. **Lexicographic key ordering** within objects.
+3. **Number serialization:** No leading zeros, no trailing zeros after decimal, integers without decimal point.
+4. **String escaping:** Minimal escaping (only `"`, `\`, and control chars).
+5. **UTF-8 encoding** without BOM.
+
+### 2.3 Common Mistakes
+
+❌ **Wrong:** Using `JsonSerializer.Serialize()` directly for digest input.
+
+```csharp
+// WRONG - non-deterministic ordering
+var json = JsonSerializer.Serialize(obj);
+var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json));
+```
+
+✅ **Correct:** Canonicalize before hashing.
+
+```csharp
+// CORRECT - deterministic
+var canonicalizer = new Rfc8785JsonCanonicalizer();
+var canonical = canonicalizer.Canonicalize(obj);
+var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical));
+```
+
+---
+
+## 3. Unicode NFC Normalization
+
+Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when:
+
+- Processing user-supplied strings
+- Aggregating data from multiple sources
+- Working with file paths or identifiers from different systems
+
+```csharp
+// Enable NFC for cross-platform string stability
+var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
+```
+
+When NFC is enabled, all strings are normalized via `string.Normalize(NormalizationForm.FormC)` before serialization.
+
+---
+
+## 4. Resolver Boundary Pattern
+
+**Key principle:** All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized.
+
+### 4.1 What Is a Resolver Boundary?
+
+A resolver boundary is any point where:
+
+- Data is **serialized** for storage, transmission, or signing
+- Data is **hashed** to produce a digest
+- Data is **compared** for equality in replay validation
+
+### 4.2 Boundary Enforcement
+
+At resolver boundaries:
+
+1. **Canonicalize** all JSON payloads using `Rfc8785JsonCanonicalizer`.
+2. **Sort** collections deterministically (alphabetically by key or ID).
+3. **Normalize** timestamps to ISO 8601 UTC with `Z` suffix.
+4. **Freeze** dictionaries using `FrozenDictionary` for stable iteration order.
+
+### 4.3 Example: Feed Snapshot Coordinator
+
+```csharp
+public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator
+{
+    private readonly FrozenDictionary<string, IFeedSourceProvider> _providers;
+
+    public FeedSnapshotCoordinatorService(IEnumerable<IFeedSourceProvider> providers, ...)
+    {
+        // Sort providers alphabetically for deterministic digest computation
+        _providers = providers
+            .OrderBy(p => p.SourceId, StringComparer.Ordinal)
+            .ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase);
+    }
+
+    private string ComputeCompositeDigest(IReadOnlyList<SourceSnapshot> sources)
+    {
+        // Sources are already sorted by SourceId (alphabetically)
+        using var sha256 = SHA256.Create();
+        foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal))
+        {
+            // Append each source digest to the hash computation
+            var digestBytes = Encoding.UTF8.GetBytes(source.Digest);
+            sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0);
+        }
+        sha256.TransformFinalBlock([], 0, 0);
+        return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}";
+    }
+}
+```
+
+---
+
+## 5. Timestamp Handling
+
+### 5.1 Rules
+
+1. **Always use UTC** - never local time.
+2. **ISO 8601 format** with `Z` suffix: `2025-12-27T14:30:00Z`
+3. **Consistent precision** - truncate to seconds unless milliseconds are required.
+4. **Use TimeProvider** for testability.
+
+### 5.2 Example
+
+```csharp
+// CORRECT - UTC with Z suffix
+var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ");
+
+// WRONG - local time
+var wrong = DateTime.Now.ToString("o");
+
+// WRONG - inconsistent format
+var wrong2 = DateTimeOffset.UtcNow.ToString();
+```
+
+---
+
+## 6. Numeric Stability
+
+### 6.1 Avoid Floating Point for Determinism
+
+Floating-point arithmetic can produce different results on different platforms. For deterministic values:
+
+- Use `decimal` for scores, percentages, and monetary values.
+- Use `int` or `long` for counts and identifiers.
+- If floating-point is unavoidable, document the acceptable epsilon and rounding rules.
+
+### 6.2 Number Serialization
+
+RFC 8785 requires specific number formatting:
+
+- Integers: no decimal point (`42`, not `42.0`)
+- Decimals: no trailing zeros (`3.14`, not `3.140`)
+- No leading zeros (`0.5`, not `00.5`)
+
+The `Rfc8785JsonCanonicalizer` handles this automatically.
+
+---
+
+## 7. Collection Ordering
+
+### 7.1 Rule
+
+All collections that participate in digest computation must have **deterministic order**.
+
+### 7.2 Implementation
+
+```csharp
+// CORRECT - use FrozenDictionary for stable iteration
+var orderedDict = items
+    .OrderBy(x => x.Key, StringComparer.Ordinal)
+    .ToFrozenDictionary(x => x.Key, x => x.Value);
+
+// CORRECT - sort before iteration
+foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal))
+{
+    // ...
+}
+
+// WRONG - iteration order is undefined
+foreach (var item in dictionary)
+{
+    // Order may vary between runs
+}
+```
+
+---
+
+## 8. Audit Hash Logging
+
+For debugging determinism issues, use the `AuditHashLogger`:
+
+```csharp
+using StellaOps.Attestor.ProofChain.Audit;
+
+var auditLogger = new AuditHashLogger(logger);
+
+// Log both raw and canonical hashes
+auditLogger.LogHashAudit(
+    rawContent,
+    canonicalContent,
+    "sha256:abc...",
+    "verdict",
+    "scan-123",
+    metadata);
+```
+
+This enables post-mortem analysis of canonicalization issues.
+
+---
+
+## 9. Testing Determinism
+
+### 9.1 Required Tests
+
+Every component that produces digests must have tests verifying:
+
+1. **Idempotency:** Same input → same digest (multiple calls).
+2. **Permutation invariance:** Reordering input collections → same digest.
+3. **Cross-platform:** Windows/Linux/macOS produce identical outputs.
+
+### 9.2 Example Test
+
+```csharp
+[Fact]
+public async Task CreateSnapshot_ProducesDeterministicDigest()
+{
+    // Arrange
+    var sources = CreateTestSources();
+    
+    // Act - create multiple snapshots with same data
+    var bundle1 = await coordinator.CreateSnapshotAsync();
+    var bundle2 = await coordinator.CreateSnapshotAsync();
+    
+    // Assert - digests must be identical
+    Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
+}
+
+[Fact]
+public async Task CreateSnapshot_OrderIndependent()
+{
+    // Arrange - sources in different orders
+    var sourcesAscending = sources.OrderBy(s => s.Id);
+    var sourcesDescending = sources.OrderByDescending(s => s.Id);
+    
+    // Act
+    var bundle1 = await CreateWithSources(sourcesAscending);
+    var bundle2 = await CreateWithSources(sourcesDescending);
+    
+    // Assert - digest must be identical regardless of input order
+    Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
+}
+```
+
+---
+
+## 10. Determinism Manifest Schema
+
+All replayable artifacts must include a determinism manifest conforming to the JSON Schema at:
+
+`docs/technical/testing/schemas/determinism-manifest.schema.json`
+
+Key fields:
+- `schemaVersion`: Must be `"1.0"`.
+- `artifactType`: One of `verdict`, `attestation`, `snapshot`, `proof`, `sbom`, `vex`.
+- `hashAlgorithm`: One of `sha256`, `sha384`, `sha512`.
+- `ordering`: One of `alphabetical`, `timestamp`, `insertion`, `canonical`.
+- `determinismGuarantee`: One of `strict`, `relaxed`, `best_effort`.
+
+---
+
+## 11. Checklist for Contributors
+
+Before submitting a PR that involves digests or attestations:
+
+- [ ] JSON is canonicalized via `Rfc8785JsonCanonicalizer` before hashing.
+- [ ] NFC normalization is enabled if user-supplied strings are involved.
+- [ ] Collections are sorted deterministically before iteration.
+- [ ] Timestamps are UTC with ISO 8601 format and `Z` suffix.
+- [ ] Numeric values avoid floating-point where possible.
+- [ ] Unit tests verify digest idempotency and permutation invariance.
+- [ ] Determinism manifest schema is validated for new artifact types.
+
+---
+
+## 12. Related Documents
+
+- [docs/technical/testing/schemas/determinism-manifest.schema.json](../technical/testing/schemas/determinism-manifest.schema.json) - JSON Schema for manifests
+- [docs/modules/policy/design/policy-determinism-tests.md](../modules/policy/design/policy-determinism-tests.md) - Policy engine determinism
+- [docs/technical/testing/TEST_SUITE_OVERVIEW.md](../technical/testing/TEST_SUITE_OVERVIEW.md) - Testing strategy
+
+---
+
+## 13. Change Log
+
+| Version | Date       | Notes                                              |
+|---------|------------|----------------------------------------------------|
+| 1.0     | 2025-12-27 | Initial version per DET-GAP-20.                    |
--- a/docs/dev/contributing/corpus-contribution-guide.md
+++ b/docs/dev/contributing/corpus-contribution-guide.md
@@ -0,0 +1,301 @@
+# Corpus Contribution Guide
+
+**Sprint:** SPRINT_3500_0003_0001  
+**Task:** CORPUS-014 - Document corpus contribution guide
+
+## Overview
+
+The Ground-Truth Corpus is a collection of validated test samples used to measure scanner accuracy. Each sample has known reachability status and expected findings, enabling deterministic quality metrics.
+
+## Corpus Structure
+
+```
+datasets/reachability/
+├── corpus.json                # Index of all samples
+├── schemas/
+│   └── corpus-sample.v1.json  # JSON schema for samples
+├── samples/
+│   ├── gt-0001/               # Sample directory
+│   │   ├── sample.json        # Sample metadata
+│   │   ├── expected.json      # Expected findings
+│   │   ├── sbom.json          # Input SBOM
+│   │   └── source/            # Optional source files
+│   └── ...
+└── baselines/
+    └── v1.0.0.json            # Baseline metrics
+```
+
+## Sample Format
+
+### sample.json
+
+```json
+{
+  "id": "gt-0001",
+  "name": "Python SQL Injection - Reachable",
+  "description": "Flask app with reachable SQL injection via user input",
+  "language": "python",
+  "ecosystem": "pypi",
+  "scenario": "webapi",
+  "entrypoints": ["app.py:main"],
+  "reachability_tier": "tainted_sink",
+  "created_at": "2025-01-15T00:00:00Z",
+  "author": "security-team",
+  "tags": ["sql-injection", "flask", "reachable"]
+}
+```
+
+### expected.json
+
+```json
+{
+  "findings": [
+    {
+      "vuln_key": "CVE-2024-1234:pkg:pypi/sqlalchemy@1.4.0",
+      "tier": "tainted_sink",
+      "rule_key": "py.sql.injection.param_concat",
+      "sink_class": "sql",
+      "location_hint": "app.py:42"
+    }
+  ]
+}
+```
+
+## Contributing a Sample
+
+### Step 1: Choose a Scenario
+
+Select a scenario that is not well-covered in the corpus:
+
+| Scenario | Description | Example |
+|----------|-------------|---------|
+| `webapi` | Web application endpoint | Flask, FastAPI, Express |
+| `cli` | Command-line tool | argparse, click, commander |
+| `job` | Background/scheduled job | Celery, cron script |
+| `lib` | Library code | Reusable package |
+
+### Step 2: Create Sample Directory
+
+```bash
+cd datasets/reachability/samples
+mkdir gt-NNNN
+cd gt-NNNN
+```
+
+Use the next available sample ID (check `corpus.json` for the highest).
+
+### Step 3: Create Minimal Reproducible Case
+
+**Requirements:**
+- Smallest possible code to demonstrate the vulnerability
+- Real or realistic vulnerability (use CVE when possible)
+- Clear entrypoint definition
+- Deterministic behavior (no network, no randomness)
+
+**Example Python Sample:**
+
+```python
+# app.py - gt-0001
+from flask import Flask, request
+import sqlite3
+
+app = Flask(__name__)
+
+@app.route("/user")
+def get_user():
+    user_id = request.args.get("id")  # Taint source
+    conn = sqlite3.connect(":memory:")
+    # SQL injection: user_id flows to query without sanitization
+    result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}")  # Taint sink
+    return str(result.fetchall())
+
+if __name__ == "__main__":
+    app.run()
+```
+
+### Step 4: Define Expected Findings
+
+Create `expected.json` with all expected findings:
+
+```json
+{
+  "findings": [
+    {
+      "vuln_key": "CWE-89:pkg:pypi/flask@2.0.0",
+      "tier": "tainted_sink",
+      "rule_key": "py.sql.injection",
+      "sink_class": "sql",
+      "location_hint": "app.py:13",
+      "notes": "User input from request.args flows to sqlite3.execute"
+    }
+  ]
+}
+```
+
+### Step 5: Create SBOM
+
+Generate or create an SBOM for the sample:
+
+```json
+{
+  "bomFormat": "CycloneDX",
+  "specVersion": "1.6",
+  "version": 1,
+  "components": [
+    {
+      "type": "library",
+      "name": "flask",
+      "version": "2.0.0",
+      "purl": "pkg:pypi/flask@2.0.0"
+    },
+    {
+      "type": "library",
+      "name": "sqlite3",
+      "version": "3.39.0",
+      "purl": "pkg:pypi/sqlite3@3.39.0"
+    }
+  ]
+}
+```
+
+### Step 6: Update Corpus Index
+
+Add entry to `corpus.json`:
+
+```json
+{
+  "id": "gt-0001",
+  "path": "samples/gt-0001",
+  "language": "python",
+  "tier": "tainted_sink",
+  "scenario": "webapi",
+  "expected_count": 1
+}
+```
+
+### Step 7: Validate Locally
+
+```bash
+# Run corpus validation
+dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
+  --filter "FullyQualifiedName~CorpusFixtureTests"
+
+# Run benchmark
+stellaops bench corpus run --sample gt-0001 --verbose
+```
+
+## Tier Guidelines
+
+### Imported Tier Samples
+
+For `imported` tier samples:
+- Vulnerability in a dependency
+- No execution path to vulnerable code
+- Package is in lockfile but not called
+
+**Example:** Unused dependency with known CVE.
+
+### Executed Tier Samples
+
+For `executed` tier samples:
+- Vulnerable code is called from entrypoint
+- No user-controlled data reaches the vulnerability
+- Static or coverage analysis proves execution
+
+**Example:** Hardcoded SQL query (no injection).
+
+### Tainted→Sink Tier Samples
+
+For `tainted_sink` tier samples:
+- User-controlled input reaches vulnerable code
+- Clear source → sink data flow
+- Include sink class taxonomy
+
+**Example:** User input to SQL query, command execution, etc.
+
+## Sink Classes
+
+When contributing `tainted_sink` samples, specify the sink class:
+
+| Sink Class | Description | Examples |
+|------------|-------------|----------|
+| `sql` | SQL injection | sqlite3.execute, cursor.execute |
+| `command` | Command injection | os.system, subprocess.run |
+| `ssrf` | Server-side request forgery | requests.get, urllib.urlopen |
+| `path` | Path traversal | open(), os.path.join |
+| `deser` | Deserialization | pickle.loads, yaml.load |
+| `eval` | Code evaluation | eval(), exec() |
+| `xxe` | XML external entity | lxml.parse, ET.parse |
+| `xss` | Cross-site scripting | innerHTML, document.write |
+
+## Quality Criteria
+
+Samples must meet these criteria:
+
+- [ ] **Deterministic**: Same input → same output
+- [ ] **Minimal**: Smallest code to demonstrate
+- [ ] **Documented**: Clear description and notes
+- [ ] **Validated**: Passes local tests
+- [ ] **Realistic**: Based on real vulnerability patterns
+- [ ] **Self-contained**: No external network calls
+
+## Negative Samples
+
+Include "negative" samples where scanner should NOT find vulnerabilities:
+
+```json
+{
+  "id": "gt-0050",
+  "name": "Python SQL - Properly Sanitized",
+  "tier": "imported",
+  "expected_count": 0,
+  "notes": "Uses parameterized queries, no injection possible"
+}
+```
+
+## Review Process
+
+1. Create PR with new sample(s)
+2. CI runs validation tests
+3. Security team reviews expected findings
+4. QA team verifies determinism
+5. Merge and update baseline
+
+## Updating Baselines
+
+After adding samples, update baseline metrics:
+
+```bash
+# Generate new baseline
+stellaops bench corpus run --all --output baselines/v1.1.0.json
+
+# Compare to previous
+stellaops bench corpus compare baselines/v1.0.0.json baselines/v1.1.0.json
+```
+
+## FAQ
+
+### How many samples should I contribute?
+
+Start with 2-3 high-quality samples covering different aspects of the same vulnerability class.
+
+### Can I use synthetic vulnerabilities?
+
+Yes, but prefer real CVE patterns when possible. Synthetic samples should document the vulnerability pattern clearly.
+
+### What if my sample has multiple findings?
+
+Include all expected findings in `expected.json`. Multi-finding samples are valuable for testing.
+
+### How do I test tier classification?
+
+Run with verbose output:
+```bash
+stellaops bench corpus run --sample gt-NNNN --verbose --show-evidence
+```
+
+## Related Documentation
+
+- [Tiered Precision Curves](../benchmarks/tiered-precision-curves.md)
+- [Reachability Analysis](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
+- [Corpus Index Schema](../../datasets/reachability/schemas/corpus-sample.v1.json)