feat: Initialize Zastava Webhook service with TLS and Authority authentication

- Added Program.cs to set up the web application with Serilog for logging, health check endpoints, and a placeholder admission endpoint.
- Configured Kestrel server to use TLS 1.3 and handle client certificates appropriately.
- Created StellaOps.Zastava.Webhook.csproj with necessary dependencies including Serilog and Polly.
- Documented tasks in TASKS.md for the Zastava Webhook project, outlining current work and exit criteria for each task.
This commit is contained in:
2025-10-19 18:36:22 +03:00
parent 7e2fa0a42a
commit 5ce40d2eeb
966 changed files with 91038 additions and 1850 deletions

View File

@@ -0,0 +1,114 @@
# StellaOps Scanner — Language Analyzer Implementation Plan (2025Q4)
> **Goal.** Deliver best-in-class language analyzers that outperform competitors on fidelity, determinism, and offline readiness while integrating tightly with Scanner Worker orchestration and SBOM composition.
All sprints below assume prerequisites from SP10-G2 (core scaffolding + Java analyzer) are complete. Each sprint is sized for a focused guild (≈11.5weeks) and produces definitive gates for downstream teams (Emit, Policy, Scheduler).
---
## Sprint LA1 — Node Analyzer & Workspace Intelligence (Tasks 10-302, 10-307, 10-308, 10-309 subset) *(DOING — 2025-10-19)*
- **Scope:** Resolve hoisted `node_modules`, PNPM structures, Yarn Berry Plug'n'Play, symlinked workspaces, and detect security-sensitive scripts.
- **Deliverables:**
- `StellaOps.Scanner.Analyzers.Lang.Node` plug-in with manifest + DI registration.
- Deterministic walker supporting >100k modules with streaming JSON parsing.
- Workspace graph persisted as analyzer metadata (`package.json` provenance + symlink target proofs).
- **Acceptance Metrics:**
- 10k module fixture scans <1.8s on 4vCPU (p95).
- Memory ceiling <220MB (tracked via deterministic benchmark harness).
- All symlink targets canonicalized; path traversal guarded.
- **Gate Artifacts:**
- `Fixtures/lang/node/**` golden outputs.
- Analyzer benchmark CSV + flamegraph (commit under `bench/Scanner.Analyzers`).
- Worker integration sample enabling Node analyzer via manifest.
- **Progress (2025-10-19):** Module walker with package-lock/yarn/pnpm resolution, workspace attribution, integrity metadata, and deterministic fixture harness committed; Node tasks 10-302A/B marked DONE. Shared component mapper + canonical result harness landed, closing tasks 10-307/308. Script metadata & telemetry (10-302C) emit policy hints, hashed evidence, and feed `scanner_analyzer_node_scripts_total` into Worker OpenTelemetry pipeline.
## Sprint LA2 — Python Analyzer & Entry Point Attribution (Tasks 10-303, 10-307, 10-308, 10-309 subset)
- **Scope:** Parse `*.dist-info`, `RECORD` hashes, entry points, and pip-installed editable packages; integrate usage hints from EntryTrace.
- **Deliverables:**
- `StellaOps.Scanner.Analyzers.Lang.Python` plug-in.
- RECORD hash validation with optional Zip64 support for `.whl` caches.
- Entry-point mapping into `UsageFlags` for Emit stage.
- **Acceptance Metrics:**
- Hash verification throughput 75MB/s sustained with streaming reader.
- False-positive rate for editable installs <1% on curated fixtures.
- Determinism check across CPython 3.83.12 generated metadata.
- **Gate Artifacts:**
- Golden fixtures for `site-packages`, virtualenv, and layered pip caches.
- Usage hint propagation tests (EntryTrace analyzer SBOM).
- Metrics counters (`scanner_analyzer_python_components_total`) documented.
## Sprint LA3 — Go Analyzer & Build Info Synthesis (Tasks 10-304, 10-307, 10-308, 10-309 subset)
- **Scope:** Extract Go build metadata from `.note.go.buildid`, embedded module info, and fallback to `bin:{sha256}`; surface VCS provenance.
- **Deliverables:**
- `StellaOps.Scanner.Analyzers.Lang.Go` plug-in.
- DWARF-lite parser to enrich component origin (commit hash + dirty flag) when available.
- Shared hash cache to dedupe repeated binaries across layers.
- **Acceptance Metrics:**
- Analyzer latency 400µs per binary (hot cache) / 2ms (cold).
- Provenance coverage 95% on representative Go fixture suite.
- Zero allocations in happy path beyond pooled buffers (validated via BenchmarkDotNet).
- **Gate Artifacts:**
- Benchmarks vs competitor open-source tool (Trivy or Syft) demonstrating faster metadata extraction.
- Documentation snippet explaining VCS metadata fields for Policy team.
## Sprint LA4 — .NET Analyzer & RID Variants (Tasks 10-305, 10-307, 10-308, 10-309 subset)
- **Scope:** Parse `*.deps.json`, `runtimeconfig.json`, assembly metadata, and RID-specific assets; correlate with native dependencies.
- **Deliverables:**
- `StellaOps.Scanner.Analyzers.Lang.DotNet` plug-in.
- Strong-name + Authenticode optional verification when offline cert bundle provided.
- RID-aware component grouping with fallback to `bin:{sha256}` for self-contained apps.
- **Acceptance Metrics:**
- Multi-target app fixture processed <1.2s; memory <250MB.
- RID variant collapse reduces component explosion by 40% vs naive listing.
- All security metadata (signing Publisher, timestamp) surfaced deterministically.
- **Gate Artifacts:**
- Signed .NET sample apps (framework-dependent & self-contained) under `samples/scanner/lang/dotnet/`.
- Tests verifying dual runtimeconfig merge logic.
- Guidance for Policy on license propagation from NuGet metadata.
## Sprint LA5 — Rust Analyzer & Binary Fingerprinting (Tasks 10-306, 10-307, 10-308, 10-309 subset)
- **Scope:** Detect crates via metadata in `.fingerprint`, Cargo.lock fragments, or embedded `rustc` markers; robust fallback to binary hash classification.
- **Deliverables:**
- `StellaOps.Scanner.Analyzers.Lang.Rust` plug-in.
- Symbol table heuristics capable of attributing stripped binaries by leveraging `.comment` and section names without violating determinism.
- Quiet-provenance flags to differentiate heuristics from hard evidence.
- **Acceptance Metrics:**
- Accurate crate attribution 85% on curated Cargo workspace fixtures.
- Heuristic fallback clearly labeled; no false certain claims.
- Analyzer completes <1s on 500 binary corpus.
- **Gate Artifacts:**
- Fixtures covering cargo workspaces, binaries with embedded metadata stripped.
- ADR documenting heuristic boundaries + risk mitigations.
## Sprint LA6 — Shared Evidence Enhancements & Worker Integration (Tasks 10-307, 10-308, 10-309 finalization)
- **Scope:** Finalize shared helpers, deterministic harness expansion, Worker/Emit wiring, and macro benchmarks.
- **Deliverables:**
- Consolidated `LanguageComponentWriter` extensions for license, vulnerability hints, and usage propagation.
- Worker dispatcher loading plug-ins via manifest registry + health checks.
- Combined analyzer benchmark suite executed in CI with regression thresholds.
- **Acceptance Metrics:**
- Worker executes mixed analyzer suite (Java+Node+Python+Go+.NET+Rust) within SLA: warm scan <6s, cold <25s.
- CI determinism guard catches output drift (>0 diff tolerance) across all fixtures.
- Telemetry coverage: each analyzer emits timing + component counters.
- **Gate Artifacts:**
- `SPRINTS_LANG_IMPLEMENTATION_PLAN.md` progress log updated (this file).
- `bench/Scanner.Analyzers/lang-matrix.csv` recorded + referenced in docs.
- Ops notes for packaging plug-ins into Offline Kit.
---
## Cross-Sprint Considerations
- **Security:** All analyzers must enforce path canonicalization, guard against zip-slip, and expose provenance classifications (`observed`, `heuristic`, `attested`).
- **Offline-first:** No network calls; rely on cached metadata and optional offline bundles (license texts, signature roots).
- **Determinism:** Normalise timestamps to `0001-01-01T00:00:00Z` when persisting synthetic data; sort collections by stable keys.
- **Benchmarking:** Extend `bench/Scanner.Analyzers` to compare against open-source scanners (Syft/Trivy) and document performance wins.
- **Hand-offs:** Emit guild requires consistent component schemas; Policy needs license + provenance metadata; Scheduler depends on usage flags for ImpactIndex.
## Tracking & Reporting
- Update `TASKS.md` per sprint (TODO → DOING → DONE) with date stamps.
- Log sprint summaries in `docs/updates/` once each sprint lands.
- Use module-specific CI pipeline to run analyzer suites nightly (determinism + perf).
---
**Next Action:** Start Sprint LA1 (Node Analyzer) — move tasks 10-302, 10-307, 10-308, 10-309 → DOING and spin up fixtures + benchmarks.