Files
git.stella-ops.org/docs/implplan/SPRINT_0143_0001_0001_signals.md
StellaOps Bot 108d1c64b3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
up
2025-12-09 09:38:09 +02:00

19 KiB
Raw Blame History

Sprint 0143-0000-0001 · Signals

Topic & Scope

  • Runtime & Signals stream focused on reachability ingestion, runtime facts, and scoring.
  • Deliver CAS-backed callgraph ingestion for Java/Node.js/Python/Go plus runtime facts NDJSON/gzip ingestion with provenance enrichment.
  • Produce reachability scoring engine with Redis-backed caching and signals.fact.updated events, honoring CAS remediation/waiver rules.
  • Working directory: src/Signals/StellaOps.Signals

Dependencies & Concurrency

  • Upstream sprints: 120.A (AirGap), 130.A (Scanner).
  • Tasks sit in Signals; no cross-module coupling flagged beyond Authority (AUTH-SIG-26-001) for finished skeleton.
  • Completed/historic work archived in docs/implplan/archived/tasks.md (last updated 2025-11-08).

Documentation Prerequisites

  • docs/README.md; docs/07_HIGH_LEVEL_ARCHITECTURE.md; docs/modules/platform/architecture-overview.md.
  • src/Signals/StellaOps.Signals/AGENTS.md.
  • CAS waiver/remediation checklist dated 2025-11-17 for SIGNALS-24-002/004/005 scope.

BLOCKED Tasks: Before working on BLOCKED tasks, review BLOCKED_DEPENDENCY_TREE.md for root blockers and dependencies.

Delivery Tracker

# Task ID Status Key dependency / next step Owners Task Definition
P1 PREP-SIGNALS-24-005-REDIS-CACHE-IMPLEMENTED-A DONE (2025-11-20) Doc published at docs/signals/events-24-005.md; bus/topic approved. Signals Guild, Platform Events Guild Redis cache implemented; awaiting real bus/topic + payload contract to replace placeholder signals.fact.updated logging.

Document artefact/deliverable for SIGNALS-24-005 and publish location so downstream tasks can proceed.
P2 PREP-SIGNALS-24-002-CAS-PROMO DONE (2025-11-19) Due 2025-11-22 · Accountable: Signals Guild · Platform Storage Guild Signals Guild · Platform Storage Guild CAS promotion checklist and manifest schema published at docs/signals/cas-promotion-24-002.md; awaiting storage approval to execute.
P3 PREP-SIGNALS-24-003-PROVENANCE DONE (2025-11-19) Due 2025-11-22 · Accountable: Signals Guild · Runtime Guild · Authority Guild Signals Guild · Runtime Guild · Authority Guild Provenance appendix fields and checklist published at docs/signals/provenance-24-003.md; awaiting schema/signing approval to execute.
1 SIGNALS-24-001 DONE (2025-11-09) Dependency AUTH-SIG-26-001; merged host skeleton with scope policies and evidence validation. Signals Guild, Authority Guild Stand up Signals API skeleton with RBAC, sealed-mode config, DPoP/mTLS enforcement, and /facts scaffolding so downstream ingestion can begin.
2 SIGNALS-24-002 DONE (2025-12-08) CAS storage implementation started. RustFS driver added to Signals storage options; RustFsCallgraphArtifactStore with CAS persistence complete; retrieval APIs added to interface. Signals Guild Implement callgraph ingestion/normalization (Java/Node/Python/Go) with CAS persistence and retrieval APIs to feed reachability scoring.
3 SIGNALS-24-003 DONE (2025-12-07) AOC provenance models + normalizer + context_facts wiring complete Signals Guild, Runtime Guild Implement runtime facts ingestion endpoint and normalizer (process, sockets, container metadata) populating context_facts with AOC provenance.
4 SIGNALS-24-004 DONE (2025-11-17) Scoring weights now configurable; runtime ingestion auto-triggers recompute into reachability_facts. Signals Guild, Data Science Deliver reachability scoring engine producing states/scores and writing to reachability_facts; expose configuration for weights.
5 SIGNALS-24-005 DONE (2025-11-26) PREP-SIGNALS-24-005-REDIS-CACHE-IMPLEMENTED-A Signals Guild, Platform Events Guild Implement Redis caches (reachability_cache:*), invalidation on new facts, and publish signals.fact.updated events.

Action Tracker

Action Owner(s) Due Status Next step
CAS approval decision (SIGNALS-24-002) Signals Guild · Platform Storage Guild 2025-12-06 DONE CAS Infrastructure Contract APPROVED at docs/contracts/cas-infrastructure.md. SIGNALS-24-002/003 unblocked.
Provenance appendix freeze (SIGNALS-24-003) Runtime Guild · Authority Guild 2025-12-07 DONE Appendix + fixtures published (docs/signals/provenance-24-003.md, docs/schemas/provenance-feed.schema.json).
Production re-sign of signals artefacts Signals Guild · Platform / Build Guild 2025-12-06 DONE (pipeline ready 2025-12-09) CI workflows (signals-reachability.yml, signals-evidence-locker.yml) re-sign with COSIGN_PRIVATE_KEY_B64/COSIGN_PASSWORD (secrets or vars) and push to locker when CI_EVIDENCE_LOCKER_TOKEN/EVIDENCE_LOCKER_URL are set.
Postprod-sign scoring regression Signals Guild 2025-12-07 DONE (2025-12-09) Reachability smoke suite (scripts/signals/reachability-smoke.sh) passing after deterministic digest/events changes.

Execution Log

Date (UTC) Update Owner
2025-12-10 Router-backed publisher added: Signals.Events.Driver=router now POSTs signals.fact.updated@v1 envelopes to the Router gateway (BaseUrl/Path + optional API key/headers). Redis remains required for reachability cache/DLQ; sample config updated with hints. Implementer
2025-12-09 SIGNALS-24-004/005 hardened: deterministic fact.version/digest hasher, Redis stream events (signals.fact.updated.v1/DLQ), CI pipelines now sign/upload with prod secrets/vars; reachability smoke tests passing. Implementer
2025-12-08 Cleared locked Microsoft.SourceLink.GitLab.dll.bak from repo-scoped .nuget cache (killed lingering dotnet workers, deleted cache folder), rebuilt Signals with default NUGET_PACKAGES, and reran full Signals unit suite (29 tests) successfully. Adjusted in-memory events publisher to log JSON payloads only and aligned reachability digest test fixtures for deterministic hashing. Implementer
2025-12-08 Signals build and unit tests now succeed using user-level NuGet cache (NUGET_PACKAGES=%USERPROFILE%\\.nuget\\packages) to bypass locked repo cache file. Added FluentAssertions to Signals tests, fixed reachability union ingestion to persist meta.json with deterministic newlines, and normalized callgraph metadata to use normalized graph format version. Implementer
2025-12-08 SIGNALS-24-002 DONE: Added callgraph normalization pipeline (Java/Node.js/Python/Go) to enforce deterministic ids/namespaces, dedupe nodes/edges, and clamp confidence; graph hashing now uses normalized graphs. Ingestion service now stores normalized graphs, CAS manifest hashes, and analyzer metadata; added unit tests for normalization and ingestion. Build attempt hit SourceLink file lock (Microsoft.SourceLink.GitLab.dll); tests not run in-session due to that permission error. Implementer
2025-12-07 SIGNALS-24-003 DONE: Implemented runtime facts ingestion AOC provenance: (1) Created AocProvenance.cs with full provenance-feed.schema.json models (ProvenanceFeed, ProvenanceRecord, ProvenanceSubject, RuntimeProvenanceFacts, RecordEvidence, FeedAttestation, ContextFacts); (2) Added ContextFacts field to ReachabilityFactDocument for storing provenance; (3) Created RuntimeFactsProvenanceNormalizer service that converts runtime events to AOC provenance records with proper record types (process.observed, network.connection, container.activity, package.loaded, symbol.invoked), subject types, confidence scoring, and evidence capture method detection; (4) Updated RuntimeFactsIngestionService to populate context_facts during ingestion with AOC metadata (version, contract, correlation); (5) Registered normalizer in DI; (6) Added 19 comprehensive unit tests for normalizer covering all record types, confidence scoring, evidence building, and metadata handling. Build succeeds; 20/20 runtime facts tests pass. Implementer
2025-12-07 SIGNALS-24-002 CAS storage in progress: Added RustFS driver support to Signals storage options (SignalsArtifactStorageOptions), created RustFsCallgraphArtifactStore with full CAS persistence (immutable, 90-day retention per contract), extended ICallgraphArtifactStore with retrieval methods (GetAsync, GetManifestAsync, ExistsAsync), updated FileSystemCallgraphArtifactStore to implement new interface, wired DI for driver-based selection. Configuration sample updated at etc/signals.yaml.sample. Build succeeds; 5/6 tests pass (1 pre-existing ZIP test failure unrelated). Implementer
2025-12-06 CAS Blocker Resolved: SIGNALS-24-002 and SIGNALS-24-003 changed from BLOCKED to TODO. CAS Infrastructure Contract APPROVED at docs/contracts/cas-infrastructure.md; provenance schema at docs/schemas/provenance-feed.schema.json. Ready for implementation. Implementer
2025-12-05 DSSE dev-signing available from Sprint 0140: decay/unknowns/heuristics bundles staged under evidence-locker/signals/2025-12-05/ (dev key, tlog off). Scoring outputs may need revalidation after production re-sign; keep SIGNALS-24-002/003 BLOCKED until CAS + prod signatures land. Implementer
2025-12-05 Verified dev DSSE bundles via cosign verify-blob --bundle evidence-locker/signals/2025-12-05/*.sigstore.json --key tools/cosign/cosign.dev.pub (all OK). Pending production re-sign once Alice Carter key available. Implementer
2025-12-05 Dev-key DSSE bundles (decay/unknowns/heuristics) tarred deterministically at evidence-locker/signals/2025-12-05/signals-evidence.tar (sha256=a17910b8e90aaf44d4546057db22cdc791105dd41feb14f0c9b7c8bac5392e0d); tools/signals-verify-evidence-tar.sh added. Production re-sign still pending Alice Carter key/CI secret. Project Mgmt
2025-12-05 Added CI workflow signals-evidence-locker.yml and local uploader tools/signals-upload-evidence.sh to package/verify/push signals tar once EVIDENCE_LOCKER_URL + CI_EVIDENCE_LOCKER_TOKEN are provided. Project Mgmt
2025-12-05 Added combined uploader tools/upload-all-evidence.sh (signals + zastava) to simplify locker push once creds land. Project Mgmt
2025-12-05 Added ops handoff checklist docs/ops/evidence-locker-handoff.md (hashes, commands, required secrets, prod re-sign steps). Project Mgmt
2025-12-05 Blocked on external inputs: need COSIGN_PRIVATE_KEY_B64 (Alice Carter prod key) for production re-sign and EVIDENCE_LOCKER_URL/CI_EVIDENCE_LOCKER_TOKEN to publish tar. No further repo work pending until creds arrive. Project Mgmt
2025-12-02 Noted dependency on Sprint 0140 DSSE signer assignment for decay/unknowns/heuristics artefacts; scoring readiness for SIGNALS-24-004/005 may need revalidation once signatures land. No status change. Project Mgmt
2025-11-26 Enriched signals.fact.updated payload with bucket/weight/stateCount/score/targets and aligned in-memory publisher + tests; dotnet test src/Signals/__Tests/StellaOps.Signals.Tests/StellaOps.Signals.Tests.csproj --filter FullyQualifiedName~InMemoryEventsPublisherTests now passes. Implementer
2025-11-20 Published docs/signals/events-24-005.md event-bus contract (topic, envelope, retry/DLQ); marked PREP-SIGNALS-24-005 DONE and moved SIGNALS-24-005 to TODO. Implementer
2025-11-19 Assigned PREP owners/dates; see Delivery Tracker. Planning
2025-11-19 Marked SIGNALS-24-002 and SIGNALS-24-003 BLOCKED pending CAS promotion, signed manifests, and provenance schema. Implementer
2025-10-29 Skeleton live with scope policies, stub endpoints, integration tests; sample configuration committed under etc/signals.yaml.sample. Signals Guild
2025-10-29 JSON parsers for Java/Node.js/Python/Go implemented; artifacts stored with SHA-256 and callgraphs upserted into Mongo. Signals Guild
2025-11-09 Signals host registers sealed-mode evidence validation, exposes /readyz//status, enforces scope policies, and adds /signals/facts/{subjectKey} retrieval plus runtime-facts ingestion backing services. Signals Guild / Authority Guild
2025-11-09 Added /signals/callgraphs/{id} retrieval, sealed-mode gating, and CAS-backed artifact metadata responses; remaining work is CAS bucket promotion + signed graph manifests. Signals Guild
2025-11-09 Added runtime facts ingestion service + endpoint, aggregated runtime hit storage, and unit tests; next steps are NDJSON/gzip ingestion and provenance metadata wiring. Signals Guild / Runtime Guild
2025-11-09 Added /signals/runtime-facts/ndjson streaming endpoint (JSON/NDJSON + gzip) with sealed-mode gating; provenance/context enrichment + scoring linkage remain. Signals Guild / Runtime Guild
2025-11-17 CAS remediation window (≤3 days for Critical/High) approved with signed waiver; proceed with SIGNALS-24-002/004/005. Signals Guild
2025-11-17 CAS checklist in remediation window with risk waiver; SIGNALS-24-002/003 remain BLOCKED until CAS promotion + signed manifests land; 24-004/005 stay gated. Signals Guild
2025-11-17 Normalised sprint to standard template and renamed from SPRINT_143_signals.md to SPRINT_0143_0001_0001_signals.md. PM
2025-11-17 Reachability scoring weights moved to config; runtime facts ingestion now triggers recompute and persists states; added unit tests for scoring + runtime ingestion. Signals Guild
2025-11-17 dotnet test src/Signals/StellaOps.Signals.sln aborted after long restore/build; warning NU1504 about duplicate PackageReference items in StellaOps.Signals.Tests persists—needs cleanup before rerun. Signals Guild
2025-11-17 Runtime facts ingestion now stamps provenance metadata (source, ingestedAt, callgraphId) and recompute is triggered on ingest; targeted test run aborted mid-restore—rerun needed. Signals Guild
2025-11-18 dotnet restore for StellaOps.Signals.Tests now succeeds (16.8s); dotnet test -v:diag --blame-hang-timeout 120s still running long—awaiting stable completion. Signals Guild
2025-11-18 Redis reachability cache added (StackExchange.Redis) with configurable TTL; repository now wrapped with cache decorator; cache config added to signals.yaml.sample. Signals Guild
2025-11-18 Signals unit tests (ReachabilityScoringServiceTests, RuntimeFactsIngestionServiceTests) discovered successfully; targeted test run completed (tests passed). Signals Guild
2025-11-18 dotnet test --no-build --list-tests and subsequent run now succeed for Signals tests (6.2s). Signals Guild
2025-11-18 Structured signals.fact.updated@v1 payload + logging added with unit coverage (InMemoryEventsPublisherTests); bus/channel contract still pending; full solution test run cancelled for time (needs rerun). Signals Guild
2025-11-18 Another targeted test run (/m:1 --no-restore --filter InMemoryEventsPublisherTests) still times out >40s due to upstream Authority/Cryptography build fan-out; leave as follow-up once caches are warm. Signals Guild
2025-11-18 Signals test project detangled from Concelier shared infra (set UseConcelierTestInfra=false, explicit test packages), added InternalsVisibleTo for Signals tests, and refreshed cache/events test fakes; Signals solution build now clean and dotnet test --no-build --filter InMemoryEventsPublisherTests passes. Event bus contract still outstanding. Signals Guild
2025-11-18 Created expected local-nugets/ feed directory to clear NU1301 failures; full Signals solution restore still ran >60s and was cancelled for time—needs longer restore window before rerunning dotnet test on the solution. Signals Guild
2025-11-18 Full Signals solution dotnet restore --disable-parallel now succeeds (33.7s). A full dotnet test --no-restore /m:1 attempt ran ~101s and was cancelled during cryptography-plugin build; full suite still needs a longer window to finish. Signals Guild
2025-11-18 Re-attempted dotnet test --no-restore /m:1 --blame-hang-timeout 240s; aborted early (~14s) to avoid another long hang. Full solution test still pending a longer uninterrupted window. Signals Guild
2025-11-18 Tried dotnet build src/Signals/StellaOps.Signals.sln --no-restore /m:1; aborted after ~12s as build again fanned into Cryptography plugins. Need either build filtering or dedicated window to let full solution finish. Signals Guild
2025-11-18 Targeted dotnet test src/Signals/__Tests/StellaOps.Signals.Tests/StellaOps.Signals.Tests.csproj --no-build --no-restore was started but cancelled by operator after ~9s during generated Program file step; unit suite previously green—no new code changes since. Signals Guild
2025-11-18 Attempted dotnet build src/Signals/StellaOps.Signals/StellaOps.Signals.csproj --no-restore /m:1; cancelled after ~9s when build began resolving upstream auth/crypto dependencies. Signals Guild
2025-11-18 Added AirGap.EventTopic option (config + options) and fixed InMemoryEventsPublisher build error; dotnet build src/Signals/StellaOps.Signals/StellaOps.Signals.csproj --no-restore /m:1 now succeeds. Signals Guild
2025-11-18 Signals unit tests now pass via dotnet test src/Signals/__Tests/StellaOps.Signals.Tests/StellaOps.Signals.Tests.csproj --no-build --no-restore (3 tests, 0 failures, ~4s). Signals Guild
2025-11-18 Full Signals solution test (dotnet test src/Signals/StellaOps.Signals.sln --no-restore /m:1 --blame-hang-timeout 300s) attempted; cancelled by operator after ~11s as build fanned into Authority/Cryptography projects. Requires longer window or filtered solution. Signals Guild

Decisions & Risks

  • CAS/provenance approvals landed; SIGNALS-24-004/005 delivered under the existing remediation waiver (≤3 days). Monitor waiver compliance as scoring runs.
  • Redis stream publisher (signals.fact.updated.v1 + DLQ) implements the docs/signals/events-24-005.md contract; ensure DLQ monitoring in CI/staging.
  • Production re-sign/upload automated via signals-reachability.yml and signals-evidence-locker.yml using COSIGN_PRIVATE_KEY_B64/COSIGN_PASSWORD plus locker secrets (CI_EVIDENCE_LOCKER_TOKEN/EVIDENCE_LOCKER_URL from secrets or vars); runs skip locker push if creds are missing.
  • Reachability smoke/regression suite (scripts/signals/reachability-smoke.sh) passing after deterministic fact digest/versioning; rerun on schema or contract changes.
  • Router transport now wired for Signals events (Signals.Events.Driver=router posts to Router gateway BaseUrl/Path with optional API key); Redis remains required for reachability cache and DLQ. Ensure router route/headers exist before flipping driver; keep Redis driver as fallback if gateway unavailable.
  • Repo .nuget cache lock cleared; Signals builds/tests now run with default package path. Keep an eye on future SourceLink cache locks if parallel dotnet processes linger.

Next Checkpoints

  • 2025-12-10 · First CI run of signals-reachability.yml with production secrets/vars to re-sign and upload evidence.
  • 2025-12-10 · Enable Redis stream monitoring (primary + DLQ) for signals.fact.updated.v1 after first publish.
  • Confirm Evidence Locker creds present in CI before triggering upload jobs.