feat: Add initial implementation of Vulnerability Resolver Jobs
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies. - Documented roles and guidelines in AGENTS.md for Scheduler module. - Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs. - Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics. - Developed API endpoints for managing resolver jobs and retrieving metrics. - Defined models for resolver job requests and responses. - Integrated dependency injection for resolver job services. - Implemented ImpactIndexSnapshot for persisting impact index data. - Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring. - Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService. - Created dotnet-filter.sh script to handle command-line arguments for dotnet. - Established nuget-prime project for managing package downloads.
This commit is contained in:
@@ -28,9 +28,12 @@ The `stella` CLI is the operator-facing Swiss army knife for scans, exports, pol
|
||||
- ./guides/cli-reference.md
|
||||
- ./guides/policy.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-CLI-OBS-52-001 / DOCS-CLI-FORENSICS-53-001 in ../../TASKS.md.
|
||||
- CLI-CORE-41-001 epic in `src/Cli/StellaOps.Cli/TASKS.md`.
|
||||
## Backlog references
|
||||
- DOCS-CLI-OBS-52-001 / DOCS-CLI-FORENSICS-53-001 in ../../TASKS.md.
|
||||
- CLI-CORE-41-001 epic in `src/Cli/StellaOps.Cli/TASKS.md`.
|
||||
|
||||
## Current workstreams (Q4 2025)
|
||||
- Active docs sprint: `docs/implplan/SPRINT_0316_0001_0001_docs_modules_cli.md` — normalised sprint naming, doc sync, and upcoming ops/runbook refresh.
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 2 – Policy Engine & Editor:** deliver deterministic policy authoring, simulation, and explain verbs.
|
||||
|
||||
@@ -4,10 +4,11 @@
|
||||
- Maintain deterministic behaviour and offline parity across releases.
|
||||
- Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes.
|
||||
|
||||
## Workstreams
|
||||
- Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap.
|
||||
- Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs.
|
||||
- Validation: extend tests/fixtures to preserve determinism and provenance requirements.
|
||||
## Workstreams
|
||||
- Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap.
|
||||
- Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs.
|
||||
- Validation: extend tests/fixtures to preserve determinism and provenance requirements.
|
||||
- Documentation sync: keep module docs aligned with active sprint `docs/implplan/SPRINT_0316_0001_0001_docs_modules_cli.md`.
|
||||
|
||||
## Epic milestones
|
||||
- **Epic 2 – Policy Engine & Editor:** deliver deterministic policy verbs, simulation, and explain outputs.
|
||||
|
||||
47
docs/modules/concelier/advisory-ai-api.md
Normal file
47
docs/modules/concelier/advisory-ai-api.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# Advisory AI API (structured chunks)
|
||||
|
||||
**Scope:** `/advisories/{advisoryKey}/chunks` (Concelier WebService) · aligned with Sprint 0112 canonical model.
|
||||
|
||||
## Response contract
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"advisoryKey": "CVE-2025-0001",
|
||||
"fingerprint": "<sha256 canonical advisory>",
|
||||
"total": 3,
|
||||
"truncated": false,
|
||||
"entries": [
|
||||
{
|
||||
"type": "workaround", // ordered by (type, observationPath, documentId)
|
||||
"chunkId": "c0ffee12", // sha256(documentId|observationPath) first 8 bytes
|
||||
"content": { /* structured field payload */ },
|
||||
"provenance": {
|
||||
"documentId": "tenant-a:chunk:newest", // Observation _id
|
||||
"observationPath": "/references/0", // JSON Pointer into observation
|
||||
"source": "nvd",
|
||||
"kind": "workaround",
|
||||
"value": "tenant-a:chunk:newest",
|
||||
"recordedAt": "2025-01-07T00:00:00Z",
|
||||
"fieldMask": ["/references/0"]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Determinism & provenance
|
||||
|
||||
- Sort entries by `(type, observationPath, documentId)` to keep cache keys stable across nodes.
|
||||
- Cache keys include the advisory `fingerprint`, chunk/observation limits, filters, and observation hashes.
|
||||
- Provenance anchors must always include both `documentId` and `observationPath` for Console/Attestor deep links and offline mirrors.
|
||||
|
||||
### Query parameters
|
||||
|
||||
- `tenant` (required): tenant id; must match authorization context.
|
||||
- `limit`, `observations`, `minLength`: bounded integers (see `ConcelierOptions.AdvisoryChunks`).
|
||||
- `section`, `format`: comma-separated filters (case-insensitive).
|
||||
|
||||
### Compatibility notes
|
||||
|
||||
- Mirrors and offline kits rely on `fingerprint` + `chunkId` to verify chunks without re-merging observations.
|
||||
- Field names mirror GHSA GraphQL and Cisco PSIRT openVuln payloads for downstream parity.
|
||||
@@ -1,12 +1,15 @@
|
||||
# Link-Not-Merge (LNM) Observation & Linkset Schema
|
||||
|
||||
_Draft for approval — authored 2025-11-16 to unblock CONCELIER-LNM tracks._
|
||||
_Frozen v1 (add-only) — approved 2025-11-17 for CONCELIER-LNM-21-001/002/101._
|
||||
|
||||
## Goals
|
||||
- Immutable storage of raw advisory observations per source/tenant.
|
||||
- Deterministic linksets built from observations without merging or mutating originals.
|
||||
- Stable across online/offline deployments; replayable from raw inputs.
|
||||
|
||||
## Status
|
||||
- Frozen v1 as of 2025-11-17; further schema changes must go through ADR + sprint gating (CONCELIER-LNM-22x+).
|
||||
|
||||
## Observation document (Mongo JSON Schema excerpt)
|
||||
```json
|
||||
{
|
||||
@@ -41,6 +44,17 @@ _Draft for approval — authored 2025-11-16 to unblock CONCELIER-LNM tracks._
|
||||
}
|
||||
},
|
||||
"references": {"bsonType": "array", "items": {"bsonType":"string"}},
|
||||
"scopes": {"bsonType":"array","items":{"bsonType":"string"}},
|
||||
"relationships": {
|
||||
"bsonType": "array",
|
||||
"items": {"bsonType":"object","required":["type","source","target"],
|
||||
"properties": {
|
||||
"type":{"bsonType":"string"},
|
||||
"source":{"bsonType":"string"},
|
||||
"target":{"bsonType":"string"},
|
||||
"provenance":{"bsonType":"string"}
|
||||
}}
|
||||
},
|
||||
"weaknesses": {"bsonType":"array","items":{"bsonType":"string"}},
|
||||
"published": {"bsonType": "date"},
|
||||
"modified": {"bsonType": "date"},
|
||||
@@ -84,6 +98,14 @@ _Draft for approval — authored 2025-11-16 to unblock CONCELIER-LNM tracks._
|
||||
"severities": {"bsonType":"array","items":{"bsonType":"object"}}
|
||||
}
|
||||
},
|
||||
"confidence": {"bsonType":"double", "description":"Optional correlation confidence (0–1)"},
|
||||
"conflicts": {"bsonType":"array","items":{"bsonType":"object",
|
||||
"required":["field","reason"],
|
||||
"properties":{
|
||||
"field":{"bsonType":"string"},
|
||||
"reason":{"bsonType":"string"},
|
||||
"values":{"bsonType":"array","items":{"bsonType":"string"}}
|
||||
}}},
|
||||
"createdAt":{"bsonType":"date"},
|
||||
"builtByJobId":{"bsonType":"string"},
|
||||
"provenance": {"bsonType":"object","properties":{
|
||||
|
||||
89
docs/modules/excititor/evidence-contract.md
Normal file
89
docs/modules/excititor/evidence-contract.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# Excititor Advisory-AI Evidence Contract (v1)
|
||||
|
||||
Updated: 2025-11-18 · Scope: EXCITITOR-AIAI-31-004 (Phase 119)
|
||||
|
||||
This note defines the deterministic, aggregation-only contract that Excititor exposes to Advisory AI and Lens consumers. It covers the `/v1/vex/evidence/chunks` NDJSON stream plus the projection rules for observation IDs, signatures, and provenance metadata.
|
||||
|
||||
## Goals
|
||||
- **Deterministic & replayable**: stable ordering, no implicit clocks, fixed schemas.
|
||||
- **Aggregation-only**: no consensus/inference; raw supplier statements plus signatures and AOC (Aggregation-Only Contract) guardrails.
|
||||
- **Offline-friendly**: chunked NDJSON; no cross-tenant lookups; portable enough for mirror/air-gap bundles.
|
||||
|
||||
## Endpoint
|
||||
- `GET /v1/vex/evidence/chunks`
|
||||
- **Query**:
|
||||
- `tenant` (required)
|
||||
- `vulnerabilityId` (optional, repeatable) — CVE, GHSA, etc.
|
||||
- `productKey` (optional, repeatable) — PURLish key used by Advisory AI.
|
||||
- `cursor` (optional) — stable pagination token.
|
||||
- `limit` (optional) — max records per stream chunk (default 500, max 2000).
|
||||
- **Response**: `Content-Type: application/x-ndjson`
|
||||
- Each line is a single evidence record (see schema below).
|
||||
- Ordered by `(tenant, vulnerabilityId, productKey, observationId, statementId)` to stay deterministic.
|
||||
|
||||
## Evidence record schema (NDJSON)
|
||||
```json
|
||||
{
|
||||
"tenant": "acme",
|
||||
"vulnerabilityId": "CVE-2024-1234",
|
||||
"productKey": "pkg:pypi/django@3.2.24",
|
||||
"observationId": "obs-3cf9d6e4-…",
|
||||
"statementId": "stmt-9c1d…",
|
||||
"source": {
|
||||
"supplier": "upstream:osv",
|
||||
"documentId": "osv:GHSA-xxxx-yyyy",
|
||||
"retrievedAt": "2025-11-10T12:34:56Z",
|
||||
"signatureStatus": "missing|unverified|verified"
|
||||
},
|
||||
"aoc": {
|
||||
"violations": [
|
||||
{ "code": "EVIDENCE_SIGNATURE_MISSING", "surface": "ingest" }
|
||||
]
|
||||
},
|
||||
"evidence": {
|
||||
"type": "vex.statement",
|
||||
"payload": { "...supplier-normalized-fields..." }
|
||||
},
|
||||
"provenance": {
|
||||
"hash": "sha256:...",
|
||||
"canonicalUri": "https://mirror.example/bundles/…",
|
||||
"bundleId": "mirror-bundle-001"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Field notes
|
||||
- `observationId` is stable and maps 1:1 to internal storage; Advisory AI must cite it when emitting narratives.
|
||||
- `statementId` remains unique within an observation.
|
||||
- `signatureStatus` is pass-through from ingest; no interpretation beyond `missing|unverified|verified`.
|
||||
- `aoc.violations` enumerates guardrail violations without blocking delivery.
|
||||
- `evidence.payload` is supplier-shaped; we **do not** merge or rank.
|
||||
- `provenance.hash` is the SHA-256 of the supplier document bytes; `canonicalUri` points to the mirror bundle when available.
|
||||
|
||||
## Determinism rules
|
||||
- Ordering: fixed sort above; pagination cursor is derived from the last emitted `(tenant, vulnerabilityId, productKey, observationId, statementId)`.
|
||||
- Clocks: All timestamps are UTC ISO-8601 with `Z`.
|
||||
- No server-generated randomness; record content is idempotent for identical upstream inputs.
|
||||
|
||||
## AOC guardrails
|
||||
- Enforced surfaces: ingest, `/v1/vex/aoc/verify`, and chunk emission.
|
||||
- Violations are reported via `aoc.violations` and metric `excititor.vex.aoc.guard_violations`.
|
||||
- No statements are dropped due to AOC; consumers decide how to act.
|
||||
|
||||
## Telemetry (counters/logs-only until span sink arrives)
|
||||
- `excititor.vex.chunks.requests` — by `tenant`, `outcome`, `truncated`.
|
||||
- `excititor.vex.chunks.bytes` — histogram of NDJSON stream sizes.
|
||||
- `excititor.vex.chunks.records` — histogram of records per stream.
|
||||
- Existing observation metrics (`excititor.vex.observation.*`) remain unchanged.
|
||||
|
||||
## Error handling
|
||||
- 400 for invalid tenant or mutually exclusive filters.
|
||||
- 429 with `Retry-After` when throttle budgets exceeded.
|
||||
- 503 on upstream store/transient failures; responses remain NDJSON-free on error.
|
||||
|
||||
## Offline / mirror readiness
|
||||
- When mirror bundles are configured, `provenance.canonicalUri` points to the local bundle path; otherwise it is omitted.
|
||||
- All payloads are side-effect free; no remote fetches occur while streaming.
|
||||
|
||||
## Versioning
|
||||
- Contract version: `v1` (this document). Changes must be additive; breaking changes require `v2` path and updated doc.
|
||||
@@ -17,7 +17,10 @@ Excititor’s evidence APIs now emit first-class OpenTelemetry metrics so Lens,
|
||||
| `excititor.vex.observation.requests` | Counter | Number of `/v1/vex/observations/{vulnerabilityId}/{productKey}` requests handled. | `tenant`, `outcome` (`success`, `error`, `cancelled`), `truncated` (`true/false`) |
|
||||
| `excititor.vex.observation.statement_count` | Histogram | Distribution of statements returned per observation projection request. | `tenant`, `outcome` |
|
||||
| `excititor.vex.signature.status` | Counter | Signature status per statement (missing vs. unverified). | `tenant`, `status` (`missing`, `unverified`) |
|
||||
| `excititor.vex.aoc.guard_violations` | Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + `/vex/aoc/verify`). | `tenant`, `surface` (`ingest`, `aoc_verify`, etc.), `code` (AOC error code) |
|
||||
| `excititor.vex.aoc.guard_violations` | Counter | Aggregated count of Aggregation-Only Contract violations detected by the WebService (ingest + `/v1/vex/aoc/verify`). | `tenant`, `surface` (`ingest`, `aoc_verify`, etc.), `code` (AOC error code) |
|
||||
| `excititor.vex.chunks.requests` | Counter | Requests to `/v1/vex/evidence/chunks` (NDJSON stream). | `tenant`, `outcome` (`success`,`error`,`cancelled`), `truncated` (`true/false`) |
|
||||
| `excititor.vex.chunks.bytes` | Histogram | Size of NDJSON chunk streams served (bytes). | `tenant`, `outcome` |
|
||||
| `excititor.vex.chunks.records` | Histogram | Count of evidence records emitted per chunk stream. | `tenant`, `outcome` |
|
||||
|
||||
> All metrics originate from the `EvidenceTelemetry` helper (`src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs`). When disabled (telemetry off), the helper is inert.
|
||||
|
||||
@@ -31,8 +34,8 @@ Excititor’s evidence APIs now emit first-class OpenTelemetry metrics so Lens,
|
||||
|
||||
1. **Enable telemetry**: set `Excititor:Telemetry:EnableMetrics=true`, configure OTLP endpoints/headers as described in `TelemetryExtensions`.
|
||||
2. **Add dashboards**: import panels referencing the metrics above (see Grafana JSON snippets in Ops repo once merged).
|
||||
3. **Alerting**: add rules for high guard violation rates and missing signatures. Tie alerts back to connectors via tenant metadata.
|
||||
4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...`, watching the console exporter (dev) or OTLP (prod).
|
||||
3. **Alerting**: add rules for high guard violation rates, missing signatures, and abnormal chunk bytes/record counts. Tie alerts back to connectors via tenant metadata.
|
||||
4. **Post-deploy checks**: after each release, verify metrics emit by curling `/v1/vex/observations/...` and `/v1/vex/evidence/chunks`, watching the console exporter (dev) or OTLP (prod).
|
||||
|
||||
## Related documents
|
||||
|
||||
|
||||
@@ -17,6 +17,8 @@
|
||||
| `ledger_ingest_backlog_events` | Gauge | `tenant` | Number of events buffered in the writer queue. Alert when >5 000 for 5 min. |
|
||||
| `ledger_projection_lag_seconds` | Gauge | `tenant` | Wall-clock difference between latest ledger event and projection tail. Target <30 s. |
|
||||
| `ledger_projection_rebuild_seconds` | Histogram | `tenant` | Duration of replay/rebuild operations triggered by LEDGER-29-008 harness. |
|
||||
| `ledger_projection_apply_seconds` | Histogram | `tenant`, `event_type`, `policy_version`, `evaluation_status` | Time to apply a single ledger event to projection. Target P95 <1 s. |
|
||||
| `ledger_projection_events_total` | Counter | `tenant`, `event_type`, `policy_version`, `evaluation_status` | Count of events applied to projections. |
|
||||
| `ledger_merkle_anchor_duration_seconds` | Histogram | `tenant` | Time to batch + anchor events. Target <60 s per 10k events. |
|
||||
| `ledger_merkle_anchor_failures_total` | Counter | `tenant`, `reason` (`db`, `signing`, `network`) | Alerts at >0 within 15 min. |
|
||||
| `ledger_attachments_encryption_failures_total` | Counter | `tenant`, `stage` (`encrypt`, `sign`, `upload`) | Ensures secure attachment pipeline stays healthy. |
|
||||
@@ -25,22 +27,23 @@
|
||||
|
||||
### Derived dashboards
|
||||
- **Writer health:** `ledger_write_latency_seconds` (P50/P95/P99), backlog gauge, event throughput.
|
||||
- **Projection health:** `ledger_projection_lag_seconds`, rebuild durations, conflict counts (from logs).
|
||||
- **Projection health:** `ledger_projection_lag_seconds`, `ledger_projection_apply_seconds`, projection throughput, conflict counts (from logs).
|
||||
- **Anchoring:** Anchor duration histogram, failure counter, root hash timeline.
|
||||
|
||||
## 3. Logs & traces
|
||||
- **Log structure:** Serilog JSON with fields `tenant`, `chainId`, `sequence`, `eventId`, `eventType`, `actorId`, `policyVersion`, `hash`, `merkleRoot`.
|
||||
- **Log levels:** `Information` for success summaries (sampled), `Warning` for retried operations, `Error` for failed writes/anchors.
|
||||
- **Correlation:** Each API request includes `requestId` + `traceId` logged with events. Projector logs capture `replayId` and `rebuildReason`.
|
||||
- **Timeline events:** `ledger.event.appended` and `ledger.projection.updated` are emitted as structured logs carrying `tenant`, `chainId`, `sequence`, `eventId`, `policyVersion`, `traceId`, and placeholder `evidence_ref` fields for downstream timeline consumers.
|
||||
- **Secrets:** Ensure `event_body` is never logged; log only metadata/hashes.
|
||||
|
||||
## 4. Alerts
|
||||
|
||||
| Alert | Condition | Response |
|
||||
| --- | --- | --- |
|
||||
| **LedgerWriteSLA** | `ledger_write_latency_seconds` P95 > 0.12 s for 3 intervals | Check DB contention, review queue backlog, scale writer. |
|
||||
| **LedgerWriteSLA** | `ledger_write_latency_seconds` P95 > 1 s for 3 intervals | Check DB contention, review queue backlog, scale writer. |
|
||||
| **LedgerBacklogGrowing** | `ledger_ingest_backlog_events` > 5 000 for 5 min | Inspect upstream policy runs, ensure projector keeping up. |
|
||||
| **ProjectionLag** | `ledger_projection_lag_seconds` > 60 s | Trigger rebuild, verify change streams. |
|
||||
| **ProjectionLag** | `ledger_projection_lag_seconds` > 30 s | Trigger rebuild, verify change streams. |
|
||||
| **AnchorFailure** | `ledger_merkle_anchor_failures_total` increase > 0 | Collect logs, rerun anchor, verify signing service. |
|
||||
| **AttachmentSecurityError** | `ledger_attachments_encryption_failures_total` increase > 0 | Audit attachments pipeline; check key material and storage endpoints. |
|
||||
|
||||
|
||||
@@ -38,6 +38,7 @@ Events are immutable append-only records representing every workflow change. Rec
|
||||
| `event_hash` | `char(64)` | SHA-256 over canonical payload envelope. |
|
||||
| `previous_hash` | `char(64)` | Hash of prior event in chain (all zeroes for first). |
|
||||
| `merkle_leaf_hash` | `char(64)` | Leaf hash used for Merkle anchoring (hash over `event_hash || sequence_no`). |
|
||||
| `evidence_bundle_ref` | `text` | Optional reference to evaluation/job evidence bundle (DSSE or capsule id). |
|
||||
|
||||
**Constraints & indexes**
|
||||
|
||||
@@ -49,6 +50,7 @@ CHECK (event_hash ~ '^[0-9a-f]{64}$');
|
||||
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
|
||||
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
|
||||
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);
|
||||
CREATE INDEX ix_ledger_events_finding_evidence_ref ON ledger_events (tenant_id, finding_id, recorded_at DESC) WHERE evidence_bundle_ref IS NOT NULL;
|
||||
```
|
||||
|
||||
Partitions: top-level partitioned by `tenant_id` (list) with a default partition. Optional sub-partition by month on `recorded_at` for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for `event_id` is enforced as `(tenant_id, event_id)` with application-level guards maintaining cross-tenant uniqueness.
|
||||
|
||||
@@ -16,7 +16,8 @@ Graph Indexer + Graph API build the tenant-scoped knowledge graph that powers bl
|
||||
- **Storage abstraction** — supports document + adjacency (Mongo) or pluggable graph engine; both paths enforce deterministic ordering and export manifests.
|
||||
|
||||
## Current workstreams (Q4 2025)
|
||||
- `GRAPH-SVC-30-00x` (in `src/Graph/StellaOps.Graph.Indexer/TASKS.md`) — stand up Graph Indexer pipeline, identity registry, snapshot exports.
|
||||
- `GRAPH-SVC-30-00x` (see `src/Graph/StellaOps.Graph.Indexer/TASKS.md`) — stand up Graph Indexer pipeline, identity registry, snapshot exports.
|
||||
- Active sprint: `docs/implplan/SPRINT_0141_0001_0001_graph_indexer.md` (Runtime & Signals 140.A) — clustering/centrality jobs, incremental/backfill pipeline, determinism tests, packaging.
|
||||
- `GRAPH-API-30-00x` — draft API planner/cost guard, streaming responses, and Authority scope integration.
|
||||
- `DOCS-GRAPH-24-003` & related backlog — author overview/API/query language docs; update this README again once those deliverables land.
|
||||
- Deployment/DevOps follow-ups (`DEVOPS-VEX-30-001`, `DEPLOY-VEX-30-001`) coordinate dashboards, load tests, and Helm/Compose overlays for the graph stack.
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
# Implementation plan — Graph
|
||||
|
||||
## Delivery phases
|
||||
## Delivery phases
|
||||
> Current active execution sprint: `docs/implplan/SPRINT_0141_0001_0001_graph_indexer.md` (Runtime & Signals 140.A).
|
||||
- **Phase 1 – Graph Indexer foundations**
|
||||
Stand up Graph Indexer service, node/edge schemas, ingestion from SBOM/Concelier/Excititor events, identity stability, and snapshot materialisation.
|
||||
- **Phase 2 – Graph API service**
|
||||
|
||||
@@ -2,14 +2,17 @@
|
||||
|
||||
The Orchestrator schedules, observes, and recovers ingestion and analysis jobs across the StellaOps platform.
|
||||
|
||||
## Latest updates (2025-11-01)
|
||||
- Authority added `orch:quota` and `orch:backfill` scopes for quota/backfill operations, plus token reason/ticket auditing (`docs/updates/2025-11-01-orch-admin-scope.md`). Operators must supply `quota_reason` / `quota_ticket` (or `backfill_reason` / `backfill_ticket`) when requesting elevated tokens and surface those claims in change reviews.
|
||||
## Latest updates (2025-11-18)
|
||||
- Job leasing now flows through the Task Runner bridge: allocations carry idempotency keys, lease durations, and retry hints; workers acknowledge via claim/ack and emit heartbeats.
|
||||
- Event envelopes remain interim pending ORCH-SVC-37-101; include provenance (tenant/project, job type, correlationId, task runner id) in all notifier events.
|
||||
- Authority `orch:quota` / `orch:backfill` scopes require reason/ticket audit fields; include them in runbooks and dashboard overrides.
|
||||
|
||||
## Responsibilities
|
||||
- Track job state, throughput, and errors for Concelier, Excititor, Scheduler, and export pipelines.
|
||||
- Expose dashboards and APIs for throttling, replays, and failover.
|
||||
- Enforce rate-limits, concurrency and dependency chains across queues.
|
||||
- Stream structured events and audit logs for incident response.
|
||||
- Provide Task Runner bridge semantics (claim/ack, heartbeats, progress, artifacts, backfills) for Go/Python SDKs.
|
||||
|
||||
## Key components
|
||||
- Orchestrator WebService (control plane).
|
||||
@@ -24,9 +27,9 @@ The Orchestrator schedules, observes, and recovers ingestion and analysis jobs a
|
||||
|
||||
## Operational notes
|
||||
- Job recovery runbooks and dashboard JSON as described in Epic 9.
|
||||
- Audit retention policies for job history.
|
||||
- Rate-limit reconfiguration guidelines.
|
||||
- When using the new `orch:quota` / `orch:backfill` scopes, ensure reason/ticket fields are captured in runbooks and audit checklists per the 2025-11-01 Authority update.
|
||||
- Rate-limit and lease reconfiguration guidelines; keep lease defaults aligned across runners and SDKs (Go/Python).
|
||||
- Log streaming: SSE/WS endpoints carry correlationId + tenant/project; buffer size and retention must be documented in runbooks.
|
||||
- When using `orch:quota` / `orch:backfill` scopes, capture reason/ticket fields in runbooks and audit checklists.
|
||||
|
||||
## Epic alignment
|
||||
- Epic 9: Source & Job Orchestrator Dashboard.
|
||||
|
||||
9
docs/modules/orchestrator/TASKS.md
Normal file
9
docs/modules/orchestrator/TASKS.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Orchestrator docs task board
|
||||
|
||||
| Task ID | Status | Owner(s) | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
| ORCH-DOCS-0001 | DONE | Docs Guild | README updated with leasing / task runner bridge notes and interim envelope guidance. |
|
||||
| ORCH-ENG-0001 | DONE | Module Team | Sprint references normalized; notes synced to doc sprint. |
|
||||
| ORCH-OPS-0001 | DONE | Ops Guild | Runbook impacts captured in README; follow-up to update ops docs. |
|
||||
|
||||
Status rules: mirror changes in `docs/implplan/SPRINT_0323_0001_0001_docs_modules_orchestrator.md`; use TODO → DOING → DONE/BLOCKED; add brief note if pausing.
|
||||
@@ -9,13 +9,18 @@
|
||||
- **Queue abstraction.** Supports Mongo queue, Redis Streams, or NATS JetStream (pluggable). Each job carries lease metadata and retry policy.
|
||||
- **Dashboard feeds.** SSE/GraphQL endpoints supply Console UI with job timelines, throughput, error distributions, and rate-limit status.
|
||||
|
||||
## 2) Job lifecycle
|
||||
|
||||
1. **Enqueue.** Producer services (Concelier, Excititor, Scheduler, Export Center, Policy Engine) submit `JobRequest` records containing `jobType`, `tenant`, `priority`, `payloadDigest`, `dependencies`.
|
||||
2. **Scheduling.** Orchestrator applies quotas and rate limits per `{tenant, jobType}`. Jobs exceeding limits are staged in pending queue with next eligible timestamp.
|
||||
3. **Leasing.** Workers poll `LeaseJob` endpoint; Orchestrator returns job with `leaseId`, `leaseUntil`, and instrumentation tokens. Lease renewal required for long-running tasks.
|
||||
4. **Completion.** Worker reports status (`succeeded`, `failed`, `canceled`, `timed_out`). On success the job is archived; on failure Orchestrator applies retry policy (exponential backoff, max attempts). Incidents escalate to Ops if thresholds exceeded.
|
||||
5. **Replay.** Operators trigger `POST /jobs/{id}/replay` which clones job payload, sets `replayOf` pointer, and requeues with high priority while preserving determinism metadata.
|
||||
## 2) Job lifecycle
|
||||
|
||||
1. **Enqueue.** Producer services (Concelier, Excititor, Scheduler, Export Center, Policy Engine) submit `JobRequest` records containing `jobType`, `tenant`, `priority`, `payloadDigest`, `dependencies`.
|
||||
2. **Scheduling.** Orchestrator applies quotas and rate limits per `{tenant, jobType}`. Jobs exceeding limits are staged in pending queue with next eligible timestamp.
|
||||
3. **Leasing (Task Runner bridge).** Workers poll `LeaseJob` endpoint; Orchestrator returns job with `leaseId`, `leaseUntil`, `idempotencyKey`, and instrumentation tokens. Lease renewal required for long-running tasks; leases carry retry hints and provenance (`tenant`, `project`, `correlationId`, `taskRunnerId`).
|
||||
4. **Completion.** Worker reports status (`succeeded`, `failed`, `canceled`, `timed_out`). On success the job is archived; on failure Orchestrator applies retry policy (exponential backoff, max attempts). Incidents escalate to Ops if thresholds exceeded.
|
||||
5. **Replay.** Operators trigger `POST /jobs/{id}/replay` which clones job payload, sets `replayOf` pointer, and requeues with high priority while preserving determinism metadata.
|
||||
|
||||
### Pack-run lifecycle (phase III)
|
||||
- **Register** `pack-run` job type with task runner hints (artifacts, log channel, heartbeat cadence).
|
||||
- **Logs/Artifacts**: SSE/WS stream keyed by `packRunId` + `tenant/project`; artifacts published with content digests and URI metadata.
|
||||
- **Events**: notifier payloads include envelope provenance (tenant, project, correlationId, idempotencyKey) pending ORCH-SVC-37-101 final spec.
|
||||
|
||||
## 3) Rate-limit & quota governance
|
||||
|
||||
@@ -24,22 +29,24 @@
|
||||
- Circuit breakers automatically pause job types when failure rate > configured threshold; incidents generated via Notify and Observability stack.
|
||||
- Control plane quota updates require Authority scope `orch:quota` (issued via `Orch.Admin` role). Historical rebuilds/backfills additionally require `orch:backfill` and must supply `backfill_reason` and `backfill_ticket` alongside the operator metadata. Authority persists all four fields (`quota_reason`, `quota_ticket`, `backfill_reason`, `backfill_ticket`) for audit replay.
|
||||
|
||||
## 4) APIs
|
||||
|
||||
- `GET /api/jobs?status=` — list jobs with filters (tenant, jobType, status, time window).
|
||||
- `GET /api/jobs/{id}` — job detail (payload digest, attempts, worker, lease history, metrics).
|
||||
- `POST /api/jobs/{id}/cancel` — cancel running/pending job with audit reason.
|
||||
- `POST /api/jobs/{id}/replay` — schedule replay.
|
||||
- `POST /api/limits/throttle` — apply throttle (requires elevated scope).
|
||||
- `GET /api/dashboard/metrics` — aggregated metrics for Console dashboards.
|
||||
## 4) APIs
|
||||
|
||||
- `GET /api/jobs?status=` — list jobs with filters (tenant, jobType, status, time window).
|
||||
- `GET /api/jobs/{id}` — job detail (payload digest, attempts, worker, lease history, metrics).
|
||||
- `POST /api/jobs/{id}/cancel` — cancel running/pending job with audit reason.
|
||||
- `POST /api/jobs/{id}/replay` — schedule replay.
|
||||
- `POST /api/limits/throttle` — apply throttle (requires elevated scope).
|
||||
- `GET /api/dashboard/metrics` — aggregated metrics for Console dashboards.
|
||||
- Event envelope draft (`docs/modules/orchestrator/event-envelope.md`) defines notifier/webhook/SSE payloads with idempotency keys, provenance, and task runner metadata for job/pack-run events.
|
||||
|
||||
All responses include deterministic timestamps, job digests, and DSSE signature fields for offline reconciliation.
|
||||
|
||||
## 5) Observability
|
||||
|
||||
- Metrics: `job_queue_depth{jobType,tenant}`, `job_latency_seconds`, `job_failures_total`, `job_retry_total`, `lease_extensions_total`.
|
||||
- Logs: structured with `jobId`, `jobType`, `tenant`, `workerId`, `leaseId`, `status`. Incident logs flagged for Ops.
|
||||
- Traces: spans covering `enqueue`, `schedule`, `lease`, `worker_execute`, `complete`. Trace IDs propagate to worker spans for end-to-end correlation.
|
||||
## 5) Observability
|
||||
|
||||
- Metrics: `job_queue_depth{jobType,tenant}`, `job_latency_seconds`, `job_failures_total`, `job_retry_total`, `lease_extensions_total`.
|
||||
- Task Runner bridge adds `pack_run_logs_stream_lag_seconds`, `pack_run_heartbeats_total`, `pack_run_artifacts_total`.
|
||||
- Logs: structured with `jobId`, `jobType`, `tenant`, `workerId`, `leaseId`, `status`. Incident logs flagged for Ops.
|
||||
- Traces: spans covering `enqueue`, `schedule`, `lease`, `worker_execute`, `complete`. Trace IDs propagate to worker spans for end-to-end correlation.
|
||||
|
||||
## 6) Offline support
|
||||
|
||||
|
||||
69
docs/modules/orchestrator/event-envelope.md
Normal file
69
docs/modules/orchestrator/event-envelope.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Orchestrator Event Envelope (draft)
|
||||
|
||||
Status: draft for ORCH-SVC-38-101 (pending ORCH-SVC-37-101 approval)
|
||||
|
||||
## Goals
|
||||
- Single, provenance-rich envelope for policy/export/job lifecycle events.
|
||||
- Idempotent across retries and transports (Notifier bus, webhooks, SSE/WS streams).
|
||||
- Tenant/project isolation and offline-friendly replays.
|
||||
|
||||
## Envelope
|
||||
```jsonc
|
||||
{
|
||||
"schemaVersion": "orch.event.v1",
|
||||
"eventId": "urn:orch:event:...", // UUIDv7 or ULID
|
||||
"eventType": "job.failed|job.completed|pack_run.log|pack_run.artifact|policy.updated|export.completed",
|
||||
"occurredAt": "2025-11-19T12:34:56Z",
|
||||
"idempotencyKey": "orch-{eventType}-{jobId}-{attempt}",
|
||||
"correlationId": "corr-...", // propagated from producer
|
||||
"tenantId": "...",
|
||||
"projectId": "...", // optional but preferred
|
||||
"actor": {
|
||||
"subject": "service/worker-sdk-go", // who emitted the event
|
||||
"scopes": ["orch:quota", "orch:backfill"]
|
||||
},
|
||||
"job": {
|
||||
"id": "job_018f...",
|
||||
"type": "pack-run|ingest|export|policy-simulate",
|
||||
"runId": "run_018f...", // for pack runs / sims
|
||||
"attempt": 3,
|
||||
"leaseId": "lease_018f...",
|
||||
"taskRunnerId": "tr_018f...",
|
||||
"status": "completed|failed|running|canceled",
|
||||
"reason": "user_cancelled|retry_backoff|quota_paused",
|
||||
"payloadDigest": "sha256:...",
|
||||
"artifacts": [
|
||||
{"uri": "s3://...", "digest": "sha256:...", "mime": "application/json"}
|
||||
]
|
||||
},
|
||||
"metrics": {
|
||||
"durationSeconds": 12.345,
|
||||
"logStreamLagSeconds": 0.8,
|
||||
"backoffSeconds": 30
|
||||
},
|
||||
"notifier": {
|
||||
"channel": "orch.jobs",
|
||||
"delivery": "dsse",
|
||||
"replay": {"ordinal": 5, "total": 12}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Idempotency rules
|
||||
- `eventId` globally unique; `idempotencyKey` dedupe per channel.
|
||||
- Emit once per state transition; retries reuse the same `eventId`/`idempotencyKey`.
|
||||
|
||||
## Provenance
|
||||
- Always include `tenantId` and `projectId` (if available).
|
||||
- Carry `correlationId` from upstream producers and `taskRunnerId` from leasing bridge.
|
||||
- Include `actor.scopes` when events are triggered via elevated tokens (`orch:quota`, `orch:backfill`).
|
||||
|
||||
## Transport bindings
|
||||
- **Notifier bus**: DSSE-wrapped envelope; subject `orch.event` and `eventType`.
|
||||
- **Webhooks**: HMAC with `X-Orchestrator-Signature` (sha256), replay-safe via `idempotencyKey`.
|
||||
- **SSE/WS**: stream per `tenantId` filtered by `projectId`; client dedupe via `eventId`.
|
||||
|
||||
## Backlog & follow-ups
|
||||
- Align field names with ORCH-SVC-37-101 once finalized.
|
||||
- Add examples for policy/export events and pack-run log/manifest payloads.
|
||||
- Document retry/backoff semantics in Notify/Console subscribers.
|
||||
57
docs/modules/sbomservice/architecture.md
Normal file
57
docs/modules/sbomservice/architecture.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# SBOM Service architecture (2025Q4)
|
||||
|
||||
> Scope: canonical SBOM projections, lookup and timeline APIs, asset metadata overlays, and events feeding Advisory AI, Console, Graph, Policy, and Vuln Explorer.
|
||||
|
||||
## 1) Mission & boundaries
|
||||
- Mission: serve deterministic, tenant-scoped SBOM projections (Link-Not-Merge v1) and related metadata for downstream reasoning and overlays.
|
||||
- Boundaries:
|
||||
- Does not perform scanning; consumes Scanner outputs or supplied SPDX/CycloneDX blobs.
|
||||
- Does not author verdicts/policy; supplies evidence and projections to Policy/Concelier/Graph.
|
||||
- Append-only SBOM versions; mutations happen via new versions, never in-place edits.
|
||||
|
||||
## 2) Project layout
|
||||
- `src/SbomService/StellaOps.SbomService` — REST API + event emitters + orchestrator integration.
|
||||
- Storage: MongoDB collections (proposed)
|
||||
- `sbom_snapshots` (immutable versions; tenant + artifact + digest + createdAt)
|
||||
- `sbom_projections` (materialised views keyed by snapshotId, entrypoint/service node flags)
|
||||
- `sbom_assets` (asset metadata, criticality/owner/env/exposure; append-only history)
|
||||
- `sbom_paths` (resolved dependency paths with runtime flags, blast-radius hints)
|
||||
- `sbom_events` (outbox for event delivery + watermark/backfill tracking)
|
||||
|
||||
## 3) APIs (first wave)
|
||||
- `GET /sbom/paths?purl=...&artifact=...&scope=...&env=...` — returns ordered paths with runtime_flag/blast_radius and nearest-safe-version hint; supports `cursor` pagination.
|
||||
- `GET /sbom/versions?artifact=...` — time-ordered SBOM version timeline for Advisory AI; include provenance and source bundle hash.
|
||||
- `GET /console/sboms` — Console catalog with filters (artifact, license, scope, asset tags), cursor pagination, evaluation metadata, immutable JSON projection for drawer views.
|
||||
- `GET /components/lookup?purl=...` — component neighborhood for global search/Graph overlays; returns caches hints + tenant enforcement.
|
||||
- `POST /entrypoints` / `GET /entrypoints` — manage entrypoint/service node overrides feeding Cartographer relevance; deterministic defaults when unset.
|
||||
|
||||
## 4) Ingestion & orchestrator integration
|
||||
- Ingest sources: Scanner pipeline (preferred) or uploaded SPDX 3.0.1/CycloneDX 1.6 bundles.
|
||||
- Orchestrator: register SBOM ingest/index jobs; worker SDK emits artifact hash + job metadata; honor pause/throttle; report backpressure metrics; support watermark-based backfill for idempotent replays.
|
||||
- Idempotency: combine `(tenant, artifactDigest, sbomVersion)` as primary key; duplicate ingests short-circuit.
|
||||
|
||||
## 5) Events & streaming
|
||||
- `sbom.version.created` — emitted per new SBOM snapshot; payload: tenant, artifact digest, sbomVersion, projection hash, source bundle hash, import provenance; replay/backfill via outbox with watermark.
|
||||
- `sbom.asset.updated` — emitted when asset metadata changes; idempotent payload keyed by `(tenant, assetId, version)`.
|
||||
- Inventory/resolver feeds — queue/topic delivering `(artifact, purl, version, paths, runtime_flag, scope, nearest_safe_version)` for Vuln Explorer/Findings Ledger.
|
||||
|
||||
## 6) Determinism & offline posture
|
||||
- Stable ordering for projections and paths; timestamps in UTC ISO-8601; hash inputs canonicalised.
|
||||
- Add-only evolution for schemas; LNM v1 fixtures published alongside API docs and replayable tests.
|
||||
- Offline-friendly: uses mirrored packages, avoids external calls during projection; exports NDJSON bundles for air-gapped replay.
|
||||
|
||||
## 7) Tenancy & security
|
||||
- All APIs require tenant context (token claims or mTLS binding); collection filters must include tenant keys.
|
||||
- Enforce least-privilege queries; avoid cross-tenant caches; log tenant IDs in structured logs.
|
||||
- Input validation: schema-validate incoming SBOMs; reject oversized/unsupported media types early.
|
||||
|
||||
## 8) Observability
|
||||
- Metrics: `sbom_projection_seconds`, `sbom_projection_size_bytes`, `sbom_paths_latency_seconds`, `sbom_paths_cache_hit_ratio`, `sbom_events_backlog`.
|
||||
- Traces: wrap ingest, projection build, and API handlers; propagate orchestrator job IDs.
|
||||
- Logs: structured, include tenant + artifact digest + sbomVersion; classify ingest failures (schema, storage, orchestrator, validation).
|
||||
- Alerts: backlog thresholds for outbox/event delivery; high latency on path/timeline endpoints.
|
||||
|
||||
## 9) Open questions / dependencies
|
||||
- Confirm orchestrator pause/backfill contract (shared with Runtime & Signals 140-series).
|
||||
- Finalise storage collection names and indexes (compound on tenant+artifactDigest+version, TTL for transient staging).
|
||||
- Publish canonical LNM v1 fixtures and JSON schemas for projections and asset metadata.
|
||||
@@ -4,6 +4,7 @@ This directory contains deep technical designs for current and upcoming analyzer
|
||||
|
||||
## Language analyzers
|
||||
- `ruby-analyzer.md` — lockfile, runtime graph, capability signals for Ruby.
|
||||
- `deno-runtime-signals.md` — runtime trace + policy signal contract for Deno analyzer.
|
||||
|
||||
## Surface & platform contracts
|
||||
- `surface-fs.md`
|
||||
|
||||
109
docs/modules/scanner/design/deno-runtime-signals.md
Normal file
109
docs/modules/scanner/design/deno-runtime-signals.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Deno Runtime Signals & Policy Contract (v0.1-DRAFT)
|
||||
|
||||
## Purpose
|
||||
Define deterministic runtime evidence records and policy signals for Deno analyzer phase II (tasks DENO-26-009/010/011). The contract is offline-friendly, append-only, and compatible with Surface/Signals stores.
|
||||
|
||||
## Scope
|
||||
- Harnessed execution hook (`stella deno trace`) capturing module loads and permission grants during analysis.
|
||||
- Trace serialization for Worker/CLI/Offline Kit and AnalysisStore.
|
||||
- Policy signal keys consumed by Surface/Signals and Policy Engine.
|
||||
|
||||
## Event model
|
||||
- Encoding: NDJSON; each line is a UTF-8 JSON object sorted by key when written.
|
||||
- Path handling: absolute paths are converted to analyzer-relative paths; each relative path also emits `path_sha256` (lowercase hex) to proof without leaking paths.
|
||||
- Timestamps: ISO-8601 UTC with millisecond precision; no local time.
|
||||
|
||||
### Event types
|
||||
```jsonc
|
||||
{
|
||||
"type": "deno.module.load", // required
|
||||
"ts": "2025-11-17T12:00:00.123Z", // required
|
||||
"module": {
|
||||
"specifier": "file:///src/app/main.ts", // original
|
||||
"normalized": "app/main.ts",
|
||||
"path_sha256": "..."
|
||||
},
|
||||
"reason": "dynamic-import", // static-import | dynamic-import | npm | cache | bundle
|
||||
"permissions": ["fs", "net"], // granted at time of load
|
||||
"origin": "https://deno.land/x/std@0.208.0/http/server.ts" // optional for remote/npm
|
||||
}
|
||||
```
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "deno.permission.use",
|
||||
"ts": "2025-11-17T12:00:01.234Z",
|
||||
"permission": "ffi", // fs|net|env|ffi|process|crypto|worker
|
||||
"module": {
|
||||
"normalized": "native/mod.ts",
|
||||
"path_sha256": "..."
|
||||
},
|
||||
"details": "Deno.dlopen" // short reason code
|
||||
}
|
||||
```
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "deno.npm.resolution",
|
||||
"ts": "2025-11-17T12:00:02.100Z",
|
||||
"specifier": "npm:chalk@5",
|
||||
"package": "chalk",
|
||||
"version": "5.3.0",
|
||||
"resolved": "file:///cache/npm/registry.npmjs.org/chalk/5.3.0",
|
||||
"exists": true
|
||||
}
|
||||
```
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"type": "deno.wasm.load",
|
||||
"ts": "2025-11-17T12:00:03.000Z",
|
||||
"module": {
|
||||
"normalized": "pkg/module.wasm",
|
||||
"path_sha256": "..."
|
||||
},
|
||||
"importer": "app/main.ts",
|
||||
"reason": "dynamic-import"
|
||||
}
|
||||
```
|
||||
|
||||
## Observation envelope (AnalysisStore)
|
||||
Key: `ScanAnalysisKeys.DenoObservationPayload`
|
||||
Payload fields:
|
||||
- `analyzerId`: `deno`
|
||||
- `kind`: `deno.runtime.v1`
|
||||
- `mediaType`: `application/x-ndjson`
|
||||
- `metadata` (map):
|
||||
- `deno.runtime.event_count`
|
||||
- `deno.runtime.permission_uses`
|
||||
- `deno.runtime.module_loads`
|
||||
- `deno.runtime.remote_origins` (comma-separated, sorted)
|
||||
- `deno.runtime.permissions` (unique perms CSV)
|
||||
- `deno.runtime.npm_resolutions`
|
||||
- `deno.runtime.wasm_loads`
|
||||
- `deno.runtime.dynamic_imports`
|
||||
- `content`: gz-safe byte stream of NDJSON lines.
|
||||
|
||||
## Policy signal keys
|
||||
Emit into Surface/Signals (namespaced `surface.lang.deno.*`) derived from observation digest + static analyzer outputs:
|
||||
- `surface.lang.deno.permissions`: CSV of unique permissions seen (fs, net, env, ffi, process, crypto, worker).
|
||||
- `surface.lang.deno.remote_origins`: CSV of normalized remote origins from module loads/fetches.
|
||||
- `surface.lang.deno.npm_modules`: integer count of npm resolutions observed.
|
||||
- `surface.lang.deno.wasm_modules`: integer count of wasm loads.
|
||||
- `surface.lang.deno.dynamic_imports`: integer count of `deno.module.load` events where `reason=dynamic-import`.
|
||||
- `surface.lang.deno.capabilities`: CSV of capability reason codes from static analyzer (`builtin.*`) merged with runtime permissions.
|
||||
- `surface.lang.deno.module_loads`: integer count of module load events.
|
||||
- `surface.lang.deno.permission_uses`: integer count of permission use events.
|
||||
|
||||
## CLI / Worker contracts
|
||||
- CLI verb `stella deno trace --root <path>` writes `deno-runtime.ndjson` to output folder and prints observation hash.
|
||||
- Worker: when `DenoRuntimeCapture:true`, analyzer writes observation to AnalysisStore and links hash in layer metadata `deno.observation.hash` (already produced by static analyzer) and new `deno.runtime.hash`.
|
||||
|
||||
## Determinism and safety
|
||||
- No network fetches; trace operates on cached artifacts or harnessed execution with `--allow-all` disabled. Permissions recorded reflect requested grants; blanks treated as deny.
|
||||
- Paths always normalized to forward slashes; hashing uses full relative path bytes.
|
||||
- Redaction: no environment variable values or file contents persisted—only paths + hashes.
|
||||
|
||||
## Open follow-ups (to track in sprint)
|
||||
- Map NDJSON to AOC writer once runtime ingestion lands (LANG-11-003 analogue for Deno).
|
||||
- Add integration tests mirroring fixtures from DENO-26-008 with synthetic permission use and dynamic imports.
|
||||
@@ -1,34 +1,39 @@
|
||||
# Scheduler agent guide
|
||||
|
||||
## Mission
|
||||
Scheduler detects advisory/VEX deltas, computes impact windows, and orchestrates re-evaluations across Scanner and Policy Engine.
|
||||
Scheduler detects advisory/VEX deltas, computes impact windows, and orchestrates re-evaluations across Scanner and Policy Engine. Docs in this directory are the front-door contract for contributors.
|
||||
|
||||
## Key docs
|
||||
- [Module README](./README.md)
|
||||
- [Architecture](./architecture.md)
|
||||
- [Implementation plan](./implementation_plan.md)
|
||||
- [Task board](./TASKS.md)
|
||||
## Working directory
|
||||
- `docs/modules/scheduler` (docs-only); code changes live under `src/Scheduler/**` but must be coordinated via sprint plans.
|
||||
|
||||
## How to get started
|
||||
1. Open sprint file `/docs/implplan/SPRINT_*.md` and locate the stories referencing this module.
|
||||
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
|
||||
3. Read the architecture and README for domain context before editing code or docs.
|
||||
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
|
||||
## Roles & owners
|
||||
- **Docs author**: curates AGENTS/TASKS/runbooks; keeps determinism/offline guidance accurate.
|
||||
- **Scheduler engineer (Worker/WebService)**: aligns implementation notes with architecture and ensures observability/runbook updates land with code.
|
||||
- **Observability/Ops**: maintains dashboards/rules, documents operational SLOs and alert contracts.
|
||||
|
||||
## Guardrails
|
||||
- Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md).
|
||||
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
|
||||
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
|
||||
- Update runbooks/observability assets when operational characteristics change.
|
||||
## Required Reading
|
||||
- `docs/modules/scheduler/README.md`
|
||||
- `docs/modules/scheduler/architecture.md`
|
||||
- `docs/modules/scheduler/implementation_plan.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
## Working Agreement
|
||||
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
|
||||
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
|
||||
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
|
||||
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
|
||||
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.
|
||||
## How to work
|
||||
1. Open relevant sprint file in `docs/implplan/SPRINT_*.md` and set task status to `DOING` there and in `docs/modules/scheduler/TASKS.md` before starting.
|
||||
2. Confirm prerequisites above are read; note any missing contracts in sprint **Decisions & Risks**.
|
||||
3. Keep outputs deterministic (stable ordering, UTC ISO-8601 timestamps, sorted lists) and offline-friendly (no external fetches without mirrors).
|
||||
4. When changing behavior, update runbooks and observability assets in `./operations/`.
|
||||
5. On completion, set status to `DONE` in both the sprint file and `TASKS.md`; if paused, revert to `TODO` and add a brief note.
|
||||
|
||||
## Guardrails
|
||||
- Honour the Aggregation-Only Contract where applicable (see `../../ingestion/aggregation-only-contract.md`).
|
||||
- No undocumented schema or API contract changes; document deltas in architecture or implementation_plan.
|
||||
- Keep Offline Kit parity—document air-gapped workflows for any new feature.
|
||||
- Prefer deterministic fixtures and avoid machine-specific artefacts in examples.
|
||||
|
||||
## Testing & determinism expectations
|
||||
- Examples and snippets should be reproducible; pin sample timestamps to UTC and sort collections.
|
||||
- Observability examples must align with published metric names and labels; update `operations/worker-prometheus-rules.yaml` if alert semantics change.
|
||||
|
||||
## Status mirrors
|
||||
- Sprint tracker: `/docs/implplan/SPRINT_*.md` (source of record for Delivery Tracker).
|
||||
- Local tracker: `docs/modules/scheduler/TASKS.md` (mirrors sprint status; keep in sync).
|
||||
|
||||
14
docs/modules/scheduler/TASKS.md
Normal file
14
docs/modules/scheduler/TASKS.md
Normal file
@@ -0,0 +1,14 @@
|
||||
# Scheduler module task board
|
||||
|
||||
Keep this table in sync with sprint Delivery Trackers for the Scheduler docs/process stream.
|
||||
|
||||
| Task ID | Status | Owner(s) | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
| SCHEDULER-DOCS-0001 | DONE | Docs Guild | AGENTS charter refreshed with roles/prereqs/determinism and cross-links. |
|
||||
| SCHEDULER-ENG-0001 | DONE | Module Team | TASKS.md created; status mirror rules documented. |
|
||||
| SCHEDULER-OPS-0001 | DONE | Ops Guild | Outcomes synced to sprint file and tasks-all tracker. |
|
||||
|
||||
## Status rules
|
||||
- Update both this file and the relevant `docs/implplan/SPRINT_*.md` entry whenever you change a task state.
|
||||
- Use TODO → DOING → DONE/BLOCKED. If you pause work, revert to TODO and leave a short note.
|
||||
- Document contract or runbook changes in the appropriate module docs under this directory.
|
||||
Reference in New Issue
Block a user