Resolve Concelier/Excititor merge conflicts

This commit is contained in:
root
2025-10-20 14:19:25 +03:00
2687 changed files with 212646 additions and 85913 deletions

View File

@@ -1,8 +1,3 @@
Below is the **revised, consolidated** `high_level_architecture.md`.
It **absorbs** all content from `components.md` so you have a single, authoritative file. No separate components doc is required.
---
# HighLevel Architecture — **StellaOps** (Consolidated • 2025Q4)
> **Purpose.** A complete, implementationready map of StellaOps: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic.
@@ -30,28 +25,32 @@ It **absorbs** all content from `components.md` so you have a single, authoritat
### 1.1 Runtime inventory (firstparty)
| Service / Tool | Container image | Core role | Scale pattern |
| ------------------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports. | Stateless; N replicas behind LB. |
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/MachO, EntryTrace); emits perlayer SBOMs and composes image SBOMs. | Horizontal; queuedriven; sharded by layer digest. |
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for buildtime SBOMs as OCI **referrers**. | CIside; ephemeral. |
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLIorchestrated scanner container for postbuild scans. | Local/CI; ephemeral. |
| **Feedser.WebService** | `stellaops/feedser-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
| **Vexer.WebService** | `stellaops/vexer-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usagegating); produces **policy digest**. | Inprocess; cache per digest. |
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
| **Authority** | `stellaops/authority` | Onprem OIDC issuing **shortlived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, runtime, reports. | Stateless. |
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper. | Local/CI. |
| Service / Tool | Container image | Core role | Scale pattern |
| ------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; **analysisonly report runs** for Scheduler. | Stateless; N replicas behind LB. |
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/MachO, EntryTrace); emits perlayer SBOMs and composes image SBOMs. | Horizontal; queuedriven; sharded by layer digest. |
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for buildtime SBOMs as OCI **referrers**. | CIside; ephemeral. |
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLIorchestrated scanner container for postbuild scans. | Local/CI; ephemeral. |
| **Concelier.WebService** | `stellaops/concelier-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
| **Excititor.WebService** | `stellaops/excititor-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usagegating); produces **policy digest**. | Inprocess; cache per digest. |
| **Scheduler.WebService** | `stellaops/scheduler-web` | Schedules **reevaluation** runs; consumes Concelier/Excititor deltas; selects **impacted images** via BOMIndex; orchestrates analysisonly reports. | Stateless API. |
| **Scheduler.Worker** | `stellaops/scheduler-worker` | Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queuedriven. |
| **Notify.WebService** | `stellaops/notify-web` | Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
| **Notify.Worker** | `stellaops/notify-worker` | Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; perchannel rate limits. |
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
| **Authority** | `stellaops/authority` | Onprem OIDC issuing **shortlived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, **Scheduler**, **Notify**, runtime, reports. | Stateless. |
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper; **schedule** and **notify** verbs. | Local/CI. |
### 1.2 Thirdparty (selfhosted)
* **Fulcio** (Sigstore CA) — issues shortlived signing certs (keyless).
* **Rekor v2** (tilebacked transparency log).
* **MinIO** — S3compatible object store with lifecycle & Object Lock.
* **MongoDB** — catalog, advisories, VEX.
* **MongoDB** — catalog, advisories, VEX, scheduler, notify.
* **Queue** — Redis Streams / NATS / RabbitMQ (pluggable).
* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures).
@@ -71,8 +70,12 @@ flowchart LR
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
SW[Scanner.WebService]
WK[Scanner.Worker xN]
FEED[Feedser]
VEX[Vexer]
CONC[Concelier]
EXC[Excititor]
SCHW[Scheduler.Web]
SCH[Scheduler.Worker xN]
NOTW[Notify.Web]
NOT[Notify.Worker xN]
POL[Policy Engine (in Scanner.Web)]
SGN[Signer\n(entitlement + signing)]
ATT[Attestor\n(Rekor v2 submit/verify)]
@@ -93,11 +96,19 @@ flowchart LR
QUE --> WK
WK --> MIN
SW --> MGO
FEED --> MGO
VEX --> MGO
CONC --> MGO
EXC --> MGO
UI --> SW
Z --> SW
%% New event-driven loop
CONC -- export.delta --> SCHW
EXC -- export.delta --> SCHW
SCHW --> SCH
SCH --> SW
SW -- report.ready --> NOTW
Z -- admission/observe --> NOTW
SGN <--> Auth
SGN --> FUL
SGN -->|mTLS| ATT
@@ -106,7 +117,7 @@ flowchart LR
SGN <-->|verify referrers| REG
```
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI never sign.
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI/Scheduler/Notify never sign.
---
@@ -116,7 +127,7 @@ flowchart LR
* **License Token (LT)** — longlived JWT from **Licensing Service**; used **once** to enroll the installation; never used in hot path.
* **ProofofEntitlement (PoE)** — bound to the installation key (mTLS client cert **or** DPoPbound JWT with `cnf`); mediumlived; renewable; revocable.
* **Operational token (OpTok)** — 25min OIDC token from **Authority**, **senderconstrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**.
* **Operational token (OpTok)** — 25min OIDC token from **Authority**, **senderconstrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**/**Scheduler.Web**/**Notify.Web**.
**Signer enforces both:** PoE proves entitlement; OpTok proves “who is calling now”. It also **independently verifies** the **scanner image digest** is **StellaOpssigned** via **Referrers + cosign** before signing anything.
@@ -173,16 +184,21 @@ LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
* Buildx **generator** runs analyzers during `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, attaches SBOMs as **OCI referrers**.
* Scanner.WebService can trust these (policyconfigurable) and **skip** rescan; DSSE + Rekor v2 can be done either at build time or postpush via Signer/Attestor.
### 3.5 Events / integrations
* **Out:** `report.ready` (summary + verdict + Rekor UUID) → internal bus for **Notify** & UI.
* **Expose:** imagelevel **BOMIndex** metadata for **Scheduler** impact selection.
---
## 4) Backend evaluation (decider)
### 4.1 Feedser (advisories)
### 4.1 Concelier (advisories)
* Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in Mongo; exports **deterministic JSON** and **Trivy DB**.
* Offline kit bundles for airgapped sites.
### 4.2 Vexer (VEX)
### 4.2 Excititor (VEX)
* Ingests **OpenVEX / CSAF VEX / CycloneDX VEX**; normalizes claims; retains conflicts; computes **consensus** with provider trust weights and justification gates.
@@ -194,8 +210,8 @@ LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
### 4.4 PASS/FAIL flow
1. SBOM (Inventory / Usage) → join with **Feedser** advisories.
2. Apply **Vexer** consensus (statuses & justifications).
1. SBOM (Inventory / Usage) → join with **Concelier** advisories.
2. Apply **Excititor** consensus (statuses & justifications).
3. Apply **Policy**; compute PASS/FAIL with waiver TTLs.
4. Sign the **final report** (DSSE via **Signer**) and log to **Rekor v2** via **Attestor**.
@@ -227,6 +243,8 @@ s3://stellaops/
* `artifacts` (type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)
* `images`, `layers`, `links`, `lifecycleRules`
* **Scheduler:** `schedules`, `runs`, `locks`, `impact_cursors`
* **Notify:** `rules`, `deliveries`, `channels`, `templates`
**Retention**
@@ -239,13 +257,13 @@ s3://stellaops/
### 7.1 Scanner.WebService
```
POST /api/scans { imageRef|digest, force? } → { scanId }
GET /api/scans/{id} → { status, digests, artifacts[] }
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
POST /api/scans { imageRef|digest, force? } → { scanId }
GET /api/scans/{id} → { status, digests, artifacts[] }
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports { imageDigest, policyRevision? } → { reportId, rekorUrl }
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
GET /healthz | /readyz | /metrics
```
@@ -276,6 +294,25 @@ POST /license/introspect { poe } → { active, claims, exp }
POST /attest/endorse { bundle } → endorsement bundle (optional)
```
### 7.6 Scheduler
```
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
POST /api/v1/scheduler/run { id|selector } → { runId }
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
```
### 7.7 Notify
```
POST /api/v1/notify/test { channel, target } → { delivered }
POST /api/v1/notify/rules {yaml|json} → { ruleId }
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
```
---
## 8) Security & verifiability
@@ -283,8 +320,9 @@ POST /attest/endorse { bundle } → endorsement bundle (optio
* **Senderconstrained tokens.** All operational calls use **DPoP** (RFC9449) or **mTLSbound** tokens (RFC8705).
* **Entitlement.** **PoE** is mandatory; revocation honored online.
* **Release integrity.** **Signer** independently verifies **scanner image digest** via **Referrers + cosign** before signing.
* **Separation of duties.** Scanner/UI cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
* **Separation of duties.** Scanner/UI/Scheduler/Notify cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
* **Verifiers.** Anyone can verify: DSSE signature → certificate chain to **StellaOps Fulcio/KMS root****Rekor v2** inclusion.
* **RBAC.** Roles: `scanner.admin|read`, `scheduler.admin|read`, `notify.admin|read`, `zastava.admin|read`.
* **Community vs Authorized.** Free/community runs throttled with no official attestations; authorized runs full speed and produce **StellaOpsverified** bundles.
**DSSE predicate (SBOM/report)**
@@ -321,6 +359,8 @@ Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags
* Buildtime path P95 ≤35s on warmed bases.
* Postbuild delta scan P95 ≤10s for 200MB images.
* Policy + VEX evaluation ≤500ms for 5k components using BOMIndex.
* **Event → notification** p95 ≤ **3060s** under nominal load.
* **Export delta → reevaluation verdict** p95 ≤ **5min** for 10k impacted images.
* **Quotas:** license plan enforces QPS/concurrency/size; **Signer** throttles and can deny DSSE.
---
@@ -337,32 +377,37 @@ Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags
```yaml
services:
authority: { image: stellaops/authority }
fulcio: { image: sigstore/fulcio }
rekor: { image: sigstore/rekor-v2 }
minio: { image: minio/minio, command: server /data --console-address ":9001" }
mongo: { image: mongo:7 }
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
scanner-web:{ image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
scanner-worker:
image: stellaops/scanner-worker
deploy: { replicas: 4 }
depends_on: [scanner-web]
feedser: { image: stellaops/feedser-web, depends_on: [mongo] }
vexer: { image: stellaops/vexer-web, depends_on: [mongo] }
ui: { image: stellaops/ui, depends_on: [scanner-web, feedser, vexer] }
authority: { image: stellaops/authority }
fulcio: { image: sigstore/fulcio }
rekor: { image: sigstore/rekor-v2 }
minio: { image: minio/minio, command: server /data --console-address ":9001" }
mongo: { image: mongo:7 }
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
scanner-web: { image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
concelier: { image: stellaops/concelier-web, depends_on: [mongo] }
excititor: { image: stellaops/excititor-web, depends_on: [mongo] }
scheduler-web: { image: stellaops/scheduler-web, depends_on: [mongo] }
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
notify-web: { image: stellaops/notify-web, depends_on: [mongo] }
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
```
* **Backups:** Mongo dumps; MinIO versioned buckets & replication; Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.
* **Ops runbooks:** Scheduler catchup after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
* **SLOs & alerts:** lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
---
## 11) Observability & audit
* **Metrics:** scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
* **Tracing:** perstage spans; correlation IDs across Scanner→Signer→Attestor.
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID.
* **Scheduler metrics:** `scheduler.impacted_images_total`, `scheduler.jobs_enqueued_total`, `scheduler.selection_ms`, endtoend p95 (event → verdict).
* **Notify metrics:** `notify.sent_total{channel}`, `notify.dropped_total{reason}`, `notify.digest_coalesced_total`, `notify.latency_ms`.
* **Tracing:** perstage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped.
* **Compliance:** MinIO **Object Lock** for immutable artifacts; reproducible outputs via policy digest + SBOM digest in predicate.
---
@@ -373,11 +418,13 @@ services:
* M2: Buildx generator certified flows; crossregistry trust policies.
* M3: PatchPresence plugin (signaturebased backport detection), optin.
* M3: Zastava Admission control GA with policy presets and dryrun→enforce stages.
* Continuous: Policy UX (waiver TTLs, vendor rules), Vexer connectors expansion.
* M3: **Scheduler GA** with exportdelta impact routing and capacityaware pacing.
* M3: **Notify GA** with digests, Slack/Teams/Email/Webhooks; **M4:** PagerDuty/Opsgenie connectors.
* Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
---
## 13) Canonical sequences (verification & signing)
## 13) Canonical sequences (verification, reevaluation & notify)
**Sign & log (OpTok + PoE, image verify, DSSE, Rekor).**
@@ -409,22 +456,62 @@ sequenceDiagram
end
```
**Verification (third party).**
**Eventdriven reevaluation & notify.**
```plantuml
@startuml
actor Verifier
participant "stellaops verify" as Tool
database "Fulcio/KMS root" as Root
participant "Rekor v2" as R2
Verifier -> Tool: bundle (URL/file)
Tool -> Tool: Verify DSSE signature
Tool -> Root: Verify cert chain to StellaOps root
Tool -> R2: Verify inclusion proof / query by UUID
Tool -> Verifier: OK + claims (license_id, policy_digest, version)
@enduml
```mermaid
sequenceDiagram
participant CONC as Concelier
participant EXC as Excititor
participant SCH as Scheduler
participant SC as Scanner.WebService
participant NO as Notify
CONC->>SCH: export.delta {changedProductKeys, exportId}
EXC ->>SCH: export.delta {changedProductKeys, exportId}
SCH->>SCH: Impact select via BOM-Index bitmaps
SCH->>SC: Enqueue analysis-only reports (batches)
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
```
---
**End of `high_level_architecture.md` (Consolidated).**
## 14) Minimal data shapes (Scheduler & Notify)
**Scheduler schedule (YAML via UI/CLI)**
```yaml
name: nightly-eu
when: "0 2 * * * Europe/Sofia"
mode: analysis-only # or content-refresh
selection:
scope: all-images # or tenant/ns/repo label selectors
onlyIf: { lastReportOlderThanDays: 7 }
notify:
onNewFindings: true
minSeverity: high
limits:
maxJobs: 5000
ratePerSecond: 50
```
**Notify rule (YAML)**
```yaml
name: high-critical-alerts
match:
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
minSeverity: high
namespaces: ["prod-*"]
vex: { includeAcceptedJustifications: false }
actions:
- channel: slack
target: "#sec-alerts"
template: "concise"
throttle: "5m"
- channel: email
target: "soc@acme.org"
digest: "hourly"
enabled: true
```

View File

@@ -149,14 +149,288 @@ Client then generates SBOM **only** for the `missing` layers and reposts `/sc
---
### 2.3 Policy Endpoints
### 2.3 Policy Endpoints *(preview feature flag: `scanner.features.enablePolicyPreview`)*
| Method | Path | Purpose |
| ------ | ------------------ | ------------------------------------ |
| `GET` | `/policy/export` | Download live YAML ruleset |
| `POST` | `/policy/import` | Upload YAML or Rego; replaces active |
| `POST` | `/policy/validate` | Lint only; returns 400 on error |
| `GET` | `/policy/history` | Paginated change log (audit trail) |
All policy APIs require **`scanner.reports`** scope (or anonymous access while auth is disabled).
**Fetch schema**
```
GET /api/v1/policy/schema
Authorization: Bearer <token>
Accept: application/schema+json
```
Returns the embedded `policy-schema@1` JSON schema used by the binder.
**Run diagnostics**
```
POST /api/v1/policy/diagnostics
Content-Type: application/json
Authorization: Bearer <token>
```
```json
{
"policy": {
"format": "yaml",
"actor": "cli",
"description": "dev override",
"content": "version: \"1.0\"\nrules:\n - name: Quiet Dev\n environments: [dev]\n action:\n type: ignore\n justification: dev waiver\n"
}
}
```
**Response 200**:
```json
{
"success": false,
"version": "1.0",
"ruleCount": 1,
"errorCount": 0,
"warningCount": 1,
"generatedAt": "2025-10-19T03:25:14.112Z",
"issues": [
{ "code": "policy.rule.quiet.missing_vex", "message": "Quiet flag ignored: rule must specify requireVex justifications.", "severity": "Warning", "path": "$.rules[0]" }
],
"recommendations": [
"Review policy warnings and ensure intentional overrides are documented."
]
}
```
`success` is `false` when blocking issues remain; recommendations aggregate YAML ignore rules, VEX include/exclude hints, and vendor precedence guidance.
**Preview impact**
```
POST /api/v1/policy/preview
Authorization: Bearer <token>
Content-Type: application/json
```
```json
{
"imageDigest": "sha256:abc123",
"findings": [
{ "id": "finding-1", "severity": "Critical", "source": "NVD" }
],
"policy": {
"format": "yaml",
"content": "version: \"1.0\"\nrules:\n - name: Block Critical\n severity: [Critical]\n action: block\n"
}
}
```
**Response 200**:
```json
{
"success": true,
"policyDigest": "9c5e...",
"revisionId": "preview",
"changed": 1,
"diffs": [
{
"findingId": "finding-1",
"baseline": {"findingId": "finding-1", "status": "Pass"},
"projected": {
"findingId": "finding-1",
"status": "Blocked",
"ruleName": "Block Critical",
"ruleAction": "Block",
"score": 5.0,
"configVersion": "1.0",
"inputs": {"severityWeight": 5.0}
},
"changed": true
}
],
"issues": []
}
```
- Provide `policy` to preview staged changes; omit it to compare against the active snapshot.
- Baseline verdicts are optional; when omitted, the API synthesises pass baselines before computing diffs.
- Quieted verdicts include `quietedBy` and `quiet` flags; score inputs now surface reachability/vendor trust weights (`reachability.*`, `trustWeight.*`).
**OpenAPI**: the full API document (including these endpoints) is exposed at `/openapi/v1.json` and can be fetched for tooling or contract regeneration.
### 2.4 Scanner Queue a Scan Job *(SP9 milestone)*
```
POST /api/v1/scans
Authorization: Bearer <token with scanner.scans.enqueue>
Content-Type: application/json
```
```json
{
"image": {
"reference": "registry.example.com/acme/app:1.2.3"
},
"force": false,
"clientRequestId": "ci-build-1845",
"metadata": {
"pipeline": "github",
"trigger": "pull-request"
}
}
```
| Field | Required | Notes |
| ------------------- | -------- | ------------------------------------------------------------------------------------------------ |
| `image.reference` | no\* | Full repo/tag (`registry/repo:tag`). Provide **either** `reference` or `digest` (sha256:…). |
| `image.digest` | no\* | OCI digest (e.g. `sha256:…`). |
| `force` | no | `true` forces a re-run even if an identical scan (`scanId`) already exists. Default **false**. |
| `clientRequestId` | no | Free-form string surfaced in audit logs. |
| `metadata` | no | Optional string map stored with the job and surfaced in observability feeds. |
\* At least one of `image.reference` or `image.digest` must be supplied.
**Response 202** job accepted (idempotent):
```http
HTTP/1.1 202 Accepted
Location: /api/v1/scans/2f6c17f9b3f548e2a28b9c412f4d63f8
```
```json
{
"scanId": "2f6c17f9b3f548e2a28b9c412f4d63f8",
"status": "Pending",
"location": "/api/v1/scans/2f6c17f9b3f548e2a28b9c412f4d63f8",
"created": true
}
```
- `scanId` is deterministic resubmitting an identical payload returns the same identifier with `"created": false`.
- API is cancellation-aware; aborting the HTTP request cancels the submission attempt.
- Required scope: **`scanner.scans.enqueue`**.
**Response 400** validation problem (`Content-Type: application/problem+json`) when both `image.reference` and `image.digest` are blank.
### 2.5 Scanner Fetch Scan Status
```
GET /api/v1/scans/{scanId}
Authorization: Bearer <token with scanner.scans.read>
Accept: application/json
```
**Response 200**:
```json
{
"scanId": "2f6c17f9b3f548e2a28b9c412f4d63f8",
"status": "Pending",
"image": {
"reference": "registry.example.com/acme/app:1.2.3",
"digest": null
},
"createdAt": "2025-10-18T20:15:12.482Z",
"updatedAt": "2025-10-18T20:15:12.482Z",
"failureReason": null
}
```
Statuses: `Pending`, `Running`, `Succeeded`, `Failed`, `Cancelled`.
### 2.6 Scanner Stream Progress (SSE / JSONL)
```
GET /api/v1/scans/{scanId}/events?format=sse|jsonl
Authorization: Bearer <token with scanner.scans.read>
Accept: text/event-stream
```
When `format` is omitted the endpoint emits **Server-Sent Events** (SSE). Specify `format=jsonl` to receive newline-delimited JSON (`application/x-ndjson`). Response headers include `Cache-Control: no-store` and `X-Accel-Buffering: no` so intermediaries avoid buffering the stream.
**SSE frame** (default):
```
id: 1
event: pending
data: {"scanId":"2f6c17f9b3f548e2a28b9c412f4d63f8","sequence":1,"state":"Pending","message":"queued","timestamp":"2025-10-19T03:12:45.118Z","correlationId":"2f6c17f9b3f548e2a28b9c412f4d63f8:0001","data":{"force":false,"meta.pipeline":"github"}}
```
**JSONL frame** (`format=jsonl`):
```json
{"scanId":"2f6c17f9b3f548e2a28b9c412f4d63f8","sequence":1,"state":"Pending","message":"queued","timestamp":"2025-10-19T03:12:45.118Z","correlationId":"2f6c17f9b3f548e2a28b9c412f4d63f8:0001","data":{"force":false,"meta.pipeline":"github"}}
```
- `sequence` is monotonic starting at `1`.
- `correlationId` is deterministic (`{scanId}:{sequence:0000}`) unless a custom identifier is supplied by the publisher.
- `timestamp` is ISO8601 UTC with millisecond precision, ensuring deterministic ordering for consumers.
- The stream completes when the client disconnects or the coordinator stops publishing events.
### 2.7 Scanner Assemble Report (Signed Envelope)
```
POST /api/v1/reports
Authorization: Bearer <token with scanner.reports>
Content-Type: application/json
```
Request body mirrors policy preview inputs (image digest plus findings). The service evaluates the active policy snapshot, assembles a verdict, and signs the canonical report payload.
**Response 200**:
```json
{
"report": {
"reportId": "report-3def5f362aa475ef14b6",
"imageDigest": "sha256:deadbeef",
"verdict": "blocked",
"policy": { "revisionId": "rev-1", "digest": "27d2ec2b34feedc304fc564d252ecee1c8fa14ea581a5ff5c1ea8963313d5c8d" },
"summary": { "total": 1, "blocked": 1, "warned": 0, "ignored": 0, "quieted": 0 },
"verdicts": [
{
"findingId": "finding-1",
"status": "Blocked",
"ruleName": "Block Critical",
"ruleAction": "Block",
"score": 40.5,
"configVersion": "1.0",
"inputs": {
"reachabilityWeight": 0.45,
"baseScore": 40.5,
"severityWeight": 90,
"trustWeight": 1,
"trustWeight.NVD": 1,
"reachability.runtime": 0.45
},
"quiet": false,
"sourceTrust": "NVD",
"reachability": "runtime"
}
],
"issues": []
},
"dsse": {
"payloadType": "application/vnd.stellaops.report+json",
"payload": "<base64 canonical report>",
"signatures": [
{
"keyId": "scanner-report-signing",
"algorithm": "hs256",
"signature": "<base64 signature>"
}
]
}
}
```
- The `report` object omits null fields and is deterministic (ISO timestamps, sorted keys).
- `dsse` follows the DSSE (Dead Simple Signing Envelope) shape; `payload` is the canonical UTF-8 JSON and `signatures[0].signature` is the base64 HMAC/Ed25519 value depending on configuration.
- A runnable sample envelope is available at `samples/api/reports/report-sample.dsse.json` for tooling tests or signature verification.
**Response 404** `application/problem+json` payload with type `https://stellaops.org/problems/not-found` when the scan identifier is unknown.
> **Tip**  poll `Location` from the submission call until `status` transitions away from `Pending`/`Running`.
```yaml
# Example import payload (YAML)
@@ -181,6 +455,23 @@ Validation errors come back as:
}
```
```json
# Preview response excerpt
{
"success": true,
"policyDigest": "9c5e...",
"revisionId": "rev-12",
"changed": 1,
"diffs": [
{
"baseline": {"findingId": "finding-1", "status": "pass"},
"projected": {"findingId": "finding-1", "status": "blocked", "ruleName": "Block Critical"},
"changed": true
}
]
}
```
---
### 2.4 Attestation (Planned  Q12026)
@@ -200,7 +491,7 @@ Returns `202 Accepted` and `Location: /attest/{id}` for async verify.
## 3 StellaOps CLI (`stellaops-cli`)
The new CLI is built on **System.CommandLine2.0.0beta5** and mirrors the Feedser backend REST API.
The new CLI is built on **System.CommandLine2.0.0beta5** and mirrors the Concelier backend REST API.
Configuration follows the same precedence chain everywhere:
1. Environment variables (e.g. `API_KEY`, `STELLAOPS_BACKEND_URL`, `StellaOps:ApiKey`)
@@ -231,9 +522,12 @@ See `docs/dev/32_AUTH_CLIENT_GUIDE.md` for recommended profiles (online vs. air-
| `stellaops-cli auth revoke export` | Export the Authority revocation bundle | `--output <directory>` (defaults to CWD) | Writes `revocation-bundle.json`, `.json.jws`, and `.json.sha256`; verifies the digest locally and includes key metadata in the log summary. |
| `stellaops-cli auth revoke verify` | Validate a revocation bundle offline | `--bundle <path>` `--signature <path>` `--key <path>`<br>`--verbose` | Verifies detached JWS signatures, reports the computed SHA-256, and can fall back to cached JWKS when `--key` is omitted. |
| `stellaops-cli config show` | Display resolved configuration | — | Masks secret values; helpful for airgapped installs |
| `stellaops-cli runtime policy test` | Ask Scanner.WebService for runtime verdicts (Webhook parity) | `--image/-i <digest>` (repeatable, comma/space lists supported)<br>`--file/-f <path>`<br>`--namespace/--ns <name>`<br>`--label/-l key=value` (repeatable)<br>`--json` | Posts to `POST /api/v1/scanner/policy/runtime`, deduplicates image digests, and prints TTL/policy revision plus per-image columns for signed state, SBOM referrers, quieted-by metadata, confidence, and Rekor attestation (uuid + verified flag). Accepts newline/whitespace-delimited stdin when piped; `--json` emits the raw response without additional logging. |
When running on an interactive terminal without explicit override flags, the CLI uses Spectre.Console prompts to let you choose per-run ORAS/offline bundle behaviour.
Runtime verdict output reflects the SCANNER-RUNTIME-12-302 contract sign-off (quieted provenance, confidence band, attestation verification). CLI-RUNTIME-13-008 now mirrors those fields in both table and `--json` formats.
**Startup diagnostics**
- `stellaops-cli` now loads Authority plug-in manifests during startup (respecting `Authority:Plugins:*`) and surfaces analyzer warnings when a plug-in weakens the baseline password policy (minimum length **12** and all character classes required).
@@ -250,7 +544,7 @@ When running on an interactive terminal without explicit override flags, the CLI
- Downloads are verified against the `X-StellaOps-Digest` header (SHA-256). When `StellaOps:ScannerSignaturePublicKeyPath` points to a PEM-encoded RSA key, the optional `X-StellaOps-Signature` header is validated as well.
- Metadata for each bundle is written alongside the artefact (`*.metadata.json`) with digest, signature, source URL, and timestamps.
- Retry behaviour is controlled via `StellaOps:ScannerDownloadAttempts` (default **3** with exponential backoff).
- Successful `scan run` executions create timestamped JSON artefacts inside `ResultsDirectory` plus a `scan-run-*.json` metadata envelope documenting the runner, arguments, timing, and stdout/stderr. The artefact is posted back to Feedser automatically.
- Successful `scan run` executions create timestamped JSON artefacts inside `ResultsDirectory` plus a `scan-run-*.json` metadata envelope documenting the runner, arguments, timing, and stdout/stderr. The artefact is posted back to Concelier automatically.
#### Trivy DB export metadata (`metadata.json`)
@@ -265,18 +559,18 @@ When running on an interactive terminal without explicit override flags, the CLI
| `treeDigest` | string | Canonical SHA-256 digest of the JSON tree used to build the database. |
| `treeBytes` | number | Total bytes across exported JSON files. |
| `advisoryCount` | number | Count of advisories included in the export. |
| `exporterVersion` | string | Version stamp of `StellaOps.Feedser.Exporter.TrivyDb`. |
| `exporterVersion` | string | Version stamp of `StellaOps.Concelier.Exporter.TrivyDb`. |
| `builder` | object? | Raw metadata emitted by `trivy-db build` (version, update cadence, etc.). |
| `delta.changedFiles[]` | array | Present when `mode = delta`. Each entry lists `{ "path": "<relative json>", "length": <bytes>, "digest": "sha256:..." }`. |
| `delta.removedPaths[]` | array | Paths that existed in the previous manifest but were removed in the new run. |
When the planner opts for a delta run, the exporter copies unmodified blobs from the baseline layout identified by `baseManifestDigest`. Consumers that cache OCI blobs only need to fetch the `changedFiles` and the new manifest/metadata unless `resetBaseline` is true.
When pushing to ORAS, set `feedser:exporters:trivyDb:oras:publishFull` / `publishDelta` to control whether full or delta runs are copied to the registry. Offline bundles follow the analogous `includeFull` / `includeDelta` switches under `offlineBundle`.
When pushing to ORAS, set `concelier:exporters:trivyDb:oras:publishFull` / `publishDelta` to control whether full or delta runs are copied to the registry. Offline bundles follow the analogous `includeFull` / `includeDelta` switches under `offlineBundle`.
Example configuration (`appsettings.yaml`):
```yaml
feedser:
concelier:
exporters:
trivyDb:
oras:
@@ -293,7 +587,7 @@ feedser:
**Authentication**
- API key is sent as `Authorization: Bearer <token>` automatically when configured.
- Anonymous operation is permitted only when Feedser runs with
- Anonymous operation is permitted only when Concelier runs with
`authority.allowAnonymousFallback: true`. This flag is temporary—plan to disable
it before **2025-12-31 UTC** so bearer tokens become mandatory.
@@ -303,7 +597,7 @@ Authority-backed auth workflow:
3. Execute CLI commands as normal—the backend client injects the cached bearer token automatically and retries on transient 401/403 responses with operator guidance.
4. Inspect the cache with `stellaops-cli auth status` (shows expiry, scope, mode) or clear it via `stellaops-cli auth logout`.
5. Run `stellaops-cli auth whoami` to dump token subject, audience, issuer, scopes, and remaining lifetime (verbose mode prints additional claims).
6. Expect Feedser to emit audit logs for each `/jobs*` request showing `subject`,
6. Expect Concelier to emit audit logs for each `/jobs*` request showing `subject`,
`clientId`, `scopes`, `status`, and whether network bypass rules were applied.
Tokens live in `~/.stellaops/tokens` unless `StellaOps:Authority:TokenCacheDirectory` overrides it. Cached tokens are reused offline until they expire; the CLI surfaces clear errors if refresh fails.
@@ -314,7 +608,7 @@ Tokens live in `~/.stellaops/tokens` unless `StellaOps:Authority:TokenCacheDirec
{
"StellaOps": {
"ApiKey": "your-api-token",
"BackendUrl": "https://feedser.example.org",
"BackendUrl": "https://concelier.example.org",
"ScannerCacheDirectory": "scanners",
"ResultsDirectory": "results",
"DefaultRunner": "docker",
@@ -322,11 +616,11 @@ Tokens live in `~/.stellaops/tokens` unless `StellaOps:Authority:TokenCacheDirec
"ScannerDownloadAttempts": 3,
"Authority": {
"Url": "https://authority.example.org",
"ClientId": "feedser-cli",
"ClientId": "concelier-cli",
"ClientSecret": "REDACTED",
"Username": "",
"Password": "",
"Scope": "feedser.jobs.trigger",
"Scope": "concelier.jobs.trigger",
"TokenCacheDirectory": ""
}
}

View File

@@ -1,289 +1,289 @@
# 10 · Feedser + CLI Quickstart
This guide walks through configuring the Feedser web service and the `stellaops-cli`
tool so an operator can ingest advisories, merge them, and publish exports from a
single workstation. It focuses on deployment-facing surfaces only (configuration,
runtime wiring, CLI usage) and leaves connector/internal customization for later.
---
## 0 · Prerequisites
- .NET SDK **10.0.100-preview** (matches `global.json`)
- MongoDB instance reachable from the host (local Docker or managed)
- `trivy-db` binary on `PATH` for Trivy exports (and `oras` if publishing to OCI)
- Plugin assemblies present in `PluginBinaries/` (already included in the repo)
- Optional: Docker/Podman runtime if you plan to run scanners locally
> **Tip** air-gapped installs should preload `trivy-db` and `oras` binaries into the
> runner image since Feedser never fetches them dynamically.
---
## 1 · Configure Feedser
1. Copy the sample config to the expected location (CI/CD pipelines can stamp values
into this file during deployment—see the “Deployment automation” note below):
```bash
mkdir -p etc
cp etc/feedser.yaml.sample etc/feedser.yaml
```
2. Edit `etc/feedser.yaml` and update the MongoDB DSN (and optional database name).
The default template configures plug-in discovery to look in `PluginBinaries/`
and disables remote telemetry exporters by default.
3. (Optional) Override settings via environment variables. All keys are prefixed with
`FEEDSER_`. Example:
```bash
export FEEDSER_STORAGE__DSN="mongodb://user:pass@mongo:27017/feedser"
export FEEDSER_TELEMETRY__ENABLETRACING=false
```
4. Start the web service from the repository root:
```bash
dotnet run --project src/StellaOps.Feedser.WebService
```
On startup Feedser validates the options, boots MongoDB indexes, loads plug-ins,
and exposes:
- `GET /health` returns service status and telemetry settings
- `GET /ready` performs a MongoDB `ping`
- `GET /jobs` + `POST /jobs/{kind}` inspect and trigger connector/export jobs
> **Security note** authentication now ships via StellaOps Authority. Keep
> `authority.allowAnonymousFallback: true` only during the staged rollout and
> disable it before **2025-12-31 UTC** so tokens become mandatory.
### Authority companion configuration (preview)
1. Copy the Authority sample configuration:
```bash
cp etc/authority.yaml.sample etc/authority.yaml
```
2. Update the issuer URL, token lifetimes, and plug-in descriptors to match your
environment. Authority expects per-plugin manifests in `etc/authority.plugins/`;
sample `standard.yaml` and `ldap.yaml` files are provided as starting points.
For air-gapped installs keep the default plug-in binary directory
(`../PluginBinaries/Authority`) so packaged plug-ins load without outbound access.
3. Environment variables prefixed with `STELLAOPS_AUTHORITY_` override individual
fields. Example:
```bash
export STELLAOPS_AUTHORITY__ISSUER="https://authority.stella-ops.local"
export STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0="/srv/authority/plugins"
```
---
## 2 · Configure the CLI
The CLI reads configuration from JSON/YAML files *and* environment variables. The
defaults live in `src/StellaOps.Cli/appsettings.json` and expect overrides at runtime.
| Setting | Environment variable | Default | Purpose |
| ------- | -------------------- | ------- | ------- |
| `BackendUrl` | `STELLAOPS_BACKEND_URL` | _empty_ | Base URL of the Feedser web service |
| `ApiKey` | `API_KEY` | _empty_ | Reserved for legacy key auth; leave empty when using Authority |
| `ScannerCacheDirectory` | `STELLAOPS_SCANNER_CACHE_DIRECTORY` | `scanners` | Local cache folder |
| `ResultsDirectory` | `STELLAOPS_RESULTS_DIRECTORY` | `results` | Where scan outputs are written |
| `Authority.Url` | `STELLAOPS_AUTHORITY_URL` | _empty_ | StellaOps Authority issuer/token endpoint |
| `Authority.ClientId` | `STELLAOPS_AUTHORITY_CLIENT_ID` | _empty_ | Client identifier for the CLI |
| `Authority.ClientSecret` | `STELLAOPS_AUTHORITY_CLIENT_SECRET` | _empty_ | Client secret (omit when using username/password grant) |
| `Authority.Username` | `STELLAOPS_AUTHORITY_USERNAME` | _empty_ | Username for password grant flows |
| `Authority.Password` | `STELLAOPS_AUTHORITY_PASSWORD` | _empty_ | Password for password grant flows |
| `Authority.Scope` | `STELLAOPS_AUTHORITY_SCOPE` | `feedser.jobs.trigger` | OAuth scope requested for backend operations |
| `Authority.TokenCacheDirectory` | `STELLAOPS_AUTHORITY_TOKEN_CACHE_DIR` | `~/.stellaops/tokens` | Directory that persists cached tokens |
| `Authority.Resilience.EnableRetries` | `STELLAOPS_AUTHORITY_ENABLE_RETRIES` | `true` | Toggle Polly retry handler for Authority HTTP calls |
| `Authority.Resilience.RetryDelays` | `STELLAOPS_AUTHORITY_RETRY_DELAYS` | `1s,2s,5s` | Comma- or space-separated backoff delays (hh:mm:ss) |
| `Authority.Resilience.AllowOfflineCacheFallback` | `STELLAOPS_AUTHORITY_ALLOW_OFFLINE_CACHE_FALLBACK` | `true` | Allow CLI to reuse cached discovery/JWKS metadata when Authority is offline |
| `Authority.Resilience.OfflineCacheTolerance` | `STELLAOPS_AUTHORITY_OFFLINE_CACHE_TOLERANCE` | `00:10:00` | Additional tolerance window applied to cached metadata |
Example bootstrap:
```bash
export STELLAOPS_BACKEND_URL="http://localhost:5000"
export STELLAOPS_RESULTS_DIRECTORY="$HOME/.stellaops/results"
export STELLAOPS_AUTHORITY_URL="https://authority.local"
export STELLAOPS_AUTHORITY_CLIENT_ID="feedser-cli"
export STELLAOPS_AUTHORITY_CLIENT_SECRET="s3cr3t"
dotnet run --project src/StellaOps.Cli -- db merge
# Acquire a bearer token and confirm cache state
dotnet run --project src/StellaOps.Cli -- auth login
dotnet run --project src/StellaOps.Cli -- auth status
dotnet run --project src/StellaOps.Cli -- auth whoami
```
Refer to `docs/dev/32_AUTH_CLIENT_GUIDE.md` for deeper guidance on tuning retry/offline settings and rollout checklists.
To persist configuration, you can create `stellaops-cli.yaml` next to the binary or
rely on environment variables for ephemeral runners.
---
## 3 · Operating Workflow
1. **Trigger connector fetch stages**
```bash
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage fetch
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage parse
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage map
```
Use `--mode resume` when continuing from a previous window:
```bash
dotnet run --project src/StellaOps.Cli -- db fetch --source redhat --stage fetch --mode resume
```
2. **Merge canonical advisories**
```bash
dotnet run --project src/StellaOps.Cli -- db merge
```
3. **Produce exports**
```bash
# JSON tree (vuln-list style)
dotnet run --project src/StellaOps.Cli -- db export --format json
# Trivy DB (delta example)
dotnet run --project src/StellaOps.Cli -- db export --format trivy-db --delta
```
Feedser always produces a deterministic OCI layout. The first run after a clean
bootstrap emits a **full** baseline; subsequent `--delta` runs reuse the previous
baselines blobs when only JSON manifests change. If the exporter detects that a
prior delta is still active (i.e., `LastDeltaDigest` is recorded) it automatically
upgrades the next run to a full export and resets the baseline so operators never
chain deltas indefinitely. The CLI exposes `--publish-full/--publish-delta` (for
ORAS pushes) and `--include-full/--include-delta` (for offline bundles) should you
need to override the defaults interactively.
**Smoke-check delta reuse:** after the first baseline completes, run the export a
second time with `--delta` and verify that the new directory reports `mode=delta`
while reusing the previous layer blob.
```bash
export_root=${FEEDSER_EXPORT_ROOT:-exports/trivy}
base=$(ls -1d "$export_root"/* | sort | tail -n2 | head -n1)
delta=$(ls -1d "$export_root"/* | sort | tail -n1)
jq -r '.mode,.baseExportId' "$delta/metadata.json"
base_manifest=$(jq -r '.manifests[0].digest' "$base/index.json")
delta_manifest=$(jq -r '.manifests[0].digest' "$delta/index.json")
printf 'baseline manifest: %s\ndelta manifest: %s\n' "$base_manifest" "$delta_manifest"
layer_digest=$(jq -r '.layers[0].digest' "$base/blobs/sha256/${base_manifest#sha256:}")
cmp "$base/blobs/sha256/${layer_digest#sha256:}" \
"$delta/blobs/sha256/${layer_digest#sha256:}"
```
`cmp` returning exit code `0` confirms the delta export reuses the baselines
`db.tar.gz` layer instead of rebuilding it.
4. **Manage scanners (optional)**
```bash
dotnet run --project src/StellaOps.Cli -- scanner download --channel stable
dotnet run --project src/StellaOps.Cli -- scan run --entry scanners/latest/Scanner.dll --target ./sboms
dotnet run --project src/StellaOps.Cli -- scan upload --file results/scan-001.json
```
Add `--verbose` to any command for structured console logs. All commands honour
`Ctrl+C` cancellation and exit with non-zero status codes when the backend returns
a problem document.
---
## 4 · Verification Checklist
- Feedser `/health` returns `"status":"healthy"` and Storage bootstrap is marked
complete after startup.
- CLI commands return HTTP 202 with a `Location` header (job tracking URL) when
triggering Feedser jobs.
- Export artefacts are materialised under the configured output directories and
their manifests record digests.
- MongoDB contains the expected `document`, `dto`, `advisory`, and `export_state`
collections after a run.
---
## 5 · Deployment Automation
- Treat `etc/feedser.yaml.sample` as the canonical template. CI/CD should copy it to
the deployment artifact and replace placeholders (DSN, telemetry endpoints, cron
overrides) with environment-specific secrets.
- Keep secret material (Mongo credentials, OTLP tokens) outside of the repository;
inject them via secret stores or pipeline variables at stamp time.
- When building container images, include `trivy-db` (and `oras` if used) so air-gapped
clusters do not need outbound downloads at runtime.
---
## 5 · Next Steps
- Enable authority-backed authentication in non-production first. Set
`authority.enabled: true` while keeping `authority.allowAnonymousFallback: true`
to observe logs, then flip it to `false` before 2025-12-31 UTC to enforce tokens.
- Automate the workflow above via CI/CD (compose stack or Kubernetes CronJobs).
- Pair with the Feedser connector teams when enabling additional sources so their
module-specific requirements are pulled in safely.
---
## 6 · Authority Integration
- Feedser now authenticates callers through StellaOps Authority using OAuth 2.0
resource server flows. Populate the `authority` block in `feedser.yaml`:
```yaml
authority:
enabled: true
allowAnonymousFallback: false # keep true only during the staged rollout window
issuer: "https://authority.example.org"
audiences:
- "api://feedser"
requiredScopes:
- "feedser.jobs.trigger"
clientId: "feedser-jobs"
clientSecretFile: "../secrets/feedser-jobs.secret"
clientScopes:
- "feedser.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
```
- Store the client secret outside of source control. Either provide it via
`authority.clientSecret` (environment variable `FEEDSER_AUTHORITY__CLIENTSECRET`)
or point `authority.clientSecretFile` to a file mounted at runtime.
- Cron jobs running on the same host can keep using the API thanks to the loopback
bypass mask. Add additional CIDR ranges as needed; every bypass is logged.
- Export the same configuration to Kubernetes or systemd by setting environment
variables such as:
```bash
export FEEDSER_AUTHORITY__ENABLED=true
export FEEDSER_AUTHORITY__ALLOWANONYMOUSFALLBACK=false
export FEEDSER_AUTHORITY__ISSUER="https://authority.example.org"
export FEEDSER_AUTHORITY__CLIENTID="feedser-jobs"
export FEEDSER_AUTHORITY__CLIENTSECRETFILE="/var/run/secrets/feedser/authority-client"
```
- CLI commands already pass `Authorization` headers when credentials are supplied.
Configure the CLI with matching Authority settings (`docs/09_API_CLI_REFERENCE.md`)
so that automation can obtain tokens with the same client credentials. Feedser
logs every job request with the client ID, subject (if present), scopes, and
a `bypass` flag so operators can audit cron traffic.
# 10 · Concelier + CLI Quickstart
This guide walks through configuring the Concelier web service and the `stellaops-cli`
tool so an operator can ingest advisories, merge them, and publish exports from a
single workstation. It focuses on deployment-facing surfaces only (configuration,
runtime wiring, CLI usage) and leaves connector/internal customization for later.
---
## 0 · Prerequisites
- .NET SDK **10.0.100-preview** (matches `global.json`)
- MongoDB instance reachable from the host (local Docker or managed)
- `trivy-db` binary on `PATH` for Trivy exports (and `oras` if publishing to OCI)
- Plugin assemblies present in `StellaOps.Concelier.PluginBinaries/` (already included in the repo)
- Optional: Docker/Podman runtime if you plan to run scanners locally
> **Tip** air-gapped installs should preload `trivy-db` and `oras` binaries into the
> runner image since Concelier never fetches them dynamically.
---
## 1 · Configure Concelier
1. Copy the sample config to the expected location (CI/CD pipelines can stamp values
into this file during deployment—see the “Deployment automation” note below):
```bash
mkdir -p etc
cp etc/concelier.yaml.sample etc/concelier.yaml
```
2. Edit `etc/concelier.yaml` and update the MongoDB DSN (and optional database name).
The default template configures plug-in discovery to look in `StellaOps.Concelier.PluginBinaries/`
and disables remote telemetry exporters by default.
3. (Optional) Override settings via environment variables. All keys are prefixed with
`CONCELIER_`. Example:
```bash
export CONCELIER_STORAGE__DSN="mongodb://user:pass@mongo:27017/concelier"
export CONCELIER_TELEMETRY__ENABLETRACING=false
```
4. Start the web service from the repository root:
```bash
dotnet run --project src/StellaOps.Concelier.WebService
```
On startup Concelier validates the options, boots MongoDB indexes, loads plug-ins,
and exposes:
- `GET /health` returns service status and telemetry settings
- `GET /ready` performs a MongoDB `ping`
- `GET /jobs` + `POST /jobs/{kind}` inspect and trigger connector/export jobs
> **Security note** authentication now ships via StellaOps Authority. Keep
> `authority.allowAnonymousFallback: true` only during the staged rollout and
> disable it before **2025-12-31 UTC** so tokens become mandatory.
### Authority companion configuration (preview)
1. Copy the Authority sample configuration:
```bash
cp etc/authority.yaml.sample etc/authority.yaml
```
2. Update the issuer URL, token lifetimes, and plug-in descriptors to match your
environment. Authority expects per-plugin manifests in `etc/authority.plugins/`;
sample `standard.yaml` and `ldap.yaml` files are provided as starting points.
For air-gapped installs keep the default plug-in binary directory
(`../StellaOps.Authority.PluginBinaries`) so packaged plug-ins load without outbound access.
3. Environment variables prefixed with `STELLAOPS_AUTHORITY_` override individual
fields. Example:
```bash
export STELLAOPS_AUTHORITY__ISSUER="https://authority.stella-ops.local"
export STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0="/srv/authority/plugins"
```
---
## 2 · Configure the CLI
The CLI reads configuration from JSON/YAML files *and* environment variables. The
defaults live in `src/StellaOps.Cli/appsettings.json` and expect overrides at runtime.
| Setting | Environment variable | Default | Purpose |
| ------- | -------------------- | ------- | ------- |
| `BackendUrl` | `STELLAOPS_BACKEND_URL` | _empty_ | Base URL of the Concelier web service |
| `ApiKey` | `API_KEY` | _empty_ | Reserved for legacy key auth; leave empty when using Authority |
| `ScannerCacheDirectory` | `STELLAOPS_SCANNER_CACHE_DIRECTORY` | `scanners` | Local cache folder |
| `ResultsDirectory` | `STELLAOPS_RESULTS_DIRECTORY` | `results` | Where scan outputs are written |
| `Authority.Url` | `STELLAOPS_AUTHORITY_URL` | _empty_ | StellaOps Authority issuer/token endpoint |
| `Authority.ClientId` | `STELLAOPS_AUTHORITY_CLIENT_ID` | _empty_ | Client identifier for the CLI |
| `Authority.ClientSecret` | `STELLAOPS_AUTHORITY_CLIENT_SECRET` | _empty_ | Client secret (omit when using username/password grant) |
| `Authority.Username` | `STELLAOPS_AUTHORITY_USERNAME` | _empty_ | Username for password grant flows |
| `Authority.Password` | `STELLAOPS_AUTHORITY_PASSWORD` | _empty_ | Password for password grant flows |
| `Authority.Scope` | `STELLAOPS_AUTHORITY_SCOPE` | `concelier.jobs.trigger` | OAuth scope requested for backend operations |
| `Authority.TokenCacheDirectory` | `STELLAOPS_AUTHORITY_TOKEN_CACHE_DIR` | `~/.stellaops/tokens` | Directory that persists cached tokens |
| `Authority.Resilience.EnableRetries` | `STELLAOPS_AUTHORITY_ENABLE_RETRIES` | `true` | Toggle Polly retry handler for Authority HTTP calls |
| `Authority.Resilience.RetryDelays` | `STELLAOPS_AUTHORITY_RETRY_DELAYS` | `1s,2s,5s` | Comma- or space-separated backoff delays (hh:mm:ss) |
| `Authority.Resilience.AllowOfflineCacheFallback` | `STELLAOPS_AUTHORITY_ALLOW_OFFLINE_CACHE_FALLBACK` | `true` | Allow CLI to reuse cached discovery/JWKS metadata when Authority is offline |
| `Authority.Resilience.OfflineCacheTolerance` | `STELLAOPS_AUTHORITY_OFFLINE_CACHE_TOLERANCE` | `00:10:00` | Additional tolerance window applied to cached metadata |
Example bootstrap:
```bash
export STELLAOPS_BACKEND_URL="http://localhost:5000"
export STELLAOPS_RESULTS_DIRECTORY="$HOME/.stellaops/results"
export STELLAOPS_AUTHORITY_URL="https://authority.local"
export STELLAOPS_AUTHORITY_CLIENT_ID="concelier-cli"
export STELLAOPS_AUTHORITY_CLIENT_SECRET="s3cr3t"
dotnet run --project src/StellaOps.Cli -- db merge
# Acquire a bearer token and confirm cache state
dotnet run --project src/StellaOps.Cli -- auth login
dotnet run --project src/StellaOps.Cli -- auth status
dotnet run --project src/StellaOps.Cli -- auth whoami
```
Refer to `docs/dev/32_AUTH_CLIENT_GUIDE.md` for deeper guidance on tuning retry/offline settings and rollout checklists.
To persist configuration, you can create `stellaops-cli.yaml` next to the binary or
rely on environment variables for ephemeral runners.
---
## 3 · Operating Workflow
1. **Trigger connector fetch stages**
```bash
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage fetch
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage parse
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage map
```
Use `--mode resume` when continuing from a previous window:
```bash
dotnet run --project src/StellaOps.Cli -- db fetch --source redhat --stage fetch --mode resume
```
2. **Merge canonical advisories**
```bash
dotnet run --project src/StellaOps.Cli -- db merge
```
3. **Produce exports**
```bash
# JSON tree (vuln-list style)
dotnet run --project src/StellaOps.Cli -- db export --format json
# Trivy DB (delta example)
dotnet run --project src/StellaOps.Cli -- db export --format trivy-db --delta
```
Concelier always produces a deterministic OCI layout. The first run after a clean
bootstrap emits a **full** baseline; subsequent `--delta` runs reuse the previous
baselines blobs when only JSON manifests change. If the exporter detects that a
prior delta is still active (i.e., `LastDeltaDigest` is recorded) it automatically
upgrades the next run to a full export and resets the baseline so operators never
chain deltas indefinitely. The CLI exposes `--publish-full/--publish-delta` (for
ORAS pushes) and `--include-full/--include-delta` (for offline bundles) should you
need to override the defaults interactively.
**Smoke-check delta reuse:** after the first baseline completes, run the export a
second time with `--delta` and verify that the new directory reports `mode=delta`
while reusing the previous layer blob.
```bash
export_root=${CONCELIER_EXPORT_ROOT:-exports/trivy}
base=$(ls -1d "$export_root"/* | sort | tail -n2 | head -n1)
delta=$(ls -1d "$export_root"/* | sort | tail -n1)
jq -r '.mode,.baseExportId' "$delta/metadata.json"
base_manifest=$(jq -r '.manifests[0].digest' "$base/index.json")
delta_manifest=$(jq -r '.manifests[0].digest' "$delta/index.json")
printf 'baseline manifest: %s\ndelta manifest: %s\n' "$base_manifest" "$delta_manifest"
layer_digest=$(jq -r '.layers[0].digest' "$base/blobs/sha256/${base_manifest#sha256:}")
cmp "$base/blobs/sha256/${layer_digest#sha256:}" \
"$delta/blobs/sha256/${layer_digest#sha256:}"
```
`cmp` returning exit code `0` confirms the delta export reuses the baselines
`db.tar.gz` layer instead of rebuilding it.
4. **Manage scanners (optional)**
```bash
dotnet run --project src/StellaOps.Cli -- scanner download --channel stable
dotnet run --project src/StellaOps.Cli -- scan run --entry scanners/latest/Scanner.dll --target ./sboms
dotnet run --project src/StellaOps.Cli -- scan upload --file results/scan-001.json
```
Add `--verbose` to any command for structured console logs. All commands honour
`Ctrl+C` cancellation and exit with non-zero status codes when the backend returns
a problem document.
---
## 4 · Verification Checklist
- Concelier `/health` returns `"status":"healthy"` and Storage bootstrap is marked
complete after startup.
- CLI commands return HTTP 202 with a `Location` header (job tracking URL) when
triggering Concelier jobs.
- Export artefacts are materialised under the configured output directories and
their manifests record digests.
- MongoDB contains the expected `document`, `dto`, `advisory`, and `export_state`
collections after a run.
---
## 5 · Deployment Automation
- Treat `etc/concelier.yaml.sample` as the canonical template. CI/CD should copy it to
the deployment artifact and replace placeholders (DSN, telemetry endpoints, cron
overrides) with environment-specific secrets.
- Keep secret material (Mongo credentials, OTLP tokens) outside of the repository;
inject them via secret stores or pipeline variables at stamp time.
- When building container images, include `trivy-db` (and `oras` if used) so air-gapped
clusters do not need outbound downloads at runtime.
---
## 5 · Next Steps
- Enable authority-backed authentication in non-production first. Set
`authority.enabled: true` while keeping `authority.allowAnonymousFallback: true`
to observe logs, then flip it to `false` before 2025-12-31 UTC to enforce tokens.
- Automate the workflow above via CI/CD (compose stack or Kubernetes CronJobs).
- Pair with the Concelier connector teams when enabling additional sources so their
module-specific requirements are pulled in safely.
---
## 6 · Authority Integration
- Concelier now authenticates callers through StellaOps Authority using OAuth 2.0
resource server flows. Populate the `authority` block in `concelier.yaml`:
```yaml
authority:
enabled: true
allowAnonymousFallback: false # keep true only during the staged rollout window
issuer: "https://authority.example.org"
audiences:
- "api://concelier"
requiredScopes:
- "concelier.jobs.trigger"
clientId: "concelier-jobs"
clientSecretFile: "../secrets/concelier-jobs.secret"
clientScopes:
- "concelier.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
```
- Store the client secret outside of source control. Either provide it via
`authority.clientSecret` (environment variable `CONCELIER_AUTHORITY__CLIENTSECRET`)
or point `authority.clientSecretFile` to a file mounted at runtime.
- Cron jobs running on the same host can keep using the API thanks to the loopback
bypass mask. Add additional CIDR ranges as needed; every bypass is logged.
- Export the same configuration to Kubernetes or systemd by setting environment
variables such as:
```bash
export CONCELIER_AUTHORITY__ENABLED=true
export CONCELIER_AUTHORITY__ALLOWANONYMOUSFALLBACK=false
export CONCELIER_AUTHORITY__ISSUER="https://authority.example.org"
export CONCELIER_AUTHORITY__CLIENTID="concelier-jobs"
export CONCELIER_AUTHORITY__CLIENTSECRETFILE="/var/run/secrets/concelier/authority-client"
```
- CLI commands already pass `Authorization` headers when credentials are supplied.
Configure the CLI with matching Authority settings (`docs/09_API_CLI_REFERENCE.md`)
so that automation can obtain tokens with the same client credentials. Concelier
logs every job request with the client ID, subject (if present), scopes, and
a `bypass` flag so operators can audit cron traffic.

View File

@@ -82,28 +82,53 @@ Add this to **`MyPlugin.Schedule.csproj`** so the signed DLL + `.sig` land in th
---
##5DependencyInjection Entrypoint
Backend autodiscovers the static method below:
~~~csharp
namespace StellaOps.DependencyInjection;
public static class IoCConfigurator
{
public static IServiceCollection Configure(this IServiceCollection services,
IConfiguration cfg)
{
services.AddSingleton<IJob, MyJob>(); // schedule job
services.Configure<MyPluginOptions>(cfg.GetSection("Plugins:MyPlugin"));
return services;
}
}
~~~
---
##6Schedule Plugins
##5DependencyInjection Entrypoint
Backend autodiscovers restarttime bindings through two mechanisms:
1. **Service binding metadata** for simple contracts.
2. **`IDependencyInjectionRoutine`** implementations when you need full control.
###5.1Service binding metadata
Annotate implementations with `[ServiceBinding]` to declare their lifetime and service contract.
The loader honours scoped lifetimes and will register the service before executing any custom DI routines.
~~~csharp
using Microsoft.Extensions.DependencyInjection;
using StellaOps.DependencyInjection;
[ServiceBinding(typeof(IJob), ServiceLifetime.Scoped, RegisterAsSelf = true)]
public sealed class MyJob : IJob
{
// IJob dependencies can now use scoped services (Mongo sessions, etc.)
}
~~~
Use `RegisterAsSelf = true` when you also want to resolve the concrete type.
Set `ReplaceExisting = true` to override default descriptors if the host already provides one.
###5.2Dependency injection routines
For advanced scenarios continue to expose a routine:
~~~csharp
namespace StellaOps.DependencyInjection;
public sealed class IoCConfigurator : IDependencyInjectionRoutine
{
public IServiceCollection Register(IServiceCollection services, IConfiguration cfg)
{
services.AddSingleton<IJob, MyJob>(); // schedule job
services.Configure<MyPluginOptions>(cfg.GetSection("Plugins:MyPlugin"));
return services;
}
}
~~~
---
##6Schedule Plugins
###6.1 Minimal Job
@@ -191,4 +216,4 @@ On merge, the plugin shows up in the UI Marketplace.
| NotDetected | .sig missing | cosign sign |
| VersionGateMismatch | Backend 2.1 vs plugin 2.0 | Recompile / bump attribute |
| FileLoadException | Duplicate | StellaOps.Common Ensure PrivateAssets="all" |
| Redis | timeouts Large writes | Batch or use Mongo |
| Redis | timeouts Large writes | Batch or use Mongo |

View File

@@ -1,162 +1,176 @@
# StellaOps Authority Service
> **Status:** Drafted 2025-10-12 (CORE5B.DOC / DOC1.AUTH) aligns with Authority revocation store, JWKS rotation, and bootstrap endpoints delivered in Sprint1.
## 1. Purpose
The **StellaOps Authority** service issues OAuth2/OIDC tokens for every StellaOps module (Feedser, Backend, Agent, Zastava) and exposes the policy controls required in sovereign/offline environments. Authority is built as a minimal ASP.NET host that:
- brokers password, client-credentials, and device-code flows through pluggable identity providers;
- persists access/refresh/device tokens in MongoDB with deterministic schemas for replay analysis and air-gapped audit copies;
- distributes revocation bundles and JWKS material so downstream services can enforce lockouts without direct database access;
- offers bootstrap APIs for first-run provisioning and key rotation without redeploying binaries.
Authority is deployed alongside Feedser in air-gapped environments and never requires outbound internet access. All trusted metadata (OpenIddict discovery, JWKS, revocation bundles) is cacheable, signed, and reproducible.
## 2. Component Architecture
Authority is composed of five cooperating subsystems:
1. **Minimal API host** configures OpenIddict endpoints (`/token`, `/authorize`, `/revoke`, `/jwks`) and structured logging/telemetry. Rate limiting hooks (`AuthorityRateLimiter`) wrap every request.
2. **Plugin host** loads `StellaOps.Authority.Plugin.*.dll` assemblies, applies capability metadata, and exposes password/client provisioning surfaces through dependency injection.
3. **Mongo storage** persists tokens, revocations, bootstrap invites, and plugin state in deterministic collections indexed for offline sync (`authority_tokens`, `authority_revocations`, etc.).
4. **Cryptography layer** `StellaOps.Cryptography` abstractions manage password hashing, signing keys, JWKS export, and detached JWS generation.
5. **Offline ops APIs** internal endpoints under `/internal/*` provide administrative flows (bootstrap users/clients, revocation export) guarded by API keys and deterministic audit events.
A high-level sequence for password logins:
```
Client -> /token (password grant)
-> Rate limiter & audit hooks
-> Plugin credential store (Argon2id verification)
-> Token persistence (Mongo authority_tokens)
-> Response (access/refresh tokens + deterministic claims)
```
## 3. Token Lifecycle & Persistence
Authority persists every issued token in MongoDB so operators can audit or revoke without scanning distributed caches.
- **Collection:** `authority_tokens`
- **Key fields:**
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
- `subjectId`, `clientId`, ordered `scope` array
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
- `revokedMetadata` (string dictionary for plugin-specific context)
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
- **Expiry maintenance:** `AuthorityTokenStore.DeleteExpiredAsync` prunes non-revoked tokens past their `expiresAt` timestamp. Operators should schedule this in maintenance windows if large volumes of tokens are issued.
### Expectations for resource servers
Resource servers (Feedser WebService, Backend, Agent) **must not** assume in-memory caches are authoritative. They should:
- cache `/jwks` and `/revocations/export` responses within configured lifetimes;
- honour `revokedReason` metadata when shaping audit trails;
- treat `status != "valid"` or missing tokens as immediate denial conditions.
## 4. Revocation Pipeline
Authority centralises revocation in `authority_revocations` with deterministic categories:
| Category | Meaning | Required fields |
| --- | --- | --- |
| `token` | Specific OAuth token revoked early. | `revocationId` (token id), `tokenType`, optional `clientId`, `subjectId` |
| `subject` | All tokens for a subject disabled. | `revocationId` (= subject id) |
| `client` | OAuth client registration revoked. | `revocationId` (= client id) |
| `key` | Signing/JWE key withdrawn. | `revocationId` (= key id) |
`RevocationBundleBuilder` flattens Mongo documents into canonical JSON, sorts entries by (`category`, `revocationId`, `revokedAt`), and signs exports using detached JWS (RFC7797) with cosign-compatible headers.
**Export surfaces** (deterministic output, suitable for Offline Kit):
- CLI: `stella auth revoke export --output ./out` writes `revocation-bundle.json`, `.jws`, `.sha256`.
- Verification: `stella auth revoke verify --bundle <path> --signature <path> --key <path>` validates detached JWS signatures before distribution, selecting the crypto provider advertised in the detached header (see `docs/security/revocation-bundle.md`).
- API: `GET /internal/revocations/export` (requires bootstrap API key) returns the same payload.
- Verification: `stella auth revoke verify` validates schema, digest, and detached JWS using cached JWKS or offline keys, automatically preferring the hinted provider (libsodium builds honour `provider=libsodium`; other builds fall back to the managed provider).
**Consumer guidance:**
1. Mirror `revocation-bundle.json*` alongside Feedser exports. Offline agents fetch both over the existing update channel.
2. Use bundle `sequence` and `bundleId` to detect replay or monotonicity regressions. Ignore bundles with older sequence numbers unless `bundleId` changes and `issuedAt` advances.
3. Treat `revokedReason` taxonomy as machine-friendly codes (`compromised`, `rotation`, `policy`, `lifecycle`). Translating to human-readable logs is the consumers responsibility.
## 5. Signing Keys & JWKS Rotation
Authority signs revocation bundles and publishes JWKS entries via the new signing manager:
- **Configuration (`authority.yaml`):**
```yaml
signing:
enabled: true
algorithm: ES256 # Defaults to ES256
keySource: file # Loader identifier (file, vault, etc.)
provider: default # Optional preferred crypto provider
activeKeyId: authority-signing-dev
keyPath: "../certificates/authority-signing-dev.pem"
additionalKeys:
- keyId: authority-signing-dev-2024
path: "../certificates/authority-signing-dev-2024.pem"
source: "file"
```
- **Sources:** The default loader supports PEM files relative to the content root; additional loaders can be registered via `IAuthoritySigningKeySource`.
- **Providers:** Keys are registered against the `ICryptoProviderRegistry`, so alternative implementations (HSM, libsodium) can be plugged in without changing host code.
- **JWKS output:** `GET /jwks` lists every signing key with `status` metadata (`active`, `retired`). Old keys remain until operators remove them from configuration, allowing verification of historical bundles/tokens.
### Rotation SOP (no downtime)
1. Generate a new P-256 private key (PEM) on an offline workstation and place it where the Authority host can read it (e.g., `../certificates/authority-signing-2025.pem`).
2. Call the authenticated admin API:
```bash
curl -sS -X POST https://authority.example.com/internal/signing/rotate \
-H "x-stellaops-bootstrap-key: ${BOOTSTRAP_KEY}" \
-H "Content-Type: application/json" \
-d '{
"keyId": "authority-signing-2025",
"location": "../certificates/authority-signing-2025.pem",
"source": "file"
}'
```
3. Verify the response reports the previous key as retired and fetch `/jwks` to confirm the new `kid` appears with `status: "active"`.
4. Persist the old key path in `signing.additionalKeys` (the rotation API updates in-memory options; rewrite the YAML to match so restarts remain consistent).
5. If you prefer automation, trigger the `.gitea/workflows/authority-key-rotation.yml` workflow with the new `keyId`/`keyPath`; it wraps `ops/authority/key-rotation.sh` and reads environment-specific secrets. The older key will be marked `retired` and appended to `signing.additionalKeys`.
6. Re-run `stella auth revoke export` so revocation bundles are signed with the new key. Downstream caches should refresh JWKS within their configured lifetime (`StellaOpsAuthorityOptions.Signing` + client cache tolerance).
The rotation API leverages the same cryptography abstractions as revocation signing; no restart is required and the previous key is marked `retired` but kept available for verification.
## 6. Bootstrap & Administrative Endpoints
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
| Endpoint | Method | Description |
| --- | --- | --- |
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
## 7. Configuration Reference
| Section | Key | Description | Notes |
| --- | --- | --- | --- |
| Root | `issuer` | Absolute HTTPS issuer advertised to clients. | Required. Loopback HTTP allowed only for development. |
| Tokens | `accessTokenLifetime`, `refreshTokenLifetime`, etc. | Lifetimes for each grant (access, refresh, device, authorization code, identity). | Enforced during issuance; persisted on each token document. |
| Storage | `storage.connectionString` | MongoDB connection string. | Required even for tests; offline kits ship snapshots for seeding. |
| Signing | `signing.enabled` | Enable JWKS/revocation signing. | Disable only for development. |
| Signing | `signing.algorithm` | Signing algorithm identifier. | Currently ES256; additional curves can be wired through crypto providers. |
| Signing | `signing.keySource` | Loader identifier (`file`, `vault`, custom). | Determines which `IAuthoritySigningKeySource` resolves keys. |
| Signing | `signing.keyPath` | Relative/absolute path understood by the loader. | Stored as-is; rotation request should keep it in sync with filesystem layout. |
| Signing | `signing.activeKeyId` | Active JWKS / revocation signing key id. | Exposed as `kid` in JWKS and bundles. |
| Signing | `signing.additionalKeys[].keyId` | Retired key identifier retained for verification. | Manager updates this automatically after rotation; keep YAML aligned. |
| Signing | `signing.additionalKeys[].source` | Loader identifier per retired key. | Defaults to `signing.keySource` if omitted. |
| Security | `security.rateLimiting` | Fixed-window limits for `/token`, `/authorize`, `/internal/*`. | See `docs/security/rate-limits.md` for tuning. |
| Bootstrap | `bootstrap.apiKey` | Shared secret required for `/internal/*`. | Only required when `bootstrap.enabled` is true. |
## 8. Offline & Sovereign Operation
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
## 9. Operational Checklist
- [ ] Protect the bootstrap API key and disable bootstrap endpoints (`bootstrap.enabled: false`) once initial setup is complete.
- [ ] Schedule `stella auth revoke export` (or `/internal/revocations/export`) at the same cadence as Feedser exports so bundles remain in lockstep.
- [ ] Rotate signing keys before expiration; keep at least one retired key until all cached bundles/tokens signed with it have expired.
- [ ] Monitor `/health` and `/ready` plus rate-limiter metrics to detect plugin outages early.
- [ ] Ensure downstream services cache JWKS and revocation bundles within tolerances; stale caches risk accepting revoked tokens.
For plug-in specific requirements, refer to **[Authority Plug-in Developer Guide](dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md)**. For revocation bundle validation workflow, see **[Authority Revocation Bundle](security/revocation-bundle.md)**.
# StellaOps Authority Service
> **Status:** Drafted 2025-10-12 (CORE5B.DOC / DOC1.AUTH) aligns with Authority revocation store, JWKS rotation, and bootstrap endpoints delivered in Sprint1.
## 1. Purpose
The **StellaOps Authority** service issues OAuth2/OIDC tokens for every StellaOps module (Concelier, Backend, Agent, Zastava) and exposes the policy controls required in sovereign/offline environments. Authority is built as a minimal ASP.NET host that:
- brokers password, client-credentials, and device-code flows through pluggable identity providers;
- persists access/refresh/device tokens in MongoDB with deterministic schemas for replay analysis and air-gapped audit copies;
- distributes revocation bundles and JWKS material so downstream services can enforce lockouts without direct database access;
- offers bootstrap APIs for first-run provisioning and key rotation without redeploying binaries.
Authority is deployed alongside Concelier in air-gapped environments and never requires outbound internet access. All trusted metadata (OpenIddict discovery, JWKS, revocation bundles) is cacheable, signed, and reproducible.
## 2. Component Architecture
Authority is composed of five cooperating subsystems:
1. **Minimal API host** configures OpenIddict endpoints (`/token`, `/authorize`, `/revoke`, `/jwks`) and structured logging/telemetry. Rate limiting hooks (`AuthorityRateLimiter`) wrap every request.
2. **Plugin host** loads `StellaOps.Authority.Plugin.*.dll` assemblies, applies capability metadata, and exposes password/client provisioning surfaces through dependency injection.
3. **Mongo storage** persists tokens, revocations, bootstrap invites, and plugin state in deterministic collections indexed for offline sync (`authority_tokens`, `authority_revocations`, etc.).
4. **Cryptography layer** `StellaOps.Cryptography` abstractions manage password hashing, signing keys, JWKS export, and detached JWS generation.
5. **Offline ops APIs** internal endpoints under `/internal/*` provide administrative flows (bootstrap users/clients, revocation export) guarded by API keys and deterministic audit events.
A high-level sequence for password logins:
```
Client -> /token (password grant)
-> Rate limiter & audit hooks
-> Plugin credential store (Argon2id verification)
-> Token persistence (Mongo authority_tokens)
-> Response (access/refresh tokens + deterministic claims)
```
## 3. Token Lifecycle & Persistence
Authority persists every issued token in MongoDB so operators can audit or revoke without scanning distributed caches.
- **Collection:** `authority_tokens`
- **Key fields:**
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
- `subjectId`, `clientId`, ordered `scope` array
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
- `revokedMetadata` (string dictionary for plugin-specific context)
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
- **Expiry maintenance:** `AuthorityTokenStore.DeleteExpiredAsync` prunes non-revoked tokens past their `expiresAt` timestamp. Operators should schedule this in maintenance windows if large volumes of tokens are issued.
### Expectations for resource servers
Resource servers (Concelier WebService, Backend, Agent) **must not** assume in-memory caches are authoritative. They should:
- cache `/jwks` and `/revocations/export` responses within configured lifetimes;
- honour `revokedReason` metadata when shaping audit trails;
- treat `status != "valid"` or missing tokens as immediate denial conditions.
## 4. Revocation Pipeline
Authority centralises revocation in `authority_revocations` with deterministic categories:
| Category | Meaning | Required fields |
| --- | --- | --- |
| `token` | Specific OAuth token revoked early. | `revocationId` (token id), `tokenType`, optional `clientId`, `subjectId` |
| `subject` | All tokens for a subject disabled. | `revocationId` (= subject id) |
| `client` | OAuth client registration revoked. | `revocationId` (= client id) |
| `key` | Signing/JWE key withdrawn. | `revocationId` (= key id) |
`RevocationBundleBuilder` flattens Mongo documents into canonical JSON, sorts entries by (`category`, `revocationId`, `revokedAt`), and signs exports using detached JWS (RFC7797) with cosign-compatible headers.
**Export surfaces** (deterministic output, suitable for Offline Kit):
- CLI: `stella auth revoke export --output ./out` writes `revocation-bundle.json`, `.jws`, `.sha256`.
- Verification: `stella auth revoke verify --bundle <path> --signature <path> --key <path>` validates detached JWS signatures before distribution, selecting the crypto provider advertised in the detached header (see `docs/security/revocation-bundle.md`).
- API: `GET /internal/revocations/export` (requires bootstrap API key) returns the same payload.
- Verification: `stella auth revoke verify` validates schema, digest, and detached JWS using cached JWKS or offline keys, automatically preferring the hinted provider (libsodium builds honour `provider=libsodium`; other builds fall back to the managed provider).
**Consumer guidance:**
1. Mirror `revocation-bundle.json*` alongside Concelier exports. Offline agents fetch both over the existing update channel.
2. Use bundle `sequence` and `bundleId` to detect replay or monotonicity regressions. Ignore bundles with older sequence numbers unless `bundleId` changes and `issuedAt` advances.
3. Treat `revokedReason` taxonomy as machine-friendly codes (`compromised`, `rotation`, `policy`, `lifecycle`). Translating to human-readable logs is the consumers responsibility.
## 5. Signing Keys & JWKS Rotation
Authority signs revocation bundles and publishes JWKS entries via the new signing manager:
- **Configuration (`authority.yaml`):**
```yaml
signing:
enabled: true
algorithm: ES256 # Defaults to ES256
keySource: file # Loader identifier (file, vault, etc.)
provider: default # Optional preferred crypto provider
activeKeyId: authority-signing-dev
keyPath: "../certificates/authority-signing-dev.pem"
additionalKeys:
- keyId: authority-signing-dev-2024
path: "../certificates/authority-signing-dev-2024.pem"
source: "file"
```
- **Sources:** The default loader supports PEM files relative to the content root; additional loaders can be registered via `IAuthoritySigningKeySource`.
- **Providers:** Keys are registered against the `ICryptoProviderRegistry`, so alternative implementations (HSM, libsodium) can be plugged in without changing host code.
- **JWKS output:** `GET /jwks` lists every signing key with `status` metadata (`active`, `retired`). Old keys remain until operators remove them from configuration, allowing verification of historical bundles/tokens.
### Rotation SOP (no downtime)
1. Generate a new P-256 private key (PEM) on an offline workstation and place it where the Authority host can read it (e.g., `../certificates/authority-signing-2025.pem`).
2. Call the authenticated admin API:
```bash
curl -sS -X POST https://authority.example.com/internal/signing/rotate \
-H "x-stellaops-bootstrap-key: ${BOOTSTRAP_KEY}" \
-H "Content-Type: application/json" \
-d '{
"keyId": "authority-signing-2025",
"location": "../certificates/authority-signing-2025.pem",
"source": "file"
}'
```
3. Verify the response reports the previous key as retired and fetch `/jwks` to confirm the new `kid` appears with `status: "active"`.
4. Persist the old key path in `signing.additionalKeys` (the rotation API updates in-memory options; rewrite the YAML to match so restarts remain consistent).
5. If you prefer automation, trigger the `.gitea/workflows/authority-key-rotation.yml` workflow with the new `keyId`/`keyPath`; it wraps `ops/authority/key-rotation.sh` and reads environment-specific secrets. The older key will be marked `retired` and appended to `signing.additionalKeys`.
6. Re-run `stella auth revoke export` so revocation bundles are signed with the new key. Downstream caches should refresh JWKS within their configured lifetime (`StellaOpsAuthorityOptions.Signing` + client cache tolerance).
The rotation API leverages the same cryptography abstractions as revocation signing; no restart is required and the previous key is marked `retired` but kept available for verification.
## 6. Bootstrap & Administrative Endpoints
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
| Endpoint | Method | Description |
| --- | --- | --- |
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
## 7. Configuration Reference
| Section | Key | Description | Notes |
| --- | --- | --- | --- |
| Root | `issuer` | Absolute HTTPS issuer advertised to clients. | Required. Loopback HTTP allowed only for development. |
| Tokens | `accessTokenLifetime`, `refreshTokenLifetime`, etc. | Lifetimes for each grant (access, refresh, device, authorization code, identity). | Enforced during issuance; persisted on each token document. |
| Storage | `storage.connectionString` | MongoDB connection string. | Required even for tests; offline kits ship snapshots for seeding. |
| Signing | `signing.enabled` | Enable JWKS/revocation signing. | Disable only for development. |
| Signing | `signing.algorithm` | Signing algorithm identifier. | Currently ES256; additional curves can be wired through crypto providers. |
| Signing | `signing.keySource` | Loader identifier (`file`, `vault`, custom). | Determines which `IAuthoritySigningKeySource` resolves keys. |
| Signing | `signing.keyPath` | Relative/absolute path understood by the loader. | Stored as-is; rotation request should keep it in sync with filesystem layout. |
| Signing | `signing.activeKeyId` | Active JWKS / revocation signing key id. | Exposed as `kid` in JWKS and bundles. |
| Signing | `signing.additionalKeys[].keyId` | Retired key identifier retained for verification. | Manager updates this automatically after rotation; keep YAML aligned. |
| Signing | `signing.additionalKeys[].source` | Loader identifier per retired key. | Defaults to `signing.keySource` if omitted. |
| Security | `security.rateLimiting` | Fixed-window limits for `/token`, `/authorize`, `/internal/*`. | See `docs/security/rate-limits.md` for tuning. |
| Bootstrap | `bootstrap.apiKey` | Shared secret required for `/internal/*`. | Only required when `bootstrap.enabled` is true. |
### 7.1 Sender-constrained clients (DPoP & mTLS)
Authority now understands two flavours of sender-constrained OAuth clients:
- **DPoP proof-of-possession** clients sign a `DPoP` header for `/token` requests. Authority validates the JWK thumbprint, HTTP method/URI, and replay window, then stamps the resulting access token with `cnf.jkt` so downstream services can verify the same key is reused.
- Configure under `security.senderConstraints.dpop`. `allowedAlgorithms`, `proofLifetime`, and `replayWindow` are enforced at validation time.
- `security.senderConstraints.dpop.nonce.enabled` enables nonce challenges for high-value audiences (`requiredAudiences`, normalised to case-insensitive strings). When a nonce is required but missing or expired, `/token` replies with `WWW-Authenticate: DPoP error="use_dpop_nonce"` (and, when available, a fresh `DPoP-Nonce` header). Clients must retry with the issued nonce embedded in the proof.
- `security.senderConstraints.dpop.nonce.store` selects `memory` (default) or `redis`. When `redis` is configured, set `security.senderConstraints.dpop.nonce.redisConnectionString` so replicas share nonce issuance and high-value clients avoid replay gaps during failover.
- Declare client `audiences` in bootstrap manifests or plug-in provisioning metadata; Authority now defaults the token `aud` claim and `resource` indicator from this list, which is also used to trigger nonce enforcement for audiences such as `signer` and `attestor`.
- **Mutual TLS clients** client registrations may declare an mTLS binding (`senderConstraint: mtls`). When enabled via `security.senderConstraints.mtls`, Authority validates the presented client certificate against stored bindings (`certificateBindings[]`), optional chain verification, and timing windows. Successful requests embed `cnf.x5t#S256` into the access token so resource servers can enforce the certificate thumbprint.
- Certificate bindings record the certificate thumbprint, optional SANs, subject/issuer metadata, and activation windows. Operators can enforce subject regexes, SAN type allow-lists (`dns`, `uri`, `ip`), trusted certificate authorities, and rotation grace via `security.senderConstraints.mtls.*`.
Both modes persist additional metadata in `authority_tokens`: `senderConstraint` records the enforced policy, while `senderKeyThumbprint` stores the DPoP JWK thumbprint or mTLS certificate hash captured at issuance. Downstream services can rely on these fields (and the corresponding `cnf` claim) when auditing offline copies of the token store.
## 8. Offline & Sovereign Operation
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
## 9. Operational Checklist
- [ ] Protect the bootstrap API key and disable bootstrap endpoints (`bootstrap.enabled: false`) once initial setup is complete.
- [ ] Schedule `stella auth revoke export` (or `/internal/revocations/export`) at the same cadence as Concelier exports so bundles remain in lockstep.
- [ ] Rotate signing keys before expiration; keep at least one retired key until all cached bundles/tokens signed with it have expired.
- [ ] Monitor `/health` and `/ready` plus rate-limiter metrics to detect plugin outages early.
- [ ] Ensure downstream services cache JWKS and revocation bundles within tolerances; stale caches risk accepting revoked tokens.
For plug-in specific requirements, refer to **[Authority Plug-in Developer Guide](dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md)**. For revocation bundle validation workflow, see **[Authority Revocation Bundle](security/revocation-bundle.md)**.

View File

@@ -86,16 +86,152 @@ Only enabled when `MONGO_URI` is supplied (for longterm audit).
Schema detail for **policy_versions**:
```jsonc
{
"_id": "6619e90b8c5e1f76",
"yaml": "version: 1.0\nrules:\n - …",
"rego": null, // filled when Rego uploaded
"authorId": "u_1021",
"created": "2025-07-14T08:15:04Z",
"comment": "Imported via API"
}
```
Samples live under `samples/api/scheduler/` (e.g., `schedule.json`, `run.json`, `impact-set.json`, `audit.json`) and mirror the canonical serializer output shown below.
```jsonc
{
"_id": "6619e90b8c5e1f76",
"yaml": "version: 1.0\nrules:\n - …",
"rego": null, // filled when Rego uploaded
"authorId": "u_1021",
"created": "2025-07-14T08:15:04Z",
"comment": "Imported via API"
}
```
###3.1Scheduler Sprints 16 Artifacts
**Collections.** `schedules`, `runs`, `impact_snapshots`, `audit` (modulelocal). All documents reuse the canonical JSON emitted by `StellaOps.Scheduler.Models` so agents and fixtures remain deterministic.
####3.1.1Schedule (`schedules`)
```jsonc
{
"_id": "sch_20251018a",
"tenantId": "tenant-alpha",
"name": "Nightly Prod",
"enabled": true,
"cronExpression": "0 2 * * *",
"timezone": "UTC",
"mode": "analysis-only",
"selection": {
"scope": "by-namespace",
"namespaces": ["team-a", "team-b"],
"repositories": ["app/service-api"],
"includeTags": ["canary", "prod"],
"labels": [{"key": "env", "values": ["prod", "staging"]}],
"resolvesTags": true
},
"onlyIf": {"lastReportOlderThanDays": 7, "policyRevision": "policy@42"},
"notify": {"onNewFindings": true, "minSeverity": "high", "includeKev": true},
"limits": {"maxJobs": 1000, "ratePerSecond": 25, "parallelism": 4},
"subscribers": ["notify.ops"],
"createdAt": "2025-10-18T22:00:00Z",
"createdBy": "svc_scheduler",
"updatedAt": "2025-10-18T22:00:00Z",
"updatedBy": "svc_scheduler"
}
```
*Constraints*: arrays are alphabetically sorted; `selection.tenantId` is optional but when present must match `tenantId`. Cron expressions are validated for newline/length, timezones are validated via `TimeZoneInfo`.
####3.1.2Run (`runs`)
```jsonc
{
"_id": "run_20251018_0001",
"tenantId": "tenant-alpha",
"scheduleId": "sch_20251018a",
"trigger": "feedser",
"state": "running",
"stats": {
"candidates": 1280,
"deduped": 910,
"queued": 624,
"completed": 310,
"deltas": 42,
"newCriticals": 7,
"newHigh": 11,
"newMedium": 18,
"newLow": 6
},
"reason": {"feedserExportId": "exp-20251018-03"},
"createdAt": "2025-10-18T22:03:14Z",
"startedAt": "2025-10-18T22:03:20Z",
"finishedAt": null,
"error": null,
"deltas": [
{
"imageDigest": "sha256:a1b2c3",
"newFindings": 3,
"newCriticals": 1,
"newHigh": 1,
"newMedium": 1,
"newLow": 0,
"kevHits": ["CVE-2025-0002"],
"topFindings": [
{
"purl": "pkg:rpm/openssl@3.0.12-5.el9",
"vulnerabilityId": "CVE-2025-0002",
"severity": "critical",
"link": "https://ui.internal/scans/sha256:a1b2c3"
}
],
"attestation": {"uuid": "rekor-314", "verified": true},
"detectedAt": "2025-10-18T22:03:21Z"
}
]
}
```
Counters are clamped to ≥0, timestamps are converted to UTC, and delta arrays are sorted (critical → info severity precedence, then vulnerability id). Missing `deltas` implies "no change" snapshots.
####3.1.3Impact Snapshot (`impact_snapshots`)
```jsonc
{
"selector": {
"scope": "all-images",
"tenantId": "tenant-alpha"
},
"images": [
{
"imageDigest": "sha256:f1e2d3",
"registry": "registry.internal",
"repository": "app/api",
"namespaces": ["team-a"],
"tags": ["prod"],
"usedByEntrypoint": true,
"labels": {"env": "prod"}
}
],
"usageOnly": true,
"generatedAt": "2025-10-18T22:02:58Z",
"total": 412,
"snapshotId": "impact-20251018-1"
}
```
Images are deduplicated and sorted by digest. Label keys are normalised to lowercase to avoid casesensitive duplicates during reconciliation. `snapshotId` enables run planners to compare subsequent snapshots for drift.
####3.1.4Audit (`audit`)
```jsonc
{
"_id": "audit_169754",
"tenantId": "tenant-alpha",
"category": "scheduler",
"action": "pause",
"occurredAt": "2025-10-18T22:10:00Z",
"actor": {"actorId": "user_admin", "displayName": "Cluster Admin", "kind": "user"},
"scheduleId": "sch_20251018a",
"correlationId": "corr-123",
"metadata": {"details": "schedule paused", "reason": "maintenance"},
"message": "Paused via API"
}
```
Metadata keys are lowercased, firstwriter wins (duplicates with different casing are ignored), and optional IDs (`scheduleId`, `runId`) are trimmed when empty. Use the canonical serializer when emitting events so audit digests remain reproducible.
---
@@ -120,20 +256,66 @@ rules:
action: escalate
```
Validation is performed by `policy:mapping.yaml` JSONSchema embedded in backend.
Validation is performed by `policy:mapping.yaml` JSONSchema embedded in backend.
Canonical schema source: `src/StellaOps.Policy/Schemas/policy-schema@1.json` (embedded into `StellaOps.Policy`).
`PolicyValidationCli` (see `src/StellaOps.Policy/PolicyValidationCli.cs`) provides the reusable command handler that the main CLI wires up; in the interim it can be invoked from a short host like:
```csharp
await new PolicyValidationCli().RunAsync(new PolicyValidationCliOptions
{
Inputs = new[] { "policies/root.yaml" },
Strict = true,
});
```
###4.1Rego Variant (Advanced  TODO)
*Accepted but stored asis in `rego` field.*
Evaluated via internal **OPA** sidecar once feature graduates from TODO list.
---
##5SLSA Attestation Schema 
Planned for Q12026 (kept here for early plugin authors).
```jsonc
###4.1Rego Variant (Advanced  TODO)
*Accepted but stored asis in `rego` field.*
Evaluated via internal **OPA** sidecar once feature graduates from TODO list.
###4.2Policy Scoring Config (JSON)
*Schema id.* `https://schemas.stella-ops.org/policy/policy-scoring-schema@1.json`
*Source.* `src/StellaOps.Policy/Schemas/policy-scoring-schema@1.json` (embedded in `StellaOps.Policy`), default fixture at `src/StellaOps.Policy/Schemas/policy-scoring-default.json`.
```jsonc
{
"version": "1.0",
"severityWeights": {"Critical": 90, "High": 75, "Unknown": 60, "...": 0},
"quietPenalty": 45,
"warnPenalty": 15,
"ignorePenalty": 35,
"trustOverrides": {"vendor": 1.0, "distro": 0.85},
"reachabilityBuckets": {"entrypoint": 1.0, "direct": 0.85, "runtime": 0.45, "unknown": 0.5},
"unknownConfidence": {
"initial": 0.8,
"decayPerDay": 0.05,
"floor": 0.2,
"bands": [
{"name": "high", "min": 0.65},
{"name": "medium", "min": 0.35},
{"name": "low", "min": 0.0}
]
}
}
```
Validation occurs alongside policy binding (`PolicyScoringConfigBinder`), producing deterministic digests via `PolicyScoringConfigDigest`. Bands are ordered descending by `min` so consumers can resolve confidence tiers deterministically. Reachability buckets are case-insensitive keys (`entrypoint`, `direct`, `indirect`, `runtime`, `unreachable`, `unknown`) with numeric multipliers (default ≤1.0).
**Runtime usage**
- `trustOverrides` are matched against `finding.tags` (`trust:<key>`) first, then `finding.source`/`finding.vendor`; missing keys default to `1.0`.
- `reachabilityBuckets` consume `finding.tags` with prefix `reachability:` (fallback `usage:` or `unknown`). Missing buckets fall back to `unknown` weight when present, otherwise `1.0`.
- Policy verdicts expose scoring inputs (`severityWeight`, `trustWeight`, `reachabilityWeight`, `baseScore`, penalties) plus unknown-state metadata (`unknownConfidence`, `unknownAgeDays`, `confidenceBand`) for auditability. See `samples/policy/policy-preview-unknown.json` for an end-to-end preview payload.
- Unknown confidence derives from `unknown-age-days:` (preferred) or `unknown-since:` + `observed-at:` tags; with no hints the engine keeps `initial` confidence. Values decay by `decayPerDay` down to `floor`, then resolve to the first matching `bands[].name`.
---
##5SLSA Attestation Schema 
Planned for Q12026 (kept here for early plugin authors).
```jsonc
{
"id": "prov_0291",
"imageDigest": "sha256:e2b9…",
@@ -153,8 +335,70 @@ Planned for Q12026 (kept here for early plugin authors).
{"uri": "git+https://git…", "digest": {"sha1": "f6a1…"}}
],
"rekorLogIndex": 99817 // entry in local Rekor mirror
}
```
}
```
---
##6NotifyFoundations (Rule·Channel·Event)
*Sprint 15 target* canonically describe the Notify data shapes that UI, workers, and storage consume. JSON Schemas live under `docs/notify/schemas/` and deterministic fixtures under `docs/notify/samples/`.
| Artifact | Schema | Sample |
|----------|--------|--------|
| **Rule** (catalogued routing logic) | `docs/notify/schemas/notify-rule@1.json` | `docs/notify/samples/notify-rule@1.sample.json` |
| **Channel** (delivery endpoint definition) | `docs/notify/schemas/notify-channel@1.json` | `docs/notify/samples/notify-channel@1.sample.json` |
| **Template** (rendering payload) | `docs/notify/schemas/notify-template@1.json` | `docs/notify/samples/notify-template@1.sample.json` |
| **Event envelope** (Notify ingest surface) | `docs/notify/schemas/notify-event@1.json` | `docs/notify/samples/notify-event@1.sample.json` |
###6.1Rule highlights (`notify-rule@1`)
* Keys are lowercased camelCase. `schemaVersion` (`notify.rule@1`), `ruleId`, `tenantId`, `name`, `match`, `actions`, `createdAt`, and `updatedAt` are mandatory.
* `match.eventKinds`, `match.verdicts`, and other array selectors are presorted and casenormalized (e.g. `scanner.report.ready`).
* `actions[].throttle` serialises as ISO8601 duration (`PT5M`), mirroring worker backoff guardrails.
* `vex` gates let operators exclude accepted/notaffected justifications; omit the block to inherit default behaviour.
* Use `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeRule(JsonNode)` when deserialising legacy payloads that might lack `schemaVersion` or retain older revisions.
* Soft deletions persist `deletedAt` in Mongo (and disable the rule); repository queries automatically filter them.
###6.2Channel highlights (`notify-channel@1`)
* `schemaVersion` is pinned to `notify.channel@1` and must accompany persisted documents.
* `type` matches plugin identifiers (`slack`, `teams`, `email`, `webhook`, `custom`).
* `config.secretRef` stores an external secret handle (Authority, Vault, K8s). Notify never persists raw credentials.
* Optional `config.limits.timeout` uses ISO8601 durations identical to rule throttles; concurrency/RPM defaults apply when absent.
* `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeChannel(JsonNode)` backfills the schema version when older documents omit it.
* Channels share the same soft-delete marker (`deletedAt`) so operators can restore prior configuration without purging history.
###6.3Event envelope (`notify-event@1`)
* Aligns with the platform event contract—`eventId` UUID, RFC3339 `ts`, tenant isolation enforced.
* Enumerated `kind` covers the initial Notify surface (`scanner.report.ready`, `scheduler.rescan.delta`, `zastava.admission`, etc.).
* `scope.labels`/`scope.attributes` and top-level `attributes` mirror the metadata dictionaries workers surface for templating and audits.
* Notify workers use the same migration helper to wrap event payloads before template rendering, so schema additions remain additive.
###6.4Template highlights (`notify-template@1`)
* Carries the presentation key (`channelType`, `key`, `locale`) and the raw template body; `schemaVersion` is fixed to `notify.template@1`.
* `renderMode` enumerates supported engines (`markdown`, `html`, `adaptiveCard`, `plainText`, `json`) aligning with `NotifyTemplateRenderMode`.
* `format` signals downstream connector expectations (`slack`, `teams`, `email`, `webhook`, `json`).
* Upgrade legacy definitions with `NotifySchemaMigration.UpgradeTemplate(JsonNode)` to auto-apply the new schema version and ordering.
* Templates also record soft deletes via `deletedAt`; UI/API skip them by default while retaining revision history.
**Validation loop:**
```bash
# Validate Notify schemas and samples (matches Docs CI)
for schema in docs/notify/schemas/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
for sample in docs/notify/samples/*.sample.json; do
schema="docs/notify/schemas/$(basename "${sample%.sample.json}").json"
npx ajv validate -c ajv-formats -s "$schema" -d "$sample"
done
```
Integration tests can embed the sample fixtures to guarantee deterministic serialisation from the `StellaOps.Notify.Models` DTOs introduced in Sprint15.
---

View File

@@ -81,7 +81,7 @@ cosign verify \
## 5·Privatefeed mirrors 🌐
The **Feedser (vulnerability ingest/merge/export service)** provides signed JSON and Trivy DB snapshots that merge:
The **Concelier (vulnerability ingest/merge/export service)** provides signed JSON and Trivy DB snapshots that merge:
* OSV + GHSA
* (optional) NVD 2.0, CNNVD, CNVD, ENISA, JVN and BDU regionals

View File

@@ -20,7 +20,7 @@ open a PR and append it alphabetically.*
| **ADR** | *Architecture Decision Record* lightweight Markdown file that captures one irreversible design decision. | ADR template lives at `/docs/adr/` |
| **AIRE** | *AI Risk Evaluator* optional Plus/Pro plugin that suggests mute rules using an ONNX model. | Commercial feature |
| **AzurePipelines** | CI/CD service in Microsoft Azure DevOps. | Recipe in Pipeline Library |
| **BDU** | Russian (FSTEC) national vulnerability database: *База данных уязвимостей*. | Merged with NVD by Feedser (vulnerability ingest/merge/export service) |
| **BDU** | Russian (FSTEC) national vulnerability database: *База данных уязвимостей*. | Merged with NVD by Concelier (vulnerability ingest/merge/export service) |
| **BuildKit** | Modern Docker build engine with caching and concurrency. | Needed for layer cache patterns |
| **CI** | *Continuous Integration* automated build/test pipeline. | Stella integrates via CLI |
| **Cosign** | Opensource Sigstore tool that signs & verifies container images **and files**. | Images & OUK tarballs |
@@ -36,7 +36,7 @@ open a PR and append it alphabetically.*
| **Digest (image)** | SHA256 hash uniquely identifying a container image or layer. | Pin digests for reproducible builds |
| **DockerinDocker (DinD)** | Running Docker daemon inside a CI container. | Used in GitHub / GitLab recipes |
| **DTO** | *Data Transfer Object* C# record serialised to JSON. | Schemas in doc 11 |
| **Feedser** | Vulnerability ingest/merge/export service consolidating OVN, GHSA, NVD 2.0, CNNVD, CNVD, ENISA, JVN and BDU feeds into the canonical MongoDB store and export artifacts. | Cron default `01* * *` |
| **Concelier** | Vulnerability ingest/merge/export service consolidating OVN, GHSA, NVD 2.0, CNNVD, CNVD, ENISA, JVN and BDU feeds into the canonical MongoDB store and export artifacts. | Cron default `01* * *` |
| **FSTEC** | Russian regulator issuing SOBIT certificates. | Pro GA target |
| **Gitea** | Selfhosted Git service mirrors GitHub repo. | OSS hosting |
| **GOST TLS** | TLS ciphersuites defined by Russian GOST R 34.102012 / 34.112012. | Provided by `OpenSslGost` or CryptoPro |

View File

@@ -145,28 +145,28 @@ cosign verify ghcr.io/stellaops/backend@sha256:<DIGEST> \
| Audit events | Redis stream audit; export daily to SIEM |
| Alert rules | Feed age 48h, P95 walltime>5s, Redis used memory>75% |
### 7.1Feedser authorization audits
### 7.1Concelier authorization audits
- Enable the Authority integration for Feedser (`authority.enabled=true`). Keep
- Enable the Authority integration for Concelier (`authority.enabled=true`). Keep
`authority.allowAnonymousFallback` set to `true` only during migration and plan
to disable it before **2025-12-31 UTC** so the `/jobs*` surface always demands
a bearer token.
- Store the Authority client secret using Docker/Kubernetes secrets and point
`authority.clientSecretFile` at the mounted path; the value is read at startup
and never logged.
- Watch the `Feedser.Authorization.Audit` logger. Each entry contains the HTTP
- Watch the `Concelier.Authorization.Audit` logger. Each entry contains the HTTP
status, subject, client ID, scopes, remote IP, and a boolean `bypass` flag
showing whether a network bypass CIDR allowed the request. Configure your SIEM
to alert when unauthenticated requests (`status=401`) appear with
`bypass=true`, or when unexpected scopes invoke job triggers.
Detailed monitoring and response guidance lives in `docs/ops/feedser-authority-audit-runbook.md`.
Detailed monitoring and response guidance lives in `docs/ops/concelier-authority-audit-runbook.md`.
## 8Update & patch strategy
| Layer | Cadence | Method |
| -------------------- | -------------------------------------------------------- | ------------------------------ |
| Backend & CLI images | Monthly or CVEdriven docker pull + docker compose up -d |
| Trivy DB | 24h scheduler via Feedser (vulnerability ingest/merge/export service) | configurable via Feedser scheduler options |
| Trivy DB | 24h scheduler via Concelier (vulnerability ingest/merge/export service) | configurable via Concelier scheduler options |
| Docker Engine | vendor LTS | distro package manager |
| Host OS | security repos enabled | unattendedupgrades |

View File

@@ -16,7 +16,7 @@ contributors who need to extend coverage or diagnose failures.
| **1. Unit** | `xUnit` (<code>dotnet test</code>) | `*.Tests.csproj` | per PR / push |
| **2. Propertybased** | `FsCheck` | `SbomPropertyTests` | per PR |
| **3. Integration (API)** | `Testcontainers` suite | `test/Api.Integration` | per PR + nightly |
| **4. Integration (DB-merge)** | in-memory Mongo + Redis | `Feedser.Integration` (vulnerability ingest/merge/export service) | per PR |
| **4. Integration (DB-merge)** | in-memory Mongo + Redis | `Concelier.Integration` (vulnerability ingest/merge/export service) | per PR |
| **5. Contract (gRPC)** | `Buf breaking` | `buf.yaml` files | per PR |
| **6. Frontend unit** | `Jest` | `ui/src/**/*.spec.ts` | per PR |
| **7. Frontend E2E** | `Playwright` | `ui/e2e/**` | nightly |
@@ -59,17 +59,17 @@ The script spins up MongoDB/Redis via Testcontainers and requires:
---
### Feedser OSV↔GHSA parity fixtures
### Concelier OSV↔GHSA parity fixtures
The Feedser connector suite includes a regression test (`OsvGhsaParityRegressionTests`)
The Concelier connector suite includes a regression test (`OsvGhsaParityRegressionTests`)
that checks a curated set of GHSA identifiers against OSV responses. The fixture
snapshots live in `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/` and are kept
snapshots live in `src/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/` and are kept
deterministic so the parity report remains reproducible.
To refresh the fixtures when GHSA/OSV payloads change:
1. Ensure outbound HTTPS access to `https://api.osv.dev` and `https://api.github.com`.
2. Run `UPDATE_PARITY_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj`.
2. Run `UPDATE_PARITY_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
3. Commit the regenerated `osv-ghsa.*.json` files that the test emits (raw snapshots and canonical advisories).
The regen flow logs `[Parity]` messages and normalises `recordedAt` timestamps so the
@@ -88,7 +88,7 @@ flowchart LR
I1 --> FE[Jest]
FE --> E2E[Playwright]
E2E --> Lighthouse
Lighthouse --> INTEG2[Feedser]
Lighthouse --> INTEG2[Concelier]
INTEG2 --> LOAD[k6]
LOAD --> CHAOS[pumba]
CHAOS --> RELEASE[Attestation diff]

View File

@@ -76,51 +76,57 @@ UI: [https://\&lt;host\&gt;:8443](https://&lt;host&gt;:8443) (selfsigned cert
> `stella-ops:latest` with the immutable digest printed by
> `docker images --digests`.
### 1.1·Feedser authority configuration
> **Repo bundles** Development, staging, and airgapped Compose profiles live
> under `deploy/compose/`, already tied to the release manifests in
> `deploy/releases/`. Helm users can pull the same channel overlays from
> `deploy/helm/stellaops/values-*.yaml` and validate everything with
> `deploy/tools/validate-profiles.sh`.
The Feedser container reads configuration from `etc/feedser.yaml` plus
`FEEDSER_` environment variables. To enable the new Authority integration:
### 1.1·Concelier authority configuration
The Concelier container reads configuration from `etc/concelier.yaml` plus
`CONCELIER_` environment variables. To enable the new Authority integration:
1. Add the following keys to `.env` (replace values for your environment):
```bash
FEEDSER_AUTHORITY__ENABLED=true
FEEDSER_AUTHORITY__ALLOWANONYMOUSFALLBACK=true # temporary rollout only
FEEDSER_AUTHORITY__ISSUER="https://authority.internal"
FEEDSER_AUTHORITY__AUDIENCES__0="api://feedser"
FEEDSER_AUTHORITY__REQUIREDSCOPES__0="feedser.jobs.trigger"
FEEDSER_AUTHORITY__CLIENTID="feedser-jobs"
FEEDSER_AUTHORITY__CLIENTSECRETFILE="/run/secrets/feedser_authority_client"
FEEDSER_AUTHORITY__BYPASSNETWORKS__0="127.0.0.1/32"
FEEDSER_AUTHORITY__BYPASSNETWORKS__1="::1/128"
FEEDSER_AUTHORITY__RESILIENCE__ENABLERETRIES=true
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__0="00:00:01"
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__1="00:00:02"
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__2="00:00:05"
FEEDSER_AUTHORITY__RESILIENCE__ALLOWOFFLINECACHEFALLBACK=true
FEEDSER_AUTHORITY__RESILIENCE__OFFLINECACHETOLERANCE="00:10:00"
CONCELIER_AUTHORITY__ENABLED=true
CONCELIER_AUTHORITY__ALLOWANONYMOUSFALLBACK=true # temporary rollout only
CONCELIER_AUTHORITY__ISSUER="https://authority.internal"
CONCELIER_AUTHORITY__AUDIENCES__0="api://concelier"
CONCELIER_AUTHORITY__REQUIREDSCOPES__0="concelier.jobs.trigger"
CONCELIER_AUTHORITY__CLIENTID="concelier-jobs"
CONCELIER_AUTHORITY__CLIENTSECRETFILE="/run/secrets/concelier_authority_client"
CONCELIER_AUTHORITY__BYPASSNETWORKS__0="127.0.0.1/32"
CONCELIER_AUTHORITY__BYPASSNETWORKS__1="::1/128"
CONCELIER_AUTHORITY__RESILIENCE__ENABLERETRIES=true
CONCELIER_AUTHORITY__RESILIENCE__RETRYDELAYS__0="00:00:01"
CONCELIER_AUTHORITY__RESILIENCE__RETRYDELAYS__1="00:00:02"
CONCELIER_AUTHORITY__RESILIENCE__RETRYDELAYS__2="00:00:05"
CONCELIER_AUTHORITY__RESILIENCE__ALLOWOFFLINECACHEFALLBACK=true
CONCELIER_AUTHORITY__RESILIENCE__OFFLINECACHETOLERANCE="00:10:00"
```
Store the client secret outside source control (Docker secrets, mounted file,
or Kubernetes Secret). Feedser loads the secret during post-configuration, so
or Kubernetes Secret). Concelier loads the secret during post-configuration, so
the value never needs to appear in the YAML template.
Connected sites can keep the retry ladder short (1s,2s,5s) so job triggers fail fast when Authority is down. For airgapped or intermittently connected deployments, extend `RESILIENCE__OFFLINECACHETOLERANCE` (e.g. `00:30:00`) so cached discovery/JWKS data remains valid while the Offline Kit synchronises upstream changes.
2. Redeploy Feedser:
2. Redeploy Concelier:
```bash
docker compose --env-file .env -f docker-compose.stella-ops.yml up -d feedser
docker compose --env-file .env -f docker-compose.stella-ops.yml up -d concelier
```
3. Tail the logs: `docker compose logs -f feedser`. Successful `/jobs*` calls now
emit `Feedser.Authorization.Audit` entries with `route`, `status`, `subject`,
3. Tail the logs: `docker compose logs -f concelier`. Successful `/jobs*` calls now
emit `Concelier.Authorization.Audit` entries with `route`, `status`, `subject`,
`clientId`, `scopes`, `bypass`, and `remote` fields. 401 denials keep the same
shape—watch for `bypass=True`, which indicates a bypass CIDR accepted an anonymous
call. See `docs/ops/feedser-authority-audit-runbook.md` for a full audit/alerting checklist.
call. See `docs/ops/concelier-authority-audit-runbook.md` for a full audit/alerting checklist.
> **Enforcement deadline** keep `FEEDSER_AUTHORITY__ALLOWANONYMOUSFALLBACK=true`
> only while validating the rollout. Set it to `false` (and restart Feedser)
> **Enforcement deadline** keep `CONCELIER_AUTHORITY__ALLOWANONYMOUSFALLBACK=true`
> only while validating the rollout. Set it to `false` (and restart Concelier)
> before **2025-12-31 UTC** to require tokens in production.
---

View File

@@ -18,7 +18,7 @@ completely isolated network:
| **Attested manifest** | `offline-manifest.json` + detached JWS covering bundle metadata, signed during export. |
| **Delta patches** | Daily diff bundles keep size \<350MB |
**RU BDU note:** ship the official Russian Trusted Root/Sub CA bundle (`certificates/russian_trusted_bundle.pem`) inside the kit so `feedser:httpClients:source.bdu:trustedRootPaths` can resolve it when the service runs in an airgapped network. Drop the most recent `vulxml.zip` alongside the kit if operators need a cold-start cache.
**RU BDU note:** ship the official Russian Trusted Root/Sub CA bundle (`certificates/russian_trusted_bundle.pem`) inside the kit so `concelier:httpClients:source.bdu:trustedRootPaths` can resolve it when the service runs in an airgapped network. Drop the most recent `vulxml.zip` alongside the kit if operators need a cold-start cache.
*Scanner core:* C# 12 on **.NET{{ dotnet }}**.
*Imports are idempotent and atomic — no service downtime.*
@@ -110,4 +110,4 @@ See the detailed rules in
* **Install guide:** `/install/#air-gapped`
* **Sovereign mode rationale:** `/sovereign/`
* **Security policy:** `/security/#reporting-a-vulnerability`
* **CERT-Bund snapshots:** `python tools/certbund_offline_snapshot.py --help` (see `docs/ops/feedser-certbund-operations.md`)
* **CERT-Bund snapshots:** `python tools/certbund_offline_snapshot.py --help` (see `docs/ops/concelier-certbund-operations.md`)

View File

@@ -32,7 +32,7 @@ why the system leans *monolithplusplugins*, and where extension points
graph TD
A(API Gateway)
B1(Scanner Core<br/>.NET latest LTS)
B2(Feedser service\n(vuln ingest/merge/export))
B2(Concelier service\n(vuln ingest/merge/export))
B3(Policy Engine OPA)
C1(Redis 7)
C2(MongoDB 7)
@@ -53,7 +53,7 @@ graph TD
| ---------------------------- | --------------------- | ---------------------------------------------------- |
| **API Gateway** | ASP.NET Minimal API | Auth (JWT), quotas, request routing |
| **Scanner Core** | C# 12, Polly | Layer diffing, SBOM generation, vuln correlation |
| **Feedser (vulnerability ingest/merge/export service)** | C# source-gen workers | Consolidate NVD + regional CVE feeds into the canonical MongoDB store and drive JSON / Trivy DB exports |
| **Concelier (vulnerability ingest/merge/export service)** | C# source-gen workers | Consolidate NVD + regional CVE feeds into the canonical MongoDB store and drive JSON / Trivy DB exports |
| **Policy Engine** | OPA (Rego) | admission decisions, custom org rules |
| **Redis 7** | KeyDB compatible | LRU cache, quota counters |
| **MongoDB 7** | WiredTiger | SBOM & findings storage |
@@ -121,7 +121,7 @@ Hotplugging is deferred until after v1.0 for security review.
Although the default deployment is a single container, each subservice can be
extracted:
* Feedser → standalone cron pod.
* Concelier → standalone cron pod.
* Policy Engine → sidecar (OPA) with gRPC contract.
* ResultSink → queue worker (RabbitMQ or Azure Service Bus).

View File

@@ -1,20 +1,20 @@
# Docs & Enablement Guild
## Mission
Produce and maintain offline-friendly documentation for StellaOps modules, covering architecture, configuration, operator workflows, and developer onboarding.
## Scope Highlights
- Authority docs (`docs/dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md`, upcoming `docs/11_AUTHORITY.md`).
- Feedser quickstarts, CLI guides, Offline Kit manuals.
- Release notes and migration playbooks.
## Operating Principles
- Keep guides deterministic and in sync with shipped configuration samples.
- Prefer tables/checklists for operator steps; flag security-sensitive actions.
- When work involves a specific `StellaOps.<Component>` project, consult both `docs/07_HIGH_LEVEL_ARCHITECTURE.md` and the matching dossier `docs/ARCHITECTURE_<COMPONENT>.md` before drafting or editing content.
- Update `docs/TASKS.md` whenever work items change status (TODO/DOING/REVIEW/DONE/BLOCKED).
## Coordination
- Authority Core & Plugin teams for auth-related changes.
- Security Guild for threat-model outputs and mitigations.
- DevEx for tooling diagrams and documentation pipeline.
# Docs & Enablement Guild
## Mission
Produce and maintain offline-friendly documentation for StellaOps modules, covering architecture, configuration, operator workflows, and developer onboarding.
## Scope Highlights
- Authority docs (`docs/dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md`, upcoming `docs/11_AUTHORITY.md`).
- Concelier quickstarts, CLI guides, Offline Kit manuals.
- Release notes and migration playbooks.
## Operating Principles
- Keep guides deterministic and in sync with shipped configuration samples.
- Prefer tables/checklists for operator steps; flag security-sensitive actions.
- When work involves a specific `StellaOps.<Component>` project, consult both `docs/07_HIGH_LEVEL_ARCHITECTURE.md` and the matching dossier `docs/ARCHITECTURE_<COMPONENT>.md` before drafting or editing content.
- Update `docs/TASKS.md` whenever work items change status (TODO/DOING/REVIEW/DONE/BLOCKED).
## Coordination
- Authority Core & Plugin teams for auth-related changes.
- Security Guild for threat-model outputs and mitigations.
- DevEx for tooling diagrams and documentation pipeline.

View File

@@ -1,6 +1,6 @@
# component_architecture_attestor.md — **StellaOps Attestor** (2025Q4)
> **Scope.** Implementationready architecture for the **Attestor**: the service that **submits** DSSE envelopes to **Rekor v2**, retrieves/validates inclusion proofs, caches results, and exposes verification APIs. It accepts DSSE **only** from the **Signer** over mTLS, enforces chainoftrust to StellaOps roots, and returns `{uuid, index, proof, logURL}` to calling services (Scanner.WebService for SBOMs; backend for final reports; Vexer exports when configured).
> **Scope.** Implementationready architecture for the **Attestor**: the service that **submits** DSSE envelopes to **Rekor v2**, retrieves/validates inclusion proofs, caches results, and exposes verification APIs. It accepts DSSE **only** from the **Signer** over mTLS, enforces chainoftrust to StellaOps roots, and returns `{uuid, index, proof, logURL}` to calling services (Scanner.WebService for SBOMs; backend for final reports; Excititor exports when configured).
---
@@ -200,7 +200,8 @@ Indexes:
* Predicate `predicateType` must be on allowlist (sbom/report/vex-export).
* `subject.digest.sha256` values must be present and wellformed (hex).
* **No public submission** path. **Never** accept bundles from untrusted clients.
* **Rate limits**: per mTLS thumbprint/license (from Signerforwarded claims) to avoid flooding the log.
* **Client certificate allowlists**: optional `security.mtls.allowedSubjects` / `allowedThumbprints` tighten peer identity checks beyond CA pinning.
* **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded.
* **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor.
---
@@ -233,6 +234,10 @@ Indexes:
* `attestor.dedupe_hits_total`
* `attestor.errors_total{type}`
**Correlation**:
* HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing.
**Tracing**:
* Spans: `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `verify`.

View File

@@ -6,7 +6,7 @@
## 0) Mission & boundaries
**Mission.** Provide **fast, local, verifiable** authentication for StellaOps microservices and tools by minting **very shortlived** OAuth2/OIDC tokens that are **senderconstrained** (DPoP or mTLSbound). Support RBAC scopes, multitenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Vexer, Feedser, UI, CLI, Zastava).
**Mission.** Provide **fast, local, verifiable** authentication for StellaOps microservices and tools by minting **very shortlived** OAuth2/OIDC tokens that are **senderconstrained** (DPoP or mTLSbound). Support RBAC scopes, multitenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).
**Boundaries.**
@@ -43,7 +43,7 @@
```
iss = https://authority.<domain>
sub = <client_id or user_id>
aud = <service audience: signer|scanner|attestor|feedser|vexer|ui|zastava>
aud = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava>
exp = <unix ts> (<= 300 s from iat)
iat = <unix ts>
nbf = iat - 30
@@ -140,7 +140,7 @@ plan? = <plan name> // optional hint for UIs; not used for e
### 4.1 Audiences
* `signer` — only the **Signer** service should accept tokens with `aud=signer`.
* `attestor`, `scanner`, `feedser`, `vexer`, `ui`, `zastava` similarly.
* `attestor`, `scanner`, `concelier`, `excititor`, `ui`, `zastava` similarly.
Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their policy.
@@ -153,8 +153,8 @@ Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their p
| `scanner.scan` | Scanner.WebService | Submit scan jobs |
| `scanner.export` | Scanner.WebService | Export SBOMs |
| `scanner.read` | Scanner.WebService | Read catalog/SBOMs |
| `vex.read` / `vex.admin` | Vexer | Query/operate |
| `feedser.read` / `feedser.export` | Feedser | Query/exports |
| `vex.read` / `vex.admin` | Excititor | Query/operate |
| `concelier.read` / `concelier.export` | Concelier | Query/exports |
| `ui.read` / `ui.admin` | UI | View/admin |
| `zastava.emit` / `zastava.enforce` | Scanner/Zastava | Runtime events / admission |
@@ -229,6 +229,8 @@ GET /admin/metrics # Prometheus exposition (token issue rates,
GET /admin/healthz|readyz # health/readiness
```
Declared client `audiences` flow through to the issued JWT `aud` claim and the token request's `resource` indicators. Authority relies on this metadata to enforce DPoP nonce challenges for `signer`, `attestor`, and other high-value services without requiring clients to repeat the audience parameter on every request.
---
## 11) Integration hard lines (what resource servers must enforce)
@@ -286,6 +288,8 @@ authority:
nonce:
enable: true
ttlSeconds: 600
store: redis
redisConnectionString: "redis://authority-redis:6379?ssl=false"
mtls:
enable: true
caBundleFile: /etc/ssl/mtls/clients-ca.pem
@@ -302,6 +306,18 @@ authority:
auth: { type: "mtls" }
senderConstraint: "mtls"
scopes: [ "signer.sign" ]
- clientId: notify-web-dev
grantTypes: [ "client_credentials" ]
audiences: [ "notify.dev" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
- clientId: notify-web
grantTypes: [ "client_credentials" ]
audiences: [ "notify" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
```
---

View File

@@ -1,6 +1,6 @@
# component_architecture_cli.md — **StellaOps CLI** (2025Q4)
> **Scope.** Implementationready architecture for **StellaOps CLI**: command surface, process model, auth (Authority/DPoP), integration with Scanner/Vexer/Feedser/Signer/Attestor, Buildx plugin management, offline kit behavior, packaging, observability, security posture, and CI ergonomics.
> **Scope.** Implementationready architecture for **StellaOps CLI**: command surface, process model, auth (Authority/DPoP), integration with Scanner/Excititor/Concelier/Signer/Attestor, Buildx plugin management, offline kit behavior, packaging, observability, security posture, and CI ergonomics.
---
@@ -18,7 +18,7 @@
* CLI **never** signs; it only calls **Signer**/**Attestor** via backend APIs when needed (e.g., `report --attest`).
* CLI **does not** store longlived credentials beyond OS keychain; tokens are **short** (Authority OpToks).
* Heavy work (scanning, merging, policy) is executed **serverside** (Scanner/Vexer/Feedser).
* Heavy work (scanning, merging, policy) is executed **serverside** (Scanner/Excititor/Concelier).
---
@@ -37,6 +37,8 @@ src/
**Language/runtime**: .NET 10 **Native AOT** for speed/startup; Linux builds use **musl** static when possible.
**Plug-in verbs.** Non-core verbs (Excititor, runtime helpers, future integrations) ship as restart-time plug-ins under `plugins/cli/**` with manifest descriptors. The launcher loads plug-ins on startup; hot reloading is intentionally unsupported.
**OS targets**: linuxx64/arm64, windowsx64/arm64, macOSx64/arm64.
---
@@ -76,8 +78,8 @@ src/
### 2.4 Policy & data
* `policy get/set/apply` — fetch active policy, apply staged policy, compute digest.
* `feedser export` — trigger/export canonical JSON or Trivy DB (admin).
* `vexer export` — trigger/export consensus/raw claims (admin).
* `concelier export` — trigger/export canonical JSON or Trivy DB (admin).
* `excititor export` — trigger/export consensus/raw claims (admin).
### 2.5 Verification
@@ -87,12 +89,12 @@ src/
### 2.6 Runtime (Zastava helper)
* `runtime policy test --images <digest,...> [--ns <name> --labels k=v,...]` — ask backend `/policy/runtime` like the webhook would.
* `runtime policy test --image/-i <digest> [--file <path> --ns <name> --label key=value --json]` — ask backend `/policy/runtime` like the webhook would (accepts multiple `--image`, comma/space lists, or stdin pipelines).
### 2.7 Offline kit
* `offline kit pull` — fetch latest **Feedser JSON + Trivy DB + Vexer exports** as a tarball from a mirror.
* `offline kit import <tar>` — upload the kit to onprem services (Feedser/Vexer).
* `offline kit pull` — fetch latest **Concelier JSON + Trivy DB + Excititor exports** as a tarball from a mirror.
* `offline kit import <tar>` — upload the kit to onprem services (Concelier/Excititor).
* `offline kit status` — list current seed versions.
### 2.8 Utilities
@@ -122,7 +124,7 @@ src/
* `scanner` for scan/export/report/diff
* `signer` (indirect; usually backend calls Signer)
* `attestor` for verify
* `feedser`/`vexer` for admin verbs
* `concelier`/`excititor` for admin verbs
CLI rejects verbs if required scopes are missing.
@@ -167,8 +169,8 @@ cli:
backend:
scanner: "https://scanner-web.internal"
attestor: "https://attestor.internal"
feedser: "https://feedser-web.internal"
vexer: "https://vexer-web.internal"
concelier: "https://concelier-web.internal"
excititor: "https://excititor-web.internal"
auth:
audienceDefault: "scanner"
deviceCode: true
@@ -263,7 +265,7 @@ Exit code: 2
## 13) Admin & advanced flags
* `--authority`, `--scanner`, `--attestor`, `--feedser`, `--vexer` override config URLs.
* `--authority`, `--scanner`, `--attestor`, `--concelier`, `--excititor` override config URLs.
* `--no-color`, `--quiet`, `--json`.
* `--timeout`, `--retries`, `--retry-backoff-ms`.
* `--ca-bundle`, `--insecure` (dev only; prints warning).
@@ -386,4 +388,3 @@ script:
* macOS: 1315 (x64, arm64).
* Windows: 10/11, Server 2019/2022 (x64, arm64).
* Docker engines: Docker Desktop, containerdbased runners.

View File

@@ -1,433 +1,466 @@
# component_architecture_feedser.md — **StellaOps Feedser** (2025Q4)
> **Scope.** Implementationready architecture for **Feedser**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Vexer pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Boundaries.**
* Feedser **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Feedser **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Feedser.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `feedser`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 5) Merge engine
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Feedsers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Feedser **posts** the artifact metadata to **Signer**/**Attestor**; Feedser itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) REST APIs
All under `/api/v1/feedser`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `feedser.read`, `feedser.admin`, `feedser.export`.
---
## 9) Configuration (YAML)
```yaml
feedser:
mongo: { uri: "mongodb://mongo/feedser" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-feedser"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-feedser/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-feedser/trivy/
oras:
enabled: false
repo: ghcr.io/org/feedser
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Feedser replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `feedser.fetch.docs_total{source}`
* `feedser.fetch.bytes_total{source}`
* `feedser.parse.failures_total{source}`
* `feedser.map.affected_total{source}`
* `feedser.merge.changed_total`
* `feedser.export.bytes{kind}`
* `feedser.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/feedser/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/feedser/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/feedser/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/feedser/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/feedser/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.
# component_architecture_concelier.md — **StellaOps Concelier** (2025Q4)
> **Scope.** Implementationready architecture for **Concelier**: the vulnerability ingest/normalize/merge/export subsystem that produces deterministic advisory data for the Scanner + Policy + Excititor pipeline. Covers domain model, connectors, merge rules, storage schema, exports, APIs, performance, security, and test matrices.
---
## 0) Mission & boundaries
**Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), normalize them into a **canonical model**, reconcile aliases and version ranges, and export **deterministic artifacts** (JSON, Trivy DB) for fast backend joins.
**Boundaries.**
* Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (outofprocess).
* Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine.
* Online operation is **allowlistonly**; airgapped deployments use the **Offline Kit**.
---
## 1) Topology & processes
**Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting:
* **Scheduler** with distributed locks (Mongo backed).
* **Connectors** (fetch/parse/map).
* **Merger** (canonical record assembly + precedence).
* **Exporters** (JSON, Trivy DB).
* **Minimal REST** for health/status/trigger/export.
**Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter.
---
## 2) Canonical domain model
> Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps).
### 2.1 Core entities
**Advisory**
```
advisoryId // internal GUID
advisoryKey // stable string key (e.g., CVE-2025-12345 or vendor ID)
title // short title (best-of from sources)
summary // normalized summary (English; i18n optional)
published // earliest source timestamp
modified // latest source timestamp
severity // normalized {none, low, medium, high, critical}
cvss // {v2?, v3?, v4?} objects (vector, baseScore, severity, source)
exploitKnown // bool (e.g., KEV/active exploitation flags)
references[] // typed links (advisory, kb, patch, vendor, exploit, blog)
sources[] // provenance for traceability (doc digests, URIs)
```
**Alias**
```
advisoryId
scheme // CVE, GHSA, RHSA, DSA, USN, MSRC, etc.
value // e.g., "CVE-2025-12345"
```
**Affected**
```
advisoryId
productKey // canonical product identity (see 2.2)
rangeKind // semver | evr | nvra | apk | rpm | deb | generic | exact
introduced? // string (format depends on rangeKind)
fixed? // string (format depends on rangeKind)
lastKnownSafe? // optional explicit safe floor
arch? // arch or platform qualifier if source declares (x86_64, aarch64)
distro? // distro qualifier when applicable (rhel:9, debian:12, alpine:3.19)
ecosystem? // npm|pypi|maven|nuget|golang|…
notes? // normalized notes per source
```
**Reference**
```
advisoryId
url
kind // advisory | patch | kb | exploit | mitigation | blog | cvrf | csaf
sourceTag // e.g., vendor/redhat, distro/debian, oss/ghsa
```
**MergeEvent**
```
advisoryKey
beforeHash // canonical JSON hash before merge
afterHash // canonical JSON hash after merge
mergedAt
inputs[] // source doc digests that contributed
```
**AdvisoryStatement (event log)**
```
statementId // GUID (immutable)
vulnerabilityKey // canonical advisory key (e.g., CVE-2025-12345)
advisoryKey // merge snapshot advisory key (may reference variant)
statementHash // canonical hash of advisory payload
asOf // timestamp of snapshot (UTC)
recordedAt // persistence timestamp (UTC)
inputDocuments[] // document IDs contributing to the snapshot
payload // canonical advisory document (BSON / canonical JSON)
```
**AdvisoryConflict**
```
conflictId // GUID
vulnerabilityKey // canonical advisory key
conflictHash // deterministic hash of conflict payload
asOf // timestamp aligned with originating statement set
recordedAt // persistence timestamp
statementIds[] // related advisoryStatement identifiers
details // structured conflict explanation / merge reasoning
```
- `AdvisoryEventLog` (Concelier.Core) provides the public API for appending immutable statements/conflicts and querying replay history. Inputs are normalized by trimming and lower-casing `vulnerabilityKey`, serializing advisories with `CanonicalJsonSerializer`, and computing SHA-256 hashes (`statementHash`, `conflictHash`) over the canonical JSON payloads. Consumers can replay by key with an optional `asOf` filter to obtain deterministic snapshots ordered by `asOf` then `recordedAt`.
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
**ExportState**
```
exportKind // json | trivydb
baseExportId? // last full baseline
baseDigest? // digest of last full baseline
lastFullDigest? // digest of last full export
lastDeltaDigest? // digest of last delta export
cursor // per-kind incremental cursor
files[] // last manifest snapshot (path → sha256)
```
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved.
* **Secondary:** `cpe` retained for compatibility; advisory records may carry both.
* **Image/platform:** `oci:<registry>/<repo>@<digest>` for imagelevel advisories (rare).
* **Unmappable:** if a source is nondeterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **nonjoinable**.
---
## 3) Source families & precedence
### 3.1 Families
* **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium…
* **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine…
* **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go.
* **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERTFR/BUND, etc.
### 3.2 Precedence (when claims conflict)
1. **Vendor PSIRT** (authoritative for their product).
2. **Distro** (authoritative for packages they ship, including backports).
3. **Ecosystem** (OSV/GHSA) for library semantics.
4. **CERTs/aggregators** for enrichment (KEV/known exploited).
> Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**.
---
## 4) Connectors & normalization
### 4.1 Connector contract
```csharp
public interface IFeedConnector {
string SourceName { get; }
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // -> document collection
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // -> dto collection (validated)
Task MapAsync(IServiceProvider sp, CancellationToken ct); // -> advisory/alias/affected/reference
}
```
* **Fetch**: windowed (cursor), conditional GET (ETag/LastModified), retry/backoff, rate limiting.
* **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing.
* **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors).
### 4.2 Version range normalization
* **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals).
* **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query.
* **DEB**: dpkg version comparison semantics mirrored; store computed keys.
* **APK**: Alpine version semantics; compute order keys.
* **Generic**: if provider uses text, retain raw; do **not** invent ranges.
### 4.3 Severity & CVSS
* Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity).
* If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable).
* **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date).
---
## 5) Merge engine
### 5.1 Keying & identity
* Identity graph: **CVE** is primary node; vendor/distro IDs resolved via **Alias** edges (from connectors and Conceliers alias tables).
* `advisoryKey` is the canonical primary key (CVE if present, else vendor/distro key).
### 5.2 Merge algorithm (deterministic)
1. **Gather** all rows for `advisoryKey` (across sources).
2. **Select title/summary** by precedence source (vendor>distro>ecosystem>cert).
3. **Union aliases** (dedupe by scheme+value).
4. **Merge `Affected`** with rules:
* Prefer **vendor** ranges for vendor products; prefer **distro** for **distroshipped** packages.
* If both exist for same `productKey`, keep **both**; mark `sourceTag` and `precedence` so **Policy** can decide.
* Never collapse range semantics across different families (e.g., rpm EVR vs semver).
5. **CVSS/severity**: record all CVSS sets; compute **effectiveSeverity** = max (unless policy override).
6. **References**: union with type precedence (advisory > patch > kb > exploit > blog); dedupe by URL; preserve `sourceTag`.
7. Produce **canonical JSON**; compute **afterHash**; store **MergeEvent** with inputs and hashes.
> The merge is **pure** given inputs. Any change in inputs or precedence matrices changes the **hash** predictably.
---
## 6) Storage schema (MongoDB)
**Collections & indexes**
* `source` `{_id, type, baseUrl, enabled, notes}`
* `source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}`
* `document` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}`
* Index: `{sourceName:1, uri:1}` unique, `{fetchedAt:-1}`
* `dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}`
* Index: `{sourceName:1, documentId:1}`
* `advisory` `{_id, advisoryKey, title, summary, published, modified, severity, cvss, exploitKnown, sources[]}`
* Index: `{advisoryKey:1}` unique, `{modified:-1}`, `{severity:1}`, text index (title, summary)
* `alias` `{advisoryId, scheme, value}`
* Index: `{scheme:1,value:1}`, `{advisoryId:1}`
* `affected` `{advisoryId, productKey, rangeKind, introduced?, fixed?, arch?, distro?, ecosystem?}`
* Index: `{productKey:1}`, `{advisoryId:1}`, `{productKey:1, rangeKind:1}`
* `reference` `{advisoryId, url, kind, sourceTag}`
* Index: `{advisoryId:1}`, `{kind:1}`
* `merge_event` `{advisoryKey, beforeHash, afterHash, mergedAt, inputs[]}`
* Index: `{advisoryKey:1, mergedAt:-1}`
* `export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}`
* `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks)
* `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}`
**GridFS buckets**: `fs.documents` for raw payloads.
---
## 7) Exporters
### 7.1 Deterministic JSON (vulnlist style)
* Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace.
* `manifest.json` lists all files with SHA256 and a toplevel **export digest**.
### 7.2 Trivy DB exporter
* Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes.
* In delta, unchanged blobs are reused from the base; metadata captures:
```
{
"mode": "delta|full",
"baseExportId": "...",
"baseManifestDigest": "sha256:...",
"changed": ["path1", "path2"],
"removed": ["path3"]
}
```
* Optional ORAS push (OCI layout) for registries.
* Offline kit bundles include Trivy DB + JSON tree + export manifest.
* Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints.
### 7.3 Handoff to Signer/Attestor (optional)
* On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys.
* Export record stores returned `{ uuid, index, url }` from **Rekor v2**.
---
## 8) REST APIs
All under `/api/v1/concelier`.
**Health & status**
```
GET /healthz | /readyz
GET /status → sources, last runs, export cursors
```
**Sources & jobs**
```
GET /sources → list of configured sources
POST /sources/{name}/trigger → { jobId }
POST /sources/{name}/pause | /resume → toggle
GET /jobs/{id} → job status
```
**Exports**
```
POST /exports/json { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? }
POST /exports/trivy { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? }
GET /exports/{id} → export metadata (kind, digest, createdAt, rekor?)
GET /concelier/exports/index.json → mirror index describing available domains/bundles
GET /concelier/exports/mirror/{domain}/manifest.json
GET /concelier/exports/mirror/{domain}/bundle.json
GET /concelier/exports/mirror/{domain}/bundle.json.jws
```
**Search (operator debugging)**
```
GET /advisories/{key}
GET /advisories?scheme=CVE&value=CVE-2025-12345
GET /affected?productKey=pkg:rpm/openssl&limit=100
```
**AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`.
---
## 9) Configuration (YAML)
```yaml
concelier:
mongo: { uri: "mongodb://mongo/concelier" }
s3:
endpoint: "http://minio:9000"
bucket: "stellaops-concelier"
scheduler:
windowSeconds: 30
maxParallelSources: 4
sources:
- name: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signature: { type: pgp, keys: [ "…redhat PGP…" ] }
enabled: true
windowDays: 7
- name: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signature: { type: pgp, keys: [ "…suse PGP…" ] }
- name: ubuntu
kind: usn-json
baseUrl: https://ubuntu.com/security/notices.json
signature: { type: none }
- name: osv
kind: osv
baseUrl: https://api.osv.dev/v1/
signature: { type: none }
- name: ghsa
kind: ghsa
baseUrl: https://api.github.com/graphql
auth: { tokenRef: "env:GITHUB_TOKEN" }
exporters:
json:
enabled: true
output: s3://stellaops-concelier/json/
trivy:
enabled: true
mode: full
output: s3://stellaops-concelier/trivy/
oras:
enabled: false
repo: ghcr.io/org/concelier
precedence:
vendorWinsOverDistro: true
distroWinsOverOsv: true
severity:
policy: max # or 'vendorPreferred' / 'distroPreferred'
```
---
## 10) Security & compliance
* **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible.
* **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; **merge** can downweight or ignore them by config.
* **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers.
* **Multitenant**: pertenant DBs or prefixes; pertenant S3 prefixes; tenantscoped API tokens.
* **Determinism**: canonical JSON writer; export digests stable across runs given same inputs.
---
## 11) Performance targets & scale
* **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON).
* **Normalize/map**: ≥ 50k `Affected` rows/min on 4 cores.
* **Merge**: ≤ 10ms P95 per advisory at steadystate updates.
* **Export**: 1M advisories JSON in ≤ 90s (streamed, zstd), Trivy DB in ≤ 60s on 8 cores.
* **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes.
**Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs.
---
## 12) Observability
* **Metrics**
* `concelier.fetch.docs_total{source}`
* `concelier.fetch.bytes_total{source}`
* `concelier.parse.failures_total{source}`
* `concelier.map.affected_total{source}`
* `concelier.merge.changed_total`
* `concelier.export.bytes{kind}`
* `concelier.export.duration_seconds{kind}`
* **Tracing** around fetch/parse/map/merge/export.
* **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`.
---
## 13) Testing matrix
* **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail).
* **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, prereleases).
* **Merge:** conflicting sources (vendor vs distro vs OSV); verify precedence & dual retention.
* **Export determinism:** byteforbyte stable outputs across runs; digest equality.
* **Performance:** soak tests with 1M advisories; cap memory; verify backpressure.
* **API:** pagination, filters, RBAC, error envelopes (RFC 7807).
* **Offline kit:** bundle build & import correctness.
---
## 14) Failure modes & recovery
* **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness.
* **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges.
* **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`.
* **Resume:** all stages idempotent; `source_state.cursor` supports window resume.
---
## 15) Operator runbook (quick)
* **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger`
* **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }`
* **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }`
* **Inspect advisory:** `GET /api/v1/concelier/advisories?scheme=CVE&value=CVE-2025-12345`
* **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause`
---
## 16) Rollout plan
1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export.
2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export.
3. **Attestation handoff**: integrate with **Signer/Attestor** (optional).
4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse.
5. **Offline kit**: endtoend verified bundles for airgap.

View File

@@ -42,7 +42,7 @@ Semantic core + calendar tag:
A release is a **bundle** of image digests + charts + manifests. All services in a bundle are **wirecompatible**. Mixed minor versions are allowed within a bounded skew:
* **Web UI ↔ backend**: `±1 minor`.
* **Scanner ↔ Policy/Vexer/Feedser**: `±1 minor`.
* **Scanner ↔ Policy/Excititor/Concelier**: `±1 minor`.
* **Authority/Signer/Attestor triangle**: **must** be same minor (crypto and DPoP/mTLS binding rules).
At startup, services **selfadvertise** their semver & channel; the UI surfaces **mismatch warnings**.
@@ -75,7 +75,7 @@ At startup, services **selfadvertise** their semver & channel; the UI surface
* **Static**: linters, codegen checks, protobuf API freeze (backwardcompat tests).
* **Unit/integration**: percomponent, plus **endtoend** flows (scan→vex→policy→sign→attest).
* **Perf SLOs**: hot paths (SBOM compose, diff, export) measured against budgets.
* **Security**: dependency audit vs Feedser export; container hardening tests; minimal caps.
* **Security**: dependency audit vs Concelier export; container hardening tests; minimal caps.
* **Canary cohort**: internal staging + selected customers; one week on **edge** before **stable** tag.
---
@@ -86,11 +86,12 @@ At startup, services **selfadvertise** their semver & channel; the UI surface
* **Primary**: `registry.stella-ops.org` (OCI v2, supports Referrers API).
* **Mirrors**: GHCR (readonly), regional mirrors for latency.
* Operational runbook: see `docs/ops/concelier-mirror-operations.md` for deployment profiles, CDN guidance, and sync automation.
* **Pull by digest only** in Kubernetes/Compose manifests.
**Gating policy**:
* **Core images** (Authority, Scanner, Feedser, Vexer, Attestor, UI): public **read**.
* **Core images** (Authority, Scanner, Concelier, Excititor, Attestor, UI): public **read**.
* **Enterprise addons** (if any) and **prerelease**: private repos via OAuth2 token service.
> Monetization lever is **signing** (PoE gate), not image pulls, so the core remains simple to consume.
@@ -115,7 +116,7 @@ At startup, services **selfadvertise** their semver & channel; the UI surface
/attest/ DSSE bundles + Rekor proofs
/charts/ Helm charts + values templates
/compose/ docker-compose.yml + .env template
/plugins/ Feedser/Vexer connectors (restart-time)
/plugins/ Concelier/Excititor connectors (restart-time)
/policy/ example policies
/manifest/ release.yaml (see §6.1)
```
@@ -169,8 +170,8 @@ helm install stella stellaops/platform \
--set authority.issuer=https://authority.stella.local \
--set scanner.minio.endpoint=http://minio.stella.local:9000 \
--set scanner.mongo.uri=mongodb://mongo/scanner \
--set feedser.mongo.uri=mongodb://mongo/feedser \
--set vexer.mongo.uri=mongodb://mongo/vexer
--set concelier.mongo.uri=mongodb://mongo/concelier \
--set excititor.mongo.uri=mongodb://mongo/excititor
```
* Postinstall job registers **Authority clients** (Scanner, Signer, Attestor, UI) and prints **bootstrap** URLs and client credentials (sealed secrets).
@@ -185,7 +186,7 @@ helm install stella stellaops/platform \
1. Authority (stateless, dualkey rotation ready)
2. Signer/Attestor (same minor)
3. Scanner WebService & Workers
4. Feedser, then Vexer (schema migrations are expand/contract)
4. Concelier, then Excititor (schema migrations are expand/contract)
5. UI last
* **DB migrations** are **expand/contract**:
@@ -234,6 +235,11 @@ release:
The manifest is **cosignsigned**; UI/CLI can verify a bundle without talking to registries.
> Deployment guardrails The repository keeps channel-aligned Compose bundles
> in `deploy/compose/` and Helm overlays in `deploy/helm/stellaops/`. Both sets
> pull their digests from `deploy/releases/` and are validated by
> `deploy/tools/validate-profiles.sh` to guarantee lint/dry-run cleanliness.
### 6.2 Image labels (release metadata)
Each image sets OCI labels:
@@ -263,10 +269,10 @@ s3://stellaops/
images/<imgDigest>/usage.cdx.pb
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json
feedser/
concelier/
json/<exportId>/...
trivy/<exportId>/...
vexer/
excititor/
exports/<exportId>/...
attestor/
dsse/<bundleSha256>.json
@@ -289,14 +295,20 @@ s3://stellaops/
### 7.4 Mongo retention
* **Scanner**: `runtime.events` use TTL (e.g., 3090 days); **catalog** permanent.
* **Feedser/Vexer**: raw docs keep **last N windows**; canonical stores permanent.
* **Concelier/Excititor**: raw docs keep **last N windows**; canonical stores permanent.
* **Attestor**: `entries` permanent; `dedupe` TTL 2448h.
### 7.5 Mongo server baseline
* **Minimum supported server:** MongoDB **4.2+**. Driver 3.5.0 removes compatibility shims for 4.0; upstream has already announced 4.0 support will be dropped in upcoming C# driver releases. citeturn1open1
* **Deploy images:** Compose/Helm defaults stay on `mongo:7.x`. For air-gapped installs, refresh Offline Kit bundles so the packaged `mongod` matches ≥4.2.
* **Upgrade guard:** During rollout, verify replica sets reach FCV `4.2` or above before swapping binaries; automation should hard-stop if FCV is <4.2.
---
## 8) Observability & SLOs (operations)
* **Uptime SLO**: 99.9% for Signer/Authority/Attestor; 99.5% for Scanner WebService; Vexer/Feedser 99.0%.
* **Uptime SLO**: 99.9% for Signer/Authority/Attestor; 99.5% for Scanner WebService; Excititor/Concelier 99.0%.
* **Error budgets**: tracked per month; dashboards show burn rates.
* **Golden signals**:
@@ -324,7 +336,8 @@ Prometheus + OTLP; Grafana dashboards ship in the charts.
* **Vulnerability response**:
* Feedser redflag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice.
* Concelier red-flag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice.
* 2025-10: Pinned `MongoDB.Driver` **3.5.0** and `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1902/NU1903 warnings surfaced during scanner cache/worker test runs; future dependency bumps follow the same central override pattern.
* **Backups/DR**:
@@ -408,10 +421,10 @@ services:
scanner-worker:
image: registry.stella-ops.org/stellaops/scanner-worker@sha256:...
deploy: { replicas: 4 }
feedser:
image: registry.stella-ops.org/stellaops/feedser@sha256:...
vexer:
image: registry.stella-ops.org/stellaops/vexer@sha256:...
concelier:
image: registry.stella-ops.org/stellaops/concelier@sha256:...
excititor:
image: registry.stella-ops.org/stellaops/excititor@sha256:...
web-ui:
image: registry.stella-ops.org/stellaops/web-ui@sha256:...
mongo:
@@ -446,7 +459,7 @@ services:
* `signer.requests_total{result="success"}/minute` > 0 (when scans occur).
* `attestor.submit_latency_seconds{quantile=0.95}` < 0.3.
* `scanner.scan_latency_seconds{quantile=0.95}` < target per image size.
* `feedser.export.duration_seconds` stable; `vexer.consensus.conflicts_total` not exploding after policy changes.
* `concelier.export.duration_seconds` stable; `excititor.consensus.conflicts_total` not exploding after policy changes.
* MinIO `s3_requests_errors_total` near zero; Mongo `opcounters` hit expected baseline.
### Appendix B — Upgrade safety checklist

View File

@@ -0,0 +1,500 @@
# component_architecture_excititor.md — **StellaOps Excititor** (2025Q4)
> **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, APIs, plugin contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Concelier, and the attestation chain. It is implementationready.
---
## 0) Mission & role in the platform
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress nonexploitable findings, prioritize remaining risk, and explain decisions.
**Boundaries.**
* Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
* Excititor preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
* VEX consumption is **backendonly**: Scanner never applies VEX. The backends **Policy Engine** asks Excititor for status evidence and then decides what to show.
---
## 1) Inputs, outputs & canonical domain
### 1.1 Accepted input formats (ingest)
* **OpenVEX** JSON documents (attested or raw).
* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
* **OCIattached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
### 1.2 Canonical model (normalized)
Every incoming statement becomes a set of **VexClaim** records:
```
VexClaim
- providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
- vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
- productKey // canonical product identity (see §2.2)
- status // affected | not_affected | fixed | under_investigation
- justification? // for 'not_affected'/'affected' where provided
- introducedVersion? // semantics per provider (range or exact)
- fixedVersion? // where provided (range or exact)
- lastObserved // timestamp from source or fetch time
- provenance // doc digest, signature status, fetch URI, line/offset anchors
- evidence[] // raw source snippets for explainability
- supersedes? // optional cross-doc chain (docDigest → docDigest)
```
### 1.3 Exports (consumption)
* **VexConsensus** per `(vulnId, productKey)` with:
* `rollupStatus` (after policy weights/justification gates),
* `sources[]` (winning + losing claims with weights & reasons),
* `policyRevisionId` (identifier of the Excititor policy used),
* `consensusDigest` (stable SHA256 over canonical JSON).
* **Raw claims** export for auditing (unchanged, with provenance).
* **Provider snapshots** (per source, last N days) for operator debugging.
* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
---
## 2) Identity model — products & joins
### 2.1 Vuln identity
* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
* **Alias graph** maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
* **Fallback:** `oci:<registry>/<repo>@<digest>` for imagelevel VEX.
* **Special cases:** kernel modules, firmware, platforms → providerspecific mapping helpers (connector captures providers product taxonomy → canonical `productKey`).
> Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **nonjoinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
---
## 3) Storage schema (MongoDB)
Database: `excititor`
### 3.1 Collections
**`vex.providers`**
```
_id: providerId
name, homepage, contact
trustTier: enum {vendor, distro, platform, hub, attestation}
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
enabled: bool
createdAt, modifiedAt
```
**`vex.raw`** (immutable raw documents)
```
_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }
```
**`vex.statements`** (immutable normalized rows; append-only event log)
```
_id: ObjectId
providerId
vulnId
productKey
status
justification?
introducedVersion?
fixedVersion?
lastObserved
docDigest
provenance { uri, line?, pointer?, signatureState }
evidence[] { key, value, locator }
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
insertedAt
indices:
- {vulnId:1, productKey:1}
- {providerId:1, insertedAt:-1}
- {docDigest:1}
- {status:1}
- text index (optional) on evidence.value for debugging
```
**`vex.consensus`** (rollups)
```
_id: sha256(canonical(vulnId, productKey, policyRevision))
vulnId
productKey
rollupStatus
sources[]: [
{ providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
]
policyRevisionId
evaluatedAt
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
consensusDigest // same as _id
indices:
- {vulnId:1, productKey:1}
- {policyRevisionId:1, evaluatedAt:-1}
```
**`vex.exports`** (manifest of emitted artifacts)
```
_id
querySignature
format: raw|consensus|index
artifactSha256
rekor { uuid, index, url }?
createdAt
policyRevisionId
cacheable: bool
```
**`vex.cache`**
```
querySignature -> exportId (for fast reuse)
ttl, hits
```
**`vex.migrations`**
* ordered migrations applied at bootstrap to ensure indexes.
* `20251019-consensus-signals-statements` introduces the statements log indexes and the `policyRevisionId + evaluatedAt` lookup for consensus — rerun consensus writers once to hydrate newly persisted signals.
### 3.2 Indexing strategy
* Hot path queries use exact `(vulnId, productKey)` and timebounded windows; compound indexes cover both.
* Providers list view by `lastObserved` for monitoring staleness.
* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
---
## 4) Ingestion pipeline
### 4.1 Connector contract
```csharp
public interface IVexConnector
{
string ProviderId { get; }
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
}
```
* **Fetch** must implement: window scheduling, conditional GET (ETag/IfModifiedSince), rate limiting, retry/backoff.
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
### 4.2 Signature verification (per provider)
* **cosign (keyless or keyful)** for OCI referrers or HTTPserved JSON with Sigstore bundles.
* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
* **x509** (mutual TLS / providerpinned certs) where applicable.
* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can downweight or ignore them.
### 4.3 Time discipline
* For each doc, prefer **providers document timestamp**; if absent, use fetch time.
* Claims carry `lastObserved` which drives **tiebreaking** within equal weight tiers.
---
## 5) Normalization: product & status semantics
### 5.1 Product mapping
* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
* Where a provider publishes **platformlevel** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
* If expansion would be speculative, the claim remains **platformscoped** with `productKey="platform:redhat:rhel:9"` and is flagged **nonjoinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
### 5.2 Status + justification mapping
* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
* **Justifications** normalized to a controlled vocabulary (CISAaligned), e.g.:
* `component_not_present`
* `vulnerable_code_not_in_execute_path`
* `vulnerable_configuration_unused`
* `inline_mitigation_applied`
* `fix_available` (with `fixedVersion`)
* `under_investigation`
* Providers with freetext justifications are mapped by deterministic tables; raw text preserved as `evidence`.
---
## 6) Consensus algorithm
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
### 6.1 Inputs
* Set **S** of `VexClaim` for the key.
* **Excititor policy snapshot**:
* **weights** per provider tier and per provider overrides.
* **justification gates** (e.g., require justification for `not_affected` to be acceptable).
* **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
* **signature requirements** (e.g., require verified signature for fixed to be considered).
### 6.2 Steps
1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
2. **Score** each claim:
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
4. **Pick** `rollupStatus = argmax_status W(status)`.
5. **Tiebreakers** (in order):
* Higher **max single** provider score wins (vendor > distro > platform > hub).
* More **recent** lastObserved wins.
* Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
---
## 7) Query & export APIs
All endpoints are versioned under `/api/v1/vex`.
### 7.1 Query (online)
```
POST /claims/search
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
→ { claims[], nextPageToken? }
POST /consensus/search
body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
→ { entries[], nextPageToken? }
POST /resolve
body: { purls: string[], vulnIds: string[], policyRevisionId?: string }
→ { results: [ { vulnId, productKey, rollupStatus, sources[] } ] }
```
### 7.2 Exports (cacheable snapshots)
```
POST /exports
body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
→ { exportId, artifactSha256, rekor? }
GET /exports/{exportId} → bytes (application/json or binary index)
GET /exports/{exportId}/meta → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
```
### 7.3 Provider operations
```
GET /providers → provider list & signature policy
POST /providers/{id}/refresh → trigger fetch/normalize window
GET /providers/{id}/status → last fetch, doc counts, signature stats
```
**Auth:** servicetoservice via Authority tokens; operator operations via UI/CLI with RBAC.
---
## 8) Attestation integration
* Exports can be **DSSEsigned** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
* `vex.exports.rekor` stores `{uuid, index, url}` when present.
* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
* `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
---
## 9) Configuration (YAML)
```yaml
excititor:
mongo: { uri: "mongodb://mongo/excititor" }
s3:
endpoint: http://minio:9000
bucket: stellaops
policy:
weights:
vendor: 1.0
distro: 0.9
platform: 0.7
hub: 0.5
attestation: 0.6
ceiling: 1.25
scoring:
alpha: 0.25
beta: 0.5
providerOverrides:
redhat: 1.0
suse: 0.95
requireJustificationForNotAffected: true
signatureRequiredForFixed: true
minEvidence:
not_affected:
vendorOrTwoDistros: true
connectors:
- providerId: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
windowDays: 7
- providerId: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
- providerId: ubuntu
kind: openvex
baseUrl: https://…/vex/
signaturePolicy: { type: none }
- providerId: vendorX
kind: cyclonedx-vex
ociRef: ghcr.io/vendorx/vex@sha256:…
signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
```
### 9.1 WebService endpoints
With storage configured, the WebService exposes the following ingress and diagnostic APIs:
* `GET /excititor/status` returns the active storage configuration and registered artifact stores.
* `GET /excititor/health` simple liveness probe.
* `POST /excititor/statements` accepts normalized VEX statements and persists them via `IVexClaimStore`; use this for migrations/backfills.
* `GET /excititor/statements/{vulnId}/{productKey}?since=` returns the immutable statement log for a vulnerability/product pair.
Run the ingestion endpoint once after applying migration `20251019-consensus-signals-statements` to repopulate historical statements with the new severity/KEV/EPSS signal fields.
* `weights.ceiling` raises the deterministic clamp applied to provider tiers/overrides (range 1.05.0). Values outside the range are clamped with warnings so operators can spot typos.
* `scoring.alpha` / `scoring.beta` configure KEV/EPSS boosts for the Phase1 → Phase2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics.
---
## 10) Security model
* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
* **Connector allowlists**: outbound fetch constrained to configured domains.
* **Tenant isolation**: pertenant DB prefixes or separate DBs; pertenant S3 prefixes; pertenant policies.
* **AuthN/Z**: Authorityissued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
---
## 11) Performance & scale
* **Targets:**
* Normalize 10k VEX claims/minute/core.
* Consensus compute ≤50ms for 1k unique `(vuln, product)` pairs in hot cache.
* Export (consensus) 1M rows in ≤60s on 8 cores with streaming writer.
* **Scaling:**
* WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with ratelimits; Mongo writes batched; upserts by natural keys.
* Exports stream straight to S3 (MinIO) with rolling buffers.
* **Caching:**
* `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
---
## 12) Observability
* **Metrics:**
* `vex.ingest.docs_total{provider}`
* `vex.normalize.claims_total{provider}`
* `vex.signature.failures_total{provider,method}`
* `vex.consensus.conflicts_total{vulnId}`
* `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hitrate.
---
## 13) Testing matrix
* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
* **Normalization edge cases:** platformonly claims, freetext justifications, nonpurl products.
* **Consensus:** conflict scenarios across tiers; check tiebreakers; justification gates.
* **Performance:** 1Mrow export timing; memory ceilings; stream correctness.
* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
* **API contract tests:** pagination, filters, RBAC, rate limits.
---
## 14) Integration points
* **Backend Policy Engine** (in Scanner.WebService): calls `POST /resolve` with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
* **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEXadjacent metadata (e.g., KEV flag) for policy escalation.
* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
---
## 15) Failure modes & fallback
* **Provider unreachable:** stale thresholds trigger warnings; policy can downweight stale providers automatically (freshness factor).
* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or downweight per policy.
* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
---
## 16) Rollout plan (incremental)
1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/resolve`.
2. **Signature policies**: PGP for distros; cosign for OCI.
3. **Exports + optional attestation**.
4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
5. **Scale hardening**: export indexes; conflict analytics.
---
## 17) Operational runbooks
* **Statement backfill** — see `docs/dev/EXCITITOR_STATEMENT_BACKFILL.md` for the CLI workflow, required permissions, observability guidance, and rollback steps.
---
## 18) Appendix — canonical JSON (stable ordering)
All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
* UTF8 without BOM;
* keys sorted (ASCII);
* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
* timestamps in `YYYYMMDDThh:mm:ssZ`;
* no insignificant whitespace.

View File

@@ -0,0 +1,138 @@
# architecture_excititor_mirrors.md — Excititor Mirror Distribution
> **Status:** Draft (Sprint 7). Complements `docs/ARCHITECTURE_EXCITITOR.md` by describing the mirror export surface exposed by `Excititor.WebService` and the configuration hooks used by operators and downstream mirrors.
---
## 0) Purpose
Excititor publishes canonical VEX consensus data. Operators (or StellaOps-managed mirrors) need a deterministic way to sync those exports into downstream environments. Mirror distribution provides:
* A declarative map of export bundles (`json`, `jsonl`, `openvex`, `csaf`) reachable via signed HTTP endpoints under `/excititor/mirror`.
* Thin quota/authentication controls on top of the existing export cache so mirrors cannot starve the web service.
* Stable payload shapes that downstream automation can monitor (index → fetch updates → download artifact → verify signature).
Mirror endpoints are intentionally **read-only**. Write paths (export generation, attestation, cache) remain the responsibility of the export pipeline.
---
## 1) Configuration model
The web service reads mirror configuration from `Excititor:Mirror` (YAML/JSON/appsettings). Each domain groups a set of exports that share rate limits and authentication rules.
```yaml
Excititor:
Mirror:
Domains:
- id: primary
displayName: Primary Mirror
requireAuthentication: false
maxIndexRequestsPerHour: 600
maxDownloadRequestsPerHour: 1200
exports:
- key: consensus
format: json
filters:
vulnId: CVE-2025-0001
productKey: pkg:test/demo
sort:
createdAt: false # descending
limit: 1000
- key: consensus-openvex
format: openvex
filters:
vulnId: CVE-2025-0001
```
### Field reference
| Field | Required | Description |
| --- | --- | --- |
| `id` | ✅ | Stable identifier. Appears in URLs (`/excititor/mirror/domains/{id}`) and download filenames. |
| `displayName` | | Human-friendly label surfaced in the `/domains` listing. Falls back to `id`. |
| `requireAuthentication` | | When `true` the service enforces that the caller is authenticated (Authority token). |
| `maxIndexRequestsPerHour` | | Per-domain quota for index endpoints. `0`/negative disables the guard. |
| `maxDownloadRequestsPerHour` | | Per-domain quota for artifact downloads. |
| `exports` | ✅ | Collection of export projections. |
Export-level fields:
| Field | Required | Description |
| --- | --- | --- |
| `key` | ✅ | Unique key within the domain. Used in URLs (`/exports/{key}`) and filenames. |
| `format` | ✅ | One of `json`, `jsonl`, `openvex`, `csaf`. Maps to `VexExportFormat`. |
| `filters` | | Key/value pairs executed via `VexQueryFilter`. Keys must match export data source columns (e.g., `vulnId`, `productKey`). |
| `sort` | | Key/boolean map (false = descending). |
| `limit`, `offset`, `view` | | Optional query bounds passed through to the export query. |
⚠️ **Misconfiguration:** invalid formats or missing keys cause exports to be flagged with `status` in the index response; they are not exposed downstream.
---
## 2) HTTP surface
Routes are grouped under `/excititor/mirror`.
| Method | Path | Description |
| --- | --- | --- |
| `GET` | `/domains` | Returns configured domains with quota metadata. |
| `GET` | `/domains/{domainId}` | Domain detail (auth/quota + export keys). `404` for unknown domains. |
| `GET` | `/domains/{domainId}/index` | Lists exports with exportId, query signature, format, artifact digest, attestation metadata, and size. Applies index quota. |
| `GET` | `/domains/{domainId}/exports/{exportKey}` | Returns manifest metadata (single export). `404` if unknown/missing. |
| `GET` | `/domains/{domainId}/exports/{exportKey}/download` | Streams export content from the artifact store. Applies download quota. |
Responses are serialized via `VexCanonicalJsonSerializer` ensuring stable ordering. Download responses include a content-disposition header naming the file `<domain>-<export>.<ext>`.
### Error handling
* `401` authentication required (`requireAuthentication=true`).
* `404` domain/export not found or manifest not persisted.
* `429` per-domain quota exceeded (`Retry-After` header set in seconds).
* `503` export misconfiguration (invalid format/query).
---
## 3) Rate limiting
`MirrorRateLimiter` implements a simple rolling 1-hour window using `IMemoryCache`. Each domain has two quotas:
* `index` scope → `maxIndexRequestsPerHour`
* `download` scope → `maxDownloadRequestsPerHour`
`0` or negative limits disable enforcement. Quotas are best-effort (per-instance). For HA deployments, configure sticky routing at the ingress or replace the limiter with a distributed implementation.
---
## 4) Interaction with export pipeline
Mirror endpoints consume manifests produced by the export engine (`MongoVexExportStore`). They do **not** trigger new exports. Operators must configure connectors/exporters to keep targeted exports fresh (see `EXCITITOR-EXPORT-01-005/006/007`).
Recommended workflow:
1. Define export plans at the export layer (JSON/OpenVEX/CSAF).
2. Configure mirror domains mapping to those plans.
3. Downstream mirror automation:
* `GET /domains/{id}/index`
* Compare `exportId` / `consensusRevision`
* `GET /download` when new
* Verify digest + attestation
When the export team lands deterministic mirror bundles (Sprint 7 tasks 01-005/006/007), these configurations can be generated automatically.
---
## 5) Operational guidance
* Track quota utilisation via HTTP 429 metrics (configure structured logging or OTEL counters when rate limiting triggers).
* Mirror domains can be deployed per tenant (e.g., `tenant-a`, `tenant-b`) with different auth requirements.
* Ensure the underlying artifact stores (`FileSystem`, `S3`, offline bundle) retain artefacts long enough for mirrors to sync.
* For air-gapped mirrors, combine mirror endpoints with the Offline Kit (see `docs/24_OFFLINE_KIT.md`).
---
## 6) Future alignment
* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.

515
docs/ARCHITECTURE_NOTIFY.md Normal file
View File

@@ -0,0 +1,515 @@
> **Scope.** Implementationready architecture for **Notify**: a rulesdriven, tenantaware notification service that consumes platform events (scan completed, report ready, rescan deltas, attestation logged, admission decisions, etc.), evaluates operatordefined routing rules, renders **channelspecific messages** (Slack/Teams/Email/Webhook), and delivers them **reliably** with idempotency, throttling, and digests. It is UImanaged, auditable, and safe by default (no secrets leakage, no spam storms).
---
## 0) Mission & boundaries
**Mission.** Convert **facts** from StellaOps into **actionable, noisecontrolled** signals where teams already live (chat/email/webhooks), with **explainable** reasons and deep links to the UI.
**Boundaries.**
* Notify **does not make policy decisions** and **does not rescan**; it **consumes** events from Scanner/Scheduler/Vexer/Feedser/Attestor/Zastava and routes them.
* Attachments are **links** (UI/attestation pages); Notify **does not** attach SBOMs or large blobs to messages.
* Secrets for channels (Slack tokens, SMTP creds) are **referenced**, not stored raw in Mongo.
---
## 1) Runtime shape & projects
```
src/
├─ StellaOps.Notify.WebService/ # REST: rules/channels CRUD, test send, deliveries browse
├─ StellaOps.Notify.Worker/ # consumers + evaluators + renderers + delivery workers
├─ StellaOps.Notify.Connectors.* / # channel plug-ins: Slack, Teams, Email, Webhook (v1)
│ └─ *.Tests/
├─ StellaOps.Notify.Engine/ # rules engine, templates, idempotency, digests, throttles
├─ StellaOps.Notify.Models/ # DTOs (Rule, Channel, Event, Delivery, Template)
├─ StellaOps.Notify.Storage.Mongo/ # rules, channels, deliveries, digests, locks
├─ StellaOps.Notify.Queue/ # bus client (Redis Streams/NATS JetStream)
└─ StellaOps.Notify.Tests.* # unit/integration/e2e
```
**Deployables**:
* **Notify.WebService** (stateless API)
* **Notify.Worker** (horizontal scale)
**Dependencies**: Authority (OpToks; DPoP/mTLS), MongoDB, Redis/NATS (bus), HTTP egress to Slack/Teams/Webhooks, SMTP relay for Email.
> **Configuration.** Notify.WebService bootstraps from `notify.yaml` (see `etc/notify.yaml.sample`). Use `storage.driver: mongo` with a production connection string; the optional `memory` driver exists only for tests. Authority settings follow the platform defaults—when running locally without Authority, set `authority.enabled: false` and supply `developmentSigningKey` so JWTs can be validated offline.
>
> `api.rateLimits` exposes token-bucket controls for delivery history queries and test-send previews (`deliveryHistory`, `testSend`). Default values allow generous browsing while preventing accidental bursts; operators can relax/tighten the buckets per deployment.
> **Plug-ins.** All channel connectors are packaged under `<baseDirectory>/plugins/notify`. The ordered load list must start with Slack/Teams before Email/Webhook so chat-first actions are registered deterministically for Offline Kit bundles:
>
> ```yaml
> plugins:
> baseDirectory: "/var/opt/stellaops"
> directory: "plugins/notify"
> orderedPlugins:
> - StellaOps.Notify.Connectors.Slack
> - StellaOps.Notify.Connectors.Teams
> - StellaOps.Notify.Connectors.Email
> - StellaOps.Notify.Connectors.Webhook
> ```
>
> The Offline Kit job simply copies the `plugins/notify` tree into the air-gapped bundle; the ordered list keeps connector manifests stable across environments.
> **Authority clients.** Register two OAuth clients in StellaOps Authority: `notify-web-dev` (audience `notify.dev`) for development and `notify-web` (audience `notify`) for staging/production. Both require `notify.read` and `notify.admin` scopes and use DPoP-bound client credentials (`client_secret` in the samples). Reference entries live in `etc/authority.yaml.sample`, with placeholder secrets under `etc/secrets/notify-web*.secret.example`.
---
## 2) Responsibilities
1. **Ingest** platform events from internal bus with strong ordering per key (e.g., image digest).
2. **Evaluate rules** (tenantscoped) with matchers: severity changes, namespaces, repos, labels, KEV flags, provider provenance (VEX), component keys, admission decisions, etc.
3. **Control noise**: **throttle**, **coalesce** (digest windows), and **dedupe** via idempotency keys.
4. **Render** channelspecific messages using safe templates; include **evidence** and **links**.
5. **Deliver** with retries/backoff; record outcome; expose delivery history to UI.
6. **Test** paths (send test to channel targets) without touching live rules.
7. **Audit**: log who configured what, when, and why a message was sent.
---
## 3) Event model (inputs)
Notify subscribes to the **internal event bus** (produced by services, escaped JSON; gzip allowed with caps):
* `scanner.scan.completed` — new SBOM(s) composed; artifacts ready
* `scanner.report.ready` — analysis verdict (policy+vex) available; carries deltas summary
* `scheduler.rescan.delta` — new findings after Feedser/Vexer deltas (already summarized)
* `attestor.logged` — Rekor UUID returned (sbom/report/vex export)
* `zastava.admission` — admit/deny with reasons, namespace, image digests
* `feedser.export.completed` — new export ready (rarely notified directly; usually drives Scheduler)
* `vexer.export.completed` — new consensus snapshot (ditto)
**Canonical envelope (bus → Notify.Engine):**
```json
{
"eventId": "uuid",
"kind": "scanner.report.ready",
"tenant": "tenant-01",
"ts": "2025-10-18T05:41:22Z",
"actor": "scanner-webservice",
"scope": { "namespace":"payments", "repo":"ghcr.io/acme/api", "digest":"sha256:..." },
"payload": { /* kind-specific fields, see below */ }
}
```
**Examples (payload cores):**
* `scanner.report.ready`:
```json
{
"reportId": "report-3def...",
"verdict": "fail",
"summary": {"total": 12, "blocked": 2, "warned": 3, "ignored": 5, "quieted": 2},
"delta": {"newCritical": 1, "kev": ["CVE-2025-..."]},
"links": {"ui": "https://ui/.../reports/report-3def...", "rekor": "https://rekor/..."},
"dsse": { "...": "..." },
"report": { "...": "..." }
}
```
Payload embeds both the canonical report document and the DSSE envelope so connectors, Notify, and UI tooling can reuse the signed bytes without re-serialising.
* `scanner.scan.completed`:
```json
{
"reportId": "report-3def...",
"digest": "sha256:...",
"verdict": "fail",
"summary": {"total": 12, "blocked": 2, "warned": 3, "ignored": 5, "quieted": 2},
"delta": {"newCritical": 1, "kev": ["CVE-2025-..."]},
"policy": {"revisionId": "rev-42", "digest": "27d2..."},
"findings": [{"id": "finding-1", "severity": "Critical", "cve": "CVE-2025-...", "reachability": "runtime"}],
"dsse": { "...": "..." }
}
```
* `zastava.admission`:
```json
{ "decision":"deny|allow", "reasons":["unsigned image","missing SBOM"],
"images":[{"digest":"sha256:...","signed":false,"hasSbom":false}] }
```
---
## 4) Rules engine — semantics
**Rule shape (simplified):**
```yaml
name: "high-critical-alerts-prod"
enabled: true
match:
eventKinds: ["scanner.report.ready","scheduler.rescan.delta","zastava.admission"]
namespaces: ["prod-*"]
repos: ["ghcr.io/acme/*"]
minSeverity: "high" # min of new findings (delta context)
kev: true # require KEV-tagged or allow any if false
verdict: ["fail","deny"] # filter for report/admission
vex:
includeRejectedJustifications: false # notify only on accepted 'affected'
actions:
- channel: "slack:sec-alerts" # reference to Channel object
template: "concise"
throttle: "5m"
- channel: "email:soc"
digest: "hourly"
template: "detailed"
```
**Evaluation order**
1. **Tenant check** → discard if rule tenant ≠ event tenant.
2. **Kind filter** → discard early.
3. **Scope match** (namespace/repo/labels).
4. **Delta/severity gates** (if event carries `delta`).
5. **VEX gate** (drop if events finding is not affected under policy consensus unless rule says otherwise).
6. **Throttling/dedup** (idempotency key) — skip if suppressed.
7. **Actions** → enqueue perchannel job(s).
**Idempotency key**: `hash(ruleId | actionId | event.kind | scope.digest | delta.hash | day-bucket)`; ensures “same alert” doesnt fire more than once within throttle window.
**Digest windows**: maintain per action a **coalescer**:
* Window: `5m|15m|1h|1d` (configurable); coalesces events by tenant + namespace/repo or by digest group.
* Digest messages summarize top N items and counts, with safe truncation.
---
## 5) Channels & connectors (plugins)
Channel config is **twopart**: a **Channel** record (name, type, options) and a Secret **reference** (Vault/K8s Secret). Connectors are **restart-time plug-ins** discovered on service start (same manifest convention as Concelier/Excititor) and live under `plugins/notify/<channel>/`.
**Builtin v1:**
* **Slack**: Bot token (xoxb…), `chat.postMessage` + `blocks`; rate limit aware (HTTP 429).
* **Microsoft Teams**: Incoming Webhook (or Graph card later); adaptive card payloads.
* **Email (SMTP)**: TLS (STARTTLS or implicit), From/To/CC/BCC; HTML+text alt; DKIM optional.
* **Generic Webhook**: POST JSON with HMAC signature (Ed25519 or SHA256) in headers.
**Connector contract:** (implemented by plug-in assemblies)
```csharp
public interface INotifyConnector {
string Type { get; } // "slack" | "teams" | "email" | "webhook" | ...
Task<DeliveryResult> SendAsync(DeliveryContext ctx, CancellationToken ct);
Task<HealthResult> HealthAsync(ChannelConfig cfg, CancellationToken ct);
}
```
**DeliveryContext** includes **rendered content** and **raw event** for audit.
**Test-send previews.** Plug-ins can optionally implement `INotifyChannelTestProvider` to shape `/channels/{id}/test` responses. Providers receive a sanitised `ChannelTestPreviewContext` (channel, tenant, target, timestamp, trace) and return a `NotifyDeliveryRendered` preview + metadata. When no provider is present, the host falls back to a generic preview so the endpoint always responds.
**Secrets**: `ChannelConfig.secretRef` points to Authoritymanaged secret handle or K8s Secret path; workers load at send-time; plug-in manifests (`notify-plugin.json`) declare capabilities and version.
---
## 6) Templates & rendering
**Template engine**: strongly typed, safe Handlebarsstyle; no arbitrary code. Partial templates per channel. Deterministic outputs (prop order, no locale drift unless requested).
**Variables** (examples):
* `event.kind`, `event.ts`, `scope.namespace`, `scope.repo`, `scope.digest`
* `payload.verdict`, `payload.delta.newCritical`, `payload.links.ui`, `payload.links.rekor`
* `topFindings[]` with `purl`, `vulnId`, `severity`
* `policy.name`, `policy.revision` (if available)
**Helpers**:
* `severity_icon(sev)`, `link(text,url)`, `pluralize(n, "finding")`, `truncate(text, n)`, `code(text)`.
**Channel mapping**:
* Slack: title + blocks, limited to 50 blocks/3000 chars per section; long lists → link to UI.
* Teams: Adaptive Card schema 1.5; fallback text for older channels.
* Email: HTML + text; inline table of top N findings, rest behind UI link.
* Webhook: JSON with `event`, `ruleId`, `actionId`, `summary`, `links`, and raw `payload` subset.
**i18n**: template set per locale (English default; Bulgarian builtin).
---
## 7) Data model (Mongo)
Canonical JSON Schemas for rules/channels/events live in `docs/notify/schemas/`. Sample payloads intended for tests/UI mock responses are captured in `docs/notify/samples/`.
**Database**: `notify`
* `rules`
```
{ _id, tenantId, name, enabled, match, actions, createdBy, updatedBy, createdAt, updatedAt }
```
* `channels`
```
{ _id, tenantId, name:"slack:sec-alerts", type:"slack",
config:{ webhookUrl?:"", channel:"#sec-alerts", workspace?: "...", secretRef:"ref://..." },
createdAt, updatedAt }
```
* `deliveries`
```
{ _id, tenantId, ruleId, actionId, eventId, kind, scope, status:"sent|failed|throttled|digested|dropped",
attempts:[{ts, status, code, reason}],
rendered:{ title, body, target }, // redacted for PII; body hash stored
sentAt, lastError? }
```
* `digests`
```
{ _id, tenantId, actionKey, window:"hourly", openedAt, items:[{eventId, scope, delta}], status:"open|flushed" }
```
* `throttles`
```
{ key:"idem:<hash>", ttlAt } // short-lived, also cached in Redis
```
**Indexes**: rules by `{tenantId, enabled}`, deliveries by `{tenantId, sentAt desc}`, digests by `{tenantId, actionKey}`.
---
## 8) External APIs (WebService)
Base path: `/api/v1/notify` (Authority OpToks; scopes: `notify.admin` for write, `notify.read` for view).
*All* REST calls require the tenant header `X-StellaOps-Tenant` (matches the canonical `tenantId` stored in Mongo). Payloads are normalised via `NotifySchemaMigration` before persistence to guarantee schema version pinning.
Authentication today is stubbed with Bearer tokens (`Authorization: Bearer <token>`). When Authority wiring lands, this will switch to OpTok validation + scope enforcement, but the header contract will remain the same.
Service configuration exposes `notify:auth:*` keys (issuer, audience, signing key, scope names) so operators can wire the Authority JWKS or (in dev) a symmetric test key. `notify:storage:*` keys cover Mongo URI/database/collection overrides. Both sets are required for the new API surface.
Internal tooling can hit `/internal/notify/<entity>/normalize` to upgrade legacy JSON and return canonical output used in the docs fixtures.
* **Channels**
* `POST /channels` | `GET /channels` | `GET /channels/{id}` | `PATCH /channels/{id}` | `DELETE /channels/{id}`
* `POST /channels/{id}/test` → send sample message (no rule evaluation); returns `202 Accepted` with rendered preview + metadata (base keys: `channelType`, `target`, `previewProvider`, `traceId` + connector-specific entries); governed by `api.rateLimits:testSend`.
* `GET /channels/{id}/health` → connector selfcheck
* **Rules**
* `POST /rules` | `GET /rules` | `GET /rules/{id}` | `PATCH /rules/{id}` | `DELETE /rules/{id}`
* `POST /rules/{id}/test` → dryrun rule against a **sample event** (no delivery unless `--send`)
* **Deliveries**
* `POST /deliveries` → ingest worker delivery state (idempotent via `deliveryId`).
* `GET /deliveries?since=...&status=...&limit=...` → list envelope `{ items, count, continuationToken }` (most recent first); base metadata keys match the test-send response (`channelType`, `target`, `previewProvider`, `traceId`); rate-limited via `api.rateLimits.deliveryHistory`. See `docs/notify/samples/notify-delivery-list-response.sample.json`.
* `GET /deliveries/{id}` → detail (redacted body + metadata)
* `POST /deliveries/{id}/retry` → force retry (admin, future sprint)
* **Admin**
* `GET /stats` (per tenant counts, last hour/day)
* `GET /healthz|readyz` (liveness)
* `POST /locks/acquire` | `POST /locks/release` worker coordination primitives (short TTL).
* `POST /digests` | `GET /digests/{actionKey}` | `DELETE /digests/{actionKey}` manage open digest windows.
* `POST /audit` | `GET /audit?since=&limit=` append/query structured audit trail entries.
**Ingestion**: workers do **not** expose public ingestion; they **subscribe** to the internal bus. (Optional `/events/test` for integration testing, adminonly.)
---
## 9) Delivery pipeline (worker)
```
[Event bus] → [Ingestor] → [RuleMatcher] → [Throttle/Dedupe] → [DigestCoalescer] → [Renderer] → [Connector] → [Result]
└────────→ [DeliveryStore]
```
* **Ingestor**: N consumers with perkey ordering (key = tenant|digest|namespace).
* **RuleMatcher**: loads active rules snapshot for tenant into memory; vectorized predicate check.
* **Throttle/Dedupe**: consult Redis + Mongo `throttles`; if hit → record `status=throttled`.
* **DigestCoalescer**: append to open digest window or flush when timer expires.
* **Renderer**: select template (channel+locale), inject variables, enforce length limits, compute `bodyHash`.
* **Connector**: send; handle providerspecific rate limits and backoffs; `maxAttempts` with exponential jitter; overflow → DLQ (deadletter topic) + UI surfacing.
**Idempotency**: per action **idempotency key** stored in Redis (TTL = `throttle window` or `digest window`). Connectors also respect **provider** idempotency where available (e.g., Slack `client_msg_id`).
---
## 10) Reliability & rate controls
* **Pertenant** RPM caps (default 600/min) + **perchannel** concurrency (Slack 14, Teams 12, Email 832 based on relay).
* **Backoff** map: Slack 429 → respect `RetryAfter`; SMTP 4xx → retry; 5xx → retry with jitter; permanent rejects → drop with status recorded.
* **DLQ**: NATS/Redis stream `notify.dlq` with `{event, rule, action, error}` for operator inspection; UI shows DLQ items.
---
## 11) Security & privacy
* **AuthZ**: all APIs require **Authority** OpToks; actions scoped by tenant.
* **Secrets**: `secretRef` only; Notify fetches justintime from Authority Secret proxy or K8s Secret (mounted). No plaintext secrets in Mongo.
* **Egress TLS**: validate SSL; pin domains per channel config; optional CA bundle override for onprem SMTP.
* **Webhook signing**: HMAC or Ed25519 signatures in `X-StellaOps-Signature` + replaywindow timestamp; include canonical body hash in header.
* **Redaction**: deliveries store **hashes** of bodies, not full payloads for chat/email to minimize PII retention (configurable).
* **Quiet hours**: per tenant (e.g., 22:0006:00) route highsev only; defer others to digests.
* **Loop prevention**: Webhook target allowlist + event origin tags; do not ingest own webhooks.
---
## 12) Observability (Prometheus + OTEL)
* `notify.events_consumed_total{kind}`
* `notify.rules_matched_total{ruleId}`
* `notify.throttled_total{reason}`
* `notify.digest_coalesced_total{window}`
* `notify.sent_total{channel}` / `notify.failed_total{channel,code}`
* `notify.delivery_latency_seconds{channel}` (endtoend)
* **Tracing**: spans `ingest`, `match`, `render`, `send`; correlation id = `eventId`.
**SLO targets**
* Event→delivery p95 **≤ 3060s** under nominal load.
* Failure rate p95 **< 0.5%** per hour (excluding provider outages).
* Duplicate rate **≈ 0** (idempotency working).
---
## 13) Configuration (YAML)
```yaml
notify:
authority:
issuer: "https://authority.internal"
require: "dpop" # or "mtls"
bus:
kind: "redis" # or "nats"
streams:
- "scanner.events"
- "scheduler.events"
- "attestor.events"
- "zastava.events"
mongo:
uri: "mongodb://mongo/notify"
limits:
perTenantRpm: 600
perChannel:
slack: { concurrency: 2 }
teams: { concurrency: 1 }
email: { concurrency: 8 }
webhook: { concurrency: 8 }
digests:
defaultWindow: "1h"
maxItems: 100
quietHours:
enabled: true
window: "22:00-06:00"
minSeverity: "critical"
webhooks:
sign:
method: "ed25519" # or "hmac-sha256"
keyRef: "ref://notify/webhook-sign-key"
```
---
## 14) UI touchpoints
* **Notifications → Channels**: add Slack/Teams/Email/Webhook; run **health**; rotate secrets.
* **Notifications → Rules**: create/edit YAML rules with linting; test with sample events; see match rate.
* **Notifications → Deliveries**: timeline with filters (status, channel, rule); inspect last error; retry.
* **Digest preview**: shows current window contents and when it will flush.
* **Quiet hours**: configure per tenant; show overrides.
* **DLQ**: browse deadletters; requeue after fix.
---
## 15) Failure modes & responses
| Condition | Behavior |
| ----------------------------------- | ------------------------------------------------------------------------------------- |
| Slack 429 / Teams 429 | Respect `RetryAfter`, backoff with jitter, reduce concurrency |
| SMTP transient 4xx | Retry up to `maxAttempts`; escalate to DLQ on exhaust |
| Invalid channel secret | Mark channel unhealthy; suppress sends; surface in UI |
| Rule explosion (matches everything) | Safety valve: pertenant RPM caps; autopause rule after X drops; UI alert |
| Bus outage | Buffer to local queue (bounded); resume consuming when healthy |
| Mongo slowness | Fall back to Redis throttles; batch write deliveries; shed lowpriority notifications |
---
## 16) Testing matrix
* **Unit**: matchers, throttle math, digest coalescing, idempotency keys, template rendering edge cases.
* **Connectors**: providerlevel rate limits, payload size truncation, error mapping.
* **Integration**: synthetic event storm (10k/min), ensure p95 latency & duplicate rate.
* **Security**: DPoP/mTLS on APIs; secretRef resolution; webhook signing & replay windows.
* **i18n**: localized templates render deterministically.
* **Chaos**: Slack/Teams API flaps; SMTP greylisting; Redis hiccups; ensure graceful degradation.
---
## 17) Sequences (representative)
**A) New criticals after Feedser delta (Slack immediate + Email hourly digest)**
```mermaid
sequenceDiagram
autonumber
participant SCH as Scheduler
participant NO as Notify.Worker
participant SL as Slack
participant SMTP as Email
SCH->>NO: bus event scheduler.rescan.delta { newCritical:1, digest:sha256:... }
NO->>NO: match rules (Slack immediate; Email hourly digest)
NO->>SL: chat.postMessage (concise)
SL-->>NO: 200 OK
NO->>NO: append to digest window (email:soc)
Note over NO: At window close → render digest email
NO->>SMTP: send email (detailed digest)
SMTP-->>NO: 250 OK
```
**B) Admission deny (Teams card + Webhook)**
```mermaid
sequenceDiagram
autonumber
participant ZA as Zastava
participant NO as Notify.Worker
participant TE as Teams
participant WH as Webhook
ZA->>NO: bus event zastava.admission { decision: "deny", reasons: [...] }
NO->>TE: POST adaptive card
TE-->>NO: 200 OK
NO->>WH: POST JSON (signed)
WH-->>NO: 2xx
```
---
## 18) Implementation notes
* **Language**: .NET 10; minimal API; `System.Text.Json` with canonical writer for body hashing; Channels for pipelines.
* **Bus**: Redis Streams (**XGROUP** consumers) or NATS JetStream for atleastonce with ack; pertenant consumer groups to localize backpressure.
* **Templates**: compile and cache per rule+channel+locale; version with rule `updatedAt` to invalidate.
* **Rules**: store raw YAML + parsed AST; validate with schema + static checks (e.g., nonsensical combos).
* **Secrets**: pluggable secret resolver (Authority Secret proxy, K8s, Vault).
* **Rate limiting**: `System.Threading.RateLimiting` + perconnector adapters.
---
## 19) Roadmap (postv1)
* **PagerDuty/Opsgenie** connectors; **Jira** ticket creation.
* **User inbox** (inapp notifications) + mobile push via webhook relay.
* **Anomaly suppression**: autopause noisy rules with hints (learned thresholds).
* **Graph rules**: “only notify if *not_affected → affected* transition at consensus layer”.
* **Label enrichment**: pluggable taggers (business criticality, data classification) to refine matchers.

View File

@@ -1,6 +1,6 @@
# component_architecture_scanner.md — **StellaOps Scanner** (2025Q4)
> **Scope.** Implementationready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (MinIO+Mongo), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Vexer, Feedser, UI, CLI).
> **Scope.** Implementationready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (MinIO+Mongo), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
---
@@ -10,7 +10,7 @@
**Boundaries.**
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Vexer + Feedser) decides presentation and verdicts.
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
* Scanner **does not** keep thirdparty SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugins (e.g., patchpresence) run under explicit flags and never contaminate the core SBOM.
@@ -40,6 +40,37 @@ src/
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLIdriven scanner container
```
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
### 1.1 Queue backbone (Redis / NATS)
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:
- **Redis Streams** (default). Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
- **NATS JetStream**. Provisions the `SCANNER_JOBS` work-queue stream + durable consumer `scanner-workers`, publishes with `MsgId` for dedupe, applies backoff via `NAK` delays, and routes dead-lettered jobs to `SCANNER_JOBS_DEAD`.
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the active backend (Redis `PING`, NATS `PING`). Configuration is bound from `scanner.queue`:
```yaml
scanner:
queue:
kind: redis # or nats
redis:
connectionString: "redis://queue:6379/0"
streamName: "scanner:jobs"
nats:
url: "nats://queue:4222"
stream: "SCANNER_JOBS"
subject: "scanner.jobs"
durableConsumer: "scanner-workers"
deadLetterSubject: "scanner.jobs.dead"
maxDeliveryAttempts: 5
retryInitialBackoff: 00:00:05
retryMaxBackoff: 00:02:00
```
The DI extension (`AddScannerQueue`) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
**Runtime formfactor:** two deployables
* **Scanner.WebService** (stateless REST)
@@ -131,6 +162,10 @@ GET /catalog/artifacts/{id} → { meta }
GET /healthz | /readyz | /metrics
```
### Report events
When `scanner.events.enabled = true`, the WebService serialises the signed report (canonical JSON + DSSE envelope) with `NotifyCanonicalJsonSerializer` and publishes two Redis Stream entries (`scanner.report.ready`, `scanner.scan.completed`) to the configured stream (default `stella.events`). The stream fields carry the whole envelope plus lightweight headers (`kind`, `tenant`, `ts`) so Notify and UI timelines can consume the event bus without recomputing signatures. Publish timeouts and bounded stream length are controlled via `scanner:events:publishTimeoutSeconds` and `scanner:events:maxStreamLength`. If the queue driver is already Redis and no explicit events DSN is provided, the host reuses the queue connection and auto-enables event emission so deployments get live envelopes without extra wiring.
---
## 5) Execution flow (Worker)
@@ -155,6 +190,12 @@ GET /healthz | /readyz | /metrics
* **rpm**: `/var/lib/rpm/Packages` (via librpm or parser)
* Record `name`, `version` (epoch/revision), `arch`, source package where present, and **declared file lists**.
> **Data flow note:** Each OS analyzer now writes its canonical output into the shared `ScanAnalysisStore` under
> `analysis.os.packages` (raw results), `analysis.os.fragments` (per-analyzer layer fragments), and contributes to
> `analysis.layers.fragments` (the aggregated view consumed by emit/diff pipelines). Helpers in
> `ScanAnalysisCompositionBuilder` convert these fragments into SBOM composition requests and component graphs so the
> diff/emit stages no longer reach back into individual analyzer implementations.
**B) Language ecosystems (installed state only)**
* **Java**: `META-INF/maven/*/pom.properties`, MANIFEST → `pkg:maven/...`
@@ -171,6 +212,9 @@ GET /healthz | /readyz | /metrics
* **ELF**: parse `PT_INTERP`, `DT_NEEDED`, RPATH/RUNPATH, **GNU symbol versions**; map **SONAMEs** to file paths; link executables → libs.
* **PE/MachO** (planned M2): import table, delayimports; version resources; code signatures.
* Map libs back to **OS packages** if possible (via file lists); else emit `bin:{sha256}` components.
* The exported metadata (`stellaops.os.*` properties, license list, source package) feeds policy scoring and export pipelines
directly Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into
downstream JSON/Trivy payloads.
**D) EntryTrace (ENTRYPOINT/CMD → terminal program)**
@@ -402,6 +446,26 @@ ResolveEntrypoint(ImageConfig cfg, RootFs fs):
return Unknown(reason)
```
### Appendix A.1 — EntryTrace Explainability
EntryTrace emits structured diagnostics and metrics so operators can quickly understand why resolution succeeded or degraded:
| Reason | Description | Typical Mitigation |
|--------|-------------|--------------------|
| `CommandNotFound` | A command referenced in the script cannot be located in the layered root filesystem or `PATH`. | Ensure binaries exist in the image or extend `PATH` hints. |
| `MissingFile` | `source`/`.`/`run-parts` targets are missing. | Bundle the script or guard the include. |
| `DynamicEnvironmentReference` | Path depends on `$VARS` that are unknown at scan time. | Provide defaults via scan metadata or accept partial usage. |
| `RecursionLimitReached` | Nested includes exceeded the analyzer depth limit (default 64). | Flatten indirection or increase the limit in options. |
| `RunPartsEmpty` | `run-parts` directory contained no executable entries. | Remove empty directories or ignore if intentional. |
| `JarNotFound` / `ModuleNotFound` | Java/Python targets missing, preventing interpreter tracing. | Ship the jar/module with the image or adjust the launcher. |
Diagnostics drive two metrics published by `EntryTraceMetrics`:
- `entrytrace_resolutions_total{outcome}` — resolution attempts segmented by outcome (`resolved`, `partiallyresolved`, `unresolved`).
- `entrytrace_unresolved_total{reason}` — diagnostic counts keyed by reason.
Structured logs include `entrytrace.path`, `entrytrace.command`, `entrytrace.reason`, and `entrytrace.depth`, all correlated with scan/job IDs. Timestamps are normalized to UTC (microsecond precision) to keep DSSE attestations and UI traces explainable.
### Appendix B — BOMIndex sidecar
```
@@ -410,4 +474,3 @@ vector<string> purls
map<purlIndex, roaring_bitmap> components
optional map<purlIndex, roaring_bitmap> usedByEntrypoint
```

View File

@@ -0,0 +1,424 @@
# component_architecture_scheduler.md — **StellaOps Scheduler** (2025Q4)
> **Scope.** Implementationready architecture for **Scheduler**: a service that (1) **reevaluates** alreadycataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates **nightly** and **adhoc** runs, (3) targets only the **impacted** images using the BOMIndex, and (4) emits **reportready** events that downstream **Notify** fans out. Default mode is **analysisonly** (no image pull); optional **contentrefresh** can be enabled per schedule.
---
## 0) Mission & boundaries
**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.
**Boundaries.**
* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebServices **/reports (analysisonly)** endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
* Scheduler **may** ask Scanner to **contentrefresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.
---
## 1) Runtime shape & projects
```
src/
├─ StellaOps.Scheduler.WebService/ # REST (schedules CRUD, runs, admin)
├─ StellaOps.Scheduler.Worker/ # planners + runners (N replicas)
├─ StellaOps.Scheduler.ImpactIndex/ # purl→images inverted index (roaring bitmaps)
├─ StellaOps.Scheduler.Models/ # DTOs (Schedule, Run, ImpactSet, Deltas)
├─ StellaOps.Scheduler.Storage.Mongo/ # schedules, runs, cursors, locks
├─ StellaOps.Scheduler.Queue/ # Redis Streams / NATS abstraction
├─ StellaOps.Scheduler.Tests.* # unit/integration/e2e
```
**Deployables**:
* **Scheduler.WebService** (stateless)
* **Scheduler.Worker** (scaleout; planners + executors)
**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.
---
## 2) Core responsibilities
1. **Timebased** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
2. **Eventdriven** runs: react to **Feedser export** and **Vexer export** deltas (changed product keys / advisories / claims).
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanners perimage **BOMIndex** sidecars.
4. **Run planning**: shard, pace, and ratelimit jobs to avoid thundering herds.
5. **Execution**: call Scanner **/reports (analysisonly)** or **/scans (contentrefresh)**; aggregate **delta** results.
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
7. **Control plane**: CRUD schedules, **pause/resume**, dryrun previews, audit.
---
## 3) Data model (Mongo)
**Database**: `scheduler`
* `schedules`
```
{ _id, tenantId, name, enabled, whenCron, timezone,
mode: "analysis-only" | "content-refresh",
selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
createdAt, updatedAt, createdBy, updatedBy }
```
* `runs`
```
{ _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
reason?: { feedserExportId?, vexerExportId?, cursor? },
state: "planning|queued|running|completed|error|cancelled",
stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
startedAt, finishedAt, error? }
```
* `impact_cursors`
```
{ _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
```
* `locks` (singleton schedulers, run leases)
* `audit` (CRUD actions, run outcomes)
**Indexes**:
* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
* TTL optional for completed runs (e.g., 180 days).
---
## 4) ImpactIndex (global inverted index)
Goal: translate **change keys** → **image sets** in **milliseconds**.
**Source**: Scanner produces perimage **BOMIndex** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.
**Representation**:
* Assign **image IDs** (dense ints) to catalog images.
* Keep **Roaring Bitmaps**:
* `Contains[purl] → bitmap(imageIds)`
* `UsedBy[purl] → bitmap(imageIds)` (subset of Contains)
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
* Persist in RocksDB/LMDB or Redismodules; cache hot shards in memory; snapshot to Mongo for cold start.
**Update paths**:
* On new/updated image SBOM: **merge** perimage set into global maps.
* On image remove/expiry: **clear** id from bitmaps.
**API (internal)**:
```csharp
IImpactIndex {
ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
ImpactSet ResolveAll(Selector sel); // for nightly
}
```
**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.
---
## 5) External interfaces (REST)
Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).
### 5.1 Schedules CRUD
* `POST /schedules` → create
* `GET /schedules` → list (filter by tenant)
* `GET /schedules/{id}` → details + next run
* `PATCH /schedules/{id}` → pause/resume/update
* `DELETE /schedules/{id}` → delete (soft delete, optional)
### 5.2 Run control & introspection
* `POST /run` — adhoc run
```json
{ "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
```
* `GET /runs` — list with paging
* `GET /runs/{id}` — status, stats, links to deltas
* `POST /runs/{id}/cancel` — besteffort cancel
### 5.3 Previews (dryrun)
* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.
### 5.4 Event webhooks (optional push from Feedser/Vexer)
* `POST /events/feedser-export`
```json
{ "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
```
* `POST /events/vexer-export`
```json
{ "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
```
**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA256) plus Authority token.
---
## 6) Planner → Runner pipeline
### 6.1 Planning algorithm (eventdriven)
```
On Export Event (Feedser/Vexer):
keys = Normalize(change payload) # productKeys or vulnIds→productKeys
usageOnly = schedule/policy hint? # default true
sel = Selector for tenant/scope from schedules subscribed to events
impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
impacted = ApplyOwnerFilters(impacted, sel) # namespaces/repos/labels
impacted = DeduplicateByDigest(impacted)
impacted = EnforceLimits(impacted, limits.maxJobs)
shards = Shard(impacted, byHashPrefix, n=limits.parallelism)
For each shard:
Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
```
**Fairness & pacing**
* Use **leaky bucket** per tenant and per registry host.
* Prioritize **KEVtagged** and **critical** first if oversubscribed.
### 6.2 Nightly planning
```
At cron tick:
sel = resolve selection
candidates = ImpactIndex.ResolveAll(sel)
if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
shard & enqueue as above
```
### 6.3 Execution (Runner)
* Pop **RunSegment** job → for each image digest:
* **analysisonly**: `POST scanner/reports { imageDigest, policyRevision? }`
* **contentrefresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
* Persist perimage outcome in `runs.{id}.stats` (incremental counters).
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.
---
## 7) Event model (outbound)
**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).
```json
{
"tenant": "tenant-01",
"runId": "324af…",
"imageDigest": "sha256:…",
"newCriticals": 1,
"newHigh": 2,
"kevHits": ["CVE-2025-..."],
"topFindings": [
{ "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
],
"reportUrl": "https://ui/.../scans/sha256:.../report",
"attestation": { "uuid":"rekor-uuid", "verified": true },
"ts": "2025-10-18T03:12:45Z"
}
```
**Also**: `report.ready` for “nochange” summaries (digest + zero delta), which Notify can ignore by rule.
---
## 8) Security posture
* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
* **Multitenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenantvisible images.
* **Webhook** callers (Feedser/Vexer) present **mTLS** or **HMAC** and Authority token.
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
* **No secrets** in logs; redact tokens and signatures.
---
## 9) Observability & SLOs
**Metrics (Prometheus)**
* `scheduler.events_total{source, result}`
* `scheduler.impact_resolve_seconds{quantile}`
* `scheduler.images_selected_total{mode}`
* `scheduler.jobs_enqueued_total{mode}`
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
* `scheduler.delta_images_total{severity}`
* `scheduler.rate_limited_total{reason}`
**Targets**
* Resolve 10k changed keys → impacted set in **<300ms** (hot cache).
* Event → first rescan verdict in **≤60s** (p95).
* Nightly coverage 50k images in **≤10min** with 10 workers (analysisonly).
**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.
---
## 10) Configuration (YAML)
```yaml
scheduler:
authority:
issuer: "https://authority.internal"
require: "dpop" # or "mtls"
queue:
kind: "redis" # or "nats"
url: "redis://redis:6379/4"
mongo:
uri: "mongodb://mongo/scheduler"
impactIndex:
storage: "rocksdb" # "rocksdb" | "redis" | "memory"
warmOnStart: true
usageOnlyDefault: true
limits:
defaultRatePerSecond: 50
defaultParallelism: 8
maxJobsPerRun: 50000
integrates:
scannerUrl: "https://scanner-web.internal"
feedserWebhook: true
vexerWebhook: true
notifications:
emitBus: "internal" # deliver to Notify via internal bus
```
---
## 11) UI touchpoints
* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
* **Runs** page: timeline; heatmap of deltas; drilldown to affected images.
* **Dryrun preview** modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”
---
## 12) Failure modes & degradations
| Condition | Behavior |
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
| ImpactIndex cold / incomplete | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
| Feedser/Vexer webhook storm | Coalesce by exportId; debounce 3060s; keep last |
| Scanner under load (429) | Backoff with jitter; respect pertenant/leaky bucket |
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog |
| Notify down | Buffer outbound events in queue (TTL 24h) |
| Mongo slow | Cut batch sizes; samplelog; alert ops; dont drop runs unless critical |
---
## 13) Testing matrix
* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
* **Endtoend**: Feedser export → deltas visible in UI in ≤60s.
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
* **Chaos**: drop scanner availability; simulate registry throttles (contentrefresh mode).
* **Nightly**: cron tick correctness across timezones and DST.
---
## 14) Implementation notes
* **Language**: .NET 10 minimal API; Channelsbased pipeline; `System.Threading.RateLimiting`.
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memorymap large shards if RocksDB used.
* **Cron**: Quartzstyle parser with timezone support; clock skew tolerated ±60s.
* **Dryrun**: use ImpactIndex only; never call scanner.
* **Idempotency**: run segments carry deterministic keys; retries safe.
* **Backpressure**: pertenant buckets; perhost registry budgets respected when contentrefresh enabled.
---
## 15) Sequences (representative)
**A) Eventdriven rescan (Feedser delta)**
```mermaid
sequenceDiagram
autonumber
participant FE as Feedser
participant SCH as Scheduler.Worker
participant IDX as ImpactIndex
participant SC as Scanner.WebService
participant NO as Notify
FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
IDX-->>SCH: bitmap(imageIds) → digests list
SCH->>SC: POST /reports {imageDigest} (batch/sequenced)
SC-->>SCH: report deltas (new criticals/highs)
alt delta>0
SCH->>NO: rescan.delta {digest, newCriticals, links}
end
```
**B) Nightly rescan**
```mermaid
sequenceDiagram
autonumber
participant CRON as Cron
participant SCH as Scheduler.Worker
participant IDX as ImpactIndex
participant SC as Scanner.WebService
CRON->>SCH: tick (02:00 Europe/Sofia)
SCH->>IDX: ResolveAll(selector)
IDX-->>SCH: candidates
SCH->>SC: POST /reports {digest} (paced)
SC-->>SCH: results
SCH-->>SCH: aggregate, store run stats
```
**C) Contentrefresh (tag followers)**
```mermaid
sequenceDiagram
autonumber
participant SCH as Scheduler
participant SC as Scanner
SCH->>SC: resolve tag→digest (if changed)
alt digest changed
SCH->>SC: POST /scans {imageRef} # new SBOM
SC-->>SCH: scan complete (artifacts)
SCH->>SC: POST /reports {imageDigest}
else unchanged
SCH->>SC: POST /reports {imageDigest} # analysis-only
end
```
---
## 16) Roadmap
* **Vulncentric impact**: prejoin vuln→purl→images to rank by **KEV** and **exploitedinthewild** signals.
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
* **Crosscluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.
---
**End — component_architecture_scheduler.md**

View File

@@ -223,7 +223,7 @@ Supported **predicate types** (extensible):
* `https://stella-ops.org/attestations/sbom/1` (SBOM emissions)
* `https://stella-ops.org/attestations/report/1` (final PASS/FAIL reports)
* `https://stella-ops.org/attestations/vex-export/1` (Vexer exports; optional)
* `https://stella-ops.org/attestations/vex-export/1` (Excititor exports; optional)
**Validation**:

View File

@@ -1,6 +1,6 @@
# component_architecture_web_ui.md — **StellaOps Web UI** (2025Q4)
> **Scope.** Implementationready architecture for the **Angular SPA** that operators and developers use to drive StellaOps. This document defines UX surfaces, module boundaries, data flows, auth, RBAC, realtime updates, performance targets, i18n/a11y, security headers, testing and deployment. The UI is a *consumer* of backend APIs (Scanner, Policy, Vexer, Feedser, Attestor, Authority) and never performs scanning, merging, or signing on its own.
> **Scope.** Implementationready architecture for the **Angular SPA** that operators and developers use to drive StellaOps. This document defines UX surfaces, module boundaries, data flows, auth, RBAC, realtime updates, performance targets, i18n/a11y, security headers, testing and deployment. The UI is a *consumer* of backend APIs (Scanner, Policy, Excititor, Concelier, Attestor, Authority) and never performs scanning, merging, or signing on its own.
---
@@ -10,7 +10,7 @@
* Scans (status, SBOMs, diffs, EntryTrace, attestation).
* Policy management (rules, exemptions, VEX consumption view).
* Vulnerability intel (Feedser status), VEX consensus exploration (Vexer).
* Vulnerability intel (Concelier status), VEX consensus exploration (Excititor).
* Runtime posture (Zastava observer + admission).
* Admin operations (tenants, tokens, quotas, licensing posture).
@@ -42,7 +42,7 @@
├─ runtime/ # Zastava posture, drift events, admission decisions
├─ policy/ # rules editor (YAML/Rego), exemptions, previews
├─ vex/ # VEX explorer (claims, consensus, conflicts)
├─ feedser/ # source health, export cursors, rebuild/export triggers
├─ concelier/ # source health, export cursors, rebuild/export triggers
├─ attest/ # attestation proofs, verification bundles, Rekor links
├─ admin/ # tenants, roles, clients, quotas, licensing posture
└─ plugins/ # route plug-ins (lazy remote modules, governed)
@@ -86,13 +86,13 @@ Each feature folder builds as a **standalone route** (lazy loaded). All HTTP sha
* **VEX inclusion controls**: weight sliders (visualization only), provider allow/deny toggles.
* **Preview**: select SBOM (or image digest) → show verdict under staged policy.
### 3.5 Vexer
### 3.5 Excititor
* **Claims explorer**: search by vulnId/productKey/provider; show raw claim (status, justification, evidence).
* **Consensus view**: rollup per (vuln, product) with accepted/rejected sources, weights, timestamps.
* **Conflicts**: grid of top conflicts; filters for justification gates failed.
### 3.6 Feedser
### 3.6 Concelier
* **Sources** table: staleness, last run, errors.
* **Advisory search**: by CVE/alias; show normalized affected ranges.
@@ -136,7 +136,7 @@ Each feature folder builds as a **standalone route** (lazy loaded). All HTTP sha
* **`core/http/api-client.ts`** centralizes:
* Base URLs (Scanner, Vexer, Feedser, Attestor).
* Base URLs (Scanner, Excititor, Concelier, Attestor).
* **Retry** policies on idempotent GETs (backoff + jitter).
* **Problem+JSON** parser → uniform error toasts with correlation ID.
* **SSE** helper (EventSource) with autoreconnect & backpressure.
@@ -144,7 +144,7 @@ Each feature folder builds as a **standalone route** (lazy loaded). All HTTP sha
* Typed API clients (DTOs in `core/api/models.ts`):
* `ScannerApi`, `PolicyApi`, `VexerApi`, `FeedserApi`, `AttestorApi`, `AuthorityApi`.
* `ScannerApi`, `PolicyApi`, `ExcititorApi`, `ConcelierApi`, `AttestorApi`, `AuthorityApi`.
**DTO examples (abbrev):**
@@ -166,6 +166,28 @@ export interface VexConsensus {
}
```
*Upcoming:* `NotifyApi` consumes delivery history using the new paginated envelope returned by `/api/v1/notify/deliveries`.
```ts
export interface NotifyDeliveryListResponse {
items: NotifyDelivery[];
count: number;
continuationToken?: string;
}
export interface NotifyDelivery {
deliveryId: string;
ruleId: string;
actionId: string;
status: 'pending'|'sent'|'failed'|'throttled'|'digested'|'dropped';
rendered: NotifyDeliveryRendered;
metadata: Record<string, string>; // includes channelType, target, previewProvider, traceId, and provider-specific entries
createdAt: string;
sentAt?: string;
completedAt?: string;
}
```
---
## 6) State, caching & realtime
@@ -184,7 +206,7 @@ export interface VexConsensus {
* **Huge tables** rendered with **virtual scrolling** (CDK Virtual Scroll); sort/filter performed clientside for ≤ 20k rows; beyond that, serverside queries via BOMIndex endpoints.
* **Component row** shows purl, version, origin (OS pkg / metadata / linker / attested), licenses, and **used** badge (Usage view).
* **Diff**: compact heatmap per layer; clicking opens a rightpane with evidence: introducing paths, file hashes, VEX notes (from Vexer consensus) and links to advisories (Feedser).
* **Diff**: compact heatmap per layer; clicking opens a rightpane with evidence: introducing paths, file hashes, VEX notes (from Excititor consensus) and links to advisories (Concelier).
---

View File

@@ -1,451 +1,451 @@
# component_architecture_zastava.md — **StellaOps Zastava** (2025Q4)
> **Scope.** Implementationready architecture for **Zastava**: the **runtime inspector/enforcer** that watches real workloads, detects drift from the scanned baseline, verifies image/SBOM/attestation posture, and (optionally) **admits/blocks** deployments. Includes Kubernetes & plainDocker topologies, data contracts, APIs, security posture, performance targets, test matrices, and failure modes.
---
## 0) Mission & boundaries
**Mission.** Give operators **groundtruth** from running environments and a **fast guardrail** before workloads land:
* **Observer:** inventory containers, entrypoints actually executed, and DSOs actually loaded; verify **image signature**, **SBOM referrers**, and **attestation** presence; detect **drift** (unexpected processes/paths) and **policy violations**; publish **runtime events** to Scanner.WebService.
* **Admission (optional):** Kubernetes ValidatingAdmissionWebhook that enforces minimal posture (signed images, SBOM availability, known base images, policy PASS) **preflight**.
**Boundaries.**
* Zastava **does not** compute SBOMs and does not sign; it **consumes** Scanner/WebService outputs and **enforces** backend policy verdicts.
* Zastava can **request** a delta scan when the baseline is missing/stale, but scanning is done by **Scanner.Worker**.
* On nonK8s Docker hosts, Zastava runs as a host service with **observeronly** features.
---
## 1) Topology & processes
### 1.1 Components (Kubernetes)
```
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
```
### 1.2 Components (Docker/VM)
```
stellaops/zastava-agent # System service; watch Docker events; observer only
```
### 1.3 Dependencies
* **Authority** (OIDC): short OpToks (DPoP/mTLS) for API calls to Scanner.WebService.
* **Scanner.WebService**: `/runtime/events` ingestion; `/policy/runtime` fetch.
* **OCI Registry** (optional): for direct referrers/sig checks if not delegated to backend.
* **Container runtime**: containerd/CRIO/Docker (read interfaces only).
* **Kubernetes API** (watch Pods in cluster; validating webhook).
* **Host mounts** (K8s DaemonSet): `/proc`, `/var/lib/containerd` (or CRIO), `/run/containerd/containerd.sock` (optional, readonly).
---
## 2) Data contracts
### 2.1 Runtime event (observer → Scanner.WebService)
```json
{
"eventId": "9f6a…",
"when": "2025-10-17T12:34:56Z",
"kind": "CONTAINER_START|CONTAINER_STOP|DRIFT|POLICY_VIOLATION|ATTESTATION_STATUS",
"tenant": "tenant-01",
"node": "ip-10-0-1-23",
"runtime": { "engine": "containerd", "version": "1.7.19" },
"workload": {
"platform": "kubernetes",
"namespace": "payments",
"pod": "api-7c9fbbd8b7-ktd84",
"container": "api",
"containerId": "containerd://...",
"imageRef": "ghcr.io/acme/api@sha256:abcd…",
"owner": { "kind": "Deployment", "name": "api" }
},
"process": {
"pid": 12345,
"entrypoint": ["/entrypoint.sh", "--serve"],
"entryTrace": [
{"file":"/entrypoint.sh","line":3,"op":"exec","target":"/usr/bin/python3"},
{"file":"<argv>","op":"python","target":"/opt/app/server.py"}
]
},
"loadedLibs": [
{ "path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "…"},
{ "path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "…"}
],
"posture": {
"imageSigned": true,
"sbomReferrer": "present|missing",
"attestation": { "uuid": "rekor-uuid", "verified": true }
},
"delta": {
"baselineImageDigest": "sha256:abcd…",
"changedFiles": ["/opt/app/server.py"], // optional quick signal
"newBinaries": [{ "path":"/usr/local/bin/helper","sha256":"…" }]
},
"evidence": [
{"signal":"procfs.maps","value":"/lib/.../libssl.so.3@0x7f..."},
{"signal":"cri.task.inspect","value":"pid=12345"},
{"signal":"registry.referrers","value":"sbom: application/vnd.cyclonedx+json"}
]
}
```
### 2.2 Admission decision (webhook → API server)
```json
{
"admissionId": "…",
"namespace": "payments",
"podSpecDigest": "sha256:…",
"images": [
{
"name": "ghcr.io/acme/api:1.2.3",
"resolved": "ghcr.io/acme/api@sha256:abcd…",
"signed": true,
"hasSbomReferrers": true,
"policyVerdict": "pass|warn|fail",
"reasons": ["unsigned base image", "missing SBOM"]
}
],
"decision": "Allow|Deny",
"ttlSeconds": 300
}
```
---
## 3) Observer — node agent (DaemonSet)
### 3.1 Responsibilities
* **Watch** container lifecycle (start/stop) via CRI (`/run/containerd/containerd.sock` gRPC readonly) or `/var/log/containers/*.log` tail fallback.
* **Resolve** container → image digest, mount point rootfs.
* **Trace entrypoint**: attach **shortlived** nsenter/exec to PID 1 in container, parse shell for `exec` chain (bounded depth), record **terminal program**.
* **Sample loaded libs**: read `/proc/<pid>/maps` and `exe` symlink to collect **actually loaded** DSOs; compute **sha256** for each mapped file (bounded count/size).
* **Posture check** (cheap):
* Image signature presence (if cosign policies are local; else ask backend).
* SBOM **referrers** presence (HEAD to registry, optional).
* Rekor UUID known (query Scanner.WebService by image digest).
* **Publish runtime events** to Scanner.WebService `/runtime/events` (batch & compress).
* **Request delta scan** if: no SBOM in catalog OR base differs from known baseline.
### 3.2 Privileges & mounts (K8s)
* **SecurityContext:** `runAsUser: 0`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`.
* **Capabilities:** `CAP_SYS_PTRACE` (optional if using nsenter trace), `CAP_DAC_READ_SEARCH`.
* **Host mounts (readonly):**
* `/proc` (host) → `/host/proc`
* `/run/containerd/containerd.sock` (or CRIO socket)
* `/var/lib/containerd/io.containerd.runtime.v2.task` (rootfs paths & pids)
* **Networking:** clusterinternal egress to Scanner.WebService only.
* **Rate limits:** hard caps for bytes hashed and file count per container to avoid noisy tenants.
### 3.3 Event batching
* Buffer NDJSON; flush by **N events** or **2s**.
* Backpressure: local disk ring buffer (50MB default) if Scanner is temporarily unavailable; drop oldest after cap with **metrics** and **warning** event.
---
## 4) Admission Webhook (Kubernetes)
### 4.1 Gate criteria
Configurable policy (fetched from backend and cached):
* **Image signature**: must be cosignverifiable to configured key(s) or keyless identities.
* **SBOM availability**: at least one **CycloneDX** referrer or **Scanner.WebService** catalog entry.
* **Scanner policy verdict**: backend `PASS` required for namespaces/labels matching rules; allow `WARN` if configured.
* **Registry allowlists/denylists**.
* **Tag bans** (e.g., `:latest`).
* **Base image allowlists** (by digest).
### 4.2 Flow
```mermaid
sequenceDiagram
autonumber
participant K8s as API Server
participant WH as Zastava Webhook
participant SW as Scanner.WebService
K8s->>WH: AdmissionReview(Pod)
WH->>WH: Resolve images to digests (remote HEAD/pull if needed)
WH->>SW: POST /policy/runtime { digests, namespace, labels }
SW-->>WH: { per-image: {signed, hasSbom, verdict, reasons}, ttl }
alt All pass
WH-->>K8s: AdmissionResponse(Allow, ttl)
else Any fail (enforce=true)
WH-->>K8s: AdmissionResponse(Deny, message)
end
```
**Caching:** Perdigest result cached `ttlSeconds` (default 300 s). **Failopen** or **failclosed** is configurable per namespace.
### 4.3 TLS & HA
* Webhook has its own **serving cert** signed by cluster CA (or custom cert + CA bundle on configuration).
* Deployment ≥ 2 replicas; **leaderless**; stateless.
---
## 5) Backend integration (Scanner.WebService)
### 5.1 Ingestion endpoint
`POST /api/v1/scanner/runtime/events` *(OpTok + DPoP/mTLS)*
* Validates event schema; enforces rate caps by tenant/node; persists to **Mongo** (`runtime.events` capped collection or regular with TTL).
* Performs **correlation**:
* Attach nearest **image SBOM** (inventory/usage) and **BOMIndex** if known.
* If unknown/missing, schedule **delta scan** and return `202 Accepted`.
* Emits **derived signals** (usedByEntrypoint per component based on `/proc/<pid>/maps`).
### 5.2 Policy decision API (for webhook)
`POST /api/v1/scanner/policy/runtime`
Request:
```json
{
"namespace": "payments",
"labels": { "app": "api", "env": "prod" },
"images": ["ghcr.io/acme/api@sha256:...", "ghcr.io/acme/nginx@sha256:..."]
}
```
Response:
```json
{
"ttlSeconds": 300,
"results": {
"ghcr.io/acme/api@sha256:...": {
"signed": true,
"hasSbom": true,
"policyVerdict": "pass",
"reasons": [],
"rekor": { "uuid": "..." }
},
"ghcr.io/acme/nginx@sha256:...": {
"signed": false,
"hasSbom": false,
"policyVerdict": "fail",
"reasons": ["unsigned", "missing SBOM"]
}
}
}
```
---
## 6) Configuration (YAML)
```yaml
zastava:
mode:
observer: true
webhook: true
authority:
issuer: "https://authority.internal"
aud: ["scanner","zastava"] # tokens for backend and self-id
backend:
url: "https://scanner-web.internal"
connectTimeoutMs: 500
requestTimeoutMs: 1500
retry: { attempts: 3, backoffMs: 200 }
runtime:
engine: "auto" # containerd|cri-o|docker|auto
procfs: "/host/proc"
collect:
entryTrace: true
loadedLibs: true
maxLibs: 256
maxHashBytesPerContainer: 64_000_000
maxDepth: 48
admission:
enforce: true
failOpenNamespaces: ["dev", "test"]
verify:
imageSignature: true
sbomReferrer: true
scannerPolicyPass: true
cacheTtlSeconds: 300
resolveTags: true # do remote digest resolution for tag-only images
limits:
eventsPerSecond: 50
burst: 200
perNodeQueue: 10_000
security:
mounts:
containerdSock: "/run/containerd/containerd.sock:ro"
proc: "/proc:/host/proc:ro"
runtimeState: "/var/lib/containerd:ro"
```
---
## 7) Security posture
* **AuthN/Z**: Authority OpToks (DPoP preferred) to backend; webhook does **not** require client auth from API server (K8s handles).
* **Least privileges**: readonly host mounts; optional `CAP_SYS_PTRACE`; **no** host networking; **no** write mounts.
* **Isolation**: never exec untrusted code; nsenter only to **read** `/proc/<pid>`.
* **Data minimization**: do not exfiltrate env vars or command arguments unless policy explicitly enables diagnostic mode.
* **Rate limiting**: pernode caps; pertenant caps at backend.
* **Hard caps**: bytes hashed, files inspected, depth of shell parsing.
---
## 8) Metrics, logs, tracing
**Observer**
* `zastava.events_emitted_total{kind}`
* `zastava.proc_maps_samples_total{result}`
* `zastava.entrytrace_depth{p99}`
* `zastava.hash_bytes_total`
* `zastava.buffer_drops_total`
**Webhook**
* `zastava.admission_requests_total{decision}`
* `zastava.admission_latency_seconds`
* `zastava.cache_hits_total`
* `zastava.backend_failures_total`
**Logs** (structured): node, pod, image digest, decision, reasons.
**Tracing**: spans for observe→batch→post; webhook request→resolve→respond.
---
## 9) Performance & scale targets
* **Observer**: ≤ **30ms** to sample `/proc/<pid>/maps` and compute quick hashes for ≤ 64 files; ≤ **200ms** for full library set (256 libs).
* **Webhook**: P95 ≤ **8ms** with warm cache; ≤ **50ms** with one backend roundtrip.
* **Throughput**: 1k admission requests/min/replica; 5k runtime events/min/node with batching.
---
## 10) Drift detection model
**Signals**
* **Process drift**: terminal program differs from **EntryTrace** baseline.
* **Library drift**: loaded DSOs not present in **Usage** SBOM view.
* **Filesystem drift**: new executable files under `/usr/local/bin`, `/opt`, `/app` with **mtime** after image creation.
* **Network drift** (optional): listening sockets on unexpected ports (from policy).
**Action**
* Emit `DRIFT` event with evidence; backend can **autoqueue** a delta scan; policy may **escalate** to alert/block (Admission cannot block alreadyrunning pods; rely on K8s policies/PodSecurity or operator action).
---
## 11) Test matrix
* **Engines**: containerd, CRIO, Docker; ensure PID resolution and rootfs mapping.
* **EntryTrace**: bash features (case, if, runparts, `.`/`source`), language launchers (python/node/java).
* **Procfs**: multiple arches, musl/glibc images; static binaries (maps minimal).
* **Admission**: unsigned images, missing SBOM referrers, tagonly images, digest resolution, backend latency, cache TTL.
* **Perf/soak**: 500 Pods/node churn; webhook under HPA growth.
* **Security**: attempt privilege escalation disabled, readonly mounts enforced, ratelimit abuse.
* **Failure injection**: backend down (observer buffers, webhook failopen/closed), registry throttling, containerd socket unavailable.
---
## 12) Failure modes & responses
| Condition | Observer behavior | Webhook behavior |
| ------------------------------- | ---------------------------------------------- | ------------------------------------------------------ |
| Backend unreachable | Buffer to disk; drop after cap; emit metric | **Failopen/closed** per namespace config |
| PID vanished midsample | Retry once; emit partial evidence | N/A |
| CRI socket missing | Fallback to K8s events only (reduced fidelity) | N/A |
| Registry digest resolve blocked | Defer to backend; mark `resolve=unknown` | Deny or allow per `resolveTags` & `failOpenNamespaces` |
| Excessive events | Apply local rate limit, coalesce | N/A |
---
## 13) Deployment notes (K8s)
**DaemonSet (snippet):**
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata: { name: zastava-observer, namespace: stellaops }
spec:
template:
spec:
serviceAccountName: zastava
hostPID: true
containers:
- name: observer
image: stellaops/zastava-observer:2.3
securityContext:
runAsUser: 0
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities: { add: ["SYS_PTRACE","DAC_READ_SEARCH"] }
volumeMounts:
- { name: proc, mountPath: /host/proc, readOnly: true }
- { name: containerd-sock, mountPath: /run/containerd/containerd.sock, readOnly: true }
- { name: containerd-state, mountPath: /var/lib/containerd, readOnly: true }
volumes:
- { name: proc, hostPath: { path: /proc } }
- { name: containerd-sock, hostPath: { path: /run/containerd/containerd.sock } }
- { name: containerd-state, hostPath: { path: /var/lib/containerd } }
```
**Webhook (snippet):**
```yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
webhooks:
- name: gate.zastava.stella-ops.org
admissionReviewVersions: ["v1"]
sideEffects: None
failurePolicy: Ignore # or Fail
rules:
- operations: ["CREATE","UPDATE"]
apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
clientConfig:
service:
namespace: stellaops
name: zastava-webhook
path: /admit
caBundle: <base64 CA>
```
---
## 14) Implementation notes
* **Language**: Rust (observer) for lowlatency `/proc` parsing; Go/.NET viable too. Webhook can be .NET 10 for parity with backend.
* **CRI drivers**: pluggable (`containerd`, `cri-o`, `docker`). Prefer CRI over parsing logs.
* **Shell parser**: reuse Scanner.EntryTrace grammar for consistent results (compile to WASM if observer is Rust/Go).
* **Hashing**: `BLAKE3` for speed locally, then convert to `sha256` (or compute `sha256` directly when budget allows).
* **Resilience**: never block container start; observer is **passive**; only webhook decides allow/deny.
---
## 15) Roadmap
* **eBPF** option for syscall/library load tracing (kernellevel, optin).
* **Windows containers** support (ETW providers, loaded modules).
* **Network posture** checks: listening ports vs policy.
* **Live **usedbyentrypoint** synthesis**: send compact bitset diff to backend to tighten Usage view.
* **Admission dryrun** dashboards (simulate block lists before enforcing).
# component_architecture_zastava.md — **StellaOps Zastava** (2025Q4)
> **Scope.** Implementationready architecture for **Zastava**: the **runtime inspector/enforcer** that watches real workloads, detects drift from the scanned baseline, verifies image/SBOM/attestation posture, and (optionally) **admits/blocks** deployments. Includes Kubernetes & plainDocker topologies, data contracts, APIs, security posture, performance targets, test matrices, and failure modes.
---
## 0) Mission & boundaries
**Mission.** Give operators **groundtruth** from running environments and a **fast guardrail** before workloads land:
* **Observer:** inventory containers, entrypoints actually executed, and DSOs actually loaded; verify **image signature**, **SBOM referrers**, and **attestation** presence; detect **drift** (unexpected processes/paths) and **policy violations**; publish **runtime events** to Scanner.WebService.
* **Admission (optional):** Kubernetes ValidatingAdmissionWebhook that enforces minimal posture (signed images, SBOM availability, known base images, policy PASS) **preflight**.
**Boundaries.**
* Zastava **does not** compute SBOMs and does not sign; it **consumes** Scanner/WebService outputs and **enforces** backend policy verdicts.
* Zastava can **request** a delta scan when the baseline is missing/stale, but scanning is done by **Scanner.Worker**.
* On nonK8s Docker hosts, Zastava runs as a host service with **observeronly** features.
---
## 1) Topology & processes
### 1.1 Components (Kubernetes)
```
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
```
### 1.2 Components (Docker/VM)
```
stellaops/zastava-agent # System service; watch Docker events; observer only
```
### 1.3 Dependencies
* **Authority** (OIDC): short OpToks (DPoP/mTLS) for API calls to Scanner.WebService.
* **Scanner.WebService**: `/runtime/events` ingestion; `/policy/runtime` fetch.
* **OCI Registry** (optional): for direct referrers/sig checks if not delegated to backend.
* **Container runtime**: containerd/CRIO/Docker (read interfaces only).
* **Kubernetes API** (watch Pods in cluster; validating webhook).
* **Host mounts** (K8s DaemonSet): `/proc`, `/var/lib/containerd` (or CRIO), `/run/containerd/containerd.sock` (optional, readonly).
---
## 2) Data contracts
### 2.1 Runtime event (observer → Scanner.WebService)
```json
{
"eventId": "9f6a…",
"when": "2025-10-17T12:34:56Z",
"kind": "CONTAINER_START|CONTAINER_STOP|DRIFT|POLICY_VIOLATION|ATTESTATION_STATUS",
"tenant": "tenant-01",
"node": "ip-10-0-1-23",
"runtime": { "engine": "containerd", "version": "1.7.19" },
"workload": {
"platform": "kubernetes",
"namespace": "payments",
"pod": "api-7c9fbbd8b7-ktd84",
"container": "api",
"containerId": "containerd://...",
"imageRef": "ghcr.io/acme/api@sha256:abcd…",
"owner": { "kind": "Deployment", "name": "api" }
},
"process": {
"pid": 12345,
"entrypoint": ["/entrypoint.sh", "--serve"],
"entryTrace": [
{"file":"/entrypoint.sh","line":3,"op":"exec","target":"/usr/bin/python3"},
{"file":"<argv>","op":"python","target":"/opt/app/server.py"}
]
},
"loadedLibs": [
{ "path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "…"},
{ "path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "…"}
],
"posture": {
"imageSigned": true,
"sbomReferrer": "present|missing",
"attestation": { "uuid": "rekor-uuid", "verified": true }
},
"delta": {
"baselineImageDigest": "sha256:abcd…",
"changedFiles": ["/opt/app/server.py"], // optional quick signal
"newBinaries": [{ "path":"/usr/local/bin/helper","sha256":"…" }]
},
"evidence": [
{"signal":"procfs.maps","value":"/lib/.../libssl.so.3@0x7f..."},
{"signal":"cri.task.inspect","value":"pid=12345"},
{"signal":"registry.referrers","value":"sbom: application/vnd.cyclonedx+json"}
]
}
```
### 2.2 Admission decision (webhook → API server)
```json
{
"admissionId": "…",
"namespace": "payments",
"podSpecDigest": "sha256:…",
"images": [
{
"name": "ghcr.io/acme/api:1.2.3",
"resolved": "ghcr.io/acme/api@sha256:abcd…",
"signed": true,
"hasSbomReferrers": true,
"policyVerdict": "pass|warn|fail",
"reasons": ["unsigned base image", "missing SBOM"]
}
],
"decision": "Allow|Deny",
"ttlSeconds": 300
}
```
---
## 3) Observer — node agent (DaemonSet)
### 3.1 Responsibilities
* **Watch** container lifecycle (start/stop) via CRI (`/run/containerd/containerd.sock` gRPC readonly) or `/var/log/containers/*.log` tail fallback.
* **Resolve** container → image digest, mount point rootfs.
* **Trace entrypoint**: attach **shortlived** nsenter/exec to PID 1 in container, parse shell for `exec` chain (bounded depth), record **terminal program**.
* **Sample loaded libs**: read `/proc/<pid>/maps` and `exe` symlink to collect **actually loaded** DSOs; compute **sha256** for each mapped file (bounded count/size).
* **Posture check** (cheap):
* Image signature presence (if cosign policies are local; else ask backend).
* SBOM **referrers** presence (HEAD to registry, optional).
* Rekor UUID known (query Scanner.WebService by image digest).
* **Publish runtime events** to Scanner.WebService `/runtime/events` (batch & compress).
* **Request delta scan** if: no SBOM in catalog OR base differs from known baseline.
### 3.2 Privileges & mounts (K8s)
* **SecurityContext:** `runAsUser: 0`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`.
* **Capabilities:** `CAP_SYS_PTRACE` (optional if using nsenter trace), `CAP_DAC_READ_SEARCH`.
* **Host mounts (readonly):**
* `/proc` (host) → `/host/proc`
* `/run/containerd/containerd.sock` (or CRIO socket)
* `/var/lib/containerd/io.containerd.runtime.v2.task` (rootfs paths & pids)
* **Networking:** clusterinternal egress to Scanner.WebService only.
* **Rate limits:** hard caps for bytes hashed and file count per container to avoid noisy tenants.
### 3.3 Event batching
* Buffer NDJSON; flush by **N events** or **2s**.
* Backpressure: local disk ring buffer (50MB default) if Scanner is temporarily unavailable; drop oldest after cap with **metrics** and **warning** event.
---
## 4) Admission Webhook (Kubernetes)
### 4.1 Gate criteria
Configurable policy (fetched from backend and cached):
* **Image signature**: must be cosignverifiable to configured key(s) or keyless identities.
* **SBOM availability**: at least one **CycloneDX** referrer or **Scanner.WebService** catalog entry.
* **Scanner policy verdict**: backend `PASS` required for namespaces/labels matching rules; allow `WARN` if configured.
* **Registry allowlists/denylists**.
* **Tag bans** (e.g., `:latest`).
* **Base image allowlists** (by digest).
### 4.2 Flow
```mermaid
sequenceDiagram
autonumber
participant K8s as API Server
participant WH as Zastava Webhook
participant SW as Scanner.WebService
K8s->>WH: AdmissionReview(Pod)
WH->>WH: Resolve images to digests (remote HEAD/pull if needed)
WH->>SW: POST /policy/runtime { digests, namespace, labels }
SW-->>WH: { per-image: {signed, hasSbom, verdict, reasons}, ttl }
alt All pass
WH-->>K8s: AdmissionResponse(Allow, ttl)
else Any fail (enforce=true)
WH-->>K8s: AdmissionResponse(Deny, message)
end
```
**Caching:** Perdigest result cached `ttlSeconds` (default 300 s). **Failopen** or **failclosed** is configurable per namespace.
### 4.3 TLS & HA
* Webhook has its own **serving cert** signed by cluster CA (or custom cert + CA bundle on configuration).
* Deployment ≥ 2 replicas; **leaderless**; stateless.
---
## 5) Backend integration (Scanner.WebService)
### 5.1 Ingestion endpoint
`POST /api/v1/scanner/runtime/events` *(OpTok + DPoP/mTLS)*
* Validates event schema; enforces rate caps by tenant/node; persists to **Mongo** (`runtime.events` capped collection or regular with TTL).
* Performs **correlation**:
* Attach nearest **image SBOM** (inventory/usage) and **BOMIndex** if known.
* If unknown/missing, schedule **delta scan** and return `202 Accepted`.
* Emits **derived signals** (usedByEntrypoint per component based on `/proc/<pid>/maps`).
### 5.2 Policy decision API (for webhook)
`POST /api/v1/scanner/policy/runtime`
Request:
```json
{
"namespace": "payments",
"labels": { "app": "api", "env": "prod" },
"images": ["ghcr.io/acme/api@sha256:...", "ghcr.io/acme/nginx@sha256:..."]
}
```
Response:
```json
{
"ttlSeconds": 300,
"results": {
"ghcr.io/acme/api@sha256:...": {
"signed": true,
"hasSbom": true,
"policyVerdict": "pass",
"reasons": [],
"rekor": { "uuid": "..." }
},
"ghcr.io/acme/nginx@sha256:...": {
"signed": false,
"hasSbom": false,
"policyVerdict": "fail",
"reasons": ["unsigned", "missing SBOM"]
}
}
}
```
---
## 6) Configuration (YAML)
```yaml
zastava:
mode:
observer: true
webhook: true
authority:
issuer: "https://authority.internal"
aud: ["scanner","zastava"] # tokens for backend and self-id
backend:
url: "https://scanner-web.internal"
connectTimeoutMs: 500
requestTimeoutMs: 1500
retry: { attempts: 3, backoffMs: 200 }
runtime:
engine: "auto" # containerd|cri-o|docker|auto
procfs: "/host/proc"
collect:
entryTrace: true
loadedLibs: true
maxLibs: 256
maxHashBytesPerContainer: 64_000_000
maxDepth: 48
admission:
enforce: true
failOpenNamespaces: ["dev", "test"]
verify:
imageSignature: true
sbomReferrer: true
scannerPolicyPass: true
cacheTtlSeconds: 300
resolveTags: true # do remote digest resolution for tag-only images
limits:
eventsPerSecond: 50
burst: 200
perNodeQueue: 10_000
security:
mounts:
containerdSock: "/run/containerd/containerd.sock:ro"
proc: "/proc:/host/proc:ro"
runtimeState: "/var/lib/containerd:ro"
```
---
## 7) Security posture
* **AuthN/Z**: Authority OpToks (DPoP preferred) to backend; webhook does **not** require client auth from API server (K8s handles).
* **Least privileges**: readonly host mounts; optional `CAP_SYS_PTRACE`; **no** host networking; **no** write mounts.
* **Isolation**: never exec untrusted code; nsenter only to **read** `/proc/<pid>`.
* **Data minimization**: do not exfiltrate env vars or command arguments unless policy explicitly enables diagnostic mode.
* **Rate limiting**: pernode caps; pertenant caps at backend.
* **Hard caps**: bytes hashed, files inspected, depth of shell parsing.
---
## 8) Metrics, logs, tracing
**Observer**
* `zastava.events_emitted_total{kind}`
* `zastava.proc_maps_samples_total{result}`
* `zastava.entrytrace_depth{p99}`
* `zastava.hash_bytes_total`
* `zastava.buffer_drops_total`
**Webhook**
* `zastava.admission_requests_total{decision}`
* `zastava.admission_latency_seconds`
* `zastava.cache_hits_total`
* `zastava.backend_failures_total`
**Logs** (structured): node, pod, image digest, decision, reasons.
**Tracing**: spans for observe→batch→post; webhook request→resolve→respond.
---
## 9) Performance & scale targets
* **Observer**: ≤ **30ms** to sample `/proc/<pid>/maps` and compute quick hashes for ≤ 64 files; ≤ **200ms** for full library set (256 libs).
* **Webhook**: P95 ≤ **8ms** with warm cache; ≤ **50ms** with one backend roundtrip.
* **Throughput**: 1k admission requests/min/replica; 5k runtime events/min/node with batching.
---
## 10) Drift detection model
**Signals**
* **Process drift**: terminal program differs from **EntryTrace** baseline.
* **Library drift**: loaded DSOs not present in **Usage** SBOM view.
* **Filesystem drift**: new executable files under `/usr/local/bin`, `/opt`, `/app` with **mtime** after image creation.
* **Network drift** (optional): listening sockets on unexpected ports (from policy).
**Action**
* Emit `DRIFT` event with evidence; backend can **autoqueue** a delta scan; policy may **escalate** to alert/block (Admission cannot block alreadyrunning pods; rely on K8s policies/PodSecurity or operator action).
---
## 11) Test matrix
* **Engines**: containerd, CRIO, Docker; ensure PID resolution and rootfs mapping.
* **EntryTrace**: bash features (case, if, runparts, `.`/`source`), language launchers (python/node/java).
* **Procfs**: multiple arches, musl/glibc images; static binaries (maps minimal).
* **Admission**: unsigned images, missing SBOM referrers, tagonly images, digest resolution, backend latency, cache TTL.
* **Perf/soak**: 500 Pods/node churn; webhook under HPA growth.
* **Security**: attempt privilege escalation disabled, readonly mounts enforced, ratelimit abuse.
* **Failure injection**: backend down (observer buffers, webhook failopen/closed), registry throttling, containerd socket unavailable.
---
## 12) Failure modes & responses
| Condition | Observer behavior | Webhook behavior |
| ------------------------------- | ---------------------------------------------- | ------------------------------------------------------ |
| Backend unreachable | Buffer to disk; drop after cap; emit metric | **Failopen/closed** per namespace config |
| PID vanished midsample | Retry once; emit partial evidence | N/A |
| CRI socket missing | Fallback to K8s events only (reduced fidelity) | N/A |
| Registry digest resolve blocked | Defer to backend; mark `resolve=unknown` | Deny or allow per `resolveTags` & `failOpenNamespaces` |
| Excessive events | Apply local rate limit, coalesce | N/A |
---
## 13) Deployment notes (K8s)
**DaemonSet (snippet):**
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata: { name: zastava-observer, namespace: stellaops }
spec:
template:
spec:
serviceAccountName: zastava
hostPID: true
containers:
- name: observer
image: stellaops/zastava-observer:2.3
securityContext:
runAsUser: 0
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities: { add: ["SYS_PTRACE","DAC_READ_SEARCH"] }
volumeMounts:
- { name: proc, mountPath: /host/proc, readOnly: true }
- { name: containerd-sock, mountPath: /run/containerd/containerd.sock, readOnly: true }
- { name: containerd-state, mountPath: /var/lib/containerd, readOnly: true }
volumes:
- { name: proc, hostPath: { path: /proc } }
- { name: containerd-sock, hostPath: { path: /run/containerd/containerd.sock } }
- { name: containerd-state, hostPath: { path: /var/lib/containerd } }
```
**Webhook (snippet):**
```yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
webhooks:
- name: gate.zastava.stella-ops.org
admissionReviewVersions: ["v1"]
sideEffects: None
failurePolicy: Ignore # or Fail
rules:
- operations: ["CREATE","UPDATE"]
apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
clientConfig:
service:
namespace: stellaops
name: zastava-webhook
path: /admit
caBundle: <base64 CA>
```
---
## 14) Implementation notes
* **Language**: Rust (observer) for lowlatency `/proc` parsing; Go/.NET viable too. Webhook can be .NET 10 for parity with backend.
* **CRI drivers**: pluggable (`containerd`, `cri-o`, `docker`). Prefer CRI over parsing logs.
* **Shell parser**: reuse Scanner.EntryTrace grammar for consistent results (compile to WASM if observer is Rust/Go).
* **Hashing**: `BLAKE3` for speed locally, then convert to `sha256` (or compute `sha256` directly when budget allows).
* **Resilience**: never block container start; observer is **passive**; only webhook decides allow/deny.
---
## 15) Roadmap
* **eBPF** option for syscall/library load tracing (kernellevel, optin).
* **Windows containers** support (ETW providers, loaded modules).
* **Network posture** checks: listening ports vs policy.
* **Live **usedbyentrypoint** synthesis**: send compact bitset diff to backend to tighten Usage view.
* **Admission dryrun** dashboards (simulate block lists before enforcing).

104
docs/EXCITITOR_SCORRING.md Normal file
View File

@@ -0,0 +1,104 @@
## Status
This document tracks the future-looking risk scoring model for Excititor. The calculation below is not active yet; Sprint 7 work will add the required schema fields, policy controls, and services. Until that ships, Excititor emits consensus statuses without numeric scores.
## Scoring model (target state)
**S = Gate(VEX_status) × W_trust(source) × [Severity_base × (1 + α·KEV + β·EPSS)]**
* **Gate(VEX_status)**: `affected`/`under_investigation` → 1, `not_affected`/`fixed` → 0. A trusted “not affected” or “fixed” still zeroes the score.
* **W_trust(source)**: normalized policy weight (baseline 01). Policies may opt into >1 boosts for signed vendor feeds once Phase 1 closes.
* **Severity_base**: canonical numeric severity from Concelier (CVSS or org-defined scale).
* **KEV flag**: 0/1 boost when CISA Known Exploited Vulnerabilities applies.
* **EPSS**: probability [0,1]; bounded multiplier.
* **α, β**: configurable coefficients (default α=0.25, β=0.5) stored in policy.
Safeguards: freeze boosts when product identity is unknown, clamp outputs ≥0, and log every factor in the audit trail.
## Implementation roadmap
| Phase | Scope | Artifacts |
| --- | --- | --- |
| **Phase 1 Schema foundations** | Extend Excititor consensus/claims and Concelier canonical advisories with severity, KEV, EPSS, and expose α/β + weight ceilings in policy. | Sprint 7 tasks `EXCITITOR-CORE-02-001`, `EXCITITOR-POLICY-02-001`, `EXCITITOR-STORAGE-02-001`, `FEEDCORE-ENGINE-07-001`. |
| **Phase 2 Deterministic score engine** | Implement a scoring component that executes alongside consensus and persists score envelopes with hashes. | Planned task `EXCITITOR-CORE-02-002` (backlog). |
| **Phase 3 Surfacing & enforcement** | Expose scores via WebService/CLI, integrate with Concelier noise priors, and enforce policy-based suppressions. | To be scheduled after Phase 2. |
## Policy controls (Phase 1)
Operators tune scoring inputs through the Excititor policy document:
```yaml
excititor:
policy:
weights:
vendor: 1.10 # per-tier weight
ceiling: 1.40 # max clamp applied to tiers and overrides (1.05.0)
providerOverrides:
trusted.vendor: 1.35
scoring:
alpha: 0.30 # KEV boost coefficient (defaults to 0.25)
beta: 0.60 # EPSS boost coefficient (defaults to 0.50)
```
* All weights (tiers + overrides) are clamped to `[0, weights.ceiling]` with structured warnings when a value is out of range or not a finite number.
* `weights.ceiling` itself is constrained to `[1.0, 5.0]`, preserving prior behaviour when omitted.
* `scoring.alpha` / `scoring.beta` accept non-negative values up to 5.0; values outside the range fall back to defaults and surface diagnostics to operators.
## Data model (after Phase 1)
```json
{
"vulnerabilityId": "CVE-2025-12345",
"product": "pkg:name@version",
"consensus": {
"status": "affected",
"policyRevisionId": "rev-12",
"policyDigest": "0D9AEC…"
},
"signals": {
"severity": {"scheme": "CVSS:3.1", "score": 7.5},
"kev": true,
"epss": 0.40
},
"policy": {
"weight": 1.15,
"alpha": 0.25,
"beta": 0.5
},
"score": {
"value": 10.8,
"generatedAt": "2025-11-05T14:12:30Z",
"audit": [
"gate:affected",
"weight:1.15",
"severity:7.5",
"kev:1",
"epss:0.40"
]
}
}
```
## Operational guidance
* **Inputs**: Concelier delivers severity/KEV/EPSS via the advisory event log; Excititor connectors load VEX statements. Policy owns trust tiers and coefficients.
* **Processing**: the scoring engine (Phase 2) runs next to consensus, storing results with deterministic hashes so exports and attestations can reference them.
* **Consumption**: WebService/CLI will return consensus plus score; scanners may suppress findings only when policy-authorized VEX gating and signed score envelopes agree.
## Pseudocode (Phase 2 preview)
```python
def risk_score(gate, weight, severity, kev, epss, alpha, beta, freeze_boosts=False):
if gate == 0:
return 0
if freeze_boosts:
kev, epss = 0, 0
boost = 1 + alpha * kev + beta * epss
return max(0, weight * severity * boost)
```
## FAQ
* **Can operators opt out?** Set α=β=0 or keep weights ≤1.0 via policy.
* **What about missing signals?** Treat them as zero and log the omission.
* **When will this ship?** Phase 1 is planned for Sprint 7; later phases depend on connector coverage and attestation delivery.

View File

@@ -34,28 +34,34 @@ Everything here is opensource and versioned— when you check out a git ta
### Reference & concepts
- **05[System Requirements Specification](05_SYSTEM_REQUIREMENTS_SPEC.md)**
- **07[HighLevel Architecture](07_HIGH_LEVEL_ARCHITECTURE.md)**
- **08[Architecture Decision Records](adr/index.md)**
- **08Module Architecture Dossiers**
- [Scanner](ARCHITECTURE_SCANNER.md)
- [Feedser](ARCHITECTURE_FEEDSER.md)
- [Vexer](ARCHITECTURE_VEXER.md)
- [Signer](ARCHITECTURE_SIGNER.md)
- [Attestor](ARCHITECTURE_ATTESTOR.md)
- [Authority](ARCHITECTURE_AUTHORITY.md)
- [CLI](ARCHITECTURE_CLI.md)
- [WebUI](ARCHITECTURE_UI.md)
- [Zastava Runtime](ARCHITECTURE_ZASTAVA.md)
- [Release & Operations](ARCHITECTURE_DEVOPS.md)
- **09[API&CLI Reference](09_API_CLI_REFERENCE.md)**
- [Concelier](ARCHITECTURE_CONCELIER.md)
- [Excititor](ARCHITECTURE_EXCITITOR.md)
- [Excititor Mirrors](ARCHITECTURE_EXCITITOR_MIRRORS.md)
- [Signer](ARCHITECTURE_SIGNER.md)
- [Attestor](ARCHITECTURE_ATTESTOR.md)
- [Authority](ARCHITECTURE_AUTHORITY.md)
- [Notify](ARCHITECTURE_NOTIFY.md)
- [Scheduler](ARCHITECTURE_SCHEDULER.md)
- [CLI](ARCHITECTURE_CLI.md)
- [WebUI](ARCHITECTURE_UI.md)
- [Zastava Runtime](ARCHITECTURE_ZASTAVA.md)
- [Release & Operations](ARCHITECTURE_DEVOPS.md)
- **09[API&CLI Reference](09_API_CLI_REFERENCE.md)**
- **10[Plugin SDK Guide](10_PLUGIN_SDK_GUIDE.md)**
- **10[Feedser CLI Quickstart](10_FEEDSER_CLI_QUICKSTART.md)**
- **30[Vexer Connector Packaging Guide](dev/30_VEXER_CONNECTOR_GUIDE.md)**
- **30Developer Templates**
- [Vexer Connector Skeleton](dev/templates/vexer-connector/)
- **11[Authority Service](11_AUTHORITY.md)**
- **11[Data Schemas](11_DATA_SCHEMAS.md)**
- **12[Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
- **13[ReleaseEngineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
- **30[Fixture Maintenance](dev/fixtures.md)**
- **10[Concelier CLI Quickstart](10_CONCELIER_CLI_QUICKSTART.md)**
- **10[BuildX Generator Quickstart](dev/BUILDX_PLUGIN_QUICKSTART.md)**
- **10[Scanner Cache Configuration](dev/SCANNER_CACHE_CONFIGURATION.md)**
- **30[Excititor Connector Packaging Guide](dev/30_EXCITITOR_CONNECTOR_GUIDE.md)**
- **30Developer Templates**
- [Excititor Connector Skeleton](dev/templates/excititor-connector/)
- **11[Authority Service](11_AUTHORITY.md)**
- **11[Data Schemas](11_DATA_SCHEMAS.md)**
- **12[Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
- **13[ReleaseEngineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
- **30[Fixture Maintenance](dev/fixtures.md)**
### User & operator guides
- **14[Glossary](14_GLOSSARY_OF_TERMS.md)**
@@ -64,18 +70,19 @@ Everything here is opensource and versioned— when you check out a git ta
- **18[Coding Standards](18_CODING_STANDARDS.md)**
- **19[TestSuite Overview](19_TEST_SUITE_OVERVIEW.md)**
- **21[Install Guide](21_INSTALL_GUIDE.md)**
- **22[CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
- **23[FAQ](23_FAQ_MATRIX.md)**
- **22[CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
- **23[FAQ](23_FAQ_MATRIX.md)**
- **24[Offline Update Kit Admin Guide](24_OFFLINE_KIT.md)**
- **25[Feedser Apple Connector Operations](ops/feedser-apple-operations.md)**
- **26[Authority Key Rotation Playbook](ops/authority-key-rotation.md)**
- **27[Feedser CCCS Connector Operations](ops/feedser-cccs-operations.md)**
- **28[Feedser CISA ICS Connector Operations](ops/feedser-icscisa-operations.md)**
- **29[Feedser CERT-Bund Connector Operations](ops/feedser-certbund-operations.md)**
- **30[Feedser MSRC Connector AAD Onboarding](ops/feedser-msrc-operations.md)**
- **25[Mirror Operations Runbook](ops/concelier-mirror-operations.md)**
- **26[Concelier Apple Connector Operations](ops/concelier-apple-operations.md)**
- **27[Authority Key Rotation Playbook](ops/authority-key-rotation.md)**
- **28[Concelier CCCS Connector Operations](ops/concelier-cccs-operations.md)**
- **29[Concelier CISA ICS Connector Operations](ops/concelier-icscisa-operations.md)**
- **30[Concelier CERT-Bund Connector Operations](ops/concelier-certbund-operations.md)**
- **31[Concelier MSRC Connector AAD Onboarding](ops/concelier-msrc-operations.md)**
### Legal & licence
- **31[Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**
- **32[Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**
</details>

View File

@@ -3,12 +3,19 @@
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|----|--------|----------|------------|-------------|---------------|
| DOC7.README-INDEX | DONE (2025-10-17) | Docs Guild | — | Refresh index docs (docs/README.md + root README) after architecture dossier split and Offline Kit overhaul. | ✅ ToC reflects new component architecture docs; ✅ root README highlights updated doc set; ✅ Offline Kit guide linked correctly. |
| DOC4.AUTH-PDG | REVIEW | Docs Guild, Plugin Team | PLG6.DOC | Copy-edit `docs/dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md`, export lifecycle diagram, add LDAP RFC cross-link. | ✅ PR merged with polish; ✅ Diagram committed; ✅ Slack handoff posted. |
| DOC4.AUTH-PDG | DONE (2025-10-19) | Docs Guild, Plugin Team | PLG6.DOC | Copy-edit `docs/dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md`, export lifecycle diagram, add LDAP RFC cross-link. | ✅ PR merged with polish; ✅ Diagram committed; ✅ Slack handoff posted. |
| DOC1.AUTH | DONE (2025-10-12) | Docs Guild, Authority Core | CORE5B.DOC | Draft `docs/11_AUTHORITY.md` covering architecture, configuration, bootstrap flows. | ✅ Architecture + config sections approved by Core; ✅ Samples reference latest options; ✅ Offline note added. |
| DOC3.Feedser-Authority | DONE (2025-10-12) | Docs Guild, DevEx | FSR4 | Polish operator/runbook sections (DOC3/DOC5) to document Feedser authority rollout, bypass logging, and enforcement checklist. | ✅ DOC3/DOC5 updated with audit runbook references; ✅ enforcement deadline highlighted; ✅ Docs guild sign-off. |
| DOC5.Feedser-Runbook | DONE (2025-10-12) | Docs Guild | DOC3.Feedser-Authority | Produce dedicated Feedser authority audit runbook covering log fields, monitoring recommendations, and troubleshooting steps. | ✅ Runbook published; ✅ linked from DOC3/DOC5; ✅ alerting guidance included. |
| FEEDDOCS-DOCS-05-001 | DONE (2025-10-11) | Docs Guild | FEEDMERGE-ENGINE-04-001, FEEDMERGE-ENGINE-04-002 | Publish Feedser conflict resolution runbook covering precedence workflow, merge-event auditing, and Sprint 3 metrics. | ✅ `docs/ops/feedser-conflict-resolution.md` committed; ✅ metrics/log tables align with latest merge code; ✅ Ops alert guidance handed to Feedser team. |
| FEEDDOCS-DOCS-05-002 | DONE (2025-10-16) | Docs Guild, Feedser Ops | FEEDDOCS-DOCS-05-001 | Ops sign-off captured: conflict runbook circulated, alert thresholds tuned, and rollout decisions documented in change log. | ✅ Ops review recorded; ✅ alert thresholds finalised using `docs/ops/feedser-authority-audit-runbook.md`; ✅ change-log entry linked from runbook once GHSA/NVD/OSV regression fixtures land. |
| DOC3.Concelier-Authority | DONE (2025-10-12) | Docs Guild, DevEx | FSR4 | Polish operator/runbook sections (DOC3/DOC5) to document Concelier authority rollout, bypass logging, and enforcement checklist. | ✅ DOC3/DOC5 updated with audit runbook references; ✅ enforcement deadline highlighted; ✅ Docs guild sign-off. |
| DOC5.Concelier-Runbook | DONE (2025-10-12) | Docs Guild | DOC3.Concelier-Authority | Produce dedicated Concelier authority audit runbook covering log fields, monitoring recommendations, and troubleshooting steps. | ✅ Runbook published; ✅ linked from DOC3/DOC5; ✅ alerting guidance included. |
| FEEDDOCS-DOCS-05-001 | DONE (2025-10-11) | Docs Guild | FEEDMERGE-ENGINE-04-001, FEEDMERGE-ENGINE-04-002 | Publish Concelier conflict resolution runbook covering precedence workflow, merge-event auditing, and Sprint 3 metrics. | ✅ `docs/ops/concelier-conflict-resolution.md` committed; ✅ metrics/log tables align with latest merge code; ✅ Ops alert guidance handed to Concelier team. |
| FEEDDOCS-DOCS-05-002 | DONE (2025-10-16) | Docs Guild, Concelier Ops | FEEDDOCS-DOCS-05-001 | Ops sign-off captured: conflict runbook circulated, alert thresholds tuned, and rollout decisions documented in change log. | ✅ Ops review recorded; ✅ alert thresholds finalised using `docs/ops/concelier-authority-audit-runbook.md`; ✅ change-log entry linked from runbook once GHSA/NVD/OSV regression fixtures land. |
| DOCS-ADR-09-001 | DONE (2025-10-19) | Docs Guild, DevEx | — | Establish ADR process (`docs/adr/0000-template.md`) and document usage guidelines. | Template published; README snippet linking ADR process; announcement posted (`docs/updates/2025-10-18-docs-guild.md`). |
| DOCS-EVENTS-09-002 | DONE (2025-10-19) | Docs Guild, Platform Events | SCANNER-EVENTS-15-201 | Publish event schema catalog (`docs/events/`) for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, `attestor.logged@1`. | Schemas validated (Ajv CI hooked); docs/events/README summarises usage; Platform Events notified via `docs/updates/2025-10-18-docs-guild.md`. |
| DOCS-EVENTS-09-003 | DONE (2025-10-19) | Docs Guild | DOCS-EVENTS-09-002 | Add human-readable envelope field references and canonical payload samples for published events, including offline validation workflow. | Tables explain common headers/payload segments; versioned sample payloads committed; README links to validation instructions and samples. |
| DOCS-EVENTS-09-004 | DONE (2025-10-19) | Docs Guild, Scanner WebService | SCANNER-EVENTS-15-201 | Refresh scanner event docs to mirror DSSE-backed report fields, document `scanner.scan.completed`, and capture canonical sample validation. | Schemas updated for new payload shape; README references DSSE reuse and validation test; samples align with emitted events. |
| PLATFORM-EVENTS-09-401 | DONE (2025-10-19) | Platform Events Guild | DOCS-EVENTS-09-003 | Embed canonical event samples into contract/integration tests and ensure CI validates payloads against published schemas. | Notify/Scheduler contract suites exercise samples; CI job validates samples with `ajv-cli`; Platform Events changelog notes coverage. |
| RUNTIME-GUILD-09-402 | DONE (2025-10-19) | Runtime Guild | SCANNER-POLICY-09-107 | Confirm Scanner WebService surfaces `quietedFindingCount` and progress hints to runtime consumers; document readiness checklist. | Runtime verification run captures enriched payload; checklist/doc updates merged; stakeholders acknowledge availability. |
| DOCS-RUNTIME-17-004 | TODO | Docs Guild, Runtime Guild | SCANNER-EMIT-17-701, ZASTAVA-OBS-17-005, DEVOPS-REL-17-002 | Document build-id workflows: SBOM exposure, runtime event payloads, debug-store layout, and operator guidance for symbol retrieval. | Architecture + operator docs updated with build-id sections, examples show `readelf` output + debuginfod usage, references linked from Offline Kit/Release guides. |
> Update statuses (TODO/DOING/REVIEW/DONE/BLOCKED) as progress changes. Keep guides in sync with configuration samples under `etc/`.

34
docs/adr/0000-template.md Normal file
View File

@@ -0,0 +1,34 @@
# ADR-0000: Title
## Status
Proposed
## Date
YYYY-MM-DD
## Authors
- Name (team)
## Deciders
- Names of approvers / reviewers
## Context
- What decision needs to be made?
- What are the forces (requirements, constraints, stakeholders)?
- Why now? What triggers the ADR?
## Decision
- Summary of the chosen option.
- Key rationale points.
## Consequences
- Positive/negative consequences.
- Follow-up actions or tasks.
- Rollback plan or re-evaluation criteria.
## Alternatives Considered
- Option A — pros/cons.
- Option B — pros/cons.
## References
- Links to related ADRs, issues, documents.

41
docs/adr/index.md Normal file
View File

@@ -0,0 +1,41 @@
# Architecture Decision Records (ADRs)
Architecture Decision Records document long-lived choices that shape StellaOps architecture, security posture, and operator experience. They complement RFCs by capturing the final call and the context that led to it.
## When to file an ADR
- Decisions that affect cross-module contracts, persistence models, or external interfaces.
- Security or compliance controls with on-going operational ownership.
- Rollout strategies that require coordination across guilds or sprints.
- Reversals or deprecations of previously accepted ADRs.
Small, module-local refactors that do not modify public behaviour can live in commit messages instead.
## Workflow at a glance
1. Copy `docs/adr/0000-template.md` to `docs/adr/NNNN-short-slug.md` with a zero-padded sequence (see **Numbering**).
2. Fill in context, decision, consequences, and alternatives. Include links to RFCs, issues, benchmarks, or experiments.
3. Request async review from the impacted guilds. Capture sign-offs in the **Deciders** field.
4. Merge the ADR with the code/config changes (or in a preparatory PR).
5. Announce the accepted ADR in the Docs Guild channel or sprint notes so downstream teams can consume it.
## Numbering and status
- Use zero-padded integers (e.g., `0001`, `0002`) in file names and the document header. Increment from the highest existing number.
- Valid statuses: `Proposed`, `Accepted`, `Rejected`, `Deprecated`, `Superseded`. Update the status when follow-up work lands.
- When an ADR supersedes another, link them in both documents **References** sections.
## Review expectations
- Highlight edge-case handling, trade-offs, and determinism requirements.
- Include operational checklists for any new runtime path (quota updates, schema migrations, credential rotation, etc.).
- Attach diagrams under `docs/adr/assets/` when visuals improve comprehension.
- Add TODO tasks for follow-up work in the relevant modules `TASKS.md` and link them from the ADR.
## Verification checklist
- [ ] `Status`, `Date`, `Authors`, and `Deciders` populated.
- [ ] Links to code/config PRs or experiments recorded under **References**.
- [ ] Consequences call out migration or rollback steps.
- [ ] Announcement posted to Docs Guild updates (or sprint log).
## Related resources
- [Docs Guild Task Board](../TASKS.md)
- [High-Level Architecture Overview](../07_HIGH_LEVEL_ARCHITECTURE.md)
- [Coding Standards](../18_CODING_STANDARDS.md)
- [Release Engineering Playbook](../13_RELEASE_ENGINEERING_PLAYBOOK.md)

View File

@@ -0,0 +1,50 @@
# StellaOps BOM Index (`bom-index@1`)
The BOM index is a deterministic, offline-friendly sidecar that accelerates queries for
layer-to-component membership and entrypoint usage. It is emitted alongside CycloneDX
SBOMs and consumed by Scheduler/Notify services.
## File Layout
Binary little-endian encoding, organised as the following sections:
1. **Header**
- `magic` (`byte[7]`): ASCII `"BOMIDX1"` identifier.
- `version` (`uint16`): current value `1`.
- `flags` (`uint16`): bit `0` set when entrypoint usage bitmaps are present.
- `imageDigestLength` (`uint16`) + UTF-8 digest string (e.g. `sha256:...`).
- `generatedAt` (`int64`): microseconds since Unix epoch.
- `layerCount` (`uint32`), `componentCount` (`uint32`), `entrypointCount` (`uint32`).
2. **Layer Table**
- For each layer: `length` (`uint16`) + UTF-8 layer digest (canonical order, base image → top layer).
3. **Component Table**
- For each component: `length` (`uint16`) + UTF-8 identity (CycloneDX purl when available, otherwise canonical key).
4. **Component ↦ Layer Bitmaps**
- For each component (matching table order):
- `bitmapLength` (`uint32`).
- Roaring bitmap payload (`Collections.Special.RoaringBitmap.Serialize`) encoding layer indexes that introduce or retain the component.
5. **Entrypoint Table** *(optional; present when `flags & 0x1 == 1`)*
- For each unique entrypoint/launcher string: `length` (`uint16`) + UTF-8 value (sorted ordinally).
6. **Component ↦ Entrypoint Bitmaps** *(optional)*
- For each component: roaring bitmap whose set bits reference entrypoint indexes used by EntryTrace. Empty bitmap (`length == 0`) indicates the component is not part of any resolved entrypoint closure.
## Determinism Guarantees
* Layer, component, and entrypoint tables are strictly ordered (base → top layer, lexicographically for components and entrypoints).
* Roaring bitmaps are optimised prior to serialisation and always produced from sorted indexes.
* Header timestamp is normalised to microsecond precision using UTC.
## Sample
`sample-index.bin` is generated from the integration fixture used in unit tests. It contains:
* 2 layers: `sha256:layer1`, `sha256:layer2`.
* 3 components: `pkg:npm/a`, `pkg:npm/b`, `pkg:npm/c`.
* Entrypoint bitmaps for `/app/start.sh` and `/app/init.sh`.
The sample can be decoded with the `BomIndexBuilder` unit tests or any RoaringBitmap implementation compatible with `Collections.Special.RoaringBitmap`.

Binary file not shown.

View File

@@ -1,105 +1,105 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1180 400" role="img">
<title>Authority rate limit and lockout flow</title>
<defs>
<style>
.lane { fill: #0f172a; }
.lane text { fill: #e2e8f0; font: bold 18px "Segoe UI", sans-serif; }
.step { fill: #1f2937; stroke: #1e3a8a; stroke-width: 2; rx: 12; ry: 12; }
.step text { fill: #f8fafc; font: 15px "Segoe UI", sans-serif; }
.decision { fill: #0f766e; stroke: #0d9488; stroke-width: 2; rx: 12; ry: 12; }
.decision text { fill: #ecfeff; font: 15px "Segoe UI", sans-serif; }
.note { fill: #1e293b; font: italic 14px "Segoe UI", sans-serif; }
</style>
<marker id="arrow-blue" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#2563eb" />
</marker>
<marker id="arrow-green" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#0d9488" />
</marker>
<marker id="arrow-red" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#dc2626" />
</marker>
</defs>
<rect class="lane" x="40" y="20" width="160" height="40" rx="8" />
<text x="60" y="48">Client / App</text>
<rect class="lane" x="300" y="20" width="200" height="40" rx="8" />
<text x="320" y="48">Authority Host</text>
<rect class="lane" x="560" y="20" width="210" height="40" rx="8" />
<text x="580" y="48">Rate Limiter</text>
<rect class="lane" x="840" y="20" width="150" height="40" rx="8" />
<text x="860" y="48">Standard Plug-in</text>
<rect class="lane" x="1040" y="20" width="120" height="40" rx="8" />
<text x="1060" y="48">Credential Store</text>
<!-- Flow steps -->
<g transform="translate(40,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">POST /token request</text>
<text x="20" y="64">client_id + subject creds</text>
</g>
<g transform="translate(300,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Authority middleware</text>
<text x="20" y="64">enriches context tags</text>
</g>
<g transform="translate(580,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Rate limiter window</text>
<text x="20" y="64">client_id + IP keyed</text>
</g>
<g transform="translate(580,210)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Quota exceeded?</text>
<text x="20" y="64">emit Retry-After &amp; tags</text>
</g>
<g transform="translate(580,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Quota OK</text>
<text x="20" y="64">pass remaining tokens</text>
</g>
<g transform="translate(840,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Verify credentials</text>
<text x="20" y="64">hash compare + audit tags</text>
</g>
<g transform="translate(1040,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Load lockout state</text>
<text x="20" y="64">deterministic counters</text>
</g>
<g transform="translate(840,330)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Lockout threshold hit?</text>
<text x="20" y="64">follow dedup precedence</text>
</g>
<g transform="translate(300,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Issue tokens or errors</text>
<text x="20" y="64">include limiter metadata</text>
</g>
<!-- Arrows -->
<path d="M260,135 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M520,135 H580" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,180 V210" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,300 V330" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M800,375 H840" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,255 H1040" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,375 H840" stroke="#0d9488" stroke-width="3" fill="none" marker-end="url(#arrow-green)" />
<path d="M670,255 H520" stroke="#dc2626" stroke-width="3" fill="none" marker-end="url(#arrow-red)" />
<path d="M520,375 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M260,375 H40" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<!-- Notes -->
<text class="note" x="40" y="210">429 path → add `authority.client_id`, `authority.remote_ip` tags for dashboards.</text>
<text class="note" x="40" y="240">Lockout path → reuse precedence strategy from Feedser dedup (see DEDUP_CONFLICTS_RESOLUTION_ALGO.md).</text>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1180 400" role="img">
<title>Authority rate limit and lockout flow</title>
<defs>
<style>
.lane { fill: #0f172a; }
.lane text { fill: #e2e8f0; font: bold 18px "Segoe UI", sans-serif; }
.step { fill: #1f2937; stroke: #1e3a8a; stroke-width: 2; rx: 12; ry: 12; }
.step text { fill: #f8fafc; font: 15px "Segoe UI", sans-serif; }
.decision { fill: #0f766e; stroke: #0d9488; stroke-width: 2; rx: 12; ry: 12; }
.decision text { fill: #ecfeff; font: 15px "Segoe UI", sans-serif; }
.note { fill: #1e293b; font: italic 14px "Segoe UI", sans-serif; }
</style>
<marker id="arrow-blue" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#2563eb" />
</marker>
<marker id="arrow-green" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#0d9488" />
</marker>
<marker id="arrow-red" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#dc2626" />
</marker>
</defs>
<rect class="lane" x="40" y="20" width="160" height="40" rx="8" />
<text x="60" y="48">Client / App</text>
<rect class="lane" x="300" y="20" width="200" height="40" rx="8" />
<text x="320" y="48">Authority Host</text>
<rect class="lane" x="560" y="20" width="210" height="40" rx="8" />
<text x="580" y="48">Rate Limiter</text>
<rect class="lane" x="840" y="20" width="150" height="40" rx="8" />
<text x="860" y="48">Standard Plug-in</text>
<rect class="lane" x="1040" y="20" width="120" height="40" rx="8" />
<text x="1060" y="48">Credential Store</text>
<!-- Flow steps -->
<g transform="translate(40,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">POST /token request</text>
<text x="20" y="64">client_id + subject creds</text>
</g>
<g transform="translate(300,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Authority middleware</text>
<text x="20" y="64">enriches context tags</text>
</g>
<g transform="translate(580,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Rate limiter window</text>
<text x="20" y="64">client_id + IP keyed</text>
</g>
<g transform="translate(580,210)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Quota exceeded?</text>
<text x="20" y="64">emit Retry-After &amp; tags</text>
</g>
<g transform="translate(580,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Quota OK</text>
<text x="20" y="64">pass remaining tokens</text>
</g>
<g transform="translate(840,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Verify credentials</text>
<text x="20" y="64">hash compare + audit tags</text>
</g>
<g transform="translate(1040,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Load lockout state</text>
<text x="20" y="64">deterministic counters</text>
</g>
<g transform="translate(840,330)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Lockout threshold hit?</text>
<text x="20" y="64">follow dedup precedence</text>
</g>
<g transform="translate(300,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Issue tokens or errors</text>
<text x="20" y="64">include limiter metadata</text>
</g>
<!-- Arrows -->
<path d="M260,135 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M520,135 H580" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,180 V210" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,300 V330" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M800,375 H840" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,255 H1040" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,375 H840" stroke="#0d9488" stroke-width="3" fill="none" marker-end="url(#arrow-green)" />
<path d="M670,255 H520" stroke="#dc2626" stroke-width="3" fill="none" marker-end="url(#arrow-red)" />
<path d="M520,375 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M260,375 H40" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<!-- Notes -->
<text class="note" x="40" y="210">429 path → add `authority.client_id`, `authority.remote_ip` tags for dashboards.</text>
<text class="note" x="40" y="240">Lockout path → reuse precedence strategy from Concelier dedup (see DEDUP_CONFLICTS_RESOLUTION_ALGO.md).</text>
</svg>

Before

Width:  |  Height:  |  Size: 4.8 KiB

After

Width:  |  Height:  |  Size: 4.9 KiB

View File

@@ -241,7 +241,59 @@ jobs:
---
## 4·Troubleshooting cheatsheet
## 4·Docs CI (Gitea Actions & Offline Mirror)
StellaOps ships a dedicated Docs workflow at `.gitea/workflows/docs.yml`. When mirroring the pipeline offline or running it locally, install the same toolchain so markdown linting, schema validation, and HTML preview stay deterministic.
### 4.1 Toolchain bootstrap
```bash
# Node.js 20.x is required; install once per runner
npm install --no-save \
markdown-link-check \
remark-cli \
remark-preset-lint-recommended \
ajv \
ajv-cli \
ajv-formats
# Python 3.11+ powers the preview renderer
python -m pip install --upgrade pip
python -m pip install markdown pygments
```
**Offline tip.** Add the packages above to your artifact mirror (for example `ops/devops/offline-kit.json`) so runners can install them via `npm --offline` / `pip --no-index`.
### 4.2 Schema validation step
Ajv compiles every event schema to guard against syntax or format regressions. The workflow uses `ajv-formats` for UUID/date-time support.
```bash
for schema in docs/events/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
```
Run this loop before committing schema changes. For new references, append `-r additional-file.json` so CI and local runs stay aligned.
### 4.3 Preview build
```bash
python scripts/render_docs.py --source docs --output artifacts/docs-preview --clean
```
Host the resulting bundle via any static file server for review (for example `python -m http.server`).
### 4.4 Publishing checklist
- [ ] Toolchain installs succeed without hitting the public internet (mirror or cached tarballs).
- [ ] Ajv validation passes for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, `attestor.logged@1`.
- [ ] Markdown link check (`npx markdown-link-check`) reports no broken references.
- [ ] Preview bundle archived (or attached) for stakeholders.
---
## 5·Troubleshooting cheatsheet
| Symptom | Root cause | First things to try |
| ------------------------------------- | --------------------------- | --------------------------------------------------------------- |
@@ -253,6 +305,7 @@ jobs:
---
### Change log
* **20250804** Variable cleanup, removed Dockersocket & cache mounts, added Jenkins / CircleCI / Gitea examples, clarified Option B comment.
### Change log
* **20251018** Documented Docs CI toolchain (Ajv validation, static preview) and offline checklist.
* **20250804** Variable cleanup, removed Dockersocket & cache mounts, added Jenkins / CircleCI / Gitea examples, clarified Option B comment.

View File

@@ -1,4 +1,4 @@
# Feedser Connector Research 2025-10-11
# Concelier Connector Research 2025-10-11
Snapshot of direct network checks performed on 2025-10-11 (UTC) for the national/vendor connectors in scope. Use alongside each modules `TASKS.md` notes.
@@ -8,7 +8,7 @@ Snapshot of direct network checks performed on 2025-10-11 (UTC) for the national
## CCCS (Canada)
- JSON endpoint (`https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=<lang>&content_type=cccs_threat`) returns ~5100 records per language; `page=<n>` still works for segmented pulls and the earliest `date_created` seen is 20180608 (EN) / 20180608 (FR). Use an explicit `User-Agent` to avoid 403 responses.
- Follow-up: telemetry, sanitiser coverage, and backfill procedures are documented in `docs/ops/feedser-cccs-operations.md` (20251015). Adjust `maxEntriesPerFetch` when performing historical sweeps so cursor state remains responsive.
- Follow-up: telemetry, sanitiser coverage, and backfill procedures are documented in `docs/ops/concelier-cccs-operations.md` (20251015). Adjust `maxEntriesPerFetch` when performing historical sweeps so cursor state remains responsive.
## CERT-Bund (Germany)
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss` responds 200 without cookies (≈250-item window, German taxonomy). Detail links load an Angular SPA that fetches JSON behind the bootstrap session.
@@ -16,7 +16,7 @@ Snapshot of direct network checks performed on 2025-10-11 (UTC) for the national
- Historical advisories accessible through the SPA search/export endpoints once the `XSRF-TOKEN` cookie (exposed via `GET /portal/api/security/csrf`) is supplied with the `X-XSRF-TOKEN` header:
- `POST /portal/api/securityadvisory/search` (`{"page":N,"size":100,"sort":["published,desc"]}`) pages data back to 2014.
- `GET /portal/api/securityadvisory/export?format=json&from=YYYY-MM-DD` emits JSON bundles suitable for Offline Kit mirrors.
- Locale note: content is German-only; Feedser preserves `language=de` and Docs will publish a CERT-Bund glossary so operators can bridge terminology without machine translation.
- Locale note: content is German-only; Concelier preserves `language=de` and Docs will publish a CERT-Bund glossary so operators can bridge terminology without machine translation.
## KISA / KNVD (Korea)
- `https://knvd.krcert.or.kr/rss/securityInfo.do` and `/rss/securityNotice.do` return UTF-8 RSS (10-item window) with `detailDos.do?IDX=` links. No cookies required for feed fetch.
@@ -24,11 +24,11 @@ Snapshot of direct network checks performed on 2025-10-11 (UTC) for the national
## BDU (Russia / FSTEC)
- Candidate endpoints (`https://bdu.fstec.ru/component/rsform/form/7-bdu?format=xml/json`) return 403/404; TLS chain requires Russian Trusted Sub CA and WAF expects additional headers.
- Next actions: acquire official PEM chain, point `feedser:httpClients:source.bdu:trustedRootPaths` (or `feedser:sources:bdu:http:trustedRootPaths`) at the Offline Kit PEM, keep `allowInvalidCertificates=false`, script session bootstrap, then capture RSS/HTML schema for parser work.
- Next actions: acquire official PEM chain, point `concelier:httpClients:source.bdu:trustedRootPaths` (or `concelier:sources:bdu:http:trustedRootPaths`) at the Offline Kit PEM, keep `allowInvalidCertificates=false`, script session bootstrap, then capture RSS/HTML schema for parser work.
## NKTsKI / cert.gov.ru (Russia)
- `https://cert.gov.ru/rss/advisories.xml` served via Bitrix returns 403/404 even with `Accept-Language: ru-RU`; TLS chain also requires Russian trust anchors.
- Next actions: source trust store, configure `feedser:httpClients:source.nkcki:trustedRootPaths` (Offline Kit root via `feedser:offline:root`), prepare proxy fallback, and once accessible document taxonomy/retention plus attachment handling.
- Next actions: source trust store, configure `concelier:httpClients:source.nkcki:trustedRootPaths` (Offline Kit root via `concelier:offline:root`), prepare proxy fallback, and once accessible document taxonomy/retention plus attachment handling.
## CISA ICS (United States)
- `curl -I https://www.cisa.gov/cybersecurity-advisories/ics-advisories.xml` returns HTTP 403 + `x-reference-error` (Akamai). Same for legacy feed paths.

View File

@@ -0,0 +1,220 @@
# Excititor Connector Packaging Guide
> **Audience:** teams implementing new Excititor provider plugins (CSAF feeds,
> OpenVEX attestations, etc.)
> **Prerequisites:** read `docs/ARCHITECTURE_EXCITITOR.md` and the module
> `AGENTS.md` in `src/StellaOps.Excititor.Connectors.Abstractions/`.
The Excititor connector SDK gives you:
- `VexConnectorBase` deterministic logging, SHA256 helpers, time provider.
- `VexConnectorOptionsBinder` strongly typed YAML/JSON configuration binding.
- `IVexConnectorOptionsValidator<T>` custom validation hooks (offline defaults, auth invariants).
- `VexConnectorDescriptor` & metadata helpers for consistent telemetry.
This guide explains how to package a connector so the Excititor Worker/WebService
can load it via the plugin host.
---
## 1. Project layout
Start from the template under
`docs/dev/templates/excititor-connector/`. It contains:
```
Excititor.MyConnector/
├── src/
│ ├── Excititor.MyConnector.csproj
│ ├── MyConnectorOptions.cs
│ ├── MyConnector.cs
│ └── MyConnectorPlugin.cs
└── manifest/
└── connector.manifest.yaml
```
Key points:
- Target `net10.0`, enable `TreatWarningsAsErrors`, reference the
`StellaOps.Excititor.Connectors.Abstractions` project (or NuGet once published).
- Keep project ID prefix `StellaOps.Excititor.Connectors.<Provider>` so the
plugin loader can discover it with the default search pattern.
### 1.1 csproj snippet
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\src\StellaOps.Excititor.Connectors.Abstractions\StellaOps.Excititor.Connectors.Abstractions.csproj" />
</ItemGroup>
</Project>
```
Adjust the `ProjectReference` for your checkout (or switch to a NuGet package
once published).
---
## 2. Implement the connector
1. **Options model** create an options POCO with data-annotation attributes.
Bind it via `VexConnectorOptionsBinder.Bind<TOptions>` in your connector
constructor or `ValidateAsync`.
2. **Validator** implement `IVexConnectorOptionsValidator<TOptions>` to add
complex checks (e.g., ensure both `clientId` and `clientSecret` are present).
3. **Connector** inherit from `VexConnectorBase`. Implement:
- `ValidateAsync` run binder/validators, log configuration summary.
- `FetchAsync` stream raw documents to `context.RawSink`.
- `NormalizeAsync` convert raw documents into `VexClaimBatch` via
format-specific normalizers (`context.Normalizers`).
4. **Plugin adapter** expose the connector via a plugin entry point so the
host can instantiate it.
### 2.1 Options binding example
```csharp
public sealed class MyConnectorOptions
{
[Required]
[Url]
public string CatalogUri { get; set; } = default!;
[Required]
public string ApiKey { get; set; } = default!;
[Range(1, 64)]
public int MaxParallelRequests { get; set; } = 4;
}
public sealed class MyConnectorOptionsValidator : IVexConnectorOptionsValidator<MyConnectorOptions>
{
public void Validate(VexConnectorDescriptor descriptor, MyConnectorOptions options, IList<string> errors)
{
if (!options.CatalogUri.StartsWith("https://", StringComparison.OrdinalIgnoreCase))
{
errors.Add("CatalogUri must use HTTPS.");
}
}
}
```
Bind inside the connector:
```csharp
private readonly MyConnectorOptions _options;
public MyConnector(VexConnectorDescriptor descriptor, ILogger<MyConnector> logger, TimeProvider timeProvider)
: base(descriptor, logger, timeProvider)
{
// `settings` comes from the orchestrator; validators registered via DI.
_options = VexConnectorOptionsBinder.Bind<MyConnectorOptions>(
descriptor,
VexConnectorSettings.Empty,
validators: new[] { new MyConnectorOptionsValidator() });
}
```
Replace `VexConnectorSettings.Empty` with the actual settings from context
inside `ValidateAsync`.
---
## 3. Plugin adapter & manifest
Create a simple plugin class that implements
`StellaOps.Plugin.IConnectorPlugin`. The Worker/WebService plugin host uses
this contract today.
```csharp
public sealed class MyConnectorPlugin : IConnectorPlugin
{
private static readonly VexConnectorDescriptor Descriptor =
new("excititor:my-provider", VexProviderKind.Vendor, "My Provider VEX");
public string Name => Descriptor.DisplayName;
public bool IsAvailable(IServiceProvider services) => true; // inject feature flags if needed
public IFeedConnector Create(IServiceProvider services)
{
var logger = services.GetRequiredService<ILogger<MyConnector>>();
var timeProvider = services.GetRequiredService<TimeProvider>();
return new MyConnector(Descriptor, logger, timeProvider);
}
}
```
> **Note:** the Excititor Worker currently instantiates connectors through the
> shared `IConnectorPlugin` contract. Once a dedicated Excititor plugin interface
> lands you simply swap the base interface; the descriptor/connector code
> remains unchanged.
Provide a manifest describing the assembly for operational tooling:
```yaml
# manifest/connector.manifest.yaml
id: excititor-my-provider
assembly: StellaOps.Excititor.Connectors.MyProvider.dll
entryPoint: StellaOps.Excititor.Connectors.MyProvider.MyConnectorPlugin
description: >
Official VEX feed for ExampleCorp products (CSAF JSON, daily updates).
tags:
- excititor
- csaf
- vendor
```
Store manifests under `/opt/stella/excititor/plugins/<connector>/manifest/` in
production so the deployment tooling can inventory and verify plugins.
---
## 4. Packaging workflow
1. `dotnet publish -c Release` → copy the published DLLs to
`/opt/stella/excititor/plugins/<Provider>/`.
2. Place `connector.manifest.yaml` next to the binaries.
3. Restart the Excititor Worker or WebService (hot reload not supported yet).
4. Verify logs: `VEX-ConnectorLoader` should list the connector descriptor.
### 4.1 Offline kits
- Add the connector folder (binaries + manifest) to the Offline Kit bundle.
- Include a `settings.sample.yaml` demonstrating offline-friendly defaults.
- Document any external dependencies (e.g., SHA mirrors) in the manifest `notes`
field.
---
## 5. Testing checklist
- Unit tests around options binding & validators.
- Integration tests (future `StellaOps.Excititor.Connectors.Abstractions.Tests`)
verifying deterministic logging scopes:
`logger.BeginScope` should produce `vex.connector.id`, `vex.connector.kind`,
and `vex.connector.operation`.
- Deterministic SHA tests: repeated `CreateRawDocument` calls with identical
content must return the same digest.
---
## 6. Reference template
See `docs/dev/templates/excititor-connector/` for the full quickstart including:
- Sample options class + validator.
- Connector implementation inheriting from `VexConnectorBase`.
- Plugin adapter + manifest.
Copy the directory, rename namespaces/IDs, then iterate on provider-specific
logic.
---
*Last updated: 2025-10-17*

View File

@@ -1,210 +1,212 @@
# Authority Plug-in Developer Guide
> **Status:** Updated 2025-10-11 (AUTHPLUG-DOCS-01-001) with lifecycle + limiter diagrams and refreshed rate-limit guidance aligned to PLG6 acceptance criteria.
## 1. Overview
Authority plug-ins extend the **StellaOps Authority** service with custom identity providers, credential stores, and client-management logic. Unlike Feedser plug-ins (which ingest or export advisories), Authority plug-ins participate directly in authentication flows:
- **Use cases:** integrate corporate directories (LDAP/AD), delegate to external IDPs, enforce bespoke password/lockout policies, or add client provisioning automation.
- **Constraints:** plug-ins load only during service start (no hot-reload), must function without outbound internet access, and must emit deterministic results for identical configuration and input data.
- **Ship targets:** target the same .NET 10 preview as the host, honour offline-first requirements, and provide clear diagnostics so operators can triage issues from `/ready`.
## 2. Architecture Snapshot
Authority hosts follow a deterministic plug-in lifecycle. The flow below can be rendered as a sequence diagram in the final authored documentation, but all touchpoints are described here for offline viewers:
1. **Configuration load** `AuthorityPluginConfigurationLoader` resolves YAML manifests under `etc/authority.plugins/`.
2. **Assembly discovery** the shared `PluginHost` scans `PluginBinaries/Authority` for `StellaOps.Authority.Plugin.*.dll` assemblies.
3. **Registrar execution** each assembly is searched for `IAuthorityPluginRegistrar` implementations. Registrars bind options, register services, and optionally queue bootstrap tasks.
4. **Runtime** the host resolves `IIdentityProviderPlugin` instances, uses capability metadata to decide which OAuth grants to expose, and invokes health checks for readiness endpoints.
![Authority plug-in lifecycle diagram](../assets/authority/authority-plugin-lifecycle.svg)
_Source:_ `docs/assets/authority/authority-plugin-lifecycle.mmd`
**Data persistence primer:** the standard Mongo-backed plugin stores users in collections named `authority_users_<pluginName>` and lockout metadata in embedded documents. Additional plugins must document their storage layout and provide deterministic collection naming to honour the Offline Kit replication process.
## 3. Capability Metadata
Capability flags let the host reason about what your plug-in supports:
- Declare capabilities in your descriptor using the string constants from `AuthorityPluginCapabilities` (`password`, `mfa`, `clientProvisioning`, `bootstrap`). The configuration loader now validates these tokens and rejects unknown values at startup.
- `AuthorityIdentityProviderCapabilities.FromCapabilities` projects those strings into strongly typed booleans (`SupportsPassword`, etc.). Authority Core will use these flags when wiring flows such as the password grant. Built-in plugins (e.g., Standard) will fail fast or force-enable required capabilities if the descriptor is misconfigured, so keep manifests accurate.
- Typical configuration (`etc/authority.plugins/standard.yaml`):
```yaml
plugins:
descriptors:
standard:
assemblyName: "StellaOps.Authority.Plugin.Standard"
capabilities:
- password
- bootstrap
```
- Only declare a capability if the plug-in genuinely implements it. For example, if `SupportsClientProvisioning` is `true`, the plug-in must supply a working `IClientProvisioningStore`.
**Operational reminder:** the Authority host surfaces capability summaries during startup (see `AuthorityIdentityProviderRegistry` log lines). Use those logs during smoke tests to ensure manifests align with expectations.
**Configuration path normalisation:** Manifest-relative paths (e.g., `tokenSigning.keyDirectory: "../keys"`) are resolved against the YAML file location and environment variables are expanded before validation. Plug-ins should expect to receive an absolute, canonical path when options are injected.
**Password policy guardrails:** The Standard registrar logs a warning when a plug-in weakens the default password policy (minimum length or required character classes). Keep overrides at least as strong as the compiled defaults—operators treat the warning as an actionable security deviation.
## 4. Project Scaffold
- Target **.NET 10 preview**, enable nullable, treat warnings as errors, and mark Authority plug-ins with `<IsAuthorityPlugin>true</IsAuthorityPlugin>`.
- Minimum references:
- `StellaOps.Authority.Plugins.Abstractions` (contracts & capability helpers)
- `StellaOps.Plugin` (hosting/DI helpers)
- `StellaOps.Auth.*` libraries as needed for shared token utilities (optional today).
- Example `.csproj` (trimmed from `StellaOps.Authority.Plugin.Standard`):
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<IsAuthorityPlugin>true</IsAuthorityPlugin>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\StellaOps.Authority.Plugins.Abstractions\StellaOps.Authority.Plugins.Abstractions.csproj" />
<ProjectReference Include="..\..\StellaOps.Plugin\StellaOps.Plugin.csproj" />
</ItemGroup>
</Project>
```
(Add other references—e.g., MongoDB driver, shared auth libraries—according to your implementation.)
## 5. Implementing `IAuthorityPluginRegistrar`
- Create a parameterless registrar class that returns your plug-in type name via `PluginType`.
- Use `AuthorityPluginRegistrationContext` to:
- Bind options (`AddOptions<T>(pluginName).Bind(...)`).
- Register singletons for stores/enrichers using manifest metadata.
- Register any hosted bootstrap tasks (e.g., seed admin users).
- Always validate configuration inside `PostConfigure` and throw meaningful `InvalidOperationException` to fail fast during startup.
- Use the provided `ILoggerFactory` from DI; avoid static loggers or console writes.
- Example skeleton:
```csharp
internal sealed class MyPluginRegistrar : IAuthorityPluginRegistrar
{
public string PluginType => "my-custom";
public void Register(AuthorityPluginRegistrationContext context)
{
var name = context.Plugin.Manifest.Name;
context.Services.AddOptions<MyPluginOptions>(name)
.Bind(context.Plugin.Configuration)
.PostConfigure(opts => opts.Validate(name));
context.Services.AddSingleton<IIdentityProviderPlugin>(sp =>
new MyIdentityProvider(context.Plugin, sp.GetRequiredService<MyCredentialStore>(),
sp.GetRequiredService<MyClaimsEnricher>(),
sp.GetRequiredService<ILogger<MyIdentityProvider>>()));
}
}
```
## 6. Identity Provider Surface
- Implement `IIdentityProviderPlugin` to expose:
- `IUserCredentialStore` for password validation and user CRUD.
- `IClaimsEnricher` to append roles/attributes onto issued principals.
- Optional `IClientProvisioningStore` for machine-to-machine clients.
- `AuthorityIdentityProviderCapabilities` to advertise supported flows.
- Password guidance:
- Standard plug-in hashes via `ICryptoProvider` using Argon2id by default and emits PHC-compliant strings. Successful PBKDF2 logins trigger automatic rehashes so migrations complete gradually. See `docs/security/password-hashing.md` for tuning advice.
- Enforce password policies before hashing to avoid storing weak credentials.
- Health checks should probe backing stores (e.g., Mongo `ping`) and return `AuthorityPluginHealthResult` so `/ready` can surface issues.
- When supporting additional factors (e.g., TOTP), implement `SupportsMfa` and document the enrolment flow for resource servers.
## 7. Configuration & Secrets
- Authority looks for manifests under `etc/authority.plugins/`. Each YAML file maps directly to a plug-in name.
- Support environment overrides using `STELLAOPS_AUTHORITY_PLUGINS__DESCRIPTORS__<NAME>__...`.
- Never store raw secrets in git: allow operators to supply them via `.local.yaml`, environment variables, or injected secret files. Document which keys are mandatory.
- Validate configuration as soon as the registrar runs; use explicit error messages to guide operators. The Standard plug-in now enforces complete bootstrap credentials (username + password) and positive lockout windows via `StandardPluginOptions.Validate`.
- Cross-reference bootstrap workflows with `docs/ops/authority_bootstrap.md` (to be published alongside CORE6) so operators can reuse the same payload formats for manual provisioning.
- `passwordHashing` inherits defaults from `authority.security.passwordHashing`. Override only when hardware constraints differ per plug-in:
```yaml
passwordHashing:
algorithm: Argon2id
memorySizeInKib: 19456
iterations: 2
parallelism: 1
```
Invalid values (≤0) fail fast during startup, and legacy PBKDF2 hashes rehash automatically once the new algorithm succeeds.
### 7.1 Token Persistence Contract
- The host automatically persists every issued principal (access, refresh, device, authorization code) in `authority_tokens`. Plug-in code **must not** bypass this store; use the provided `IAuthorityTokenStore` helpers when implementing custom flows.
- When a plug-in disables a subject or client outside the standard handlers, call `IAuthorityTokenStore.UpdateStatusAsync(...)` for each affected token so revocation bundles stay consistent.
- Supply machine-friendly `revokedReason` codes (`compromised`, `rotation`, `policy`, `lifecycle`, etc.) and optional `revokedMetadata` entries when invalidating credentials. These flow straight into `revocation-bundle.json` and should remain deterministic.
- Token scopes should be normalised (trimmed, unique, ordinal sort) before returning from plug-in verification paths. `TokenPersistenceHandlers` will keep that ordering for downstream consumers.
### 7.2 Claims & Enrichment Checklist
- Authority always sets the OpenID Connect basics: `sub`, `client_id`, `preferred_username`, optional `name`, and `role` (for password flows). Plug-ins must use `IClaimsEnricher` to append additional claims in a **deterministic** order (sort arrays, normalise casing) so resource servers can rely on stable shapes.
- Recommended enrichment keys:
- `stellaops.realm` plug-in/tenant identifier so services can scope policies.
- `stellaops.subject.type` values such as `human`, `service`, `bootstrap`.
- `groups` / `projects` sorted arrays describing operator entitlements.
- Claims visible in tokens should mirror what `/token` and `/userinfo` emit. Avoid injecting sensitive PII directly; mark values with `ClassifiedString.Personal` inside the plug-in so audit sinks can tag them appropriately.
- For client-credential flows, remember to enrich both the client principal and the validation path (`TokenValidationHandlers`) so refresh flows keep the same metadata.
### 7.3 Revocation Bundles & Reasons
- Use `IAuthorityRevocationStore` to record subject/client/token revocations when credentials are deleted or rotated. Stick to the standard categories (`token`, `subject`, `client`, `key`).
- Include a deterministic `reason` string and optional `reasonDescription` so operators understand *why* a subject was revoked when inspecting bundles offline.
- Plug-ins should populate `metadata` with stable keys (e.g., `revokedBy`, `sourcePlugin`, `ticketId`) to simplify SOC correlation. The keys must be lowercase, ASCII, and free of secrets—bundles are mirrored to air-gapped agents.
## 8. Rate Limiting & Lockout Interplay
Rate limiting and account lockouts are complementary controls. Plug-ins must surface both deterministically so operators can correlate limiter hits with credential rejections.
**Baseline quotas** (from `docs/dev/authority-rate-limit-tuning-outline.md`):
| Endpoint | Default policy | Notes |
|----------|----------------|-------|
| `/token` | 30 requests / 60s, queue 0 | Drop to 10/60s for untrusted ranges; raise only with WAF + monitoring. |
| `/authorize` | 60 requests / 60s, queue 10 | Reduce carefully; interactive UX depends on headroom. |
| `/internal/*` | Disabled by default; recommended 5/60s when enabled | Keep queue 0 for bootstrap APIs. |
**Retry metadata:** The middleware stamps `Retry-After` plus tags `authority.client_id`, `authority.remote_ip`, and `authority.endpoint`. Plug-ins should keep these tags intact when crafting responses or telemetry so dashboards remain consistent.
**Lockout counters:** Treat lockouts as **subject-scoped** decisions. When multiple instances update counters, reuse the deterministic tie-breakers documented in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` (freshness overrides, precedence, and stable hashes) to avoid divergent lockout states across replicas.
**Alerting hooks:** Emit structured logs/metrics when either the limiter or credential store rejects access. Suggested gauges include `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` and any custom `auth.plugins.<pluginName>.lockouts_total` counter.
![Authority rate limit and lockout flow](../assets/authority/authority-rate-limit-flow.svg)
_Source:_ `docs/assets/authority/authority-rate-limit-flow.mmd`
## 9. Logging, Metrics, and Diagnostics
- Always log via the injected `ILogger<T>`; include `pluginName` and correlation IDs where available.
- Activity/metric names should align with `AuthorityTelemetry` constants (`service.name=stellaops-authority`).
- Expose additional diagnostics via structured logging rather than writing custom HTTP endpoints; the host will integrate these into `/health` and `/ready`.
- Emit metrics with stable names (`auth.plugins.<pluginName>.*`) when introducing custom instrumentation; coordinate with the Observability guild to reserve prefixes.
## 10. Testing & Tooling
- Unit tests: use Mongo2Go (or similar) to exercise credential stores without hitting production infrastructure (`StandardUserCredentialStoreTests` is a template).
- Determinism: fix timestamps to UTC and sort outputs consistently; avoid random GUIDs unless stable.
- Smoke tests: launch `dotnet run --project src/StellaOps.Authority/StellaOps.Authority` with your plug-in under `PluginBinaries/Authority` and verify `/ready`.
- Example verification snippet:
```csharp
[Fact]
public async Task VerifyPasswordAsync_ReturnsSuccess()
{
var store = CreateCredentialStore();
await store.UpsertUserAsync(new AuthorityUserRegistration("alice", "Pa55!", null, null, false,
Array.Empty<string>(), new Dictionary<string, string?>()), CancellationToken.None);
var result = await store.VerifyPasswordAsync("alice", "Pa55!", CancellationToken.None);
Assert.True(result.Succeeded);
Assert.True(result.User?.Roles.Count == 0);
}
```
## 11. Packaging & Delivery
- Output assembly should follow `StellaOps.Authority.Plugin.<Name>.dll` so the hosts search pattern picks it up.
- Place the compiled DLL plus dependencies under `PluginBinaries/Authority` for offline deployments; include hashes/signatures in release notes (Security Guild guidance forthcoming).
- Document any external prerequisites (e.g., CA cert bundle) in your plug-in README.
- Update `etc/authority.plugins/<plugin>.yaml` samples and include deterministic SHA256 hashes for optional bootstrap payloads when distributing Offline Kit artefacts.
## 12. Checklist & Handoff
- ✅ Capabilities declared and validated in automated tests.
- ✅ Bootstrap workflows documented (if `bootstrap` capability used) and repeatable.
- ✅ Local smoke test + unit/integration suites green (`dotnet test`).
- ✅ Operational docs updated: configuration keys, secrets guidance, troubleshooting.
- Submit the developer guide update referencing PLG6/DOC4 and tag DevEx + Docs reviewers for sign-off.
---
Mermaid sources for the embedded diagrams live under `docs/assets/authority/`. Regenerate the SVG assets with your preferred renderer before committing future updates so the visuals stay in sync with the `.mmd` definitions.
# Authority Plug-in Developer Guide
> **Status:** Updated 2025-10-11 (AUTHPLUG-DOCS-01-001) with lifecycle + limiter diagrams and refreshed rate-limit guidance aligned to PLG6 acceptance criteria.
## 1. Overview
Authority plug-ins extend the **StellaOps Authority** service with custom identity providers, credential stores, and client-management logic. Unlike Concelier plug-ins (which ingest or export advisories), Authority plug-ins participate directly in authentication flows:
- **Use cases:** integrate corporate directories (LDAP/AD)[^ldap-rfc], delegate to external IDPs, enforce bespoke password/lockout policies, or add client provisioning automation.
- **Constraints:** plug-ins load only during service start (no hot-reload), must function without outbound internet access, and must emit deterministic results for identical configuration input.
- **Ship targets:** build against the hosts .NET 10 preview SDK, honour offline-first requirements, and surface actionable diagnostics so operators can triage issues from `/ready`.
## 2. Architecture Snapshot
Authority hosts follow a deterministic plug-in lifecycle. The exported diagram (`docs/assets/authority/authority-plugin-lifecycle.svg`) mirrors the steps below; regenerate it from the Mermaid source if you update the flow.
1. **Configuration load** `AuthorityPluginConfigurationLoader` resolves YAML manifests under `etc/authority.plugins/`.
2. **Assembly discovery** the shared `PluginHost` scans `StellaOps.Authority.PluginBinaries` for `StellaOps.Authority.Plugin.*.dll` assemblies.
3. **Registrar execution** each assembly is searched for `IAuthorityPluginRegistrar` implementations. Registrars bind options, register services, and optionally queue bootstrap tasks.
4. **Runtime** the host resolves `IIdentityProviderPlugin` instances, uses capability metadata to decide which OAuth grants to expose, and invokes health checks for readiness endpoints.
![Authority plug-in lifecycle diagram](../assets/authority/authority-plugin-lifecycle.svg)
_Source:_ `docs/assets/authority/authority-plugin-lifecycle.mmd`
**Data persistence primer:** the standard Mongo-backed plugin stores users in collections named `authority_users_<pluginName>` and lockout metadata in embedded documents. Additional plugins must document their storage layout and provide deterministic collection naming to honour the Offline Kit replication process.
## 3. Capability Metadata
Capability flags let the host reason about what your plug-in supports:
- Declare capabilities in your descriptor using the string constants from `AuthorityPluginCapabilities` (`password`, `mfa`, `clientProvisioning`, `bootstrap`). The configuration loader now validates these tokens and rejects unknown values at startup.
- `AuthorityIdentityProviderCapabilities.FromCapabilities` projects those strings into strongly typed booleans (`SupportsPassword`, etc.). Authority Core will use these flags when wiring flows such as the password grant. Built-in plugins (e.g., Standard) will fail fast or force-enable required capabilities if the descriptor is misconfigured, so keep manifests accurate.
- Typical configuration (`etc/authority.plugins/standard.yaml`):
```yaml
plugins:
descriptors:
standard:
assemblyName: "StellaOps.Authority.Plugin.Standard"
capabilities:
- password
- bootstrap
```
- Only declare a capability if the plug-in genuinely implements it. For example, if `SupportsClientProvisioning` is `true`, the plug-in must supply a working `IClientProvisioningStore`.
**Operational reminder:** the Authority host surfaces capability summaries during startup (see `AuthorityIdentityProviderRegistry` log lines). Use those logs during smoke tests to ensure manifests align with expectations.
**Configuration path normalisation:** Manifest-relative paths (e.g., `tokenSigning.keyDirectory: "../keys"`) are resolved against the YAML file location and environment variables are expanded before validation. Plug-ins should expect to receive an absolute, canonical path when options are injected.
**Password policy guardrails:** The Standard registrar logs a warning when a plug-in weakens the default password policy (minimum length or required character classes). Keep overrides at least as strong as the compiled defaults—operators treat the warning as an actionable security deviation.
## 4. Project Scaffold
- Target **.NET 10 preview**, enable nullable, treat warnings as errors, and mark Authority plug-ins with `<IsAuthorityPlugin>true</IsAuthorityPlugin>`.
- Minimum references:
- `StellaOps.Authority.Plugins.Abstractions` (contracts & capability helpers)
- `StellaOps.Plugin` (hosting/DI helpers)
- `StellaOps.Auth.*` libraries as needed for shared token utilities (optional today).
- Example `.csproj` (trimmed from `StellaOps.Authority.Plugin.Standard`):
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<IsAuthorityPlugin>true</IsAuthorityPlugin>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\StellaOps.Authority.Plugins.Abstractions\StellaOps.Authority.Plugins.Abstractions.csproj" />
<ProjectReference Include="..\..\StellaOps.Plugin\StellaOps.Plugin.csproj" />
</ItemGroup>
</Project>
```
(Add other references—e.g., MongoDB driver, shared auth libraries—according to your implementation.)
## 5. Implementing `IAuthorityPluginRegistrar`
- Create a parameterless registrar class that returns your plug-in type name via `PluginType`.
- Use `AuthorityPluginRegistrationContext` to:
- Bind options (`AddOptions<T>(pluginName).Bind(...)`).
- Register singletons for stores/enrichers using manifest metadata.
- Register any hosted bootstrap tasks (e.g., seed admin users).
- Always validate configuration inside `PostConfigure` and throw meaningful `InvalidOperationException` to fail fast during startup.
- Use the provided `ILoggerFactory` from DI; avoid static loggers or console writes.
- Example skeleton:
```csharp
internal sealed class MyPluginRegistrar : IAuthorityPluginRegistrar
{
public string PluginType => "my-custom";
public void Register(AuthorityPluginRegistrationContext context)
{
var name = context.Plugin.Manifest.Name;
context.Services.AddOptions<MyPluginOptions>(name)
.Bind(context.Plugin.Configuration)
.PostConfigure(opts => opts.Validate(name));
context.Services.AddSingleton<IIdentityProviderPlugin>(sp =>
new MyIdentityProvider(context.Plugin, sp.GetRequiredService<MyCredentialStore>(),
sp.GetRequiredService<MyClaimsEnricher>(),
sp.GetRequiredService<ILogger<MyIdentityProvider>>()));
}
}
```
## 6. Identity Provider Surface
- Implement `IIdentityProviderPlugin` to expose:
- `IUserCredentialStore` for password validation and user CRUD.
- `IClaimsEnricher` to append roles/attributes onto issued principals.
- Optional `IClientProvisioningStore` for machine-to-machine clients.
- `AuthorityIdentityProviderCapabilities` to advertise supported flows.
- Password guidance:
- Standard plug-in hashes via `ICryptoProvider` using Argon2id by default and emits PHC-compliant strings. Successful PBKDF2 logins trigger automatic rehashes so migrations complete gradually. See `docs/security/password-hashing.md` for tuning advice.
- Enforce password policies before hashing to avoid storing weak credentials.
- Health checks should probe backing stores (e.g., Mongo `ping`) and return `AuthorityPluginHealthResult` so `/ready` can surface issues.
- When supporting additional factors (e.g., TOTP), implement `SupportsMfa` and document the enrolment flow for resource servers.
## 7. Configuration & Secrets
- Authority looks for manifests under `etc/authority.plugins/`. Each YAML file maps directly to a plug-in name.
- Support environment overrides using `STELLAOPS_AUTHORITY_PLUGINS__DESCRIPTORS__<NAME>__...`.
- Never store raw secrets in git: allow operators to supply them via `.local.yaml`, environment variables, or injected secret files. Document which keys are mandatory.
- Validate configuration as soon as the registrar runs; use explicit error messages to guide operators. The Standard plug-in now enforces complete bootstrap credentials (username + password) and positive lockout windows via `StandardPluginOptions.Validate`.
- Cross-reference bootstrap workflows with `docs/ops/authority_bootstrap.md` (to be published alongside CORE6) so operators can reuse the same payload formats for manual provisioning.
- `passwordHashing` inherits defaults from `authority.security.passwordHashing`. Override only when hardware constraints differ per plug-in:
```yaml
passwordHashing:
algorithm: Argon2id
memorySizeInKib: 19456
iterations: 2
parallelism: 1
```
Invalid values (≤0) fail fast during startup, and legacy PBKDF2 hashes rehash automatically once the new algorithm succeeds.
### 7.1 Token Persistence Contract
- The host automatically persists every issued principal (access, refresh, device, authorization code) in `authority_tokens`. Plug-in code **must not** bypass this store; use the provided `IAuthorityTokenStore` helpers when implementing custom flows.
- When a plug-in disables a subject or client outside the standard handlers, call `IAuthorityTokenStore.UpdateStatusAsync(...)` for each affected token so revocation bundles stay consistent.
- Supply machine-friendly `revokedReason` codes (`compromised`, `rotation`, `policy`, `lifecycle`, etc.) and optional `revokedMetadata` entries when invalidating credentials. These flow straight into `revocation-bundle.json` and should remain deterministic.
- Token scopes should be normalised (trimmed, unique, ordinal sort) before returning from plug-in verification paths. `TokenPersistenceHandlers` will keep that ordering for downstream consumers.
### 7.2 Claims & Enrichment Checklist
- Authority always sets the OpenID Connect basics: `sub`, `client_id`, `preferred_username`, optional `name`, and `role` (for password flows). Plug-ins must use `IClaimsEnricher` to append additional claims in a **deterministic** order (sort arrays, normalise casing) so resource servers can rely on stable shapes.
- Recommended enrichment keys:
- `stellaops.realm` plug-in/tenant identifier so services can scope policies.
- `stellaops.subject.type` values such as `human`, `service`, `bootstrap`.
- `groups` / `projects` sorted arrays describing operator entitlements.
- Claims visible in tokens should mirror what `/token` and `/userinfo` emit. Avoid injecting sensitive PII directly; mark values with `ClassifiedString.Personal` inside the plug-in so audit sinks can tag them appropriately.
- For client-credential flows, remember to enrich both the client principal and the validation path (`TokenValidationHandlers`) so refresh flows keep the same metadata.
### 7.3 Revocation Bundles & Reasons
- Use `IAuthorityRevocationStore` to record subject/client/token revocations when credentials are deleted or rotated. Stick to the standard categories (`token`, `subject`, `client`, `key`).
- Include a deterministic `reason` string and optional `reasonDescription` so operators understand *why* a subject was revoked when inspecting bundles offline.
- Plug-ins should populate `metadata` with stable keys (e.g., `revokedBy`, `sourcePlugin`, `ticketId`) to simplify SOC correlation. The keys must be lowercase, ASCII, and free of secrets—bundles are mirrored to air-gapped agents.
## 8. Rate Limiting & Lockout Interplay
Rate limiting and account lockouts are complementary controls. Plug-ins must surface both deterministically so operators can correlate limiter hits with credential rejections.
**Baseline quotas** (from `docs/dev/authority-rate-limit-tuning-outline.md`):
| Endpoint | Default policy | Notes |
|----------|----------------|-------|
| `/token` | 30 requests / 60s, queue 0 | Drop to 10/60s for untrusted ranges; raise only with WAF + monitoring. |
| `/authorize` | 60 requests / 60s, queue 10 | Reduce carefully; interactive UX depends on headroom. |
| `/internal/*` | Disabled by default; recommended 5/60s when enabled | Keep queue 0 for bootstrap APIs. |
**Retry metadata:** The middleware stamps `Retry-After` plus tags `authority.client_id`, `authority.remote_ip`, and `authority.endpoint`. Plug-ins should keep these tags intact when crafting responses or telemetry so dashboards remain consistent.
**Lockout counters:** Treat lockouts as **subject-scoped** decisions. When multiple instances update counters, reuse the deterministic tie-breakers documented in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` (freshness overrides, precedence, and stable hashes) to avoid divergent lockout states across replicas.
**Alerting hooks:** Emit structured logs/metrics when either the limiter or credential store rejects access. Suggested gauges include `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` and any custom `auth.plugins.<pluginName>.lockouts_total` counter.
![Authority rate limit and lockout flow](../assets/authority/authority-rate-limit-flow.svg)
_Source:_ `docs/assets/authority/authority-rate-limit-flow.mmd`
## 9. Logging, Metrics, and Diagnostics
- Always log via the injected `ILogger<T>`; include `pluginName` and correlation IDs where available.
- Activity/metric names should align with `AuthorityTelemetry` constants (`service.name=stellaops-authority`).
- Expose additional diagnostics via structured logging rather than writing custom HTTP endpoints; the host will integrate these into `/health` and `/ready`.
- Emit metrics with stable names (`auth.plugins.<pluginName>.*`) when introducing custom instrumentation; coordinate with the Observability guild to reserve prefixes.
## 10. Testing & Tooling
- Unit tests: use Mongo2Go (or similar) to exercise credential stores without hitting production infrastructure (`StandardUserCredentialStoreTests` is a template).
- Determinism: fix timestamps to UTC and sort outputs consistently; avoid random GUIDs unless stable.
- Smoke tests: launch `dotnet run --project src/StellaOps.Authority/StellaOps.Authority` with your plug-in under `StellaOps.Authority.PluginBinaries` and verify `/ready`.
- Example verification snippet:
```csharp
[Fact]
public async Task VerifyPasswordAsync_ReturnsSuccess()
{
var store = CreateCredentialStore();
await store.UpsertUserAsync(new AuthorityUserRegistration("alice", "Pa55!", null, null, false,
Array.Empty<string>(), new Dictionary<string, string?>()), CancellationToken.None);
var result = await store.VerifyPasswordAsync("alice", "Pa55!", CancellationToken.None);
Assert.True(result.Succeeded);
Assert.True(result.User?.Roles.Count == 0);
}
```
## 11. Packaging & Delivery
- Output assembly should follow `StellaOps.Authority.Plugin.<Name>.dll` so the hosts search pattern picks it up.
- Place the compiled DLL plus dependencies under `StellaOps.Authority.PluginBinaries` for offline deployments; include hashes/signatures in release notes (Security Guild guidance forthcoming).
- Document any external prerequisites (e.g., CA cert bundle) in your plug-in README.
- Update `etc/authority.plugins/<plugin>.yaml` samples and include deterministic SHA256 hashes for optional bootstrap payloads when distributing Offline Kit artefacts.
[^ldap-rfc]: Lightweight Directory Access Protocol (LDAPv3) specification — [RFC 4511](https://datatracker.ietf.org/doc/html/rfc4511).
## 12. Checklist & Handoff
- ✅ Capabilities declared and validated in automated tests.
- ✅ Bootstrap workflows documented (if `bootstrap` capability used) and repeatable.
- ✅ Local smoke test + unit/integration suites green (`dotnet test`).
- ✅ Operational docs updated: configuration keys, secrets guidance, troubleshooting.
- Submit the developer guide update referencing PLG6/DOC4 and tag DevEx + Docs reviewers for sign-off.
---
Mermaid sources for the embedded diagrams live under `docs/assets/authority/`. Regenerate the SVG assets with your preferred renderer before committing future updates so the visuals stay in sync with the `.mmd` definitions.

View File

@@ -1,91 +1,91 @@
# StellaOps Auth Client — Integration Guide
> **Status:** Drafted 2025-10-10 as part of LIB5. Consumer teams (Feedser, CLI, Agent) should review before wiring the new options into their configuration surfaces.
The `StellaOps.Auth.Client` library provides a resilient OpenID Connect client for services and tools that talk to **StellaOps Authority**. LIB5 introduced configurable HTTP retry/backoff policies and an offline-fallback window so downstream components stay deterministic even when Authority is briefly unavailable.
This guide explains how to consume the new settings, when to toggle them, and how to test your integration.
## 1. Registering the client
```csharp
services.AddStellaOpsAuthClient(options =>
{
options.Authority = configuration["StellaOps:Authority:Url"]!;
options.ClientId = configuration["StellaOps:Authority:ClientId"]!;
options.ClientSecret = configuration["StellaOps:Authority:ClientSecret"];
options.DefaultScopes.Add("feedser.jobs.trigger");
options.EnableRetries = true;
options.RetryDelays.Clear();
options.RetryDelays.Add(TimeSpan.FromMilliseconds(500));
options.RetryDelays.Add(TimeSpan.FromSeconds(2));
options.AllowOfflineCacheFallback = true;
options.OfflineCacheTolerance = TimeSpan.FromMinutes(5);
});
```
> **Reminder:** `AddStellaOpsAuthClient` binds the options via `IOptionsMonitor<T>` so changes picked up from configuration reloads will be applied to future HTTP calls without restarting the host.
## 2. Resilience options
| Option | Default | Notes |
|--------|---------|-------|
| `EnableRetries` | `true` | When disabled, the shared Polly policy is a no-op and HTTP calls will fail fast. |
| `RetryDelays` | `1s, 2s, 5s` | Edit in ascending order; zero/negative entries are ignored. Clearing the list and leaving it empty keeps the defaults. |
| `AllowOfflineCacheFallback` | `true` | When `true`, stale discovery/JWKS responses are reused within the tolerance window if Authority is unreachable. |
| `OfflineCacheTolerance` | `00:10:00` | Added to the normal cache lifetime. E.g. a 10 minute JWKS cache plus 5 minute tolerance keeps keys for 15 minutes if Authority is offline. |
The HTTP retry policy handles:
- 5xx responses
- 429 responses
- Transient transport failures (`HttpRequestException`, timeouts, aborted sockets)
Retries emit warnings via the `StellaOps.Auth.Client.HttpRetry` logger. Tune the delay values to honour your deployments SLOs.
## 3. Configuration mapping
Suggested configuration keys (coordinate with consuming teams before finalising):
```yaml
StellaOps:
Authority:
Url: "https://authority.stella-ops.local"
ClientId: "feedser"
ClientSecret: "change-me"
AuthClient:
EnableRetries: true
RetryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
AllowOfflineCacheFallback: true
OfflineCacheTolerance: "00:10:00"
```
Environment variable binding follows the usual double-underscore rules, e.g.
```
STELLAOPS__AUTHORITY__AUTHCLIENT__RETRYDELAYS__0=00:00:02
STELLAOPS__AUTHORITY__AUTHCLIENT__OFFLINECACHETOLERANCE=00:05:00
```
CLI and Feedser teams should expose these knobs once they adopt the auth client.
## 4. Testing recommendations
1. **Unit tests:** assert option binding by configuring `StellaOpsAuthClientOptions` via a `ConfigurationBuilder` and ensuring `Validate()` normalises the retry delays and scope list.
2. **Offline fallback:** simulate an unreachable Authority by swapping `HttpMessageHandler` to throw `HttpRequestException` after priming the discovery/JWKS caches. Verify that tokens are still issued until the tolerance expires.
3. **Observability:** watch for `StellaOps.Auth.Client.HttpRetry` warnings in your logs. Excessive retries mean the upstream Authority cluster needs attention.
4. **Determinism:** keep retry delays deterministic. Avoid random jitter—operators can introduce jitter at the infrastructure layer if desired.
## 5. Rollout checklist
- [ ] Update consuming service/CLI configuration schema to include the new settings.
- [ ] Document recommended defaults for offline (air-gapped) versus connected deployments.
- [ ] Extend smoke tests to cover Authority outage scenarios.
- [ ] Coordinate with Docs Guild so user-facing quickstarts reference the new knobs.
Once Feedser and CLI integrate these changes, we can mark LIB5 **DONE**; further packaging work is deferred until the backlog reintroduces it.
# StellaOps Auth Client — Integration Guide
> **Status:** Drafted 2025-10-10 as part of LIB5. Consumer teams (Concelier, CLI, Agent) should review before wiring the new options into their configuration surfaces.
The `StellaOps.Auth.Client` library provides a resilient OpenID Connect client for services and tools that talk to **StellaOps Authority**. LIB5 introduced configurable HTTP retry/backoff policies and an offline-fallback window so downstream components stay deterministic even when Authority is briefly unavailable.
This guide explains how to consume the new settings, when to toggle them, and how to test your integration.
## 1. Registering the client
```csharp
services.AddStellaOpsAuthClient(options =>
{
options.Authority = configuration["StellaOps:Authority:Url"]!;
options.ClientId = configuration["StellaOps:Authority:ClientId"]!;
options.ClientSecret = configuration["StellaOps:Authority:ClientSecret"];
options.DefaultScopes.Add("concelier.jobs.trigger");
options.EnableRetries = true;
options.RetryDelays.Clear();
options.RetryDelays.Add(TimeSpan.FromMilliseconds(500));
options.RetryDelays.Add(TimeSpan.FromSeconds(2));
options.AllowOfflineCacheFallback = true;
options.OfflineCacheTolerance = TimeSpan.FromMinutes(5);
});
```
> **Reminder:** `AddStellaOpsAuthClient` binds the options via `IOptionsMonitor<T>` so changes picked up from configuration reloads will be applied to future HTTP calls without restarting the host.
## 2. Resilience options
| Option | Default | Notes |
|--------|---------|-------|
| `EnableRetries` | `true` | When disabled, the shared Polly policy is a no-op and HTTP calls will fail fast. |
| `RetryDelays` | `1s, 2s, 5s` | Edit in ascending order; zero/negative entries are ignored. Clearing the list and leaving it empty keeps the defaults. |
| `AllowOfflineCacheFallback` | `true` | When `true`, stale discovery/JWKS responses are reused within the tolerance window if Authority is unreachable. |
| `OfflineCacheTolerance` | `00:10:00` | Added to the normal cache lifetime. E.g. a 10 minute JWKS cache plus 5 minute tolerance keeps keys for 15 minutes if Authority is offline. |
The HTTP retry policy handles:
- 5xx responses
- 429 responses
- Transient transport failures (`HttpRequestException`, timeouts, aborted sockets)
Retries emit warnings via the `StellaOps.Auth.Client.HttpRetry` logger. Tune the delay values to honour your deployments SLOs.
## 3. Configuration mapping
Suggested configuration keys (coordinate with consuming teams before finalising):
```yaml
StellaOps:
Authority:
Url: "https://authority.stella-ops.local"
ClientId: "concelier"
ClientSecret: "change-me"
AuthClient:
EnableRetries: true
RetryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
AllowOfflineCacheFallback: true
OfflineCacheTolerance: "00:10:00"
```
Environment variable binding follows the usual double-underscore rules, e.g.
```
STELLAOPS__AUTHORITY__AUTHCLIENT__RETRYDELAYS__0=00:00:02
STELLAOPS__AUTHORITY__AUTHCLIENT__OFFLINECACHETOLERANCE=00:05:00
```
CLI and Concelier teams should expose these knobs once they adopt the auth client.
## 4. Testing recommendations
1. **Unit tests:** assert option binding by configuring `StellaOpsAuthClientOptions` via a `ConfigurationBuilder` and ensuring `Validate()` normalises the retry delays and scope list.
2. **Offline fallback:** simulate an unreachable Authority by swapping `HttpMessageHandler` to throw `HttpRequestException` after priming the discovery/JWKS caches. Verify that tokens are still issued until the tolerance expires.
3. **Observability:** watch for `StellaOps.Auth.Client.HttpRetry` warnings in your logs. Excessive retries mean the upstream Authority cluster needs attention.
4. **Determinism:** keep retry delays deterministic. Avoid random jitter—operators can introduce jitter at the infrastructure layer if desired.
## 5. Rollout checklist
- [ ] Update consuming service/CLI configuration schema to include the new settings.
- [ ] Document recommended defaults for offline (air-gapped) versus connected deployments.
- [ ] Extend smoke tests to cover Authority outage scenarios.
- [ ] Coordinate with Docs Guild so user-facing quickstarts reference the new knobs.
Once Concelier and CLI integrate these changes, we can mark LIB5 **DONE**; further packaging work is deferred until the backlog reintroduces it.

View File

@@ -0,0 +1,119 @@
# BuildX Generator Quickstart
This quickstart explains how to run the StellaOps **BuildX SBOM generator** offline, verify the CAS handshake, and emit OCI descriptors that downstream services can attest.
## 1. Prerequisites
- Docker 25+ with BuildKit enabled (`docker buildx` available).
- .NET 10 (preview) SDK matching the repository `global.json`.
- Optional: network access to a StellaOps Attestor endpoint (the quickstart uses a mock service).
## 2. Publish the plug-in binaries
The BuildX generator publishes as a .NET self-contained executable with its manifest under `plugins/scanner/buildx/`.
```bash
# From the repository root
DOTNET_CLI_HOME="${PWD}/.dotnet" \
dotnet publish src/StellaOps.Scanner.Sbomer.BuildXPlugin/StellaOps.Scanner.Sbomer.BuildXPlugin.csproj \
-c Release \
-o out/buildx
```
- `out/buildx/` now contains `StellaOps.Scanner.Sbomer.BuildXPlugin.dll` and the manifest `stellaops.sbom-indexer.manifest.json`.
- `plugins/scanner/buildx/StellaOps.Scanner.Sbomer.BuildXPlugin/` receives the same artefacts for release packaging.
- The CI pipeline also tars and signs (SHA-256 manifest) the OS analyzer plug-ins located under
`plugins/scanner/analyzers/os/` so they ship alongside the BuildX generator artefacts.
## 3. Verify the CAS handshake
```bash
dotnet out/buildx/StellaOps.Scanner.Sbomer.BuildXPlugin.dll handshake \
--manifest out/buildx \
--cas out/cas
```
The command performs a deterministic probe write (`sha256`) into the provided CAS directory and prints the resolved path.
## 4. Emit a descriptor + provenance placeholder
1. Build or identify the image you want to describe and capture its digest:
```bash
docker buildx build --load -t stellaops/buildx-demo:ci samples/ci/buildx-demo
DIGEST=$(docker image inspect stellaops/buildx-demo:ci --format '{{index .RepoDigests 0}}')
```
2. Generate a CycloneDX SBOM for the built image (any tool works; here we use `docker sbom`):
```bash
docker sbom stellaops/buildx-demo:ci --format cyclonedx-json > out/buildx-sbom.cdx.json
```
3. Invoke the `descriptor` command, pointing at the SBOM file and optional metadata:
```bash
dotnet out/buildx/StellaOps.Scanner.Sbomer.BuildXPlugin.dll descriptor \
--manifest out/buildx \
--image "$DIGEST" \
--sbom out/buildx-sbom.cdx.json \
--sbom-name buildx-sbom.cdx.json \
--artifact-type application/vnd.stellaops.sbom.layer+json \
--sbom-format cyclonedx-json \
--sbom-kind inventory \
--repository git.stella-ops.org/stellaops/buildx-demo \
--build-ref $(git rev-parse HEAD) \
> out/buildx-descriptor.json
```
The output JSON captures:
- OCI artifact descriptor including size, digest, and annotations (`org.stellaops.*`).
- Provenance placeholder (`expectedDsseSha256`, `nonce`, `attestorUri` when provided). `nonce` is derived deterministically from the image + SBOM metadata so repeated runs produce identical placeholders for identical inputs.
- Generator metadata and deterministic timestamps.
## 5. (Optional) Send the placeholder to an Attestor
The plug-in can POST the descriptor metadata to an Attestor endpoint, returning once it receives an HTTP 202.
```bash
python3 - <<'PY' &
from http.server import BaseHTTPRequestHandler, HTTPServer
class Handler(BaseHTTPRequestHandler):
def do_POST(self):
_ = self.rfile.read(int(self.headers.get('Content-Length', 0)))
self.send_response(202); self.end_headers(); self.wfile.write(b'accepted')
def log_message(self, fmt, *args):
return
server = HTTPServer(('127.0.0.1', 8085), Handler)
try:
server.serve_forever()
except KeyboardInterrupt:
pass
finally:
server.server_close()
PY
MOCK_PID=$!
dotnet out/buildx/StellaOps.Scanner.Sbomer.BuildXPlugin.dll descriptor \
--manifest out/buildx \
--image "$DIGEST" \
--sbom out/buildx-sbom.cdx.json \
--attestor http://127.0.0.1:8085/provenance \
--attestor-token "$STELLAOPS_ATTESTOR_TOKEN" \
> out/buildx-descriptor.json
kill $MOCK_PID
```
Set `STELLAOPS_ATTESTOR_TOKEN` (or pass `--attestor-token`) when the Attestor requires bearer authentication. Use `--attestor-insecure` for lab environments with self-signed certificates.
## 6. CI workflow example
A reusable GitHub Actions workflow is provided under `samples/ci/buildx-demo/github-actions-buildx-demo.yml`. It publishes the plug-in, runs the handshake, builds the demo image, emits a descriptor, and uploads both the descriptor and the mock-Attestor request as artefacts.
Add the workflow to your repository (or call it via `workflow_call`) and adjust the SBOM path + Attestor URL as needed. The workflow also re-runs the `descriptor` command and diffs the results (ignoring the `generatedAt` timestamp) so you catch regressions that would break deterministic CI use.
---
For deeper integration guidance (custom SBOM builders, exporting DSSE bundles), track ADRs in `docs/ARCHITECTURE_SCANNER.md` §7 and follow upcoming Attestor API releases.

View File

@@ -0,0 +1,86 @@
# Excititor Statement Backfill Runbook
Last updated: 2025-10-19
## Overview
Use this runbook when you need to rebuild the `vex.statements` collection from historical raw documents. Typical scenarios:
- Upgrading the statement schema (e.g., adding severity/KEV/EPSS signals).
- Recovering from a partial ingest outage where statements were never persisted.
- Seeding a freshly provisioned Excititor deployment from an existing raw archive.
Backfill operates server-side via the Excititor WebService and reuses the same pipeline that powers the `/excititor/statements` ingestion endpoint. Each raw document is normalized, signed metadata is preserved, and duplicate statements are skipped unless the run is forced.
## Prerequisites
1. **Connectivity to Excititor WebService** the CLI uses the backend URL configured in `stellaops.yml` or the `--backend-url` argument.
2. **Authority credentials** the CLI honours the existing Authority client configuration; ensure the caller has permission to invoke admin endpoints.
3. **Mongo replica set** (recommended) causal consistency guarantees rely on majority read/write concerns. Standalone deployment works but skips cross-document transactions.
## CLI command
```
stellaops excititor backfill-statements \
[--retrieved-since <ISO8601>] \
[--force] \
[--batch-size <int>] \
[--max-documents <int>]
```
| Option | Description |
| ------ | ----------- |
| `--retrieved-since` | Only process raw documents fetched on or after the specified timestamp (UTC by default). |
| `--force` | Reprocess documents even if matching statements already exist (useful after schema upgrades). |
| `--batch-size` | Number of raw documents pulled per batch (default `100`). |
| `--max-documents` | Optional hard limit on the number of raw documents to evaluate. |
Example replay the last 48 hours of Red Hat ingest while keeping existing statements:
```
stellaops excititor backfill-statements \
--retrieved-since "$(date -u -d '48 hours ago' +%Y-%m-%dT%H:%M:%SZ)"
```
Example full replay with forced overwrites, capped at 2,000 documents:
```
stellaops excititor backfill-statements --force --max-documents 2000
```
The command returns a summary similar to:
```
Backfill completed: evaluated 450, backfilled 180, claims written 320, skipped 270, failures 0.
```
## Behaviour
- Raw documents are streamed in ascending `retrievedAt` order.
- Each document is normalized using the registered VEX normalizers (CSAF, CycloneDX, OpenVEX).
- Statements are appended through the same `IVexClaimStore.AppendAsync` path that powers `/excititor/statements`.
- Duplicate detection compares `Document.Digest`; duplicates are skipped unless `--force` is specified.
- Failures are logged with the offending digest and continue with the next document.
## Observability
- CLI logs aggregate counts and the backend logs per-digest warnings or errors.
- Mongo writes carry majority write concern; expect backfill throughput to match ingest baselines (≈5 seconds warm, 30 seconds cold).
- Monitor the `excititor.storage.backfill` log scope for detailed telemetry.
## Post-run verification
1. Inspect the `vex.statements` collection for the targeted window (check `InsertedAt`).
2. Re-run the Excititor storage test suite if possible:
```
dotnet test src/StellaOps.Excititor.Storage.Mongo.Tests/StellaOps.Excititor.Storage.Mongo.Tests.csproj
```
3. Optionally, call `/excititor/statements/{vulnerabilityId}/{productKey}` to confirm the expected statements exist.
## Rollback
If a forced run produced incorrect statements, use the standard Mongo rollback procedure:
1. Identify the `InsertedAt` window for the backfill run.
2. Delete affected records from `vex.statements` (and any downstream exports if applicable).
3. Rerun the backfill command with corrected parameters.

View File

@@ -0,0 +1,108 @@
# Scanner Cache Configuration Guide
The scanner cache stores layer-level SBOM fragments and file content that can be reused across scans. This document explains how to configure and operate the cache subsystem introduced in Sprint 10 (Group SP10-G5).
## 1. Overview
- **Layer cache** persists SBOM fragments per layer digest under `<root>/layers/<digest>/` with deterministic metadata (`meta.json`).
- **File CAS** (content-addressable store) keeps deduplicated blobs (e.g., analyzer fixtures, imported SBOM layers) under `<root>/cas/<prefix>/<hash>/`.
- **Maintenance** runs via `ScannerCacheMaintenanceService`, evicting expired entries and compacting the cache to stay within size limits.
- **Metrics** emit on the `StellaOps.Scanner.Cache` meter with counters for hits, misses, evictions, and byte histograms.
- **Offline workflows** use the CAS import/export helpers to package cache warmups inside the Offline Kit.
## 2. Configuration keys (`scanner:cache`)
| Key | Default | Description |
| --- | --- | --- |
| `enabled` | `true` | Globally disable cache if `false`. |
| `rootPath` | `cache/scanner` | Base directory for cache data. Use an SSD-backed path for best warm-scan latency. |
| `layersDirectoryName` | `layers` | Subdirectory for layer cache entries. |
| `fileCasDirectoryName` | `cas` | Subdirectory for file CAS entries. |
| `layerTtl` | `45.00:00:00` | Time-to-live for layer cache entries (`TimeSpan`). `0` disables TTL eviction. |
| `fileTtl` | `30.00:00:00` | Time-to-live for CAS entries. `0` disables TTL eviction. |
| `maxBytes` | `5368709120` (5 GiB) | Hard cap for combined cache footprint. Compaction trims data back to `warmBytesThreshold`. |
| `warmBytesThreshold` | `maxBytes / 5` | Target size after compaction. |
| `coldBytesThreshold` | `maxBytes * 0.8` | Upper bound that triggers compaction. |
| `enableAutoEviction` | `true` | If `false`, callers must invoke `ILayerCacheStore.CompactAsync` / `IFileContentAddressableStore.CompactAsync` manually. |
| `maintenanceInterval` | `00:15:00` | Interval for the maintenance hosted service. |
| `enableFileCas` | `true` | Disable to prevent CAS usage (APIs throw on `PutAsync`). |
| `importDirectory` / `exportDirectory` | `null` | Optional defaults for offline import/export tooling. |
> **Tip:** configure `scanner:cache:rootPath` to a dedicated volume and mount it into worker containers when running in Kubernetes or Nomad.
## 3. Metrics
Instrumentation lives in `ScannerCacheMetrics` on meter `StellaOps.Scanner.Cache`.
| Instrument | Unit | Description |
| --- | --- | --- |
| `scanner.layer_cache_hits_total` | count | Layer cache hit counter. Tag: `layer`. |
| `scanner.layer_cache_misses_total` | count | Layer cache miss counter. Tag: `layer`. |
| `scanner.layer_cache_evictions_total` | count | Layer entries evicted due to TTL or compaction. Tag: `layer`. |
| `scanner.layer_cache_bytes` | bytes | Histogram of per-entry payload size when stored. |
| `scanner.file_cas_hits_total` | count | File CAS hit counter. Tag: `sha256`. |
| `scanner.file_cas_misses_total` | count | File CAS miss counter. Tag: `sha256`. |
| `scanner.file_cas_evictions_total` | count | CAS eviction counter. Tag: `sha256`. |
| `scanner.file_cas_bytes` | bytes | Histogram of CAS payload sizes on insert. |
## 4. Import / Export workflow
1. **Export warm cache**
```bash
dotnet tool run stellaops-cache export --destination ./offline-kit/cache
```
Internally this calls `IFileContentAddressableStore.ExportAsync` which copies each CAS entry (metadata + `content.bin`).
2. **Import on air-gapped hosts**
```bash
dotnet tool run stellaops-cache import --source ./offline-kit/cache
```
The import API merges newer metadata and skips older snapshots automatically.
3. **Layer cache seeding**
Layer cache entries are deterministic and can be packaged the same way (copy `<root>/layers`). For now we keep seeding optional because layers are larger; follow-up tooling can compress directories as needed.
## 5. Hosted maintenance loop
`ScannerCacheMaintenanceService` runs as a background service within Scanner Worker or WebService hosts when `AddScannerCache` is registered. Behaviour:
- At startup it performs an immediate eviction/compaction run.
- Every `maintenanceInterval` it triggers:
- `ILayerCacheStore.EvictExpiredAsync`
- `ILayerCacheStore.CompactAsync`
- `IFileContentAddressableStore.EvictExpiredAsync`
- `IFileContentAddressableStore.CompactAsync`
- Failures are logged at `Error` with preserved stack traces; the next tick continues normally.
Set `enableAutoEviction=false` when hosting the cache inside ephemeral build pipelines that want to drive eviction explicitly.
## 6. API surface summary
```csharp
public interface ILayerCacheStore
{
ValueTask<LayerCacheEntry?> TryGetAsync(string layerDigest, CancellationToken ct = default);
Task<LayerCacheEntry> PutAsync(LayerCachePutRequest request, CancellationToken ct = default);
Task RemoveAsync(string layerDigest, CancellationToken ct = default);
Task<int> EvictExpiredAsync(CancellationToken ct = default);
Task<int> CompactAsync(CancellationToken ct = default);
Task<Stream?> OpenArtifactAsync(string layerDigest, string artifactName, CancellationToken ct = default);
}
public interface IFileContentAddressableStore
{
ValueTask<FileCasEntry?> TryGetAsync(string sha256, CancellationToken ct = default);
Task<FileCasEntry> PutAsync(FileCasPutRequest request, CancellationToken ct = default);
Task<bool> RemoveAsync(string sha256, CancellationToken ct = default);
Task<int> EvictExpiredAsync(CancellationToken ct = default);
Task<int> CompactAsync(CancellationToken ct = default);
Task<int> ExportAsync(string destinationDirectory, CancellationToken ct = default);
Task<int> ImportAsync(string sourceDirectory, CancellationToken ct = default);
}
```
Register both stores via `services.AddScannerCache(configuration);` in WebService or Worker hosts.
---
_Last updated: 2025-10-19_

View File

@@ -0,0 +1,142 @@
# Authority DPoP & mTLS Implementation Plan (2025-10-19)
## Purpose
- Provide the implementation blueprint for AUTH-DPOP-11-001 and AUTH-MTLS-11-002.
- Unify sender-constraint validation across Authority, downstream services, and clients.
- Capture deterministic, testable steps that unblock UI/Signer guilds depending on DPoP/mTLS hardening.
## Scope
- Token endpoint validation, issuance, and storage changes inside `StellaOps.Authority`.
- Shared security primitives consumed by Authority, Scanner, Signer, CLI, and UI.
- Operator-facing configuration, auditing, and observability.
- Out of scope: PoE enforcement (Signer) and CLI/UI client UX; those teams consume the new capabilities.
> **Status update (2025-10-19):** `ValidateDpopProofHandler`, `AuthorityClientCertificateValidator`, and the supporting storage/audit plumbing now live in `src/StellaOps.Authority`. DPoP proofs populate `cnf.jkt`, mTLS bindings enforce certificate thumbprints via `cnf.x5t#S256`, and token documents persist the sender constraint metadata. In-memory nonce issuance is wired (Redis implementation to follow). Documentation and configuration references were updated (`docs/11_AUTHORITY.md`). Targeted unit/integration tests were added; running the broader test suite is currently blocked by pre-existing `StellaOps.Concelier.Storage.Mongo` build errors.
## Design Summary
- Extract the existing Scanner `DpopProofValidator` stack into a shared `StellaOps.Auth.Security` library used by Authority and resource servers.
- Extend Authority configuration (`authority.yaml`) with strongly-typed `senderConstraints.dpop` and `senderConstraints.mtls` sections (map to sample already shown in architecture doc).
- Require DPoP proofs on `/token` when the registered client policy is `senderConstraint=dpop`; bind issued access tokens via `cnf.jkt`.
- Introduce Authority-managed nonce issuance for “high value” audiences (default: `signer`, `attestor`) with Redis-backed persistence and deterministic auditing.
- Enable OAuth 2.0 mTLS (RFC 8705) by storing certificate bindings per client, requesting client certificates at TLS termination, and stamping `cnf.x5t#S256` into issued tokens plus introspection output.
- Surface structured logs and counters for both DPoP and mTLS flows; provide integration tests that cover success, replay, invalid proof, and certificate mismatch cases.
## AUTH-DPOP-11-001 — Proof Validation & Nonce Handling
**Shared validator**
- Move `DpopProofValidator`, option types, and replay cache interfaces from `StellaOps.Scanner.Core` into a new assembly `StellaOps.Auth.Security`.
- Provide pluggable caches: `InMemoryDpopReplayCache` (existing) and new `RedisDpopReplayCache` (leveraging the Authority Redis connection).
- Ensure the validator exposes the validated `SecurityKey`, `jti`, and `iat` so Authority can construct the `cnf` claim and compute nonce expiry.
**Configuration model**
- Extend `StellaOpsAuthorityOptions.Security` with a `SenderConstraints` property containing:
- `Dpop` (`enabled`, `allowedAlgorithms`, `maxAgeSeconds`, `clockSkewSeconds`, `replayWindowSeconds`, `nonce` settings with `enabled`, `ttlSeconds`, `requiredAudiences`, `maxIssuancePerMinute`).
- `Mtls` (`enabled`, `requireChainValidation`, `clientCaBundle`, `allowedSubjectPatterns`, `allowedSanTypes`).
- Bind from YAML (`authority.security.senderConstraints.*`) while preserving backwards compatibility (defaults keep both disabled).
**Token endpoint pipeline**
- Introduce a scoped OpenIddict handler `ValidateDpopProofHandler` inserted before `ValidateClientCredentialsHandler`.
- Determine the required sender constraint from client metadata:
- Add `AuthorityClientMetadataKeys.SenderConstraint` storing `dpop` or `mtls`.
- Optionally allow per-client overrides for nonce requirement.
- When `dpop` is required:
- Read the `DPoP` header from the ASP.NET request, reject with `invalid_token` + `WWW-Authenticate: DPoP error="invalid_dpop_proof"` if absent.
- Call the shared validator with method/URI. Enforce algorithm allowlist and `iat` window from options.
- Persist the `jkt` thumbprint plus replay cache state in the OpenIddict transaction (`AuthorityOpenIddictConstants.DpopKeyThumbprintProperty`, `DpopIssuedAtProperty`).
- When the requested audience intersects `SenderConstraints.Dpop.Nonce.RequiredAudiences`, require `nonce` in the proof; on first failure respond with HTTP 401, `error="use_dpop_nonce"`, and include `DPoP-Nonce` header (see nonce note below). Cache the rejection reason for audit logging.
**Nonce service**
- Add `IDpopNonceStore` with methods `IssueAsync(audience, clientId, jkt)` and `TryConsumeAsync(nonce, audience, clientId, jkt)`.
- Default implementation `RedisDpopNonceStore` storing SHA-256 hashes of nonces keyed by `audience:clientId:jkt`. TTL comes from `SenderConstraints.Dpop.Nonce.Ttl`.
- Create helper `DpopNonceIssuer` used by `ValidateDpopProofHandler` to issue nonces when missing/expired, enforcing issuance rate limits (per options) and tagging audit/log records.
- On successful validation (nonce supplied and consumed) stamp metadata into the transaction for auditing.
- Update `ClientCredentialsHandlers` to observe nonce enforcement: when a nonce challenge was sent, emit structured audit with `nonce_issued`, `audiences`, and `retry`.
**Token issuance**
- In `HandleClientCredentialsHandler`, if the transaction contains a validated DPoP key:
- Build `cnf.jkt` using thumbprint from validator.
- Include `auth_time`/`dpop_jti` as needed for diagnostics.
- Persist the thumbprint alongside token metadata in Mongo (extend `AuthorityTokenDocument` with `SenderConstraint`, `KeyThumbprint`, `Nonce` fields).
**Auditing & observability**
- Emit new audit events:
- `authority.dpop.proof.validated` (success/failure, clientId, audience, thumbprint, nonce status, jti).
- `authority.dpop.nonce.issued` and `authority.dpop.nonce.consumed`.
- Metrics (Prometheus style):
- `authority_dpop_validations_total{result,reason}`.
- `authority_dpop_nonce_issued_total{audience}` and `authority_dpop_nonce_fails_total{reason}`.
- Structured logs include `authority.sender_constraint=dpop`, `authority.dpop_thumbprint`, `authority.dpop_nonce`.
**Testing**
- Unit tests for the handler pipeline using fake OpenIddict transactions.
- Replay/nonce tests with in-memory and Redis stores.
- Integration tests in `StellaOps.Authority.Tests` covering:
- Valid DPoP proof issuing `cnf.jkt`.
- Missing header → challenge with nonce.
- Replayed `jti` rejected.
- Invalid nonce rejected even after issuance.
- Contract tests to ensure `/.well-known/openid-configuration` advertises `dpop_signing_alg_values_supported` and `dpop_nonce_supported` when enabled.
## AUTH-MTLS-11-002 — Certificate-Bound Tokens
**Configuration model**
- Reuse `SenderConstraints.Mtls` described above; include:
- `enforceForAudiences` list (defaults `signer`, `attestor`, `scheduler`).
- `certificateRotationGraceSeconds` for overlap.
- `allowedClientCertificateAuthorities` absolute paths.
**Kestrel/TLS pipeline**
- Configure Kestrel with `ClientCertificateMode.AllowCertificate` globally and implement middleware that enforces certificate presence only when the resolved client requires mTLS.
- Add `IAuthorityClientCertificateValidator` that validates presented certificate chain, SANs (`dns`, `uri`, optional SPIFFE), and thumbprint matches one of the stored bindings.
- Cache validation results per connection id to avoid rehashing on every request.
**Client registration & storage**
- Extend `AuthorityClientDocument` with `List<AuthorityClientCertificateBinding>` containing:
- `Thumbprint`, `SerialNumber`, `Subject`, `NotBefore`, `NotAfter`, `Sans`, `CreatedAt`, `UpdatedAt`, `Label`.
- Provide admin API mutations (`/admin/clients/{id}/certificates`) for ops tooling (deferred implementation but schema ready).
- Update plugin provisioning store (`StandardClientProvisioningStore`) to map descriptors with certificate bindings and `senderConstraint`.
- Persist binding state in Mongo migrations (index on `{clientId, thumbprint}`).
**Token issuance & introspection**
- Add a transaction property capturing the validated client certificate thumbprint.
- `HandleClientCredentialsHandler`:
- When mTLS required, ensure certificate info present; reject otherwise.
- Stamp `cnf` claim: `principal.SetClaim("cnf", JsonSerializer.Serialize(new { x5t#S256 = thumbprint }))`.
- Store binding metadata in issued token document for audit.
- Update `ValidateAccessTokenHandler` and introspection responses to surface `cnf.x5t#S256`.
- Ensure refresh tokens (if ever enabled) copy the binding data.
**Auditing & observability**
- Audit events:
- `authority.mtls.handshake` (success/failure, clientId, thumbprint, issuer, subject).
- `authority.mtls.binding.missing` when a required client posts without a cert.
- Metrics:
- `authority_mtls_handshakes_total{result}`.
- `authority_mtls_certificate_rotations_total`.
- Logs include `authority.sender_constraint=mtls`, `authority.mtls_thumbprint`, `authority.mtls_subject`.
**Testing**
- Unit tests for certificate validation rules (SAN mismatches, expiry, CA trust).
- Integration tests running Kestrel with test certificates:
- Successful token issuance with bound certificate.
- Request without certificate → `invalid_client`.
- Token introspection reveals `cnf.x5t#S256`.
- Rotation scenario (old + new cert allowed during grace window).
## Implementation Checklist
**DPoP work-stream**
1. Extract shared validator into `StellaOps.Auth.Security`; update Scanner references.
2. Introduce configuration classes and bind from YAML/environment.
3. Implement nonce store (Redis + in-memory), handler integration, and OpenIddict transaction plumbing.
4. Stamp `cnf.jkt`, audit events, and metrics; update Mongo documents and migrations.
5. Extend docs: `docs/ARCHITECTURE_AUTHORITY.md`, `docs/security/audit-events.md`, `docs/security/rate-limits.md`, CLI/UI references.
**mTLS work-stream**
1. Extend client document/schema and provisioning stores with certificate bindings + sender constraint flag.
2. Configure Kestrel/middleware for optional client certificates and validation service.
3. Update token issuance/introspection to honour certificate bindings and emit `cnf.x5t#S256`.
4. Add auditing/metrics and integration tests (happy path + failure).
5. Refresh operator documentation (`docs/ops/authority-backup-restore.md`, `docs/ops/authority-monitoring.md`, sample `authority.yaml`) to cover certificate lifecycle.
Both streams should conclude with `dotnet test src/StellaOps.Authority.sln` and documentation cross-links so dependent guilds can unblock UI/Signer work.

View File

@@ -0,0 +1,52 @@
# Authority Plug-in Scoped Service Coordination
> Created: 2025-10-19 — Plugin Platform Guild & Authority Core
> Status: Scheduled (session confirmed for 2025-10-20 15:0016:00UTC)
This document tracks preparation, agenda, and outcomes for the scoped-service workshop required before implementing PLUGIN-DI-08-002.
## Objectives
- Inventory Authority plug-in surfaces that need scoped service lifetimes.
- Confirm session/scope handling for identity-provider registrars and background jobs.
- Assign follow-up tasks/actions with owners and due dates.
## Scheduling Snapshot
- **Meeting time:** 2025-10-20 15:0016:00UTC (10:0011:00 CDT / 08:0009:00 PDT).
- **Facilitator:** Plugin Platform Guild — Alicia Rivera.
- **Attendees (confirmed):** Authority Core — Jasmin Patel; Authority Security Guild — Mohan Singh; Plugin Platform — Alicia Rivera, Leah Chen.
- **Optional invitees:** DevOps liaison — Sofia Ortega (accepted).
- **Logistics:** Invites sent via shared calendar on 2025-10-19 15:30UTC with Teams bridge + offline dial-in. Meeting notes will be captured here.
- **Preparation deadline:** 2025-10-20 12:00UTC — complete checklist below.
## Pre-work Checklist
- Review `ServiceBindingAttribute` contract introduced by PLUGIN-DI-08-001.
- Collect existing Authority plug-in registration code paths to evaluate.
- Audit background jobs that assume singleton lifetimes.
- Identify plug-in health checks/telemetry surfaces impacted by scoped lifetimes.
### Pre-work References
- _Add links, file paths, or notes here prior to the session._
## Draft Agenda
1. Context recap (5 min) — why scoped DI is needed; summary of PLUGIN-DI-08-001 changes.
2. Authority plug-in surfaces (15 min) — registrars, background services, telemetry.
3. Session handling strategy (10 min) — scope creation semantics, cancellation propagation.
4. Action items & owners (10 min) — capture code/docs/test tasks with due dates.
5. Risks & follow-ups (5 min) — dependencies, rollout sequencing.
## Notes
- _Pending coordination session; populate afterwards._
## Action Item Log
| Item | Owner | Due | Status | Notes |
|------|-------|-----|--------|-------|
| Confirm meeting time | Alicia Rivera | 2025-10-19 15:30UTC | DONE | Calendar invite sent; all required attendees accepted |
| Compile Authority plug-in DI entry points | Jasmin Patel | 2025-10-20 | IN PROGRESS | Gather current Authority plug-in registrars, background jobs, and helper factories that assume singleton lifetimes; add the list with file paths to **Pre-work References** in this document before 2025-10-20 12:00UTC. |
| Outline scoped-session pattern for background jobs | Leah Chen | Post-session | BLOCKED | Requires meeting outcomes |

View File

@@ -1,45 +1,45 @@
# Feedser Fixture Maintenance
# Concelier Fixture Maintenance
Feedser uses a handful of deterministic fixtures to keep connector regressions in check. This guide lists the
Concelier uses a handful of deterministic fixtures to keep connector regressions in check. This guide lists the
fixture sets, where they live, and how to regenerate them safely.
---
## GHSA ↔ OSV parity fixtures
- **Location:** `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/osv-ghsa.*.json`
- **Location:** `src/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/osv-ghsa.*.json`
- **Purpose:** Exercised by `OsvGhsaParityRegressionTests` to ensure OSV + GHSA outputs stay aligned on aliases,
ranges, references, and credits.
- **Regeneration:** Either run the test harness with online regeneration (`UPDATE_PARITY_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj`)
- **Regeneration:** Either run the test harness with online regeneration (`UPDATE_PARITY_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`)
or execute the fixture updater (`dotnet run --project tools/FixtureUpdater/FixtureUpdater.csproj`). Both paths
normalise timestamps and canonical ordering.
- **SemVer provenance:** The regenerated fixtures should show `normalizedVersions[].notes` in the
`osv:{ecosystem}:{advisoryId}:{identifier}` shape emitted by `SemVerRangeRuleBuilder`. Confirm the
constraints and notes line up with GHSA/NVD composites before committing.
- **Verification:** Inspect the diff, then re-run `dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj` to confirm parity.
- **Verification:** Inspect the diff, then re-run `dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj` to confirm parity.
## GHSA credit parity fixtures
- **Location:** `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/credit-parity.{ghsa,osv,nvd}.json`
- **Location:** `src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/credit-parity.{ghsa,osv,nvd}.json`
- **Purpose:** Exercised by `GhsaCreditParityRegressionTests` to guarantee GHSA/NVD/OSV acknowledgements remain in lockstep.
- **Regeneration:** `dotnet run --project tools/FixtureUpdater/FixtureUpdater.csproj` rewrites all three canonical snapshots.
- **Verification:** `dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj`.
- **Verification:** `dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj`.
> Always commit fixture changes together with the code that motivated them and reference the regression test that guards the behaviour.
## Apple security update fixtures
- **Location:** `src/StellaOps.Feedser.Source.Vndr.Apple.Tests/Apple/Fixtures/*.html` and `.expected.json`.
- **Location:** `src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures/*.html` and `.expected.json`.
- **Purpose:** Exercised by `AppleLiveRegressionTests` to guarantee the Apple HTML parser and mapper stay deterministic while covering Rapid Security Responses and multi-device advisories.
- **Regeneration:** Use the helper scripts (`scripts/update-apple-fixtures.sh` or `scripts/update-apple-fixtures.ps1`). They export `UPDATE_APPLE_FIXTURES=1`, propagate the flag through `WSLENV`, touch `.update-apple-fixtures`, and then run the Apple test project. This keeps WSL/VSCode test invocations in sync while the refresh workflow fetches live Apple support pages, sanitises them, and rewrites both the HTML and expected DTO snapshots with normalised ordering.
- **Verification:** Inspect the generated diffs and re-run `dotnet test src/StellaOps.Feedser.Source.Vndr.Apple.Tests/StellaOps.Feedser.Source.Vndr.Apple.Tests.csproj` without the env var to confirm determinism.
- **Verification:** Inspect the generated diffs and re-run `dotnet test src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj` without the env var to confirm determinism.
> **Tip for other connector owners:** mirror the sentinel + `WSLENV` pattern (`touch .update-<connector>-fixtures`, append the env var via `WSLENV`) when you add fixture refresh scripts so contributors running under WSL inherit the regeneration flag automatically.
## KISA advisory fixtures
- **Location:** `src/StellaOps.Feedser.Source.Kisa.Tests/Fixtures/kisa-{feed,detail}.(xml|json)`
- **Location:** `src/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-{feed,detail}.(xml|json)`
- **Purpose:** Used by `KisaConnectorTests` to verify Hangul-aware fetch → parse → map flows and to assert telemetry counters stay wired.
- **Regeneration:** `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Kisa.Tests/StellaOps.Feedser.Source.Kisa.Tests.csproj`
- **Regeneration:** `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`
- **Verification:** Re-run the same test suite without the env var; confirm advisory content remains NFC-normalised and HTML is sanitised. Metrics assertions will fail if counters drift.
- **Localisation note:** RSS `category` values (e.g. `취약점정보`) remain in Hangul—do not translate them in fixtures; they feed directly into metrics/log tags.

View File

@@ -4,7 +4,7 @@ The KISA/KNVD connector now ships with structured telemetry, richer logging, and
## Telemetry counters
All metrics are emitted from `KisaDiagnostics` (`Meter` name `StellaOps.Feedser.Source.Kisa`).
All metrics are emitted from `KisaDiagnostics` (`Meter` name `StellaOps.Concelier.Connector.Kisa`).
| Metric | Description | Tags |
| --- | --- | --- |
@@ -39,7 +39,7 @@ The messages use structured properties (`Idx`, `Category`, `DocumentId`, `Severi
- Hangul fields (`title`, `summary`, `category`, `reference.label`, product vendor/name) are normalised to NFC before storage. Sample category `취약점정보` roughly translates to “vulnerability information”.
- Advisory HTML is sanitised via `HtmlContentSanitizer`, stripping script/style while preserving inline anchors for translation pipelines.
- Metrics carry Hangul `category` tags and logging keeps Hangul strings intact; this ensures air-gapped operators can validate native-language content without relying on MT.
- Fixtures live under `src/StellaOps.Feedser.Source.Kisa.Tests/Fixtures/`. Regenerate with `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Kisa.Tests/StellaOps.Feedser.Source.Kisa.Tests.csproj`.
- Fixtures live under `src/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/`. Regenerate with `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
- The regression suite asserts canonical mapping, state cleanup, and telemetry counters (`KisaConnectorTests.Telemetry_RecordsMetrics`) so QA can track instrumentation drift.
For operator docs, link to this brief when documenting Hangul handling or counter dashboards so localisation reviewers have a single reference point.

View File

@@ -1,154 +1,154 @@
# Feedser SemVer Merge Playbook (Sprint 12)
This playbook describes how the merge layer and connector teams should emit the new SemVer primitives introduced in Sprint12, how those primitives become normalized version rules, and how downstream jobs query them deterministically.
## 1. What landed in Sprint12
- `RangePrimitives.SemVer` now infers a canonical `style` (`range`, `exact`, `lt`, `lte`, `gt`, `gte`) and captures `exactValue` when the constraint is a single version.
- `NormalizedVersionRule` documents the analytics-friendly projection of each `AffectedPackage` coverage entry and is persisted alongside legacy `versionRanges`.
- `AdvisoryProvenance.decisionReason` records whether merge resolution favored precedence, freshness, or a tie-breaker comparison.
See `src/StellaOps.Feedser.Models/CANONICAL_RECORDS.md` for the full schema and field descriptions.
## 2. Mapper pattern
Connectors should emit SemVer primitives as soon as they can normalize a vendor constraint. The helper `SemVerPrimitiveExtensions.ToNormalizedVersionRule` turns those primitives into the persisted rules:
```csharp
var primitive = new SemVerPrimitive(
introduced: "1.2.3",
introducedInclusive: true,
fixed: "2.0.0",
fixedInclusive: false,
lastAffected: null,
lastAffectedInclusive: false,
constraintExpression: ">=1.2.3 <2.0.0",
exactValue: null);
var rule = primitive.ToNormalizedVersionRule(notes: "nvd:CVE-2025-1234");
// rule => scheme=semver, type=range, min=1.2.3, minInclusive=true, max=2.0.0, maxInclusive=false
```
If you omit the optional `notes` argument, `ToNormalizedVersionRule` now falls back to the primitives `ConstraintExpression`, ensuring the original comparator expression is preserved for provenance/audit queries.
Emit the resulting rule inside `AffectedPackage.NormalizedVersions` while continuing to populate `AffectedVersionRange.RangeExpression` for backward compatibility.
## 3. Merge dedupe flow
During merge, feed all package candidates through `NormalizedVersionRuleComparer.Instance` prior to persistence. The comparer orders by scheme → type → min → minInclusive → max → maxInclusive → value → notes, guaranteeing consistent document layout and making `$unwind` pipelines deterministic.
If multiple connectors emit identical constraints, the merge layer should:
1. Combine provenance entries (preserving one per source).
2. Preserve a single normalized rule instance (thanks to `NormalizedVersionRuleEqualityComparer.Instance`).
3. Attach `decisionReason="precedence"` if one source overrides another.
## 4. Example Mongo pipeline
Use the following aggregation to locate advisories that affect a specific SemVer:
```javascript
db.advisories.aggregate([
{ $match: { "affectedPackages.type": "semver", "affectedPackages.identifier": "pkg:npm/lodash" } },
{ $unwind: "$affectedPackages" },
{ $unwind: "$affectedPackages.normalizedVersions" },
{ $match: {
$or: [
{ "affectedPackages.normalizedVersions.type": "exact",
"affectedPackages.normalizedVersions.value": "4.17.21" },
{ "affectedPackages.normalizedVersions.type": "range",
"affectedPackages.normalizedVersions.min": { $lte: "4.17.21" },
"affectedPackages.normalizedVersions.max": { $gt: "4.17.21" } },
{ "affectedPackages.normalizedVersions.type": "gte",
"affectedPackages.normalizedVersions.min": { $lte: "4.17.21" } },
{ "affectedPackages.normalizedVersions.type": "lte",
"affectedPackages.normalizedVersions.max": { $gte: "4.17.21" } }
]
}},
{ $project: { advisoryKey: 1, title: 1, "affectedPackages.identifier": 1 } }
]);
```
Pair this query with the indexes listed in [Normalized Versions Query Guide](mongo_indices.md).
## 5. Recommended indexes
| Collection | Index | Purpose |
|------------|-------|---------|
| `advisory` | `{ "affectedPackages.identifier": 1, "affectedPackages.normalizedVersions.scheme": 1, "affectedPackages.normalizedVersions.type": 1 }` (compound, multikey) | Speeds up `$match` on identifier + rule style. |
| `advisory` | `{ "affectedPackages.normalizedVersions.value": 1 }` (sparse) | Optimizes lookups for exact version hits. |
Coordinate with the Storage team when enabling these indexes so deployment windows account for collection size.
## 6. Dual-write rollout
Follow the operational checklist in `docs/ops/migrations/SEMVER_STYLE.md`. The summary:
1. **Dual write (now)** emit both legacy `versionRanges` and the new `normalizedVersions`.
2. **Backfill** follow the storage migration in `docs/ops/migrations/SEMVER_STYLE.md` to rewrite historical advisories before switching consumers.
3. **Verify** run the aggregation above (with `explain("executionStats")`) to ensure the new indexes are used.
4. **Cutover** after consumers switch to normalized rules, mark the old `rangeExpression` as deprecated.
## 7. Checklist for connectors & merge
- [ ] Populate `SemVerPrimitive` for every SemVer-friendly constraint.
- [ ] Call `ToNormalizedVersionRule` and store the result.
- [ ] Emit provenance masks covering both `versionRanges[].primitives.semver` and `normalizedVersions[]`.
- [ ] Ensure merge deduping relies on the canonical comparer.
- [ ] Capture merge decisions via `decisionReason`.
- [ ] Confirm integration tests include fixtures with normalized rules and SemVer styles.
For deeper query examples and maintenance tasks, continue with [Normalized Versions Query Guide](mongo_indices.md).
## 8. Storage projection reference
`NormalizedVersionDocumentFactory` copies each normalized rule into MongoDB using the shape below. Use this as a contract when reviewing connector fixtures or diagnosing merge/storage diffs:
```json
{
"packageId": "pkg:npm/example",
"packageType": "npm",
"scheme": "semver",
"type": "range",
"style": "range",
"min": "1.2.3",
"minInclusive": true,
"max": "2.0.0",
"maxInclusive": false,
"value": null,
"notes": "ghsa:GHSA-xxxx-yyyy",
"decisionReason": "ghsa-precedence-over-nvd",
"constraint": ">= 1.2.3 < 2.0.0",
"source": "ghsa",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
For distro-specific ranges (`nevra`, `evr`) the same envelope applies with `scheme` switched accordingly. Example:
```json
{
"packageId": "bash",
"packageType": "rpm",
"scheme": "nevra",
"type": "range",
"style": "range",
"min": "0:4.4.18-2.el7",
"minInclusive": true,
"max": "0:4.4.20-1.el7",
"maxInclusive": false,
"value": null,
"notes": "redhat:RHSA-2025:1234",
"decisionReason": "rhel-priority-over-nvd",
"constraint": "<= 0:4.4.20-1.el7",
"source": "redhat",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
If a new scheme is required (for example, `apple.build` or `ios.semver`), raise it with the Models team before emitting documents so merge comparers and hashing logic can incorporate the change deterministically.
## 9. Observability signals
- `feedser.merge.normalized_rules` (counter, tags: `package_type`, `scheme`) increments once per normalized rule retained after precedence merge.
- `feedser.merge.normalized_rules_missing` (counter, tags: `package_type`) increments when a merged package still carries version ranges but no normalized rules; watch for spikes to catch connectors that have not emitted normalized arrays yet.
# Concelier SemVer Merge Playbook (Sprint 12)
This playbook describes how the merge layer and connector teams should emit the new SemVer primitives introduced in Sprint12, how those primitives become normalized version rules, and how downstream jobs query them deterministically.
## 1. What landed in Sprint12
- `RangePrimitives.SemVer` now infers a canonical `style` (`range`, `exact`, `lt`, `lte`, `gt`, `gte`) and captures `exactValue` when the constraint is a single version.
- `NormalizedVersionRule` documents the analytics-friendly projection of each `AffectedPackage` coverage entry and is persisted alongside legacy `versionRanges`.
- `AdvisoryProvenance.decisionReason` records whether merge resolution favored precedence, freshness, or a tie-breaker comparison.
See `src/StellaOps.Concelier.Models/CANONICAL_RECORDS.md` for the full schema and field descriptions.
## 2. Mapper pattern
Connectors should emit SemVer primitives as soon as they can normalize a vendor constraint. The helper `SemVerPrimitiveExtensions.ToNormalizedVersionRule` turns those primitives into the persisted rules:
```csharp
var primitive = new SemVerPrimitive(
introduced: "1.2.3",
introducedInclusive: true,
fixed: "2.0.0",
fixedInclusive: false,
lastAffected: null,
lastAffectedInclusive: false,
constraintExpression: ">=1.2.3 <2.0.0",
exactValue: null);
var rule = primitive.ToNormalizedVersionRule(notes: "nvd:CVE-2025-1234");
// rule => scheme=semver, type=range, min=1.2.3, minInclusive=true, max=2.0.0, maxInclusive=false
```
If you omit the optional `notes` argument, `ToNormalizedVersionRule` now falls back to the primitives `ConstraintExpression`, ensuring the original comparator expression is preserved for provenance/audit queries.
Emit the resulting rule inside `AffectedPackage.NormalizedVersions` while continuing to populate `AffectedVersionRange.RangeExpression` for backward compatibility.
## 3. Merge dedupe flow
During merge, feed all package candidates through `NormalizedVersionRuleComparer.Instance` prior to persistence. The comparer orders by scheme → type → min → minInclusive → max → maxInclusive → value → notes, guaranteeing consistent document layout and making `$unwind` pipelines deterministic.
If multiple connectors emit identical constraints, the merge layer should:
1. Combine provenance entries (preserving one per source).
2. Preserve a single normalized rule instance (thanks to `NormalizedVersionRuleEqualityComparer.Instance`).
3. Attach `decisionReason="precedence"` if one source overrides another.
## 4. Example Mongo pipeline
Use the following aggregation to locate advisories that affect a specific SemVer:
```javascript
db.advisories.aggregate([
{ $match: { "affectedPackages.type": "semver", "affectedPackages.identifier": "pkg:npm/lodash" } },
{ $unwind: "$affectedPackages" },
{ $unwind: "$affectedPackages.normalizedVersions" },
{ $match: {
$or: [
{ "affectedPackages.normalizedVersions.type": "exact",
"affectedPackages.normalizedVersions.value": "4.17.21" },
{ "affectedPackages.normalizedVersions.type": "range",
"affectedPackages.normalizedVersions.min": { $lte: "4.17.21" },
"affectedPackages.normalizedVersions.max": { $gt: "4.17.21" } },
{ "affectedPackages.normalizedVersions.type": "gte",
"affectedPackages.normalizedVersions.min": { $lte: "4.17.21" } },
{ "affectedPackages.normalizedVersions.type": "lte",
"affectedPackages.normalizedVersions.max": { $gte: "4.17.21" } }
]
}},
{ $project: { advisoryKey: 1, title: 1, "affectedPackages.identifier": 1 } }
]);
```
Pair this query with the indexes listed in [Normalized Versions Query Guide](mongo_indices.md).
## 5. Recommended indexes
| Collection | Index | Purpose |
|------------|-------|---------|
| `advisory` | `{ "affectedPackages.identifier": 1, "affectedPackages.normalizedVersions.scheme": 1, "affectedPackages.normalizedVersions.type": 1 }` (compound, multikey) | Speeds up `$match` on identifier + rule style. |
| `advisory` | `{ "affectedPackages.normalizedVersions.value": 1 }` (sparse) | Optimizes lookups for exact version hits. |
Coordinate with the Storage team when enabling these indexes so deployment windows account for collection size.
## 6. Dual-write rollout
Follow the operational checklist in `docs/ops/migrations/SEMVER_STYLE.md`. The summary:
1. **Dual write (now)** emit both legacy `versionRanges` and the new `normalizedVersions`.
2. **Backfill** follow the storage migration in `docs/ops/migrations/SEMVER_STYLE.md` to rewrite historical advisories before switching consumers.
3. **Verify** run the aggregation above (with `explain("executionStats")`) to ensure the new indexes are used.
4. **Cutover** after consumers switch to normalized rules, mark the old `rangeExpression` as deprecated.
## 7. Checklist for connectors & merge
- [ ] Populate `SemVerPrimitive` for every SemVer-friendly constraint.
- [ ] Call `ToNormalizedVersionRule` and store the result.
- [ ] Emit provenance masks covering both `versionRanges[].primitives.semver` and `normalizedVersions[]`.
- [ ] Ensure merge deduping relies on the canonical comparer.
- [ ] Capture merge decisions via `decisionReason`.
- [ ] Confirm integration tests include fixtures with normalized rules and SemVer styles.
For deeper query examples and maintenance tasks, continue with [Normalized Versions Query Guide](mongo_indices.md).
## 8. Storage projection reference
`NormalizedVersionDocumentFactory` copies each normalized rule into MongoDB using the shape below. Use this as a contract when reviewing connector fixtures or diagnosing merge/storage diffs:
```json
{
"packageId": "pkg:npm/example",
"packageType": "npm",
"scheme": "semver",
"type": "range",
"style": "range",
"min": "1.2.3",
"minInclusive": true,
"max": "2.0.0",
"maxInclusive": false,
"value": null,
"notes": "ghsa:GHSA-xxxx-yyyy",
"decisionReason": "ghsa-precedence-over-nvd",
"constraint": ">= 1.2.3 < 2.0.0",
"source": "ghsa",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
For distro-specific ranges (`nevra`, `evr`) the same envelope applies with `scheme` switched accordingly. Example:
```json
{
"packageId": "bash",
"packageType": "rpm",
"scheme": "nevra",
"type": "range",
"style": "range",
"min": "0:4.4.18-2.el7",
"minInclusive": true,
"max": "0:4.4.20-1.el7",
"maxInclusive": false,
"value": null,
"notes": "redhat:RHSA-2025:1234",
"decisionReason": "rhel-priority-over-nvd",
"constraint": "<= 0:4.4.20-1.el7",
"source": "redhat",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
If a new scheme is required (for example, `apple.build` or `ios.semver`), raise it with the Models team before emitting documents so merge comparers and hashing logic can incorporate the change deterministically.
## 9. Observability signals
- `concelier.merge.normalized_rules` (counter, tags: `package_type`, `scheme`) increments once per normalized rule retained after precedence merge.
- `concelier.merge.normalized_rules_missing` (counter, tags: `package_type`) increments when a merged package still carries version ranges but no normalized rules; watch for spikes to catch connectors that have not emitted normalized arrays yet.

View File

@@ -1,108 +1,108 @@
# Normalized Versions Query Guide
This guide complements the Sprint12 normalized versions rollout. It documents recommended indexes and aggregation patterns for querying `AffectedPackage.normalizedVersions`.
For a field-by-field look at how normalized rules persist in MongoDB (including provenance metadata), see Section8 of the [Feedser SemVer Merge Playbook](merge_semver_playbook.md).
## 1. Recommended indexes
When `feedser.storage.enableSemVerStyle` is enabled, advisories expose a flattened
`normalizedVersions` array at the document root. Create these indexes in `mongosh`
after the migration completes (adjust collection name if you use a prefix):
```javascript
db.advisories.createIndex(
{
"normalizedVersions.packageId": 1,
"normalizedVersions.scheme": 1,
"normalizedVersions.type": 1
},
{ name: "advisory_normalizedVersions_pkg_scheme_type" }
);
db.advisories.createIndex(
{ "normalizedVersions.value": 1 },
{ name: "advisory_normalizedVersions_value", sparse: true }
);
```
- The compound index accelerates `$match` stages that filter by package identifier and rule style without unwinding `affectedPackages`.
- The sparse index keeps storage costs low while supporting pure exact-version lookups (type `exact`).
The storage bootstrapper creates the same indexes automatically when the feature flag is enabled.
## 2. Query patterns
### 2.1 Determine if a specific version is affected
```javascript
db.advisories.aggregate([
{ $match: { "normalizedVersions.packageId": "pkg:npm/lodash" } },
{ $unwind: "$normalizedVersions" },
{ $match: {
$or: [
{ "normalizedVersions.type": "exact",
"normalizedVersions.value": "4.17.21" },
{ "normalizedVersions.type": "range",
"normalizedVersions.min": { $lte: "4.17.21" },
"normalizedVersions.max": { $gt: "4.17.21" } },
{ "normalizedVersions.type": "gte",
"normalizedVersions.min": { $lte: "4.17.21" } },
{ "normalizedVersions.type": "lte",
"normalizedVersions.max": { $gte: "4.17.21" } }
]
}},
{ $project: { advisoryKey: 1, title: 1, "normalizedVersions.packageId": 1 } }
]);
```
Use this pipeline during Sprint2 staging validation runs. Invoke `explain("executionStats")` to confirm the compound index is selected.
### 2.2 Locate advisories missing normalized rules
```javascript
db.advisories.aggregate([
{ $match: { $or: [
{ "normalizedVersions": { $exists: false } },
{ "normalizedVersions": { $size: 0 } }
] } },
{ $project: { advisoryKey: 1, affectedPackages: 1 } }
]);
```
Run this query after backfill jobs to identify gaps that still rely solely on `rangeExpression`.
### 2.3 Deduplicate overlapping rules
```javascript
db.advisories.aggregate([
{ $unwind: "$normalizedVersions" },
{ $group: {
_id: {
identifier: "$normalizedVersions.packageId",
scheme: "$normalizedVersions.scheme",
type: "$normalizedVersions.type",
min: "$normalizedVersions.min",
minInclusive: "$normalizedVersions.minInclusive",
max: "$normalizedVersions.max",
maxInclusive: "$normalizedVersions.maxInclusive",
value: "$normalizedVersions.value"
},
advisories: { $addToSet: "$advisoryKey" },
notes: { $addToSet: "$normalizedVersions.notes" }
}},
{ $match: { "advisories.1": { $exists: true } } },
{ $sort: { "_id.identifier": 1, "_id.type": 1 } }
]);
```
Use this to confirm the merge dedupe logic keeps only one normalized rule per unique constraint.
## 3. Operational checklist
- [ ] Create the indexes in staging before toggling dual-write in production.
- [ ] Capture explain plans and attach them to the release notes.
- [ ] Notify downstream services that consume advisory snapshots about the new `normalizedVersions` array.
- [ ] Update export fixtures once dedupe verification passes.
Additional background and mapper examples live in [Feedser SemVer Merge Playbook](merge_semver_playbook.md).
# Normalized Versions Query Guide
This guide complements the Sprint12 normalized versions rollout. It documents recommended indexes and aggregation patterns for querying `AffectedPackage.normalizedVersions`.
For a field-by-field look at how normalized rules persist in MongoDB (including provenance metadata), see Section8 of the [Concelier SemVer Merge Playbook](merge_semver_playbook.md).
## 1. Recommended indexes
When `concelier.storage.enableSemVerStyle` is enabled, advisories expose a flattened
`normalizedVersions` array at the document root. Create these indexes in `mongosh`
after the migration completes (adjust collection name if you use a prefix):
```javascript
db.advisories.createIndex(
{
"normalizedVersions.packageId": 1,
"normalizedVersions.scheme": 1,
"normalizedVersions.type": 1
},
{ name: "advisory_normalizedVersions_pkg_scheme_type" }
);
db.advisories.createIndex(
{ "normalizedVersions.value": 1 },
{ name: "advisory_normalizedVersions_value", sparse: true }
);
```
- The compound index accelerates `$match` stages that filter by package identifier and rule style without unwinding `affectedPackages`.
- The sparse index keeps storage costs low while supporting pure exact-version lookups (type `exact`).
The storage bootstrapper creates the same indexes automatically when the feature flag is enabled.
## 2. Query patterns
### 2.1 Determine if a specific version is affected
```javascript
db.advisories.aggregate([
{ $match: { "normalizedVersions.packageId": "pkg:npm/lodash" } },
{ $unwind: "$normalizedVersions" },
{ $match: {
$or: [
{ "normalizedVersions.type": "exact",
"normalizedVersions.value": "4.17.21" },
{ "normalizedVersions.type": "range",
"normalizedVersions.min": { $lte: "4.17.21" },
"normalizedVersions.max": { $gt: "4.17.21" } },
{ "normalizedVersions.type": "gte",
"normalizedVersions.min": { $lte: "4.17.21" } },
{ "normalizedVersions.type": "lte",
"normalizedVersions.max": { $gte: "4.17.21" } }
]
}},
{ $project: { advisoryKey: 1, title: 1, "normalizedVersions.packageId": 1 } }
]);
```
Use this pipeline during Sprint2 staging validation runs. Invoke `explain("executionStats")` to confirm the compound index is selected.
### 2.2 Locate advisories missing normalized rules
```javascript
db.advisories.aggregate([
{ $match: { $or: [
{ "normalizedVersions": { $exists: false } },
{ "normalizedVersions": { $size: 0 } }
] } },
{ $project: { advisoryKey: 1, affectedPackages: 1 } }
]);
```
Run this query after backfill jobs to identify gaps that still rely solely on `rangeExpression`.
### 2.3 Deduplicate overlapping rules
```javascript
db.advisories.aggregate([
{ $unwind: "$normalizedVersions" },
{ $group: {
_id: {
identifier: "$normalizedVersions.packageId",
scheme: "$normalizedVersions.scheme",
type: "$normalizedVersions.type",
min: "$normalizedVersions.min",
minInclusive: "$normalizedVersions.minInclusive",
max: "$normalizedVersions.max",
maxInclusive: "$normalizedVersions.maxInclusive",
value: "$normalizedVersions.value"
},
advisories: { $addToSet: "$advisoryKey" },
notes: { $addToSet: "$normalizedVersions.notes" }
}},
{ $match: { "advisories.1": { $exists: true } } },
{ $sort: { "_id.identifier": 1, "_id.type": 1 } }
]);
```
Use this to confirm the merge dedupe logic keeps only one normalized rule per unique constraint.
## 3. Operational checklist
- [ ] Create the indexes in staging before toggling dual-write in production.
- [ ] Capture explain plans and attach them to the release notes.
- [ ] Notify downstream services that consume advisory snapshots about the new `normalizedVersions` array.
- [ ] Update export fixtures once dedupe verification passes.
Additional background and mapper examples live in [Concelier SemVer Merge Playbook](merge_semver_playbook.md).

View File

@@ -1,11 +1,11 @@
# Normalized Versions Rollout Dashboard (Sprint 2 Feedser)
# Normalized Versions Rollout Dashboard (Sprint 2 Concelier)
_Status date: 2025-10-12 17:05 UTC_
This dashboard tracks connector readiness for emitting `AffectedPackage.NormalizedVersions` arrays and highlights upcoming coordination checkpoints. Use it alongside:
- [`src/StellaOps.Feedser.Merge/RANGE_PRIMITIVES_COORDINATION.md`](../../src/StellaOps.Feedser.Merge/RANGE_PRIMITIVES_COORDINATION.md) for detailed guidance and timelines.
- [Feedser SemVer Merge Playbook](merge_semver_playbook.md) §8 for persisted Mongo document shapes.
- [`src/StellaOps.Concelier.Merge/RANGE_PRIMITIVES_COORDINATION.md`](../../src/StellaOps.Concelier.Merge/RANGE_PRIMITIVES_COORDINATION.md) for detailed guidance and timelines.
- [Concelier SemVer Merge Playbook](merge_semver_playbook.md) §8 for persisted Mongo document shapes.
- [Normalized Versions Query Guide](mongo_indices.md) for index/query validation steps.
## Key milestones
@@ -18,28 +18,28 @@ This dashboard tracks connector readiness for emitting `AffectedPackage.Normaliz
| Connector | Owner team | Normalized versions status | Last update | Next action / link |
|-----------|------------|---------------------------|-------------|--------------------|
| Acsc | BE-Conn-ACSC | ❌ Not started mapper pending | 2025-10-11 | Design DTOs + mapper with normalized rule array; see `src/StellaOps.Feedser.Source.Acsc/TASKS.md`. |
| Cccs | BE-Conn-CCCS | ❌ Not started mapper pending | 2025-10-11 | Add normalized SemVer array in canonical mapper; coordinate fixtures per `TASKS.md`. |
| CertBund | BE-Conn-CERTBUND | ✅ Canonical mapper emitting vendor ranges | 2025-10-14 | Normalized vendor range payloads landed alongside telemetry/docs updates; see `src/StellaOps.Feedser.Source.CertBund/TASKS.md`. |
| CertCc | BE-Conn-CERTCC | ⚠️ In progress fetch pipeline DOING | 2025-10-11 | Implement VINCE mapper with SemVer/NEVRA rules; unblock snapshot regeneration; `src/StellaOps.Feedser.Source.CertCc/TASKS.md`. |
| Kev | BE-Conn-KEV | ✅ Normalized catalog/due-date rules verified | 2025-10-12 | Fixtures reconfirmed via `dotnet test src/StellaOps.Feedser.Source.Kev.Tests`; `src/StellaOps.Feedser.Source.Kev/TASKS.md`. |
| Cve | BE-Conn-CVE | ✅ Normalized SemVer rules verified | 2025-10-12 | Snapshot parity green (`dotnet test src/StellaOps.Feedser.Source.Cve.Tests`); `src/StellaOps.Feedser.Source.Cve/TASKS.md`. |
| Ghsa | BE-Conn-GHSA | ⚠️ DOING normalized rollout task active | 2025-10-11 18:45 UTC | Wire `SemVerRangeRuleBuilder` + refresh fixtures; `src/StellaOps.Feedser.Source.Ghsa/TASKS.md`. |
| Osv | BE-Conn-OSV | ✅ SemVer mapper & parity fixtures verified | 2025-10-12 | GHSA parity regression passing (`dotnet test src/StellaOps.Feedser.Source.Osv.Tests`); `src/StellaOps.Feedser.Source.Osv/TASKS.md`. |
| Ics.Cisa | BE-Conn-ICS-CISA | ❌ Not started mapper TODO | 2025-10-11 | Plan SemVer/firmware scheme selection; `src/StellaOps.Feedser.Source.Ics.Cisa/TASKS.md`. |
| Kisa | BE-Conn-KISA | ✅ Landed 2025-10-14 (mapper + telemetry) | 2025-10-11 | Hangul-aware mapper emits normalized rules; see `docs/dev/kisa_connector_notes.md` for localisation/metric details. |
| Ru.Bdu | BE-Conn-BDU | ✅ Raw scheme emitted | 2025-10-14 | Mapper now writes `ru-bdu.raw` normalized rules with provenance + telemetry; `src/StellaOps.Feedser.Source.Ru.Bdu/TASKS.md`. |
| Ru.Nkcki | BE-Conn-Nkcki | ❌ Not started mapper TODO | 2025-10-11 | Similar to BDU; ensure Cyrillic provenance preserved; `src/StellaOps.Feedser.Source.Ru.Nkcki/TASKS.md`. |
| Vndr.Apple | BE-Conn-Apple | ✅ Shipped emitting normalized arrays | 2025-10-11 | Continue fixture/tooling work; `src/StellaOps.Feedser.Source.Vndr.Apple/TASKS.md`. |
| Vndr.Cisco | BE-Conn-Cisco | ✅ SemVer + vendor extensions emitted | 2025-10-14 | Connector outputs SemVer primitives with `cisco.productId` notes; see `CiscoMapper` and fixtures for coverage. |
| Vndr.Msrc | BE-Conn-MSRC | ✅ Map + normalized build rules landed | 2025-10-15 | `MsrcMapper` emits `msrc.build` normalized rules with CVRF references; see `src/StellaOps.Feedser.Source.Vndr.Msrc/TASKS.md`. |
| Nvd | BE-Conn-NVD | ⚠️ Needs follow-up mapper complete but normalized array MR pending | 2025-10-11 | Align CVE notes + normalized payload flag; `src/StellaOps.Feedser.Source.Nvd/TASKS.md`. |
| Acsc | BE-Conn-ACSC | ❌ Not started mapper pending | 2025-10-11 | Design DTOs + mapper with normalized rule array; see `src/StellaOps.Concelier.Connector.Acsc/TASKS.md`. |
| Cccs | BE-Conn-CCCS | ⚠️ Scheduled helper ready, implementation due 2025-10-21 | 2025-10-19 | Apply Merge-provided trailing-version helper to emit `NormalizedVersions`; update mapper/tests per `src/StellaOps.Concelier.Connector.Cccs/TASKS.md`. |
| CertBund | BE-Conn-CERTBUND | ⚠️ Follow-up translate `versions` strings to normalized rules | 2025-10-19 | Build `bis`/`alle` translator + fixtures before 2025-10-22 per `src/StellaOps.Concelier.Connector.CertBund/TASKS.md`. |
| CertCc | BE-Conn-CERTCC | ⚠️ In progress fetch pipeline DOING | 2025-10-11 | Implement VINCE mapper with SemVer/NEVRA rules; unblock snapshot regeneration; `src/StellaOps.Concelier.Connector.CertCc/TASKS.md`. |
| Kev | BE-Conn-KEV | ✅ Normalized catalog/due-date rules verified | 2025-10-12 | Fixtures reconfirmed via `dotnet test src/StellaOps.Concelier.Connector.Kev.Tests`; `src/StellaOps.Concelier.Connector.Kev/TASKS.md`. |
| Cve | BE-Conn-CVE | ✅ Normalized SemVer rules verified | 2025-10-12 | Snapshot parity green (`dotnet test src/StellaOps.Concelier.Connector.Cve.Tests`); `src/StellaOps.Concelier.Connector.Cve/TASKS.md`. |
| Ghsa | BE-Conn-GHSA | ⚠️ DOING normalized rollout task active | 2025-10-11 18:45 UTC | Wire `SemVerRangeRuleBuilder` + refresh fixtures; `src/StellaOps.Concelier.Connector.Ghsa/TASKS.md`. |
| Osv | BE-Conn-OSV | ✅ SemVer mapper & parity fixtures verified | 2025-10-12 | GHSA parity regression passing (`dotnet test src/StellaOps.Concelier.Connector.Osv.Tests`); `src/StellaOps.Concelier.Connector.Osv/TASKS.md`. |
| Ics.Cisa | BE-Conn-ICS-CISA | ⚠️ Decision pending normalize SemVer exacts or escalate scheme | 2025-10-19 | Promote `SemVerPrimitive` outputs into `NormalizedVersions` or file Models ticket by 2025-10-23 (`src/StellaOps.Concelier.Connector.Ics.Cisa/TASKS.md`). |
| Kisa | BE-Conn-KISA | ⚠️ Proposal required firmware scheme due 2025-10-24 | 2025-10-19 | Draft `kisa.build` (or equivalent) scheme with Models, then emit normalized rules; track in `src/StellaOps.Concelier.Connector.Kisa/TASKS.md`. |
| Ru.Bdu | BE-Conn-BDU | ✅ Raw scheme emitted | 2025-10-14 | Mapper now writes `ru-bdu.raw` normalized rules with provenance + telemetry; `src/StellaOps.Concelier.Connector.Ru.Bdu/TASKS.md`. |
| Ru.Nkcki | BE-Conn-Nkcki | ❌ Not started mapper TODO | 2025-10-11 | Similar to BDU; ensure Cyrillic provenance preserved; `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md`. |
| Vndr.Apple | BE-Conn-Apple | ✅ Shipped emitting normalized arrays | 2025-10-11 | Continue fixture/tooling work; `src/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`. |
| Vndr.Cisco | BE-Conn-Cisco | ⚠️ Scheduled normalized rule emission due 2025-10-21 | 2025-10-19 | Use Merge helper to persist `NormalizedVersions` alongside SemVer primitives; see `src/StellaOps.Concelier.Connector.Vndr.Cisco/TASKS.md`. |
| Vndr.Msrc | BE-Conn-MSRC | ✅ Map + normalized build rules landed | 2025-10-15 | `MsrcMapper` emits `msrc.build` normalized rules with CVRF references; see `src/StellaOps.Concelier.Connector.Vndr.Msrc/TASKS.md`. |
| Nvd | BE-Conn-NVD | ⚠️ Needs follow-up mapper complete but normalized array MR pending | 2025-10-11 | Align CVE notes + normalized payload flag; `src/StellaOps.Concelier.Connector.Nvd/TASKS.md`. |
Legend: ✅ complete, ⚠️ in progress/partial, ❌ not started.
## Monitoring
- Merge now emits `feedser.merge.normalized_rules` (tags: `package_type`, `scheme`) and `feedser.merge.normalized_rules_missing` (tags: `package_type`). Track these counters to confirm normalized arrays land as connectors roll out.
- Merge now emits `concelier.merge.normalized_rules` (tags: `package_type`, `scheme`) and `concelier.merge.normalized_rules_missing` (tags: `package_type`). Track these counters to confirm normalized arrays land as connectors roll out.
- Expect `normalized_rules_missing` to trend toward zero as each connector flips on normalized output. Investigate any sustained counts by checking the corresponding module `TASKS.md`.
## Implementation tips

View File

@@ -0,0 +1,8 @@
id: excititor-my-provider
assembly: StellaOps.Excititor.Connectors.MyProvider.dll
entryPoint: StellaOps.Excititor.Connectors.MyProvider.MyConnectorPlugin
description: |
Example connector template. Replace metadata before shipping.
tags:
- excititor
- template

View File

@@ -0,0 +1,12 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<!-- Adjust the relative path when copying this template into a repo -->
<ProjectReference Include="..\..\..\..\src\StellaOps.Excititor.Connectors.Abstractions\StellaOps.Excititor.Connectors.Abstractions.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,72 @@
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Runtime.CompilerServices;
using Microsoft.Extensions.Logging;
using StellaOps.Excititor.Connectors.Abstractions;
using StellaOps.Excititor.Core;
namespace StellaOps.Excititor.Connectors.MyProvider;
public sealed class MyConnector : VexConnectorBase
{
private readonly IEnumerable<IVexConnectorOptionsValidator<MyConnectorOptions>> _validators;
private MyConnectorOptions? _options;
public MyConnector(VexConnectorDescriptor descriptor, ILogger<MyConnector> logger, TimeProvider timeProvider, IEnumerable<IVexConnectorOptionsValidator<MyConnectorOptions>> validators)
: base(descriptor, logger, timeProvider)
{
_validators = validators;
}
public override ValueTask ValidateAsync(VexConnectorSettings settings, CancellationToken cancellationToken)
{
_options = VexConnectorOptionsBinder.Bind(
Descriptor,
settings,
validators: _validators);
LogConnectorEvent(LogLevel.Information, "validate", "MyConnector configuration loaded.",
new Dictionary<string, object?>
{
["catalogUri"] = _options.CatalogUri,
["maxParallelRequests"] = _options.MaxParallelRequests,
});
return ValueTask.CompletedTask;
}
public override IAsyncEnumerable<VexRawDocument> FetchAsync(VexConnectorContext context, CancellationToken cancellationToken)
{
if (_options is null)
{
throw new InvalidOperationException("Connector not validated.");
}
return FetchInternalAsync(context, cancellationToken);
}
private async IAsyncEnumerable<VexRawDocument> FetchInternalAsync(VexConnectorContext context, [EnumeratorCancellation] CancellationToken cancellationToken)
{
LogConnectorEvent(LogLevel.Information, "fetch", "Fetching catalog window...");
// Replace with real HTTP logic.
await Task.Delay(10, cancellationToken);
var metadata = BuildMetadata(builder => builder
.Add("sourceUri", _options!.CatalogUri)
.Add("window", context.Since?.ToString("O") ?? "full"));
yield return CreateRawDocument(
VexDocumentFormat.CsafJson,
new Uri($"{_options.CatalogUri.TrimEnd('/')}/sample.json"),
new byte[] { 0x7B, 0x7D },
metadata);
}
public override ValueTask<VexClaimBatch> NormalizeAsync(VexRawDocument document, CancellationToken cancellationToken)
{
var claims = ImmutableArray<VexClaim>.Empty;
var diagnostics = ImmutableDictionary<string, string>.Empty;
return ValueTask.FromResult(new VexClaimBatch(document, claims, diagnostics));
}
}

View File

@@ -0,0 +1,16 @@
using System.ComponentModel.DataAnnotations;
namespace StellaOps.Excititor.Connectors.MyProvider;
public sealed class MyConnectorOptions
{
[Required]
[Url]
public string CatalogUri { get; set; } = default!;
[Required]
public string ApiKey { get; set; } = default!;
[Range(1, 32)]
public int MaxParallelRequests { get; set; } = 4;
}

View File

@@ -0,0 +1,15 @@
using System.Collections.Generic;
using StellaOps.Excititor.Connectors.Abstractions;
namespace StellaOps.Excititor.Connectors.MyProvider;
public sealed class MyConnectorOptionsValidator : IVexConnectorOptionsValidator<MyConnectorOptions>
{
public void Validate(VexConnectorDescriptor descriptor, MyConnectorOptions options, IList<string> errors)
{
if (!options.CatalogUri.StartsWith("https://", StringComparison.OrdinalIgnoreCase))
{
errors.Add("CatalogUri must use HTTPS.");
}
}
}

View File

@@ -0,0 +1,27 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using StellaOps.Plugin;
using StellaOps.Excititor.Connectors.Abstractions;
using StellaOps.Excititor.Core;
namespace StellaOps.Excititor.Connectors.MyProvider;
public sealed class MyConnectorPlugin : IConnectorPlugin
{
private static readonly VexConnectorDescriptor Descriptor = new(
id: "excititor:my-provider",
kind: VexProviderKind.Vendor,
displayName: "My Provider VEX");
public string Name => Descriptor.DisplayName;
public bool IsAvailable(IServiceProvider services) => true;
public IFeedConnector Create(IServiceProvider services)
{
var logger = services.GetRequiredService<ILogger<MyConnector>>();
var timeProvider = services.GetRequiredService<TimeProvider>();
var validators = services.GetServices<IVexConnectorOptionsValidator<MyConnectorOptions>>();
return new MyConnector(Descriptor, logger, timeProvider, validators);
}
}

65
docs/events/README.md Normal file
View File

@@ -0,0 +1,65 @@
# Event Envelope Schemas
Platform services publish strongly typed events; the JSON Schemas in this directory define those envelopes. File names follow `<event-name>@<version>.json` so producers and consumers can negotiate contracts explicitly.
## Catalog
- `scanner.report.ready@1.json` — emitted by Scanner.WebService once a signed report is persisted (payload embeds the canonical report plus DSSE envelope). Consumers: Notify, UI timeline.
- `scanner.scan.completed@1.json` — emitted alongside the signed report to capture scan outcomes/summary data for downstream automation. Consumers: Notify, Scheduler backfills, UI timelines.
- `scheduler.rescan.delta@1.json` — emitted by Scheduler when BOM-Index diffs require fresh scans. Consumers: Notify, Policy Engine.
- `attestor.logged@1.json` — emitted by Attestor after storing the Rekor inclusion proof. Consumers: UI attestation panel, Governance exports.
Additive payload changes (new optional fields) can stay within the same version. Any breaking change (removing a field, tightening validation, altering semantics) must increment the `@<version>` suffix and update downstream consumers.
## Envelope structure
All event envelopes share the same deterministic header. Use the following table as the quick reference when emitting or parsing events:
| Field | Type | Notes |
|-------|------|-------|
| `eventId` | `uuid` | Must be globally unique per occurrence; producers log duplicates as fatal. |
| `kind` | `string` | Fixed per schema (e.g., `scanner.report.ready`). Downstream services reject unknown kinds or versions. |
| `tenant` | `string` | Multitenant isolation key; mirror the value recorded in queue/Mongo metadata. |
| `ts` | `date-time` | RFC3339 UTC timestamp. Use monotonic clocks or atomic offsets so ordering survives retries. |
| `scope` | `object` | Optional block used when the event concerns a specific image or repository. See schema for required fields (e.g., `repo`, `digest`). |
| `payload` | `object` | Event-specific body. Schemas allow additional properties so producers can add optional hints (e.g., `reportId`, `quietedFindingCount`) without breaking consumers. For scanner events, payloads embed both the canonical report document and the DSSE envelope so consumers can reuse signatures without recomputing them. See `docs/runtime/SCANNER_RUNTIME_READINESS.md` for the runtime consumer checklist covering these hints. |
When adding new optional fields, document the behaviour in the schemas `description` block and update the consumer checklist in the next sprint sync.
## Canonical samples & validation
Reference payloads live under `docs/events/samples/`, mirroring the schema version (`<event-name>@<version>.sample.json`). They illustrate common field combinations, including the optional attributes that downstream teams rely on for UI affordances and audit trails. Scanner samples reuse the exact DSSE envelope checked into `samples/api/reports/report-sample.dsse.json`, and a unit test (`ReportSamplesTests`) guards that the payload/base64 remain canonical.
Run the following loop offline to validate both schemas and samples:
```bash
# Validate schemas (same check as CI)
for schema in docs/events/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
# Validate canonical samples against their schemas
for sample in docs/events/samples/*.sample.json; do
schema="docs/events/$(basename "${sample%.sample.json}").json"
npx ajv validate -c ajv-formats -s "$schema" -d "$sample"
done
```
Consumers can copy the samples into integration tests to guarantee backwards compatibility. When emitting new event versions, include a matching sample and update this README so air-gapped operators stay in sync.
## CI validation
The Docs CI workflow (`.gitea/workflows/docs.yml`) installs `ajv-cli` and compiles every schema on pull requests. Run the same check locally before opening a PR:
```bash
for schema in docs/events/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
```
Tip: run `npm install --no-save ajv ajv-cli ajv-formats` once per clone so `npx` can resolve the tooling offline.
If a schema references additional files, include `-r` flags so CI and local runs stay consistent.
## Working with schemas
- Producers should validate outbound payloads using the matching schema during unit tests.
- Consumers should pin to a specific version and log when encountering unknown versions to catch missing migrations early.
- Store real payload samples under `docs/events/samples/` (mirrors the schema version) and mirror them into `samples/events/` when you need fixtures in integration repositories.
Contact the Platform Events group in Docs Guild if you need help shaping a new event or version strategy.

View File

@@ -0,0 +1,38 @@
{
"$id": "https://stella-ops.org/schemas/events/attestor.logged@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["eventId", "kind", "tenant", "ts", "payload"],
"properties": {
"eventId": {"type": "string", "format": "uuid"},
"kind": {"const": "attestor.logged"},
"tenant": {"type": "string"},
"ts": {"type": "string", "format": "date-time"},
"payload": {
"type": "object",
"required": ["artifactSha256", "rekor", "subject"],
"properties": {
"artifactSha256": {"type": "string"},
"rekor": {
"type": "object",
"required": ["uuid", "url"],
"properties": {
"uuid": {"type": "string"},
"url": {"type": "string", "format": "uri"},
"index": {"type": "integer", "minimum": 0}
}
},
"subject": {
"type": "object",
"required": ["type", "name"],
"properties": {
"type": {"enum": ["sbom", "report", "vex-export"]},
"name": {"type": "string"}
}
}
},
"additionalProperties": true
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,21 @@
{
"eventId": "1fdcaa1a-7a27-4154-8bac-cf813d8f4f6f",
"kind": "attestor.logged",
"tenant": "tenant-acme-solar",
"ts": "2025-10-18T15:45:27+00:00",
"payload": {
"artifactSha256": "sha256:8927d9151ad3f44e61a9c647511f9a31af2b4d245e7e031fe5cb4a0e8211c5d9",
"dsseEnvelopeDigest": "sha256:51c1dd189d5f16cfe87e82841d67b4fbc27d6fa9f5a09af0cd7e18945fb4c2a9",
"rekor": {
"index": 563421,
"url": "https://rekor.example/api/v1/log/entries/d6d0f897e7244edc9cb0bb2c68b05c96",
"uuid": "d6d0f897e7244edc9cb0bb2c68b05c96"
},
"signer": "cosign-stellaops",
"subject": {
"name": "scanner/report/sha256-0f0a8de5c1f93d6716b7249f6f4ea3a8",
"type": "report"
}
},
"attributes": {}
}

View File

@@ -0,0 +1,70 @@
{
"eventId": "6d2d1b77-f3c3-4f70-8a9d-6f2d0c8801ab",
"kind": "scanner.report.ready",
"tenant": "tenant-alpha",
"ts": "2025-10-19T12:34:56+00:00",
"scope": {
"namespace": "acme/edge",
"repo": "api",
"digest": "sha256:feedface",
"labels": {},
"attributes": {}
},
"payload": {
"delta": {
"kev": ["CVE-2024-9999"],
"newCritical": 1
},
"dsse": {
"payload": "eyJyZXBvcnRJZCI6InJlcG9ydC1hYmMiLCJpbWFnZURpZ2VzdCI6InNoYTI1NjpmZWVkZmFjZSIsImdlbmVyYXRlZEF0IjoiMjAyNS0xMC0xOVQxMjozNDo1NiswMDowMCIsInZlcmRpY3QiOiJibG9ja2VkIiwicG9saWN5Ijp7InJldmlzaW9uSWQiOiJyZXYtNDIiLCJkaWdlc3QiOiJkaWdlc3QtMTIzIn0sInN1bW1hcnkiOnsidG90YWwiOjEsImJsb2NrZWQiOjEsIndhcm5lZCI6MCwiaWdub3JlZCI6MCwicXVpZXRlZCI6MH0sInZlcmRpY3RzIjpbeyJmaW5kaW5nSWQiOiJmaW5kaW5nLTEiLCJzdGF0dXMiOiJCbG9ja2VkIiwic2NvcmUiOjQ3LjUsInNvdXJjZVRydXN0IjoiTlZEIiwicmVhY2hhYmlsaXR5IjoicnVudGltZSJ9XSwiaXNzdWVzIjpbXX0=",
"payloadType": "application/vnd.stellaops.report\u002Bjson",
"signatures": [{
"algorithm": "hs256",
"keyId": "test-key",
"signature": "signature-value"
}]
},
"generatedAt": "2025-10-19T12:34:56+00:00",
"links": {
"ui": "https://scanner.example/ui/reports/report-abc"
},
"quietedFindingCount": 0,
"report": {
"generatedAt": "2025-10-19T12:34:56+00:00",
"imageDigest": "sha256:feedface",
"issues": [],
"policy": {
"digest": "digest-123",
"revisionId": "rev-42"
},
"reportId": "report-abc",
"summary": {
"blocked": 1,
"ignored": 0,
"quieted": 0,
"total": 1,
"warned": 0
},
"verdict": "blocked",
"verdicts": [
{
"findingId": "finding-1",
"status": "Blocked",
"score": 47.5,
"sourceTrust": "NVD",
"reachability": "runtime"
}
]
},
"reportId": "report-abc",
"summary": {
"blocked": 1,
"ignored": 0,
"quieted": 0,
"total": 1,
"warned": 0
},
"verdict": "fail"
},
"attributes": {}
}

View File

@@ -0,0 +1,78 @@
{
"eventId": "08a6de24-4a94-4d14-8432-9d14f36f6da3",
"kind": "scanner.scan.completed",
"tenant": "tenant-alpha",
"ts": "2025-10-19T12:34:56+00:00",
"scope": {
"namespace": "acme/edge",
"repo": "api",
"digest": "sha256:feedface",
"labels": {},
"attributes": {}
},
"payload": {
"delta": {
"kev": ["CVE-2024-9999"],
"newCritical": 1
},
"digest": "sha256:feedface",
"dsse": {
"payload": "eyJyZXBvcnRJZCI6InJlcG9ydC1hYmMiLCJpbWFnZURpZ2VzdCI6InNoYTI1NjpmZWVkZmFjZSIsImdlbmVyYXRlZEF0IjoiMjAyNS0xMC0xOVQxMjozNDo1NiswMDowMCIsInZlcmRpY3QiOiJibG9ja2VkIiwicG9saWN5Ijp7InJldmlzaW9uSWQiOiJyZXYtNDIiLCJkaWdlc3QiOiJkaWdlc3QtMTIzIn0sInN1bW1hcnkiOnsidG90YWwiOjEsImJsb2NrZWQiOjEsIndhcm5lZCI6MCwiaWdub3JlZCI6MCwicXVpZXRlZCI6MH0sInZlcmRpY3RzIjpbeyJmaW5kaW5nSWQiOiJmaW5kaW5nLTEiLCJzdGF0dXMiOiJCbG9ja2VkIiwic2NvcmUiOjQ3LjUsInNvdXJjZVRydXN0IjoiTlZEIiwicmVhY2hhYmlsaXR5IjoicnVudGltZSJ9XSwiaXNzdWVzIjpbXX0=",
"payloadType": "application/vnd.stellaops.report\u002Bjson",
"signatures": [{
"algorithm": "hs256",
"keyId": "test-key",
"signature": "signature-value"
}]
},
"findings": [
{
"cve": "CVE-2024-9999",
"id": "finding-1",
"reachability": "runtime",
"severity": "Critical"
}
],
"policy": {
"digest": "digest-123",
"revisionId": "rev-42"
},
"report": {
"generatedAt": "2025-10-19T12:34:56+00:00",
"imageDigest": "sha256:feedface",
"issues": [],
"policy": {
"digest": "digest-123",
"revisionId": "rev-42"
},
"reportId": "report-abc",
"summary": {
"blocked": 1,
"ignored": 0,
"quieted": 0,
"total": 1,
"warned": 0
},
"verdict": "blocked",
"verdicts": [
{
"findingId": "finding-1",
"status": "Blocked",
"score": 47.5,
"sourceTrust": "NVD",
"reachability": "runtime"
}
]
},
"reportId": "report-abc",
"summary": {
"blocked": 1,
"ignored": 0,
"quieted": 0,
"total": 1,
"warned": 0
},
"verdict": "fail"
},
"attributes": {}
}

View File

@@ -0,0 +1,20 @@
{
"eventId": "51d0ef8d-3a17-4af3-b2d7-4ad3db3d9d2c",
"kind": "scheduler.rescan.delta",
"tenant": "tenant-acme-solar",
"ts": "2025-10-18T15:40:11+00:00",
"payload": {
"impactedDigests": [
"sha256:0f0a8de5c1f93d6716b7249f6f4ea3a8db451dc3f3c3ff823f53c9cbde5d5e8a",
"sha256:ab921f9679dd8d0832f3710a4df75dbadbd58c2d95f26a4d4efb2fa8c3d9b4ce"
],
"reason": "policy-change:scoring/v2",
"scheduleId": "rescan-weekly-critical",
"summary": {
"newCritical": 0,
"newHigh": 1,
"total": 4
}
},
"attributes": {}
}

View File

@@ -0,0 +1,84 @@
{
"$id": "https://stella-ops.org/schemas/events/scanner.report.ready@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["eventId", "kind", "tenant", "ts", "scope", "payload"],
"properties": {
"eventId": {"type": "string", "format": "uuid"},
"kind": {"const": "scanner.report.ready"},
"tenant": {"type": "string"},
"ts": {"type": "string", "format": "date-time"},
"scope": {
"type": "object",
"required": ["repo", "digest"],
"properties": {
"namespace": {"type": "string"},
"repo": {"type": "string"},
"digest": {"type": "string"}
}
},
"payload": {
"type": "object",
"required": ["verdict", "delta", "links"],
"properties": {
"reportId": {"type": "string"},
"generatedAt": {"type": "string", "format": "date-time"},
"verdict": {"enum": ["pass", "warn", "fail"]},
"summary": {
"type": "object",
"properties": {
"total": {"type": "integer", "minimum": 0},
"blocked": {"type": "integer", "minimum": 0},
"warned": {"type": "integer", "minimum": 0},
"ignored": {"type": "integer", "minimum": 0},
"quieted": {"type": "integer", "minimum": 0}
},
"additionalProperties": false
},
"delta": {
"type": "object",
"properties": {
"newCritical": {"type": "integer", "minimum": 0},
"newHigh": {"type": "integer", "minimum": 0},
"kev": {"type": "array", "items": {"type": "string"}}
},
"additionalProperties": false
},
"links": {
"type": "object",
"properties": {
"ui": {"type": "string", "format": "uri"},
"rekor": {"type": "string", "format": "uri"}
},
"additionalProperties": false
},
"quietedFindingCount": {"type": "integer", "minimum": 0},
"report": {"type": "object"},
"dsse": {
"type": "object",
"required": ["payloadType", "payload", "signatures"],
"properties": {
"payloadType": {"type": "string"},
"payload": {"type": "string"},
"signatures": {
"type": "array",
"items": {
"type": "object",
"required": ["keyId", "algorithm", "signature"],
"properties": {
"keyId": {"type": "string"},
"algorithm": {"type": "string"},
"signature": {"type": "string"}
},
"additionalProperties": false
}
}
},
"additionalProperties": false
}
},
"additionalProperties": true
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,97 @@
{
"$id": "https://stella-ops.org/schemas/events/scanner.scan.completed@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["eventId", "kind", "tenant", "ts", "scope", "payload"],
"properties": {
"eventId": {"type": "string", "format": "uuid"},
"kind": {"const": "scanner.scan.completed"},
"tenant": {"type": "string"},
"ts": {"type": "string", "format": "date-time"},
"scope": {
"type": "object",
"required": ["repo", "digest"],
"properties": {
"namespace": {"type": "string"},
"repo": {"type": "string"},
"digest": {"type": "string"}
}
},
"payload": {
"type": "object",
"required": ["reportId", "digest", "verdict", "summary"],
"properties": {
"reportId": {"type": "string"},
"digest": {"type": "string"},
"verdict": {"enum": ["pass", "warn", "fail"]},
"summary": {
"type": "object",
"properties": {
"total": {"type": "integer", "minimum": 0},
"blocked": {"type": "integer", "minimum": 0},
"warned": {"type": "integer", "minimum": 0},
"ignored": {"type": "integer", "minimum": 0},
"quieted": {"type": "integer", "minimum": 0}
},
"additionalProperties": false
},
"delta": {
"type": "object",
"properties": {
"newCritical": {"type": "integer", "minimum": 0},
"newHigh": {"type": "integer", "minimum": 0},
"kev": {"type": "array", "items": {"type": "string"}}
},
"additionalProperties": false
},
"policy": {
"type": "object",
"properties": {
"revisionId": {"type": "string"},
"digest": {"type": "string"}
},
"additionalProperties": false
},
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"severity": {"type": "string"},
"cve": {"type": "string"},
"purl": {"type": "string"},
"reachability": {"type": "string"}
},
"additionalProperties": true
}
},
"report": {"type": "object"},
"dsse": {
"type": "object",
"required": ["payloadType", "payload", "signatures"],
"properties": {
"payloadType": {"type": "string"},
"payload": {"type": "string"},
"signatures": {
"type": "array",
"items": {
"type": "object",
"required": ["keyId", "algorithm", "signature"],
"properties": {
"keyId": {"type": "string"},
"algorithm": {"type": "string"},
"signature": {"type": "string"}
},
"additionalProperties": false
}
}
},
"additionalProperties": false
}
},
"additionalProperties": true
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,33 @@
{
"$id": "https://stella-ops.org/schemas/events/scheduler.rescan.delta@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["eventId", "kind", "tenant", "ts", "payload"],
"properties": {
"eventId": {"type": "string", "format": "uuid"},
"kind": {"const": "scheduler.rescan.delta"},
"tenant": {"type": "string"},
"ts": {"type": "string", "format": "date-time"},
"payload": {
"type": "object",
"required": ["scheduleId", "impactedDigests", "summary"],
"properties": {
"scheduleId": {"type": "string"},
"impactedDigests": {
"type": "array",
"items": {"type": "string"}
},
"summary": {
"type": "object",
"properties": {
"newCritical": {"type": "integer", "minimum": 0},
"newHigh": {"type": "integer", "minimum": 0},
"total": {"type": "integer", "minimum": 0}
}
}
},
"additionalProperties": true
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,32 @@
{
"schemaVersion": "notify.channel@1",
"channelId": "channel-slack-sec-ops",
"tenantId": "tenant-01",
"name": "slack:sec-ops",
"type": "slack",
"displayName": "SecOps Slack",
"description": "Primary incident response channel.",
"config": {
"secretRef": "ref://notify/channels/slack/sec-ops",
"target": "#sec-ops",
"properties": {
"workspace": "stellaops-sec"
},
"limits": {
"concurrency": 2,
"requestsPerMinute": 60,
"timeout": "PT10S"
}
},
"enabled": true,
"labels": {
"team": "secops"
},
"metadata": {
"createdByTask": "NOTIFY-MODELS-15-102"
},
"createdBy": "ops:amir",
"createdAt": "2025-10-18T17:02:11+00:00",
"updatedBy": "ops:amir",
"updatedAt": "2025-10-18T17:45:00+00:00"
}

View File

@@ -0,0 +1,46 @@
{
"items": [
{
"deliveryId": "delivery-7f3b6c51",
"tenantId": "tenant-acme",
"ruleId": "rule-critical-slack",
"actionId": "slack-secops",
"eventId": "4f6e9c09-01b4-4c2a-8a57-3d06de182d74",
"kind": "scanner.report.ready",
"status": "Sent",
"statusReason": null,
"rendered": {
"channelType": "Slack",
"format": "Slack",
"target": "#sec-alerts",
"title": "Critical findings detected",
"body": "{\"text\":\"Critical findings detected\",\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"*Critical findings detected*\\n1 new critical finding across 2 images.\"}},{\"type\":\"context\",\"elements\":[{\"type\":\"mrkdwn\",\"text\":\"Preview generated 2025-10-19T16:23:41.889Z · Trace `trace-58c212`\"}]}]}",
"summary": "1 new critical finding across 2 images.",
"textBody": "1 new critical finding across 2 images.\nTrace: trace-58c212",
"locale": "en-us",
"bodyHash": "febf9b2a630d862b07f4390edfbf31f5e8b836529f5232c491f4b3f6dba4a4b2",
"attachments": []
},
"attempts": [
{
"timestamp": "2025-10-19T16:23:42.112Z",
"status": "Succeeded",
"statusCode": 200,
"reason": null
}
],
"metadata": {
"channelType": "slack",
"target": "#sec-alerts",
"previewProvider": "fallback",
"traceId": "trace-58c212",
"slack.channel": "#sec-alerts"
},
"createdAt": "2025-10-19T16:23:41.889Z",
"sentAt": "2025-10-19T16:23:42.101Z",
"completedAt": "2025-10-19T16:23:42.112Z"
}
],
"count": 1,
"continuationToken": "2025-10-19T16:23:41.889Z|tenant-acme:delivery-7f3b6c51"
}

View File

@@ -0,0 +1,34 @@
{
"eventId": "8a8d6a2f-9315-49fe-9d52-8fec79ec7aeb",
"kind": "scanner.report.ready",
"version": "1",
"tenant": "tenant-01",
"ts": "2025-10-19T03:58:42+00:00",
"actor": "scanner-webservice",
"scope": {
"namespace": "prod-payment",
"repo": "ghcr.io/acme/api",
"digest": "sha256:79c1f9e5...",
"labels": {
"environment": "production"
},
"attributes": {}
},
"payload": {
"delta": {
"kev": [
"CVE-2025-40123"
],
"newCritical": 1,
"newHigh": 2
},
"links": {
"rekor": "https://rekor.stella.local/api/v1/log/entries/1",
"ui": "https://ui.stella.local/reports/sha256-79c1f9e5"
},
"verdict": "fail"
},
"attributes": {
"correlationId": "scan-23a6"
}
}

View File

@@ -0,0 +1,63 @@
{
"schemaVersion": "notify.rule@1",
"ruleId": "rule-secops-critical",
"tenantId": "tenant-01",
"name": "Critical digests to SecOps",
"description": "Escalate KEV-tagged findings to on-call feeds.",
"enabled": true,
"match": {
"eventKinds": [
"scanner.report.ready",
"scheduler.rescan.delta"
],
"namespaces": [
"prod-*"
],
"repositories": [],
"digests": [],
"labels": [],
"componentPurls": [],
"minSeverity": "high",
"verdicts": [],
"kevOnly": true,
"vex": {
"includeAcceptedJustifications": false,
"includeRejectedJustifications": false,
"includeUnknownJustifications": false,
"justificationKinds": [
"component-remediated",
"not-affected"
]
}
},
"actions": [
{
"actionId": "email-digest",
"channel": "email:soc",
"digest": "hourly",
"template": "digest",
"enabled": true,
"metadata": {
"locale": "en-us"
}
},
{
"actionId": "slack-oncall",
"channel": "slack:sec-ops",
"template": "concise",
"throttle": "PT5M",
"metadata": {},
"enabled": true
}
],
"labels": {
"team": "secops"
},
"metadata": {
"source": "sprint-15"
},
"createdBy": "ops:zoya",
"createdAt": "2025-10-19T04:12:27+00:00",
"updatedBy": "ops:zoya",
"updatedAt": "2025-10-19T04:45:03+00:00"
}

View File

@@ -0,0 +1,19 @@
{
"schemaVersion": "notify.template@1",
"templateId": "tmpl-slack-concise",
"tenantId": "tenant-01",
"channelType": "slack",
"key": "concise",
"locale": "en-us",
"body": "{{severity_icon payload.delta.newCritical}} {{summary}}",
"description": "Slack concise message for high severity findings.",
"renderMode": "markdown",
"format": "slack",
"metadata": {
"version": "2025-10-19"
},
"createdBy": "ops:zoya",
"createdAt": "2025-10-19T05:00:00+00:00",
"updatedBy": "ops:zoya",
"updatedAt": "2025-10-19T05:45:00+00:00"
}

View File

@@ -0,0 +1,73 @@
{
"$id": "https://stella-ops.org/schemas/notify/notify-channel@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Notify Channel",
"type": "object",
"required": [
"schemaVersion",
"channelId",
"tenantId",
"name",
"type",
"config",
"enabled",
"createdAt",
"updatedAt"
],
"properties": {
"schemaVersion": {"type": "string", "const": "notify.channel@1"},
"channelId": {"type": "string"},
"tenantId": {"type": "string"},
"name": {"type": "string"},
"type": {
"type": "string",
"enum": ["slack", "teams", "email", "webhook", "custom"]
},
"displayName": {"type": "string"},
"description": {"type": "string"},
"config": {"$ref": "#/$defs/channelConfig"},
"enabled": {"type": "boolean"},
"labels": {"$ref": "#/$defs/stringMap"},
"metadata": {"$ref": "#/$defs/stringMap"},
"createdBy": {"type": "string"},
"createdAt": {"type": "string", "format": "date-time"},
"updatedBy": {"type": "string"},
"updatedAt": {"type": "string", "format": "date-time"}
},
"additionalProperties": false,
"$defs": {
"channelConfig": {
"type": "object",
"required": ["secretRef"],
"properties": {
"secretRef": {"type": "string"},
"target": {"type": "string"},
"endpoint": {"type": "string", "format": "uri"},
"properties": {"$ref": "#/$defs/stringMap"},
"limits": {"$ref": "#/$defs/channelLimits"}
},
"additionalProperties": false
},
"channelLimits": {
"type": "object",
"properties": {
"concurrency": {"type": "integer", "minimum": 1},
"requestsPerMinute": {"type": "integer", "minimum": 1},
"timeout": {
"type": "string",
"pattern": "^P(T.*)?$",
"description": "ISO 8601 duration"
},
"maxBatchSize": {"type": "integer", "minimum": 1}
},
"additionalProperties": false
},
"stringMap": {
"type": "object",
"patternProperties": {
".*": {"type": "string"}
},
"additionalProperties": false
}
}
}

View File

@@ -0,0 +1,56 @@
{
"$id": "https://stella-ops.org/schemas/notify/notify-event@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Notify Event Envelope",
"type": "object",
"required": ["eventId", "kind", "tenant", "ts", "payload"],
"properties": {
"eventId": {"type": "string", "format": "uuid"},
"kind": {
"type": "string",
"description": "Event kind identifier (e.g. scanner.report.ready).",
"enum": [
"scanner.report.ready",
"scanner.scan.completed",
"scheduler.rescan.delta",
"attestor.logged",
"zastava.admission",
"feedser.export.completed",
"vexer.export.completed"
]
},
"version": {"type": "string"},
"tenant": {"type": "string"},
"ts": {"type": "string", "format": "date-time"},
"actor": {"type": "string"},
"scope": {
"type": "object",
"properties": {
"namespace": {"type": "string"},
"repo": {"type": "string"},
"digest": {"type": "string"},
"component": {"type": "string"},
"image": {"type": "string"},
"labels": {"$ref": "#/$defs/stringMap"},
"attributes": {"$ref": "#/$defs/stringMap"}
},
"additionalProperties": false
},
"payload": {
"type": "object",
"description": "Event specific body; see individual schemas for shapes.",
"additionalProperties": true
},
"attributes": {"$ref": "#/$defs/stringMap"}
},
"additionalProperties": false,
"$defs": {
"stringMap": {
"type": "object",
"patternProperties": {
".*": {"type": "string"}
},
"additionalProperties": false
}
}
}

View File

@@ -0,0 +1,96 @@
{
"$id": "https://stella-ops.org/schemas/notify/notify-rule@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Notify Rule",
"type": "object",
"required": [
"schemaVersion",
"ruleId",
"tenantId",
"name",
"enabled",
"match",
"actions",
"createdAt",
"updatedAt"
],
"properties": {
"schemaVersion": {"type": "string", "const": "notify.rule@1"},
"ruleId": {"type": "string"},
"tenantId": {"type": "string"},
"name": {"type": "string"},
"description": {"type": "string"},
"enabled": {"type": "boolean"},
"match": {"$ref": "#/$defs/ruleMatch"},
"actions": {
"type": "array",
"minItems": 1,
"items": {"$ref": "#/$defs/ruleAction"}
},
"labels": {"$ref": "#/$defs/stringMap"},
"metadata": {"$ref": "#/$defs/stringMap"},
"createdBy": {"type": "string"},
"createdAt": {"type": "string", "format": "date-time"},
"updatedBy": {"type": "string"},
"updatedAt": {"type": "string", "format": "date-time"}
},
"additionalProperties": false,
"$defs": {
"ruleMatch": {
"type": "object",
"properties": {
"eventKinds": {"$ref": "#/$defs/stringArray"},
"namespaces": {"$ref": "#/$defs/stringArray"},
"repositories": {"$ref": "#/$defs/stringArray"},
"digests": {"$ref": "#/$defs/stringArray"},
"labels": {"$ref": "#/$defs/stringArray"},
"componentPurls": {"$ref": "#/$defs/stringArray"},
"minSeverity": {"type": "string"},
"verdicts": {"$ref": "#/$defs/stringArray"},
"kevOnly": {"type": "boolean"},
"vex": {"$ref": "#/$defs/ruleMatchVex"}
},
"additionalProperties": false
},
"ruleMatchVex": {
"type": "object",
"properties": {
"includeAcceptedJustifications": {"type": "boolean"},
"includeRejectedJustifications": {"type": "boolean"},
"includeUnknownJustifications": {"type": "boolean"},
"justificationKinds": {"$ref": "#/$defs/stringArray"}
},
"additionalProperties": false
},
"ruleAction": {
"type": "object",
"required": ["actionId", "channel", "enabled"],
"properties": {
"actionId": {"type": "string"},
"channel": {"type": "string"},
"template": {"type": "string"},
"digest": {"type": "string"},
"throttle": {
"type": "string",
"pattern": "^P(T.*)?$",
"description": "ISO 8601 duration"
},
"locale": {"type": "string"},
"enabled": {"type": "boolean"},
"metadata": {"$ref": "#/$defs/stringMap"}
},
"additionalProperties": false
},
"stringArray": {
"type": "array",
"items": {"type": "string"}
},
"stringMap": {
"type": "object",
"patternProperties": {
".*": {"type": "string"}
},
"additionalProperties": false
}
}
}

View File

@@ -0,0 +1,55 @@
{
"$id": "https://stella-ops.org/schemas/notify/notify-template@1.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Notify Template",
"type": "object",
"required": [
"schemaVersion",
"templateId",
"tenantId",
"channelType",
"key",
"locale",
"body",
"renderMode",
"format",
"createdAt",
"updatedAt"
],
"properties": {
"schemaVersion": {"type": "string", "const": "notify.template@1"},
"templateId": {"type": "string"},
"tenantId": {"type": "string"},
"channelType": {
"type": "string",
"enum": ["slack", "teams", "email", "webhook", "custom"]
},
"key": {"type": "string"},
"locale": {"type": "string"},
"body": {"type": "string"},
"description": {"type": "string"},
"renderMode": {
"type": "string",
"enum": ["markdown", "html", "adaptiveCard", "plainText", "json"]
},
"format": {
"type": "string",
"enum": ["slack", "teams", "email", "webhook", "json"]
},
"metadata": {"$ref": "#/$defs/stringMap"},
"createdBy": {"type": "string"},
"createdAt": {"type": "string", "format": "date-time"},
"updatedBy": {"type": "string"},
"updatedAt": {"type": "string", "format": "date-time"}
},
"additionalProperties": false,
"$defs": {
"stringMap": {
"type": "object",
"patternProperties": {
".*": {"type": "string"}
},
"additionalProperties": false
}
}
}

View File

@@ -1,97 +1,97 @@
# Authority Backup & Restore Runbook
## Scope
- **Applies to:** StellaOps Authority deployments running the official `ops/authority/docker-compose.authority.yaml` stack or equivalent Kubernetes packaging.
- **Artifacts covered:** MongoDB (`stellaops-authority` database), Authority configuration (`etc/authority.yaml`), plugin manifests under `etc/authority.plugins/`, and signing key material stored in the `authority-keys` volume (defaults to `/app/keys` inside the container).
- **Frequency:** Run the full procedure prior to upgrades, before rotating keys, and at least once per 24h in production. Store snapshots in an encrypted, access-controlled vault.
## Inventory Checklist
| Component | Location (compose default) | Notes |
| --- | --- | --- |
| Mongo data | `mongo-data` volume (`/var/lib/docker/volumes/.../mongo-data`) | Contains all Authority collections (`AuthorityUser`, `AuthorityClient`, `AuthorityToken`, etc.). |
| Configuration | `etc/authority.yaml` | Mounted read-only into the container at `/etc/authority.yaml`. |
| Plugin manifests | `etc/authority.plugins/*.yaml` | Includes `standard.yaml` with `tokenSigning.keyDirectory`. |
| Signing keys | `authority-keys` volume -> `/app/keys` | Path is derived from `tokenSigning.keyDirectory` (defaults to `../keys` relative to the manifest). |
> **TIP:** Confirm the deployed key directory via `tokenSigning.keyDirectory` in `etc/authority.plugins/standard.yaml`; some installations relocate keys to `/var/lib/stellaops/authority/keys`.
## Hot Backup (no downtime)
1. **Create output directory:** `mkdir -p backup/$(date +%Y-%m-%d)` on the host.
2. **Dump Mongo:**
```bash
docker compose -f ops/authority/docker-compose.authority.yaml exec mongo \
mongodump --archive=/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz \
--gzip --db stellaops-authority
docker compose -f ops/authority/docker-compose.authority.yaml cp \
mongo:/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz backup/
```
The `mongodump` archive preserves indexes and can be restored with `mongorestore --archive --gzip`.
3. **Capture configuration + manifests:**
```bash
cp etc/authority.yaml backup/
rsync -a etc/authority.plugins/ backup/authority.plugins/
```
4. **Export signing keys:** the compose file maps `authority-keys` to a local Docker volume. Snapshot it without stopping the service:
```bash
docker run --rm \
-v authority-keys:/keys \
-v "$(pwd)/backup:/backup" \
busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys .
```
5. **Checksum:** generate SHA-256 digests for every file and store them alongside the artefacts.
6. **Encrypt & upload:** wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault.
## Cold Backup (planned downtime)
1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards).
2. Stop services:
```bash
docker compose -f ops/authority/docker-compose.authority.yaml down
```
3. Back up volumes directly using `tar`:
```bash
docker run --rm -v mongo-data:/data -v "$(pwd)/backup:/backup" \
busybox tar czf /backup/mongo-data-$(date +%Y%m%d).tar.gz -C /data .
docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys .
```
4. Copy configuration + manifests as in the hot backup (steps 36).
5. Restart services and verify health:
```bash
docker compose -f ops/authority/docker-compose.authority.yaml up -d
curl -fsS http://localhost:8080/ready
```
## Restore Procedure
1. **Provision clean volumes:** remove existing volumes if youre rebuilding a node (`docker volume rm mongo-data authority-keys`), then recreate the compose stack so empty volumes exist.
2. **Restore Mongo:**
```bash
docker compose exec -T mongo mongorestore --archive --gzip --drop < backup/authority-YYYYMMDDTHHMMSSZ.gz
```
Use `--drop` to replace collections; omit if doing a partial restore.
3. **Restore configuration/manifests:** copy `authority.yaml` and `authority.plugins/*` into place before starting the Authority container.
4. **Restore signing keys:** untar into the mounted volume:
```bash
docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys
```
Ensure file permissions remain `600` for private keys (`chmod -R 600`).
5. **Start services & validate:**
```bash
docker compose up -d
curl -fsS http://localhost:8080/health
```
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`.
## Disaster Recovery Notes
- **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
- **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice.
- **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Restoring across major versions requires a compatibility review.
## Verification Checklist
- [ ] `/ready` reports all identity providers ready.
- [ ] OAuth flows issue tokens signed by the restored keys.
- [ ] `PluginRegistrationSummary` logs expected providers on startup.
- [ ] Revocation manifest export (`dotnet run --project src/StellaOps.Authority`) succeeds.
- [ ] Monitoring dashboards show metrics resuming (see OPS5 deliverables).
# Authority Backup & Restore Runbook
## Scope
- **Applies to:** StellaOps Authority deployments running the official `ops/authority/docker-compose.authority.yaml` stack or equivalent Kubernetes packaging.
- **Artifacts covered:** MongoDB (`stellaops-authority` database), Authority configuration (`etc/authority.yaml`), plugin manifests under `etc/authority.plugins/`, and signing key material stored in the `authority-keys` volume (defaults to `/app/keys` inside the container).
- **Frequency:** Run the full procedure prior to upgrades, before rotating keys, and at least once per 24h in production. Store snapshots in an encrypted, access-controlled vault.
## Inventory Checklist
| Component | Location (compose default) | Notes |
| --- | --- | --- |
| Mongo data | `mongo-data` volume (`/var/lib/docker/volumes/.../mongo-data`) | Contains all Authority collections (`AuthorityUser`, `AuthorityClient`, `AuthorityToken`, etc.). |
| Configuration | `etc/authority.yaml` | Mounted read-only into the container at `/etc/authority.yaml`. |
| Plugin manifests | `etc/authority.plugins/*.yaml` | Includes `standard.yaml` with `tokenSigning.keyDirectory`. |
| Signing keys | `authority-keys` volume -> `/app/keys` | Path is derived from `tokenSigning.keyDirectory` (defaults to `../keys` relative to the manifest). |
> **TIP:** Confirm the deployed key directory via `tokenSigning.keyDirectory` in `etc/authority.plugins/standard.yaml`; some installations relocate keys to `/var/lib/stellaops/authority/keys`.
## Hot Backup (no downtime)
1. **Create output directory:** `mkdir -p backup/$(date +%Y-%m-%d)` on the host.
2. **Dump Mongo:**
```bash
docker compose -f ops/authority/docker-compose.authority.yaml exec mongo \
mongodump --archive=/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz \
--gzip --db stellaops-authority
docker compose -f ops/authority/docker-compose.authority.yaml cp \
mongo:/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz backup/
```
The `mongodump` archive preserves indexes and can be restored with `mongorestore --archive --gzip`.
3. **Capture configuration + manifests:**
```bash
cp etc/authority.yaml backup/
rsync -a etc/authority.plugins/ backup/authority.plugins/
```
4. **Export signing keys:** the compose file maps `authority-keys` to a local Docker volume. Snapshot it without stopping the service:
```bash
docker run --rm \
-v authority-keys:/keys \
-v "$(pwd)/backup:/backup" \
busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys .
```
5. **Checksum:** generate SHA-256 digests for every file and store them alongside the artefacts.
6. **Encrypt & upload:** wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault.
## Cold Backup (planned downtime)
1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards).
2. Stop services:
```bash
docker compose -f ops/authority/docker-compose.authority.yaml down
```
3. Back up volumes directly using `tar`:
```bash
docker run --rm -v mongo-data:/data -v "$(pwd)/backup:/backup" \
busybox tar czf /backup/mongo-data-$(date +%Y%m%d).tar.gz -C /data .
docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys .
```
4. Copy configuration + manifests as in the hot backup (steps 36).
5. Restart services and verify health:
```bash
docker compose -f ops/authority/docker-compose.authority.yaml up -d
curl -fsS http://localhost:8080/ready
```
## Restore Procedure
1. **Provision clean volumes:** remove existing volumes if youre rebuilding a node (`docker volume rm mongo-data authority-keys`), then recreate the compose stack so empty volumes exist.
2. **Restore Mongo:**
```bash
docker compose exec -T mongo mongorestore --archive --gzip --drop < backup/authority-YYYYMMDDTHHMMSSZ.gz
```
Use `--drop` to replace collections; omit if doing a partial restore.
3. **Restore configuration/manifests:** copy `authority.yaml` and `authority.plugins/*` into place before starting the Authority container.
4. **Restore signing keys:** untar into the mounted volume:
```bash
docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys
```
Ensure file permissions remain `600` for private keys (`chmod -R 600`).
5. **Start services & validate:**
```bash
docker compose up -d
curl -fsS http://localhost:8080/health
```
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`.
## Disaster Recovery Notes
- **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
- **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice.
- **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Driver 3.5.0 requires MongoDB **4.2+**—clusters still on 4.0 must be upgraded before restore, and future driver releases will drop 4.0 entirely. citeturn1open1
## Verification Checklist
- [ ] `/ready` reports all identity providers ready.
- [ ] OAuth flows issue tokens signed by the restored keys.
- [ ] `PluginRegistrationSummary` logs expected providers on startup.
- [ ] Revocation manifest export (`dotnet run --project src/StellaOps.Authority`) succeeds.
- [ ] Monitoring dashboards show metrics resuming (see OPS5 deliverables).

View File

@@ -1,77 +1,77 @@
# Feedser Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `feedser.yaml`) with an `apple` section. Example baseline:
```yaml
feedser:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Feedser automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Feedser workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Feedser.Source.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `feedser.source.http.requests_total{feedser_source="vndr-apple"}` should increase for both index and detail phases.
- `feedser.source.http.failures_total{feedser_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Feedser Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(feedser_source_http_requests_total{feedser_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Feedser.WebService` now exports `StellaOps.Feedser.Source.Vndr.Apple` alongside existing Feedser meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/StellaOps.Feedser.Source.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/StellaOps.Feedser.Source.Vndr.Apple.Tests/StellaOps.Feedser.Source.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/StellaOps.Feedser.Source.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.
# Concelier Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline:
```yaml
concelier:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases.
- `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.

View File

@@ -1,150 +1,150 @@
# Feedser Authority Audit Runbook
_Last updated: 2025-10-12_
This runbook helps operators verify and monitor the StellaOps Feedser ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
## 1. Prerequisites
- Authority integration is enabled in `feedser.yaml` (or via `FEEDSER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (`feedser.telemetry.*`) or container stdout is shipped to your SIEM.
- Operators have access to the Feedser job trigger endpoints via CLI or REST for smoke tests.
### Configuration snippet
```yaml
feedser:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://feedser"
requiredScopes:
- "feedser.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "feedser-jobs"
clientSecretFile: "/run/secrets/feedser_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
```
> Store secrets outside source control. Feedser reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
### Resilience tuning
- **Connected sites:** keep the default 1s / 2s / 5s retry ladder so Feedser retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (1530minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Feedser will fail fast but keep deterministic logs.
- Feedser resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `feedser.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
## 2. Key Signals
### 2.1 Audit log channel
Feedser emits structured audit entries via the `Feedser.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
```
Feedser authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=feedser-cli scopes=feedser.jobs.trigger bypass=False remote=10.1.4.7
```
| Field | Sample value | Meaning |
|--------------|-------------------------|------------------------------------------------------------------------------------------|
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
| `clientId` | `feedser-cli` | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim). |
| `scopes` | `feedser.jobs.trigger` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- `status=401 AND bypass=True` bypass network accepted an unauthenticated call (should be temporary during rollout).
- `status=202 AND scopes="(none)"` a token without scopes triggered a job; tighten client configuration.
- Spike in `clientId="(none)"` indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
### 2.2 Metrics
Feedser publishes counters under the OTEL meter `StellaOps.Feedser.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
| Metric name | Description | PromQL example |
|-------------------------------|----------------------------------------------------|----------------|
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipelines generated metric names.
Correlate audit logs with the following global meter exported via `Feedser.SourceDiagnostics`:
- `feedser.source.http.requests_total{feedser_source="jobs-run"}` ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Feedser Jobs” board with the above counters plus a table of recent audit log entries.
## 3. Alerting Guidance
1. **Unauthorized bypass attempt**
- Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
2. **Missing scopes**
- Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
- Action: audit Authority client registration; ensure `requiredScopes` includes `feedser.jobs.trigger`.
3. **Trigger failure surge**
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
- Action: inspect correlated audit entries and `Feedser.Telemetry` traces for job execution errors.
4. **Conflict spike**
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
5. **Authority offline**
- Watch `Feedser.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
## 4. Rollout & Verification Procedure
1. **Pre-checks**
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
- Validate Authority issuer metadata is reachable from Feedser (`curl https://authority.internal/.well-known/openid-configuration` from the host).
2. **Smoke test with valid token**
- Obtain a token via CLI: `stella auth login --scope feedser.jobs.trigger`.
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://feedser.internal/jobs/definitions`.
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=feedser.jobs.trigger`.
3. **Negative test without token**
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
4. **Bypass check (if applicable)**
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
5. **Metrics validation**
- Ensure `web.jobs.triggered` counter increments during accepted runs.
- Exporters should show corresponding spans (`feedser.job.trigger`) if tracing is enabled.
## 5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---------|----------------|-------------|
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Feedser. |
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`feedser.jobs.trigger`) and ensure the token audience matches `audiences` config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `feedser.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Feedser.WebService.Jobs` meter. |
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Feedser job logs, re-run with tracing enabled, validate Authority latency. |
## 6. References
- `docs/21_INSTALL_GUIDE.md` Authority configuration quick start.
- `docs/17_SECURITY_HARDENING_GUIDE.md` Security guardrails and enforcement deadlines.
- `docs/ops/authority-monitoring.md` Authority-side monitoring and alerting playbook.
- `StellaOps.Feedser.WebService/Filters/JobAuthorizationAuditFilter.cs` source of audit log fields.
# Concelier Authority Audit Runbook
_Last updated: 2025-10-12_
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
## 1. Prerequisites
- Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM.
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
### Configuration snippet
```yaml
concelier:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://concelier"
requiredScopes:
- "concelier.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "concelier-jobs"
clientSecretFile: "/run/secrets/concelier_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
```
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
### Resilience tuning
- **Connected sites:** keep the default 1s / 2s / 5s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (1530minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
## 2. Key Signals
### 2.1 Audit log channel
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
```
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger bypass=False remote=10.1.4.7
```
| Field | Sample value | Meaning |
|--------------|-------------------------|------------------------------------------------------------------------------------------|
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
| `clientId` | `concelier-cli` | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim). |
| `scopes` | `concelier.jobs.trigger` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- `status=401 AND bypass=True` bypass network accepted an unauthenticated call (should be temporary during rollout).
- `status=202 AND scopes="(none)"` a token without scopes triggered a job; tighten client configuration.
- Spike in `clientId="(none)"` indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
### 2.2 Metrics
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
| Metric name | Description | PromQL example |
|-------------------------------|----------------------------------------------------|----------------|
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipelines generated metric names.
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
## 3. Alerting Guidance
1. **Unauthorized bypass attempt**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
2. **Missing scopes**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
- Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`.
3. **Trigger failure surge**
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
- Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
4. **Conflict spike**
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
5. **Authority offline**
- Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
## 4. Rollout & Verification Procedure
1. **Pre-checks**
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
- Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host).
2. **Smoke test with valid token**
- Obtain a token via CLI: `stella auth login --scope concelier.jobs.trigger`.
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger`.
3. **Negative test without token**
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
4. **Bypass check (if applicable)**
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
5. **Metrics validation**
- Ensure `web.jobs.triggered` counter increments during accepted runs.
- Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
## 5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---------|----------------|-------------|
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
## 6. References
- `docs/21_INSTALL_GUIDE.md` Authority configuration quick start.
- `docs/17_SECURITY_HARDENING_GUIDE.md` Security guardrails and enforcement deadlines.
- `docs/ops/authority-monitoring.md` Authority-side monitoring and alerting playbook.
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` source of audit log fields.

View File

@@ -1,72 +1,72 @@
# Feedser CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Feedser options before restarting workers. Example `feedser.yaml` snippet:
```yaml
feedser:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Feedser.Source.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `feedser.source.http.requests{feedser.source="cccs"}`
- `feedser.source.http.failures{feedser.source="cccs"}`
- `feedser.source.http.duration{feedser.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(feedser_source_http_duration_bucket{feedser_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Feedser workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/StellaOps.Feedser.Source.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Cccs.Tests/StellaOps.Feedser.Source.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.
# Concelier CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
```yaml
concelier:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="cccs"}`
- `concelier.source.http.failures{concelier.source="cccs"}`
- `concelier.source.http.duration{concelier.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.

View File

@@ -1,146 +1,146 @@
# Feedser CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Feedser CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `feedser.yaml` fragment:
```yaml
feedser:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Feedser.Source.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `feedser.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(feedser_source_http_duration_bucket{feedser_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Feedser can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.
# Concelier CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `concelier.yaml` fragment:
```yaml
concelier:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `concelier.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Concelier can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.

View File

@@ -1,94 +1,94 @@
# Feedser Cisco PSIRT Connector OAuth Provisioning SOP
_Last updated: 2025-10-14_
## 1. Scope
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Feedser Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
## 2. Prerequisites
- Active Cisco.com (CCO) account with access to the Cisco API Console.
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
- Feedser configuration location (typically `/etc/stella/feedser.yaml` in production) or Offline Kit secret bundle staging directory.
## 3. Provisioning workflow
1. **Register the application**
- Sign in at <https://apiconsole.cisco.com>.
- Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
- Record the generated `clientId` and `clientSecret` in the Ops vault.
2. **Verify token issuance**
- Request an access token with:
```bash
curl -s https://id.cisco.com/oauth2/default/v1/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}"
```
- Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
- Preserve the response only long enough to validate syntax; do **not** persist tokens.
3. **Authorize Feedser runtime**
- Update `feedser:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
- For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platforms sealed secret format.
4. **Connectivity validation**
- From the Feedser control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
- Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
## 4. Rotation SOP
| Step | Owner | Notes |
| --- | --- | --- |
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
| 3. Stage dual credentials | Ops + Feedser On-Call | Publish new credentials to secret store alongside current pair. |
| 4. Cut over | Feedser On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
**Automation hooks**
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Feedser Cisco when opening a rotation task.
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
## 5. Offline Kit packaging
1. Generate the credential bundle using the Offline Kit CLI:
`offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
## 6. Quota and throttling guidance
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5000 requests/day per application.citeturn0search0turn3search6
- Feedser fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
- Telemetry to watch: `feedser.source.http.requests{feedser.source="vndr-cisco"}`, `feedser.source.http.failures{...}`, and connector-specific metrics once implemented.
## 7. Telemetry & Monitoring
- **Metrics (Meter `StellaOps.Feedser.Source.Vndr.Cisco`)**
- `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
- `cisco.parse.success`, `cisco.parse.failures`
- `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `feedser.source.http.requests{feedser.source="vndr-cisco"}`
- `feedser.source.http.failures{feedser.source="vndr-cisco"}`
- `feedser.source.http.duration{feedser.source="vndr-cisco"}`
- **Structured logs**
- `Cisco fetch completed date=… pages=… added=…` (info)
- `Cisco parse completed parsed=… failures=…` (info)
- `Cisco map completed mapped=… failures=…` (info)
- Warnings surface when DTO serialization fails or GridFS payload is missing.
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
## 8. Incident response
- **Token compromise** revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
- **Persistent 401/403** confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
- **429 spikes** inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
## 9. References
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
- openVuln API rate limit documentation.citeturn0search0turn3search6
# Concelier Cisco PSIRT Connector OAuth Provisioning SOP
_Last updated: 2025-10-14_
## 1. Scope
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
## 2. Prerequisites
- Active Cisco.com (CCO) account with access to the Cisco API Console.
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
- Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory.
## 3. Provisioning workflow
1. **Register the application**
- Sign in at <https://apiconsole.cisco.com>.
- Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
- Record the generated `clientId` and `clientSecret` in the Ops vault.
2. **Verify token issuance**
- Request an access token with:
```bash
curl -s https://id.cisco.com/oauth2/default/v1/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}"
```
- Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
- Preserve the response only long enough to validate syntax; do **not** persist tokens.
3. **Authorize Concelier runtime**
- Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
- For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platforms sealed secret format.
4. **Connectivity validation**
- From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
- Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
## 4. Rotation SOP
| Step | Owner | Notes |
| --- | --- | --- |
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
| 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. |
| 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
**Automation hooks**
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task.
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
## 5. Offline Kit packaging
1. Generate the credential bundle using the Offline Kit CLI:
`offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
## 6. Quota and throttling guidance
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5000 requests/day per application.citeturn0search0turn3search6
- Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
- Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented.
## 7. Telemetry & Monitoring
- **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)**
- `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
- `cisco.parse.success`, `cisco.parse.failures`
- `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="vndr-cisco"}`
- `concelier.source.http.failures{concelier.source="vndr-cisco"}`
- `concelier.source.http.duration{concelier.source="vndr-cisco"}`
- **Structured logs**
- `Cisco fetch completed date=… pages=… added=…` (info)
- `Cisco parse completed parsed=… failures=…` (info)
- `Cisco map completed mapped=… failures=…` (info)
- Warnings surface when DTO serialization fails or GridFS payload is missing.
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
## 8. Incident response
- **Token compromise** revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
- **Persistent 401/403** confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
- **429 spikes** inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
## 9. References
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
- openVuln API rate limit documentation.citeturn0search0turn3search6

View File

@@ -1,160 +1,160 @@
# Feedser Conflict Resolution Runbook (Sprint 3)
This runbook equips Feedser operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
---
## 1. Precedence Model (recap)
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `feedser:merge:precedence:ranks` to override per source when incident response requires it.
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
---
## 2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose |
|------------|------|----------|---------|
| `feedser.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
| `feedser.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
| `feedser.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
| `feedser.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
| `feedser.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
### Structured logs
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
- `Alias collision ...` (no EventId) - emitted when `feedser.merge.identity_conflicts` increments.
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Feedser.Merge`.
---
## 3. Detection & Alerting
1. **Dashboard panels**
- `feedser.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
- `feedser.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
- `feedser.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
- `feedser.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
2. **Log based alerts**
- `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
- `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
3. **Job health**
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #feedser-ops.
### Threshold updates (2025-10-12)
- `feedser.merge.conflicts` Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- `feedser.merge.overrides` Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
- `feedser.merge.range_overrides` Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
---
## 4. Triage Workflow
1. **Confirm job context**
- `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
2. **Inspect metrics**
- Correlate spikes in `feedser.merge.conflicts` with `primary_source`/`suppressed_source` tags from `feedser.merge.overrides`.
3. **Pull structured logs**
- Example (vector output):
```
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-feedser.log
```
4. **Review merge events**
- `mongosh`:
```javascript
use feedser;
db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
```
- Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
5. **Interrogate provenance**
- `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
- Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
---
## 5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action |
|--------|--------------|------------------|
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `feedser:merge:precedence:ranks` to break the tie; restart merge job. |
| Rising `feedser.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
| `feedser.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
---
## 6. Resolution Playbook
1. **Connector data fix**
- Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
- Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
2. **Temporary precedence override**
- Edit `etc/feedser.yaml`:
```yaml
feedser:
merge:
precedence:
ranks:
osv: 1
ghsa: 0
```
- Restart Feedser workers; confirm tags in `feedser.merge.overrides` show the new ranks.
- Document the override with expiry in the change log.
3. **Alias remediation**
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
4. **Escalation**
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
---
## 7. Validation Checklist
- [ ] Merge job rerun returns exit code `0`.
- [ ] `feedser.merge.conflicts` baseline returns to zero after corrective action.
- [ ] Latest `merge_event` entry shows expected hash delta.
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
---
## 8. Reference Material
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
- Merge engine internals: `src/StellaOps.Feedser.Merge/Services/AdvisoryPrecedenceMerger.cs`.
- Metrics definitions: `src/StellaOps.Feedser.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
- Storage audit trail: `src/StellaOps.Feedser.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Feedser.Storage.Mongo/MergeEvents`.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
---
## 9. Synthetic Regression Fixtures
- **Locations** Canonical conflict snapshots now live at `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Feedser.Source.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
- **Validation commands** To regenerate and verify the fixtures offline, run:
```bash
dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Nvd.Tests/StellaOps.Feedser.Source.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Feedser.Merge.Tests/StellaOps.Feedser.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
```
- **Expected signals** The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `feedser.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
---
## 10. Change Log
| Date (UTC) | Change | Notes |
|------------|--------|-------|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Feedser Ops Guild; no additional overrides required. |
# Concelier Conflict Resolution Runbook (Sprint 3)
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
---
## 1. Precedence Model (recap)
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it.
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
---
## 2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose |
|------------|------|----------|---------|
| `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
| `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
| `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
| `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
| `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
### Structured logs
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
- `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments.
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`.
---
## 3. Detection & Alerting
1. **Dashboard panels**
- `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
- `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
- `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
- `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
2. **Log based alerts**
- `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
- `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
3. **Job health**
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
### Threshold updates (2025-10-12)
- `concelier.merge.conflicts` Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- `concelier.merge.overrides` Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
- `concelier.merge.range_overrides` Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
---
## 4. Triage Workflow
1. **Confirm job context**
- `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
2. **Inspect metrics**
- Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`.
3. **Pull structured logs**
- Example (vector output):
```
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
```
4. **Review merge events**
- `mongosh`:
```javascript
use concelier;
db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
```
- Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
5. **Interrogate provenance**
- `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
- Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
---
## 5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action |
|--------|--------------|------------------|
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. |
| Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
| `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
---
## 6. Resolution Playbook
1. **Connector data fix**
- Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
- Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
2. **Temporary precedence override**
- Edit `etc/concelier.yaml`:
```yaml
concelier:
merge:
precedence:
ranks:
osv: 1
ghsa: 0
```
- Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks.
- Document the override with expiry in the change log.
3. **Alias remediation**
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
4. **Escalation**
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
---
## 7. Validation Checklist
- [ ] Merge job rerun returns exit code `0`.
- [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action.
- [ ] Latest `merge_event` entry shows expected hash delta.
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
---
## 8. Reference Material
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
- Merge engine internals: `src/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`.
- Metrics definitions: `src/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
- Storage audit trail: `src/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Concelier.Storage.Mongo/MergeEvents`.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
---
## 9. Synthetic Regression Fixtures
- **Locations** Canonical conflict snapshots now live at `src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
- **Validation commands** To regenerate and verify the fixtures offline, run:
```bash
dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
```
- **Expected signals** The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
---
## 10. Change Log
| Date (UTC) | Change | Notes |
|------------|--------|-------|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |

View File

@@ -1,151 +1,151 @@
{
"title": "Feedser CVE & KEV Observability",
"uid": "feedser-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}
{
"title": "Concelier CVE & KEV Observability",
"uid": "concelier-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}

View File

@@ -1,4 +1,4 @@
# Feedser CVE & KEV Connector Operations
# Concelier CVE & KEV Connector Operations
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
@@ -7,17 +7,17 @@ This playbook equips operators with the steps required to roll out and monitor t
### 1.1 Prerequisites
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Feedser workers.
- Updated `feedser.yaml` (or the matching environment variables) with the following section:
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers.
- Updated `concelier.yaml` (or the matching environment variables) with the following section:
```yaml
feedser:
concelier:
sources:
cve:
baseEndpoint: "https://cveawg.mitre.org/api/"
apiOrg: "ORG123"
apiUser: "user@example.org"
apiKeyFile: "/var/run/secrets/feedser/cve-api-key"
apiKeyFile: "/var/run/secrets/concelier/cve-api-key"
seedDirectory: "./seed-data/cve"
pageSize: 200
maxPagesPerFetch: 5
@@ -26,44 +26,44 @@ feedser:
failureBackoff: "00:10:00"
```
> Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `FEEDSER_SOURCES__CVE__APIKEY`.
> Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`.
> 🪙 When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repos `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
### 1.2 Smoke Test (staging)
1. Deploy the updated configuration and restart the Feedser service so the connector picks up the credentials.
1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials.
2. Trigger one end-to-end cycle:
- Feedser CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
3. Observe the following metrics (exported via OTEL meter `StellaOps.Feedser.Source.Cve`):
3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`):
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
- `cve.map.success`
4. Verify Prometheus shows matching `feedser.source.http.requests_total{feedser_source="cve"}` deltas (list vs detail phases) while `feedser.source.http.failures_total{feedser_source="cve"}` stays flat.
4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat.
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
### 1.3 Production Monitoring
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `feedser_source_http_requests_total{feedser_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `feedser.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- `rate(cve_fetch_failures_total[5m]) > 0` for 10minutes (`severity=warning`)
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
- **Logs** Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
- **Grafana pack** Import `docs/ops/feedser-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Feedser to apply changes.
- **Grafana pack** Import `docs/ops/concelier-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes.
### 1.4 Staging smoke log (2025-10-15)
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
- Command: `dotnet test src/StellaOps.Feedser.Source.Cve.Tests/StellaOps.Feedser.Source.Cve.Tests.csproj -l "console;verbosity=detailed"`
- Command: `dotnet test src/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"`
- Summary log emitted by the connector:
```
CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
```
- Telemetry captured by `Meter` `StellaOps.Feedser.Source.Cve`:
- Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`:
| Metric | Value |
|--------|-------|
| `cve.fetch.attempts` | 1 |
@@ -72,7 +72,7 @@ While Ops finalises long-lived CVE Services credentials, we validated the connec
| `cve.parse.success` | 1 |
| `cve.map.success` | 1 |
The Grafana pack `docs/ops/feedser-cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
The Grafana pack `docs/ops/concelier-cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
## 2. CISA KEV Connector (`source:kev:*`)
@@ -80,10 +80,10 @@ The Grafana pack `docs/ops/feedser-cve-kev-grafana-dashboard.json` has been impo
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
- Confirm the following snippet in `feedser.yaml` (defaults shown; tune as needed):
- Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed):
```yaml
feedser:
concelier:
sources:
kev:
feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
@@ -105,15 +105,15 @@ Treat repeated schema failures or growing anomaly counts as an upstream regressi
### 2.3 Smoke Test (staging)
1. Deploy the configuration and restart Feedser.
1. Deploy the configuration and restart Concelier.
2. Trigger a pipeline run:
- CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
- REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
3. Verify the metrics exposed by meter `StellaOps.Feedser.Source.Kev`:
3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`:
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
- `kev.map.advisories` (tag `catalogVersion`)
4. Confirm `feedser.source.http.requests_total{feedser_source="kev"}` increments once per fetch and that the paired `feedser.source.http.failures_total` stays flat (zero increase).
4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase).
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
@@ -126,7 +126,7 @@ Treat repeated schema failures or growing anomaly counts as an upstream regressi
### 2.5 Known good dashboard tiles
Add the following panels to the Feedser observability board:
Add the following panels to the Concelier observability board:
| Metric | Recommended visualisation |
|--------|---------------------------|
@@ -140,4 +140,4 @@ Add the following panels to the Feedser observability board:
- Record staging/production smoke test results (date, catalog version, advisory counts) in your teams change log.
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
- Version-control dashboard tweaks alongside `docs/ops/feedser-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.
- Version-control dashboard tweaks alongside `docs/ops/concelier-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.

View File

@@ -1,123 +1,123 @@
# Feedser GHSA Connector Operations Runbook
_Last updated: 2025-10-16_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10%** for longer than a single reset window helps avoid secondary limits.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
## 4. Configuration knobs (`feedser.yaml`)
```yaml
feedser:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `feedser.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Feedser requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `feedser.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
feedser:
environment:
FEEDSER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
feedser:
extraEnv:
- name: FEEDSER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: feedser-ghsa
key: apiToken
extraSecrets:
feedser-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Feedser workers (or run `kubectl rollout restart deployment/feedser`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `feedser.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Feedser.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.
## 8. Canonical metric fallback analytics
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.
# Concelier GHSA Connector Operations Runbook
_Last updated: 2025-10-16_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10%** for longer than a single reset window helps avoid secondary limits.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
## 4. Configuration knobs (`concelier.yaml`)
```yaml
concelier:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
concelier:
environment:
CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
concelier:
extraEnv:
- name: CONCELIER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: concelier-ghsa
key: apiToken
extraSecrets:
concelier-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.
## 8. Canonical metric fallback analytics
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.

View File

@@ -1,122 +1,122 @@
# Feedser CISA ICS Connector Operations
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
## 1. Credential Provisioning
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
- `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
- `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
- `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `feedser/sources/icscisa/govdelivery/code`.
> GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
## 2. Feed Validation
Use the following command to confirm the feed is reachable before wiring it into Feedser (substitute `<CODE>` with the personalised value):
```bash
curl -H "User-Agent: StellaOpsFeedser/ics-cisa" \
"https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
```
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
## 3. Configuration Snippet
Add the connector configuration to `feedser.yaml` (or equivalent environment variables):
```yaml
feedser:
sources:
icscisa:
govDelivery:
code: "${FEEDSER_ICS_CISA_GOVDELIVERY_CODE}"
topics:
- "USDHSCISA_16"
- "USDHSCISA_19"
- "USDHSCISA_17"
rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
Environment variable example:
```bash
export FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
```
Feedser automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
Optional tuning keys (set only when needed):
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
## 4. Seeding Without GovDelivery
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
4. Once credentials arrive, update `feedser:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
> The CSVs are licensed under ODbL1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
## 4. Integration Validation
1. Ensure secrets are in place and restart the Feedser workers.
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
```bash
FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \
FEEDSER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \
stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
```
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
4. If Akamai blocks direct fetches, set `feedser:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
## 4. Rotation & Incident Response
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
## 5. Offline Kit Handling
Include the personalised code in `offline-kit/secrets/feedser/icscisa.env`:
```
FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
```
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/feedser`. Ensure permissions are `600` and ownership matches the Feedser runtime user.
## 6. Telemetry & Monitoring
The connector emits metrics under the meter `StellaOps.Feedser.Source.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
- `icscisa.fetch.*` counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `feedser.source`, `icscisa.topic`).
- `icscisa.parse.*` counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
- `icscisa.detail.*` counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
- `icscisa.map.*` counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
Suggested alerts:
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
- Keep an eye on shared HTTP metrics (`feedser.source.http.*{feedser.source="ics-cisa"}`) for request latency and retry patterns.
## 6. Related Tasks
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.
# Concelier CISA ICS Connector Operations
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
## 1. Credential Provisioning
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
- `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
- `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
- `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`.
> GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
## 2. Feed Validation
Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value):
```bash
curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \
"https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
```
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
## 3. Configuration Snippet
Add the connector configuration to `concelier.yaml` (or equivalent environment variables):
```yaml
concelier:
sources:
icscisa:
govDelivery:
code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}"
topics:
- "USDHSCISA_16"
- "USDHSCISA_19"
- "USDHSCISA_17"
rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
Environment variable example:
```bash
export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
```
Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
Optional tuning keys (set only when needed):
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
## 4. Seeding Without GovDelivery
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
> The CSVs are licensed under ODbL1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
## 4. Integration Validation
1. Ensure secrets are in place and restart the Concelier workers.
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
```bash
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \
CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \
stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
```
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
## 4. Rotation & Incident Response
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
## 5. Offline Kit Handling
Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`:
```
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
```
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user.
## 6. Telemetry & Monitoring
The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
- `icscisa.fetch.*` counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`).
- `icscisa.parse.*` counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
- `icscisa.detail.*` counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
- `icscisa.map.*` counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
Suggested alerts:
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
- Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns.
## 6. Related Tasks
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.

View File

@@ -1,74 +1,74 @@
# Feedser KISA Connector Operations
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
## 1. Prerequisites
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
- Connector options defined under `feedser:sources:kisa`:
```yaml
feedser:
sources:
kisa:
feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
maxAdvisoriesPerFetch: 10
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
> Ensure the URIs stay absolute—Feedser adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
## 2. Staging Smoke Test
1. Restart the Feedser workers so the KISA options bind.
2. Run a full connector cycle:
- CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
- REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
3. Confirm telemetry (Meter `StellaOps.Feedser.Source.Kisa`):
- `kisa.feed.success`, `kisa.feed.items`
- `kisa.detail.success` / `.failures`
- `kisa.parse.success` / `.failures`
- `kisa.map.success` / `.failures`
- `kisa.cursor.updates`
4. Inspect logs for structured entries:
- `KISA feed returned {ItemCount}`
- `KISA fetched detail for {Idx} … category={Category}`
- `KISA mapped advisory {AdvisoryId} (severity={Severity})`
- Absence of warnings such as `document missing GridFS payload`.
5. Validate MongoDB state:
- `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
- DTO store contains `schemaVersion="kisa.detail.v1"`.
- Advisories include aliases (`IDX`, CVE) and `language="ko"`.
- `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
## 3. Production Monitoring
- **Dashboards** Add the following Prometheus/OTEL expressions:
- `rate(kisa_feed_items_total[15m])` versus `rate(feedser_source_http_requests_total{feedser_source="kisa"}[15m])`
- `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
- `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
- `increase(kisa_map_failures_total[1h])` to flag schema drift
- `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
- **Alerts** Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
- **Logs** Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
## 4. Localisation Handling
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF8 and avoid transliteration.
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
## 5. Fixture & Regression Maintenance
- Regression fixtures: `src/StellaOps.Feedser.Source.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Kisa.Tests/StellaOps.Feedser.Source.Kisa.Tests.csproj`.
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
## 6. Known Issues
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.
# Concelier KISA Connector Operations
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
## 1. Prerequisites
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
- Connector options defined under `concelier:sources:kisa`:
```yaml
concelier:
sources:
kisa:
feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
maxAdvisoriesPerFetch: 10
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
> Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
## 2. Staging Smoke Test
1. Restart the Concelier workers so the KISA options bind.
2. Run a full connector cycle:
- CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
- REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`):
- `kisa.feed.success`, `kisa.feed.items`
- `kisa.detail.success` / `.failures`
- `kisa.parse.success` / `.failures`
- `kisa.map.success` / `.failures`
- `kisa.cursor.updates`
4. Inspect logs for structured entries:
- `KISA feed returned {ItemCount}`
- `KISA fetched detail for {Idx} … category={Category}`
- `KISA mapped advisory {AdvisoryId} (severity={Severity})`
- Absence of warnings such as `document missing GridFS payload`.
5. Validate MongoDB state:
- `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
- DTO store contains `schemaVersion="kisa.detail.v1"`.
- Advisories include aliases (`IDX`, CVE) and `language="ko"`.
- `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
## 3. Production Monitoring
- **Dashboards** Add the following Prometheus/OTEL expressions:
- `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])`
- `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
- `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
- `increase(kisa_map_failures_total[1h])` to flag schema drift
- `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
- **Alerts** Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
- **Logs** Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
## 4. Localisation Handling
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF8 and avoid transliteration.
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
## 5. Fixture & Regression Maintenance
- Regression fixtures: `src/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
## 6. Known Issues
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.

View File

@@ -0,0 +1,196 @@
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle:
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/ARCHITECTURE_CONCELIER.md`,
`docs/ARCHITECTURE_EXCITITOR_MIRRORS.md`
- Export bundling: `docs/ARCHITECTURE_DEVOPS.md` §3, `docs/ARCHITECTURE_EXCITITOR.md` §7

View File

@@ -1,86 +1,86 @@
# Feedser MSRC Connector Azure AD Onboarding Brief
_Drafted: 2025-10-15_
## 1. App registration requirements
- **Tenant**: shared StellaOps production Azure AD.
- **Application type**: confidential client (web/API) issuing client credentials.
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
- **Token audience**: `https://api.msrc.microsoft.com/`.
- **Grant type**: client credentials. Feedser will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
## 2. Secret/credential policy
- Maintain two client secrets (primary + standby) rotating every 90 days.
- Store secrets in the Feedser secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
- Record rotation cadence in Ops runbook and update Feedser configuration (`FEEDSER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
## 3. Feedser configuration sample
```yaml
feedser:
sources:
vndr.msrc:
tenantId: "<azure-tenant-guid>"
clientId: "<app-registration-client-id>"
clientSecret: "<pull from secret store>"
apiVersion: "2024-08-01"
locale: "en-US"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
cursorOverlapMinutes: 10
downloadCvrf: false # set true to persist CVRF ZIP alongside JSON detail
```
## 4. CVRF artefacts
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
### 4.1 State seeding helper
Use `tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
```json
{
"source": "vndr.msrc",
"cursor": {
"lastModifiedCursor": "2024-01-01T00:00:00Z"
},
"documents": [
{
"uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
"contentFile": "./seeds/adv2024-0001.json",
"contentType": "application/json",
"metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
"addToPendingDocuments": true
},
{
"uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
"contentFile": "./seeds/adv2024-0001.cvrf.zip",
"contentType": "application/zip",
"status": "mapped",
"addToPendingDocuments": false
}
]
}
```
Run the helper:
```bash
dotnet run --project tools/SourceStateSeeder -- \
--connection-string "mongodb://localhost:27017" \
--database feedser \
--input seeds/msrc-backfill.json
```
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
## 5. Outstanding items
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.
# Concelier MSRC Connector Azure AD Onboarding Brief
_Drafted: 2025-10-15_
## 1. App registration requirements
- **Tenant**: shared StellaOps production Azure AD.
- **Application type**: confidential client (web/API) issuing client credentials.
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
- **Token audience**: `https://api.msrc.microsoft.com/`.
- **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
## 2. Secret/credential policy
- Maintain two client secrets (primary + standby) rotating every 90 days.
- Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
- Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
## 3. Concelier configuration sample
```yaml
concelier:
sources:
vndr.msrc:
tenantId: "<azure-tenant-guid>"
clientId: "<app-registration-client-id>"
clientSecret: "<pull from secret store>"
apiVersion: "2024-08-01"
locale: "en-US"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
cursorOverlapMinutes: 10
downloadCvrf: false # set true to persist CVRF ZIP alongside JSON detail
```
## 4. CVRF artefacts
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
### 4.1 State seeding helper
Use `tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
```json
{
"source": "vndr.msrc",
"cursor": {
"lastModifiedCursor": "2024-01-01T00:00:00Z"
},
"documents": [
{
"uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
"contentFile": "./seeds/adv2024-0001.json",
"contentType": "application/json",
"metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
"addToPendingDocuments": true
},
{
"uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
"contentFile": "./seeds/adv2024-0001.cvrf.zip",
"contentType": "application/zip",
"status": "mapped",
"addToPendingDocuments": false
}
]
}
```
Run the helper:
```bash
dotnet run --project tools/SourceStateSeeder -- \
--connection-string "mongodb://localhost:27017" \
--database concelier \
--input seeds/msrc-backfill.json
```
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
## 5. Outstanding items
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.

View File

@@ -1,48 +1,48 @@
# NKCKI Connector Operations Guide
## Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
## Configuration
Key options exposed through `feedser:sources:ru-nkcki:http`:
- `maxBulletinsPerFetch` limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/feedser/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
## Telemetry
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Feedser.Source.Ru.Nkcki`:
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)
Integrate these counters into standard Feedser observability dashboards to track crawl coverage and cache hit rates.
## Archive Backfill Strategy
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/feedser-connector-research-20251011.md`).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
## Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
Refer to `ru-nkcki` entries in `src/StellaOps.Feedser.Source.Ru.Nkcki/TASKS.md` for outstanding items.
# NKCKI Connector Operations Guide
## Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
## Configuration
Key options exposed through `concelier:sources:ru-nkcki:http`:
- `maxBulletinsPerFetch` limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
## Telemetry
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
## Archive Backfill Strategy
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
## Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
Refer to `ru-nkcki` entries in `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.

View File

@@ -1,24 +1,24 @@
# Feedser OSV Connector Operations Notes
_Last updated: 2025-10-16_
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
## 1. Canonical metric fallbacks
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
## 2. CWE provenance
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
## 3. Dashboards & alerts
- Extend existing merge dashboards with the new counter:
- Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
- Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
## 4. Runbook updates
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj`.
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.
# Concelier OSV Connector Operations Notes
_Last updated: 2025-10-16_
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
## 1. Canonical metric fallbacks
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
## 2. CWE provenance
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
## 3. Dashboards & alerts
- Extend existing merge dashboards with the new counter:
- Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
- Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
## 4. Runbook updates
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.

View File

@@ -1,50 +1,50 @@
# SemVer Style Backfill Runbook
_Last updated: 2025-10-11_
## Overview
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
runs when the feature flag `feedser:storage:enableSemVerStyle` is enabled.
## Preconditions
1. **Review configuration** set `feedser.storage.enableSemVerStyle` to `true` on all Feedser services.
2. **Confirm batch size** adjust `feedser.storage.backfillBatchSize` if you need smaller batches for older
deployments (default: `250`).
3. **Back up** capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
4. **Staging dry-run** enable the flag in a staging environment and observe the migration output before
rolling to production.
## Execution
No manual command is required. After deploying the configuration change, restart the Feedser WebService or
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
```
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
Mongo migration 20251011-semver-style-backfill applied
```
The migration reads advisories in batches (`feedser.storage.backfillBatchSize`) and writes flattened
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
## Post-checks
1. Verify the new indexes exist:
```
db.advisory.getIndexes()
```
You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
the embedded package data.
3. Run `dotnet test` for `StellaOps.Feedser.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
the storage suite passes with the feature flag enabled.
## Rollback
Set `feedser.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
the feature flag is off. If you must remove them entirely, restore from the backup captured during
preparation.
# SemVer Style Backfill Runbook
_Last updated: 2025-10-11_
## Overview
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
runs when the feature flag `concelier:storage:enableSemVerStyle` is enabled.
## Preconditions
1. **Review configuration** set `concelier.storage.enableSemVerStyle` to `true` on all Concelier services.
2. **Confirm batch size** adjust `concelier.storage.backfillBatchSize` if you need smaller batches for older
deployments (default: `250`).
3. **Back up** capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
4. **Staging dry-run** enable the flag in a staging environment and observe the migration output before
rolling to production.
## Execution
No manual command is required. After deploying the configuration change, restart the Concelier WebService or
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
```
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
Mongo migration 20251011-semver-style-backfill applied
```
The migration reads advisories in batches (`concelier.storage.backfillBatchSize`) and writes flattened
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
## Post-checks
1. Verify the new indexes exist:
```
db.advisory.getIndexes()
```
You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
the embedded package data.
3. Run `dotnet test` for `StellaOps.Concelier.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
the storage suite passes with the feature flag enabled.
## Rollback
Set `concelier.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
the feature flag is off. If you must remove them entirely, restore from the backup captured during
preparation.

View File

@@ -0,0 +1,81 @@
# Scanner Runtime Readiness Checklist
Last updated: 2025-10-19
This runbook confirms that Scanner.WebService now surfaces the metadata Runtime Guild consumers requested: quieted finding counts in the signed report events and progress hints on the scan event stream. Follow the checklist before relying on these fields in production automation.
---
## 1. Prerequisites
- Scanner.WebService release includes **SCANNER-POLICY-09-107** (adds quieted provenance and score inputs to `/reports`).
- Docs repository at commit containing `docs/events/scanner.report.ready@1.json` with `quietedFindingCount`.
- Access to a Scanner environment (staging or sandbox) with an image capable of producing policy verdicts.
---
## 2. Verify quieted finding hints
1. **Trigger a report** run a scan that produces at least one quieted finding (policy with `quiet: true`). After the scan completes, call:
```http
POST /api/v1/reports
Authorization: Bearer <token>
Content-Type: application/json
```
Ensure the JSON response contains `report.summary.quieted` and that the DSSE payload mirrors the same count.
2. **Check emitted event** pull the latest `scanner.report.ready` event (from the queue or sample capture). Confirm the payload includes:
- `quietedFindingCount` equal to the `summary.quieted` value.
- Updated `summary` block with the quieted counter.
3. **Schema validation** optionally validate the payload against `docs/events/scanner.report.ready@1.json` to guarantee downstream compatibility:
```bash
npx ajv validate -c ajv-formats \
-s docs/events/scanner.report.ready@1.json \
-d <payload.json>
```
(Use `npm install --no-save ajv ajv-cli ajv-formats` once per clone.)
> Snapshot fixtures: see `docs/events/samples/scanner.report.ready@1.sample.json` for a canonical event that already carries `quietedFindingCount`.
---
## 3. Verify progress hints (SSE / JSONL)
Scanner streams structured progress messages for each scan. The `data` map inside every frame carries the hints Runtime systems consume (force flag, client metadata, additional stage-specific attributes).
1. **Submit a scan** with custom metadata (for example `pipeline=github`, `build=1234`).
2. **Stream events**:
```http
GET /api/v1/scans/{scanId}/events?format=jsonl
Authorization: Bearer <token>
Accept: application/x-ndjson
```
3. **Confirm payload** each frame should resemble:
```json
{
"scanId": "2f6c17f9b3f548e2a28b9c412f4d63f8",
"sequence": 1,
"state": "Pending",
"message": "queued",
"timestamp": "2025-10-19T03:12:45.118Z",
"correlationId": "2f6c17f9b3f548e2a28b9c412f4d63f8:0001",
"data": {
"force": false,
"meta.pipeline": "github"
}
}
```
Subsequent frames include additional hints as analyzers progress (e.g., `stage`, `meta.*`, or analyzer-provided keys). Ensure newline-delimited JSON consumers preserve the `data` dictionary when forwarding to runtime dashboards.
> The same frame structure is documented in `docs/09_API_CLI_REFERENCE.md` §2.6. Copy that snippet into integration tests to keep compatibility.
---
## 4. Sign-off matrix
| Stakeholder | Checklist | Status | Notes |
|-------------|-----------|--------|-------|
| Runtime Guild | Sections 2 & 3 completed | ☐ | Capture sample payloads for webhook regression tests. |
| Notify Guild | `quietedFindingCount` consumed in notifications | ☐ | Update templates after Runtime sign-off. |
| Docs Guild | Checklist published & linked from updates | ☑ | 2025-10-19 |
Mark the stakeholder boxes as each team completes its validation. Once all checks are green, update `docs/TASKS.md` to reflect task completion.

View File

@@ -0,0 +1,147 @@
# Scanner Core Contracts
The **Scanner Core** library provides shared contracts, observability helpers, and security utilities consumed by `Scanner.WebService`, `Scanner.Worker`, analyzers, and tooling. These primitives guarantee deterministic identifiers, timestamps, and log context for all scanning flows.
## Canonical DTOs
- `ScanJob` & `ScanJobStatus` canonical job metadata (image reference/digest, tenant, correlation ID, timestamps, failure details). Constructors normalise timestamps to UTC microsecond precision and canonicalise image digests. Round-trips with `JsonSerializerDefaults.Web` using `ScannerJsonOptions`.
- `ScanProgressEvent` & `ScanStage`/`ScanProgressEventKind` stage-level progress surface for queue/stream consumers. Includes deterministic sequence numbers, optional progress percentage, attributes, and attached `ScannerError`.
- `ScannerError` & `ScannerErrorCode` shared error taxonomy spanning queue, analyzers, storage, exporters, and signing. Carries severity, retryability, structured details, and microsecond-precision timestamps.
- `ScanJobId` strongly-typed identifier rendered as `Guid` (lowercase `N` format) with deterministic parsing.
### Canonical JSON samples
The golden fixtures consumed by `ScannerCoreContractsTests` document the wire shape shared with downstream services. They live under `src/StellaOps.Scanner.Core.Tests/Fixtures/` and a representative extract is shown below.
```json
{
"id": "8f4cc9c582454b9d9b4f5ae049631b7d",
"status": "running",
"imageReference": "registry.example.com/stellaops/scanner:1.2.3",
"imageDigest": "sha256:abcdef",
"createdAt": "2025-10-18T14:30:15.123456+00:00",
"updatedAt": "2025-10-18T14:30:20.123456+00:00",
"correlationId": "scan-analyzeoperatingsystem-8f4cc9c582454b9d9b4f5ae049631b7d",
"tenantId": "tenant-a",
"metadata": {
"requestId": "req-1234",
"source": "ci"
},
"failure": {
"code": "analyzerFailure",
"severity": "error",
"message": "Analyzer failed to parse layer",
"timestamp": "2025-10-18T14:30:15.123456+00:00",
"retryable": false,
"stage": "AnalyzeOperatingSystem",
"component": "os-analyzer",
"details": {
"layerDigest": "sha256:deadbeef",
"attempt": "1"
}
}
}
```
Progress events follow the same conventions (`jobId`, `stage`, `kind`, `timestamp`, `attributes`, optional embedded `ScannerError`). The fixtures are verified via deterministic JSON comparison in every CI run.
## Deterministic helpers
- `ScannerIdentifiers` derives `ScanJobId`, correlation IDs, and SHA-256 hashes from normalised inputs (image reference/digest, tenant, salt). Ensures case-insensitive stability and reproducible metric keys.
- `ScannerTimestamps` trims to microsecond precision, provides ISO-8601 (`yyyy-MM-ddTHH:mm:ss.ffffffZ`) rendering, and parsing helpers.
- `ScannerJsonOptions` standard JSON options (web defaults, camel-case enums) shared by services/tests.
- `ScanAnalysisStore` & `ScanAnalysisKeys` shared in-memory analysis cache flowing through Worker stages. OS analyzers populate
`analysis.os.packages` (raw output), `analysis.os.fragments` (per-analyzer component fragments), and merge into
`analysis.layers.fragments` so emit/diff stages can compose SBOMs and diffs without knowledge of individual analyzer
implementations.
## Observability primitives
- `ScannerDiagnostics` global `ActivitySource`/`Meter` for scanner components. `StartActivity` seeds deterministic tags (`job_id`, `stage`, `component`, `correlation_id`).
- `ScannerMetricNames` centralises metric prefixes (`stellaops.scanner.*`) and deterministic job/event tag builders.
- `ScannerCorrelationContext` & `ScannerCorrelationContextAccessor` ambient correlation propagation via `AsyncLocal` for log scopes, metrics, and diagnostics.
- `ScannerLogExtensions` `ILogger` scopes for jobs/progress events with automatic correlation context push, minimal allocations, and consistent structured fields.
### Observability overhead validation
A micro-benchmark executed on 2025-10-19 (4vCPU runner, .NET 10.0.100-rc.1) measured the average scope cost across 1000000 iterations:
| Scope | Mean (µs/call) |
|-------|----------------|
| `BeginScanScope` (logger attached) | 0.80 |
| `BeginScanScope` (noop logger) | 0.31 |
| `BeginProgressScope` | 0.57 |
To reproduce, run `dotnet test src/StellaOps.Scanner.Core.Tests -c Release` (see `ScannerLogExtensionsPerformanceTests`) or copy the snippet below into a throwaway `dotnet run` console project and execute it with `dotnet run -c Release`:
```csharp
using System.Collections.Generic;
using System.Diagnostics;
using Microsoft.Extensions.Logging;
using StellaOps.Scanner.Core.Contracts;
using StellaOps.Scanner.Core.Observability;
using StellaOps.Scanner.Core.Utility;
var factory = LoggerFactory.Create(builder => builder.AddFilter(static _ => true));
var logger = factory.CreateLogger("bench");
var jobId = ScannerIdentifiers.CreateJobId("registry.example.com/stellaops/scanner:1.2.3", "sha256:abcdef", "tenant-a", "benchmark");
var correlationId = ScannerIdentifiers.CreateCorrelationId(jobId, nameof(ScanStage.AnalyzeOperatingSystem));
var now = ScannerTimestamps.Normalize(new DateTimeOffset(2025, 10, 19, 12, 0, 0, TimeSpan.Zero));
var job = new ScanJob(jobId, ScanJobStatus.Running, "registry.example.com/stellaops/scanner:1.2.3", "sha256:abcdef", now, now, correlationId, "tenant-a", new Dictionary<string, string>(StringComparer.Ordinal) { ["requestId"] = "req-bench" });
var progress = new ScanProgressEvent(jobId, ScanStage.AnalyzeOperatingSystem, ScanProgressEventKind.Progress, 42, now, 10.5, "benchmark", new Dictionary<string, string>(StringComparer.Ordinal) { ["sample"] = "true" });
Console.WriteLine("Scanner Core Observability micro-bench (1,000,000 iterations)");
Report("BeginScanScope (logger)", Measure(static ctx => ctx.Logger.BeginScanScope(ctx.Job, ctx.Stage, ctx.Component), new ScopeContext(logger, job, nameof(ScanStage.AnalyzeOperatingSystem), "os-analyzer")));
Report("BeginScanScope (no logger)", Measure(static ctx => ScannerLogExtensions.BeginScanScope(null, ctx.Job, ctx.Stage, ctx.Component), new ScopeContext(logger, job, nameof(ScanStage.AnalyzeOperatingSystem), "os-analyzer")));
Report("BeginProgressScope", Measure(static ctx => ctx.Logger.BeginProgressScope(ctx.Progress!, ctx.Component), new ScopeContext(logger, job, nameof(ScanStage.AnalyzeOperatingSystem), "os-analyzer", progress)));
static double Measure(Func<ScopeContext, IDisposable> factory, ScopeContext context)
{
const int iterations = 1_000_000;
for (var i = 0; i < 10_000; i++)
{
using var scope = factory(context);
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
var sw = Stopwatch.StartNew();
for (var i = 0; i < iterations; i++)
{
using var scope = factory(context);
}
sw.Stop();
return sw.Elapsed.TotalSeconds * 1_000_000 / iterations;
}
static void Report(string label, double microseconds)
=> Console.WriteLine($"{label,-28}: {microseconds:F3} µs");
readonly record struct ScopeContext(ILogger Logger, ScanJob Job, string? Stage, string? Component, ScanProgressEvent? Progress = null);
```
Both guardrails enforce the ≤5µs acceptance target for SP9-G1.
## Security utilities
- `AuthorityTokenSource` caches short-lived OpToks per audience+scope using deterministic keys and refresh skew (default 30s). Integrates with `StellaOps.Auth.Client`.
- `DpopProofValidator` validates DPoP proofs (alg allowlist, `htm`/`htu`, nonce, replay window, signature) backed by pluggable `IDpopReplayCache`. Ships with `InMemoryDpopReplayCache` for restart-only deployments.
- `RestartOnlyPluginGuard` enforces restart-time plug-in registration (deterministic path normalisation; throws if new plug-ins added post-seal).
- `ServiceCollectionExtensions.AddScannerAuthorityCore` DI helper wiring Authority client, OpTok source, DPoP validation, replay cache, and plug-in guard.
## Testing guarantees
Unit tests (`StellaOps.Scanner.Core.Tests`) assert:
- DTO JSON round-trips are stable and deterministic (`ScannerCoreContractsTests` + golden fixtures).
- Identifier/hash helpers ignore case and emit lowercase hex.
- Timestamp normalisation retains UTC semantics.
- Log scopes push/pop correlation context predictably while staying under the 5µs envelope.
- Authority token caching honours refresh skew and invalidation.
- DPoP validator accepts valid proofs, rejects nonce mismatch/replay, and enforces signature validation.
- Restart-only plug-in guard blocks runtime additions post-seal.

View File

@@ -1,106 +1,106 @@
# Authority Threat Model (STRIDE)
> Prepared by Security Guild — 2025-10-12. Scope covers Authority host, Standard plug-in, CLI, bootstrap workflow, and offline revocation distribution.
## 1. Scope & Method
- Methodology: STRIDE applied to primary Authority surfaces (token issuance, bootstrap, revocation, operator tooling, plug-in extensibility).
- Assets in scope: identity credentials, OAuth tokens (access/refresh), bootstrap invites, revocation manifests, signing keys, audit telemetry.
- Out of scope: Third-party IdPs federated via OpenIddict (tracked separately in SEC6 backlog).
## 2. Assets & Entry Points
| Asset / Surface | Description | Primary Actors |
|-----------------|-------------|----------------|
| Token issuance APIs (`/token`, `/authorize`) | OAuth/OIDC endpoints mediated by OpenIddict | CLI, UI, automation agents |
| Bootstrap channel | Initial admin invite + bootstrap CLI workflow | Platform operators |
| Revocation bundle | Offline JSON + detached JWS consumed by agents | Feedser, Agents, Zastava |
| Plug-in manifests | Standard plug-in configuration and password policy overrides | Operators, DevOps |
| Signing keys | ES256 signing keys backing tokens and revocation manifests | Security Guild, HSM/KeyOps |
| Audit telemetry | Structured login/audit stream persisted to Mongo/observability stack | SOC, SecOps |
## 3. Trust Boundaries
| Boundary | Rationale | Controls |
|----------|-----------|----------|
| TB1 — Public network ↔️ Authority ingress | Internet/extranet exposure for `/token`, `/authorize`, `/bootstrap` | TLS 1.3, reverse proxy ACLs, rate limiting (SEC3.A / CORE8.RL) |
| TB2 — Authority host ↔️ Mongo storage | Credential store, revocation state, audit log persistence | Authenticated Mongo, network segmentation, deterministic serializers |
| TB3 — Authority host ↔️ Plug-in sandbox | Plug-ins may override password policy and bootstrap flows | Code signing, manifest validation, restart-time loading only |
| TB4 — Operator workstation ↔️ CLI | CLI holds bootstrap secrets and revocation bundles | OS keychain storage, MFA on workstations, offline kit checksum |
| TB5 — Authority ↔️ Downstream agents | Revocation bundle consumption, token validation | Mutual TLS (planned), detached JWS signatures, bundle freshness checks |
## 4. Data Flow Diagrams
### 4.1 Runtime token issuance
```mermaid
flowchart LR
subgraph Client Tier
CLI[StellaOps CLI]
UI[UI / Automation]
end
subgraph Perimeter
RP[Reverse Proxy / WAF]
end
subgraph Authority
AUTH[Authority Host]
PLGIN[Standard Plug-in]
STORE[(Mongo Credential Store)]
end
CLI -->|OAuth password / client creds| RP --> AUTH
UI -->|OAuth flows| RP
AUTH -->|PasswordHashOptions + Secrets| PLGIN
AUTH -->|Verify / Persist hashes| STORE
STORE -->|Rehash needed| AUTH
AUTH -->|Access / refresh token| RP --> Client Tier
```
### 4.2 Bootstrap & revocation
```mermaid
flowchart LR
subgraph Operator
OPS[Operator Workstation]
end
subgraph Authority
AUTH[Authority Host]
STORE[(Mongo)]
end
subgraph Distribution
OFFKIT[Offline Kit Bundle]
AGENT[Authorized Agent / Feedser]
end
OPS -->|Bootstrap CLI (`stellaops auth bootstrap`)| AUTH
AUTH -->|One-time invite + Argon2 hash| STORE
AUTH -->|Revocation export (`stellaops auth revoke export`)| OFFKIT
OFFKIT -->|Signed JSON + .jws| AGENT
AGENT -->|Revocation ACK / telemetry| AUTH
```
## 5. STRIDE Analysis
| Threat | STRIDE Vector | Surface | Risk (L×I) | Existing Controls | Gaps / Actions | Owner |
|--------|---------------|---------|------------|-------------------|----------------|-------|
| Spoofed revocation bundle | Spoofing | TB5 — Authority ↔️ Agents | Med×High | Detached JWS signature (planned), offline kit checksums | Finalise signing key registry & verification script (SEC4.B/SEC4.HOST); add bundle freshness requirement | Security Guild (follow-up: **SEC5.B**) |
| Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Tampered requests emit `authority.token.tamper` audit events (`request.tampered`, unexpected parameter names) correlating with `/token` outcomes (SEC5.C) | Security Guild + Authority Core (follow-up: **SEC5.C**) |
| Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Invites expire automatically and emit audit events on consumption/expiration (SEC5.D) | Security Guild |
| Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Signed revocation bundles, device fingerprint heuristics, optional mTLS | Monitor revocation acknowledgement latency via Zastava and tune replay alerting thresholds | Security Guild + Zastava (follow-up: **SEC5.E**) |
| Privilege escalation via plug-in override | Elevation of Privilege | TB3 — Plug-in sandbox | Med×High | Signed plug-ins, restart-only loading, configuration validation | Add static analysis on manifest overrides + runtime warning when policy weaker than host | Security Guild + DevOps (follow-up: **SEC5.F**) |
| Offline bundle tampering | Tampering | Distribution | Low×High | SHA256 manifest, signed bundles (planned) | Add supply-chain attestation for Offline Kit, publish verification CLI in docs | Security Guild + Ops (follow-up: **SEC5.G**) |
| Failure to log denied tokens | Repudiation | TB2 — Authority ↔️ Mongo | Med×Med | Serilog structured events (partial), Mongo persistence path (planned) | Finalise audit schema (SEC2.A) and ensure `/token` denies include subject/client/IP fields | Security Guild + Authority Core (follow-up: **SEC5.H**) |
Risk scoring uses qualitative scale (Low/Med/High) for likelihood × impact; mitigation priority follows High > Med > Low.
## 6. Follow-up Backlog Hooks
| Backlog ID | Linked Threat | Summary | Target Owners |
|------------|---------------|---------|---------------|
| SEC5.B | Spoofed revocation bundle | Complete libsodium/Core signing integration and ship revocation verification script. | Security Guild + Authority Core |
| SEC5.C | Parameter tampering on `/token` | Finalise audit contract (`SEC2.A`) and add request tamper logging. | Security Guild + Authority Core |
| SEC5.D | Bootstrap invite replay | Implement expiry enforcement + audit coverage for unused bootstrap invites. | Security Guild |
| SEC5.E | Token replay by stolen agent | Coordinate Zastava alerting with the new device fingerprint heuristics and surface stale revocation acknowledgements. | Security Guild + Zastava |
| SEC5.F | Plug-in override escalation | Static analysis of plug-in manifests; warn on weaker password policy overrides. | Security Guild + DevOps |
| SEC5.G | Offline bundle tampering | Extend Offline Kit build to include attested manifest + verification CLI sample. | Security Guild + Ops |
| SEC5.H | Failure to log denied tokens | Ensure audit persistence for all `/token` denials with correlation IDs. | Security Guild + Authority Core |
Update `src/StellaOps.Cryptography/TASKS.md` (Security Guild board) with the above backlog entries to satisfy SEC5.A exit criteria.
# Authority Threat Model (STRIDE)
> Prepared by Security Guild — 2025-10-12. Scope covers Authority host, Standard plug-in, CLI, bootstrap workflow, and offline revocation distribution.
## 1. Scope & Method
- Methodology: STRIDE applied to primary Authority surfaces (token issuance, bootstrap, revocation, operator tooling, plug-in extensibility).
- Assets in scope: identity credentials, OAuth tokens (access/refresh), bootstrap invites, revocation manifests, signing keys, audit telemetry.
- Out of scope: Third-party IdPs federated via OpenIddict (tracked separately in SEC6 backlog).
## 2. Assets & Entry Points
| Asset / Surface | Description | Primary Actors |
|-----------------|-------------|----------------|
| Token issuance APIs (`/token`, `/authorize`) | OAuth/OIDC endpoints mediated by OpenIddict | CLI, UI, automation agents |
| Bootstrap channel | Initial admin invite + bootstrap CLI workflow | Platform operators |
| Revocation bundle | Offline JSON + detached JWS consumed by agents | Concelier, Agents, Zastava |
| Plug-in manifests | Standard plug-in configuration and password policy overrides | Operators, DevOps |
| Signing keys | ES256 signing keys backing tokens and revocation manifests | Security Guild, HSM/KeyOps |
| Audit telemetry | Structured login/audit stream persisted to Mongo/observability stack | SOC, SecOps |
## 3. Trust Boundaries
| Boundary | Rationale | Controls |
|----------|-----------|----------|
| TB1 — Public network ↔️ Authority ingress | Internet/extranet exposure for `/token`, `/authorize`, `/bootstrap` | TLS 1.3, reverse proxy ACLs, rate limiting (SEC3.A / CORE8.RL) |
| TB2 — Authority host ↔️ Mongo storage | Credential store, revocation state, audit log persistence | Authenticated Mongo, network segmentation, deterministic serializers |
| TB3 — Authority host ↔️ Plug-in sandbox | Plug-ins may override password policy and bootstrap flows | Code signing, manifest validation, restart-time loading only |
| TB4 — Operator workstation ↔️ CLI | CLI holds bootstrap secrets and revocation bundles | OS keychain storage, MFA on workstations, offline kit checksum |
| TB5 — Authority ↔️ Downstream agents | Revocation bundle consumption, token validation | Mutual TLS (planned), detached JWS signatures, bundle freshness checks |
## 4. Data Flow Diagrams
### 4.1 Runtime token issuance
```mermaid
flowchart LR
subgraph Client Tier
CLI[StellaOps CLI]
UI[UI / Automation]
end
subgraph Perimeter
RP[Reverse Proxy / WAF]
end
subgraph Authority
AUTH[Authority Host]
PLGIN[Standard Plug-in]
STORE[(Mongo Credential Store)]
end
CLI -->|OAuth password / client creds| RP --> AUTH
UI -->|OAuth flows| RP
AUTH -->|PasswordHashOptions + Secrets| PLGIN
AUTH -->|Verify / Persist hashes| STORE
STORE -->|Rehash needed| AUTH
AUTH -->|Access / refresh token| RP --> Client Tier
```
### 4.2 Bootstrap & revocation
```mermaid
flowchart LR
subgraph Operator
OPS[Operator Workstation]
end
subgraph Authority
AUTH[Authority Host]
STORE[(Mongo)]
end
subgraph Distribution
OFFKIT[Offline Kit Bundle]
AGENT[Authorized Agent / Concelier]
end
OPS -->|Bootstrap CLI (`stellaops auth bootstrap`)| AUTH
AUTH -->|One-time invite + Argon2 hash| STORE
AUTH -->|Revocation export (`stellaops auth revoke export`)| OFFKIT
OFFKIT -->|Signed JSON + .jws| AGENT
AGENT -->|Revocation ACK / telemetry| AUTH
```
## 5. STRIDE Analysis
| Threat | STRIDE Vector | Surface | Risk (L×I) | Existing Controls | Gaps / Actions | Owner |
|--------|---------------|---------|------------|-------------------|----------------|-------|
| Spoofed revocation bundle | Spoofing | TB5 — Authority ↔️ Agents | Med×High | Detached JWS signature (planned), offline kit checksums | Finalise signing key registry & verification script (SEC4.B/SEC4.HOST); add bundle freshness requirement | Security Guild (follow-up: **SEC5.B**) |
| Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Tampered requests emit `authority.token.tamper` audit events (`request.tampered`, unexpected parameter names) correlating with `/token` outcomes (SEC5.C) | Security Guild + Authority Core (follow-up: **SEC5.C**) |
| Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Invites expire automatically and emit audit events on consumption/expiration (SEC5.D) | Security Guild |
| Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Signed revocation bundles, device fingerprint heuristics, optional mTLS | Monitor revocation acknowledgement latency via Zastava and tune replay alerting thresholds | Security Guild + Zastava (follow-up: **SEC5.E**) |
| Privilege escalation via plug-in override | Elevation of Privilege | TB3 — Plug-in sandbox | Med×High | Signed plug-ins, restart-only loading, configuration validation | Add static analysis on manifest overrides + runtime warning when policy weaker than host | Security Guild + DevOps (follow-up: **SEC5.F**) |
| Offline bundle tampering | Tampering | Distribution | Low×High | SHA256 manifest, signed bundles (planned) | Add supply-chain attestation for Offline Kit, publish verification CLI in docs | Security Guild + Ops (follow-up: **SEC5.G**) |
| Failure to log denied tokens | Repudiation | TB2 — Authority ↔️ Mongo | Med×Med | Serilog structured events (partial), Mongo persistence path (planned) | Finalise audit schema (SEC2.A) and ensure `/token` denies include subject/client/IP fields | Security Guild + Authority Core (follow-up: **SEC5.H**) |
Risk scoring uses qualitative scale (Low/Med/High) for likelihood × impact; mitigation priority follows High > Med > Low.
## 6. Follow-up Backlog Hooks
| Backlog ID | Linked Threat | Summary | Target Owners |
|------------|---------------|---------|---------------|
| SEC5.B | Spoofed revocation bundle | Complete libsodium/Core signing integration and ship revocation verification script. | Security Guild + Authority Core |
| SEC5.C | Parameter tampering on `/token` | Finalise audit contract (`SEC2.A`) and add request tamper logging. | Security Guild + Authority Core |
| SEC5.D | Bootstrap invite replay | Implement expiry enforcement + audit coverage for unused bootstrap invites. | Security Guild |
| SEC5.E | Token replay by stolen agent | Coordinate Zastava alerting with the new device fingerprint heuristics and surface stale revocation acknowledgements. | Security Guild + Zastava |
| SEC5.F | Plug-in override escalation | Static analysis of plug-in manifests; warn on weaker password policy overrides. | Security Guild + DevOps |
| SEC5.G | Offline bundle tampering | Extend Offline Kit build to include attested manifest + verification CLI sample. | Security Guild + Ops |
| SEC5.H | Failure to log denied tokens | Ensure audit persistence for all `/token` denials with correlation IDs. | Security Guild + Authority Core |
Update `src/StellaOps.Cryptography/TASKS.md` (Security Guild board) with the above backlog entries to satisfy SEC5.A exit criteria.

View File

@@ -1,56 +1,56 @@
{
"$schema": "../../etc/authority/revocation_bundle.schema.json",
"schemaVersion": "1.0.0",
"issuer": "https://auth.stella-ops.example",
"bundleId": "6f9d08bfa0c24a0a9f7f59e6c17d2f8e8bca2ef34215c3d3ba5a9a1f0fbe2d10",
"issuedAt": "2025-10-12T15:00:00Z",
"validFrom": "2025-10-12T15:00:00Z",
"sequence": 42,
"signingKeyId": "authority-signing-20251012",
"revocations": [
{
"id": "7ad4f3d2c21b461d9b3420e1151be9c4",
"category": "token",
"tokenType": "access_token",
"clientId": "feedser-cli",
"subjectId": "user:ops-admin",
"reason": "compromised",
"reasonDescription": "Access token reported by SOC automation run R-2045.",
"revokedAt": "2025-10-12T14:32:05Z",
"scopes": [
"feedser:export",
"feedser:jobs"
],
"fingerprint": "AD35E719C12204D7E7C92ED3F6DEBF0A44642D41AAF94233F9A47E183F4C5F18",
"metadata": {
"reportId": "R-2045",
"source": "soc-automation"
}
},
{
"id": "user:departed-vendor",
"category": "subject",
"subjectId": "user:departed-vendor",
"reason": "lifecycle",
"revokedAt": "2025-10-10T18:15:00Z",
"metadata": {
"ticket": "HR-8821"
}
},
{
"id": "ci-runner-legacy",
"category": "client",
"clientId": "ci-runner-legacy",
"reason": "rotation",
"revokedAt": "2025-10-09T11:00:00Z",
"expiresAt": "2025-11-01T00:00:00Z",
"metadata": {
"replacement": "ci-runner-2025"
}
}
],
"metadata": {
"generator": "stellaops-authority@1.4.0",
"jobId": "revocation-export-20251012T1500Z"
}
}
{
"$schema": "../../etc/authority/revocation_bundle.schema.json",
"schemaVersion": "1.0.0",
"issuer": "https://auth.stella-ops.example",
"bundleId": "6f9d08bfa0c24a0a9f7f59e6c17d2f8e8bca2ef34215c3d3ba5a9a1f0fbe2d10",
"issuedAt": "2025-10-12T15:00:00Z",
"validFrom": "2025-10-12T15:00:00Z",
"sequence": 42,
"signingKeyId": "authority-signing-20251012",
"revocations": [
{
"id": "7ad4f3d2c21b461d9b3420e1151be9c4",
"category": "token",
"tokenType": "access_token",
"clientId": "concelier-cli",
"subjectId": "user:ops-admin",
"reason": "compromised",
"reasonDescription": "Access token reported by SOC automation run R-2045.",
"revokedAt": "2025-10-12T14:32:05Z",
"scopes": [
"concelier:export",
"concelier:jobs"
],
"fingerprint": "AD35E719C12204D7E7C92ED3F6DEBF0A44642D41AAF94233F9A47E183F4C5F18",
"metadata": {
"reportId": "R-2045",
"source": "soc-automation"
}
},
{
"id": "user:departed-vendor",
"category": "subject",
"subjectId": "user:departed-vendor",
"reason": "lifecycle",
"revokedAt": "2025-10-10T18:15:00Z",
"metadata": {
"ticket": "HR-8821"
}
},
{
"id": "ci-runner-legacy",
"category": "client",
"clientId": "ci-runner-legacy",
"reason": "rotation",
"revokedAt": "2025-10-09T11:00:00Z",
"expiresAt": "2025-11-01T00:00:00Z",
"metadata": {
"replacement": "ci-runner-2025"
}
}
],
"metadata": {
"generator": "stellaops-authority@1.4.0",
"jobId": "revocation-export-20251012T1500Z"
}
}

View File

@@ -1,91 +1,91 @@
# Authority Revocation Bundle
The Authority service exports revocation information as an offline-friendly JSON document plus a detached JWS signature. Operators can mirror the bundle alongside Feedser exports to ensure air-gapped scanners receive the latest token, subject, and client revocations.
## File layout
| Artefact | Description |
| --- | --- |
| `revocation-bundle.json` | Canonical JSON document describing revoked entities. Validates against [`etc/authority/revocation_bundle.schema.json`](../../etc/authority/revocation_bundle.schema.json). |
| `revocation-bundle.json.jws` | Detached JWS signature covering the exact UTF-8 bytes of `revocation-bundle.json`. |
| `revocation-bundle.json.sha256` | Hex-encoded SHA-256 digest used by mirror automation (optional but recommended). |
All hashes and signatures are generated after applying the deterministic formatting rules below.
## Deterministic formatting rules
- JSON is serialised with UTF-8 encoding, 2-space indentation, and lexicographically sorted object keys.
- Arrays are sorted by deterministic keys:
- Top-level `revocations` sorted by (`category`, `id`, `revokedAt`).
- Nested arrays (`scopes`) sorted ascending, unique enforced.
- Numeric values (`sequence`) are emitted without leading zeros.
- Timestamps use UTC ISO-8601 format with `Z` suffix.
Consumers MUST treat the combination of `schemaVersion` and `sequence` as a monotonic feed. Bundles with older `sequence` values are ignored unless `bundleId` differs and `issuedAt` is newer (supporting replay detection).
## Revocation entry categories
| Category | Description | Required fields |
| --- | --- | --- |
| `token` | A single OAuth token (access, refresh, device, authorization code). | `tokenType`, `clientId`, `revokedAt`, optional `subjectId` |
| `subject` | All credentials issued to a subject (user/service account). | `subjectId`, `revokedAt` |
| `client` | Entire OAuth client registration is revoked. | `clientId`, `revokedAt` |
| `key` | Signing/encryption key material revoked. | `id`, `revokedAt` |
`reason` is a machine-friendly code (`compromised`, `rotation`, `policy`, `lifecycle`, etc). `reasonDescription` may include a short operator note.
## Detached JWS workflow
1. Serialise `revocation-bundle.json` using the deterministic rules.
2. Compute SHA-256 digest; write to `revocation-bundle.json.sha256`.
3. Sign using ES256 (default) with the configured Authority signing key. The JWS header uses:
```json
{
"alg": "ES256",
"kid": "{signingKeyId}",
"provider": "{providerName}",
"typ": "application/vnd.stellaops.revocation-bundle+jws",
"b64": false,
"crit": ["b64"]
}
```
4. Persist the detached signature payload to `revocation-bundle.json.jws` (per RFC 7797).
Verification steps:
1. Validate `revocation-bundle.json` against the schema.
2. Re-compute SHA-256 and compare with `.sha256` (if present).
3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle, preferring the provider declared in the JWS header (`provider` falls back to `default`).
4. Verify the detached JWS using the resolved provider. The CLI mirrors Authority resolution, so builds compiled with `StellaOpsCryptoSodium=true` automatically use the libsodium provider when advertised; otherwise verification downgrades to the managed fallback.
### CLI verification workflow
Use the bundled CLI command before distributing a bundle:
```bash
stellaops auth revoke verify \
--bundle artifacts/revocation-bundle.json \
--signature artifacts/revocation-bundle.json.jws \
--key etc/authority/signing/authority-public.pem \
--verbose
```
The verifier performs three checks:
1. Prints the computed digest in `sha256:<hex>` format. Compare it with the exported `.sha256` artefact.
2. Confirms the detached JWS header advertises `b64: false`, captures the provider hint, and that the algorithm matches the Authority configuration (ES256 unless overridden).
3. Registers the supplied PEM key with the crypto provider registry and validates the signature (falling back to the managed provider when the hinted provider is unavailable).
A zero exit code means the bundle is ready for mirroring/import. Non-zero codes signal missing arguments, malformed JWS payloads, or signature mismatches; regenerate or re-sign the bundle before distribution.
## Example
The repository contains an [example bundle](revocation-bundle-example.json) demonstrating a mixed export of token, subject, and client revocations. Use it as a reference for integration tests and tooling.
## Operations Quick Reference
- `stella auth revoke export` emits a canonical JSON bundle, `.sha256` digest, and detached JWS signature in one command. Use `--output` to write into your mirror staging directory.
- `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key, honours the `provider` metadata embedded in the signature, and reports digest mismatches before distribution.
- `POST /internal/revocations/export` provides the same payload for orchestrators that already talk to the bootstrap API.
- `POST /internal/signing/rotate` rotates JWKS material without downtime; always export a fresh bundle afterward so downstream mirrors receive signatures from the new `kid`.
- Offline Kit automation should mirror `revocation-bundle.json*` alongside Feedser exports so agents ingest revocations during the same sync pass.
# Authority Revocation Bundle
The Authority service exports revocation information as an offline-friendly JSON document plus a detached JWS signature. Operators can mirror the bundle alongside Concelier exports to ensure air-gapped scanners receive the latest token, subject, and client revocations.
## File layout
| Artefact | Description |
| --- | --- |
| `revocation-bundle.json` | Canonical JSON document describing revoked entities. Validates against [`etc/authority/revocation_bundle.schema.json`](../../etc/authority/revocation_bundle.schema.json). |
| `revocation-bundle.json.jws` | Detached JWS signature covering the exact UTF-8 bytes of `revocation-bundle.json`. |
| `revocation-bundle.json.sha256` | Hex-encoded SHA-256 digest used by mirror automation (optional but recommended). |
All hashes and signatures are generated after applying the deterministic formatting rules below.
## Deterministic formatting rules
- JSON is serialised with UTF-8 encoding, 2-space indentation, and lexicographically sorted object keys.
- Arrays are sorted by deterministic keys:
- Top-level `revocations` sorted by (`category`, `id`, `revokedAt`).
- Nested arrays (`scopes`) sorted ascending, unique enforced.
- Numeric values (`sequence`) are emitted without leading zeros.
- Timestamps use UTC ISO-8601 format with `Z` suffix.
Consumers MUST treat the combination of `schemaVersion` and `sequence` as a monotonic feed. Bundles with older `sequence` values are ignored unless `bundleId` differs and `issuedAt` is newer (supporting replay detection).
## Revocation entry categories
| Category | Description | Required fields |
| --- | --- | --- |
| `token` | A single OAuth token (access, refresh, device, authorization code). | `tokenType`, `clientId`, `revokedAt`, optional `subjectId` |
| `subject` | All credentials issued to a subject (user/service account). | `subjectId`, `revokedAt` |
| `client` | Entire OAuth client registration is revoked. | `clientId`, `revokedAt` |
| `key` | Signing/encryption key material revoked. | `id`, `revokedAt` |
`reason` is a machine-friendly code (`compromised`, `rotation`, `policy`, `lifecycle`, etc). `reasonDescription` may include a short operator note.
## Detached JWS workflow
1. Serialise `revocation-bundle.json` using the deterministic rules.
2. Compute SHA-256 digest; write to `revocation-bundle.json.sha256`.
3. Sign using ES256 (default) with the configured Authority signing key. The JWS header uses:
```json
{
"alg": "ES256",
"kid": "{signingKeyId}",
"provider": "{providerName}",
"typ": "application/vnd.stellaops.revocation-bundle+jws",
"b64": false,
"crit": ["b64"]
}
```
4. Persist the detached signature payload to `revocation-bundle.json.jws` (per RFC 7797).
Verification steps:
1. Validate `revocation-bundle.json` against the schema.
2. Re-compute SHA-256 and compare with `.sha256` (if present).
3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle, preferring the provider declared in the JWS header (`provider` falls back to `default`).
4. Verify the detached JWS using the resolved provider. The CLI mirrors Authority resolution, so builds compiled with `StellaOpsCryptoSodium=true` automatically use the libsodium provider when advertised; otherwise verification downgrades to the managed fallback.
### CLI verification workflow
Use the bundled CLI command before distributing a bundle:
```bash
stellaops auth revoke verify \
--bundle artifacts/revocation-bundle.json \
--signature artifacts/revocation-bundle.json.jws \
--key etc/authority/signing/authority-public.pem \
--verbose
```
The verifier performs three checks:
1. Prints the computed digest in `sha256:<hex>` format. Compare it with the exported `.sha256` artefact.
2. Confirms the detached JWS header advertises `b64: false`, captures the provider hint, and that the algorithm matches the Authority configuration (ES256 unless overridden).
3. Registers the supplied PEM key with the crypto provider registry and validates the signature (falling back to the managed provider when the hinted provider is unavailable).
A zero exit code means the bundle is ready for mirroring/import. Non-zero codes signal missing arguments, malformed JWS payloads, or signature mismatches; regenerate or re-sign the bundle before distribution.
## Example
The repository contains an [example bundle](revocation-bundle-example.json) demonstrating a mixed export of token, subject, and client revocations. Use it as a reference for integration tests and tooling.
## Operations Quick Reference
- `stella auth revoke export` emits a canonical JSON bundle, `.sha256` digest, and detached JWS signature in one command. Use `--output` to write into your mirror staging directory.
- `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key, honours the `provider` metadata embedded in the signature, and reports digest mismatches before distribution.
- `POST /internal/revocations/export` provides the same payload for orchestrators that already talk to the bootstrap API.
- `POST /internal/signing/rotate` rotates JWKS material without downtime; always export a fresh bundle afterward so downstream mirrors receive signatures from the new `kid`.
- Offline Kit automation should mirror `revocation-bundle.json*` alongside Concelier exports so agents ingest revocations during the same sync pass.

View File

@@ -0,0 +1,14 @@
# Docs Guild Update — 2025-10-18
**Subject:** ADR process + events schema validation shipped
**Audience:** Docs Guild, DevEx, Platform Events
- Published the ADR contribution guide at `docs/adr/index.md` and enriched the template to capture authorship, deciders, and alternatives. All new cross-module decisions should follow this workflow.
- Linked the ADR hub from `docs/README.md` so operators and engineers can discover the process without digging through directories.
- Extended Docs CI (`.gitea/workflows/docs.yml`) to compile event schemas with Ajv (including `ajv-formats`) and documented the local loop in `docs/events/README.md`.
- Captured the mirror/offline workflow in `docs/ci/20_CI_RECIPES.md` so runners know how to install the Ajv toolchain and publish previews without internet access.
- Validated `scanner.report.ready@1`, `scheduler.rescan.delta@1`, and `attestor.logged@1` schemas locally to unblock Platform Events acknowledgements.
Next steps:
- Platform Events to confirm Notify/Scheduler consumers have visibility into the schema docs.
- DevEx to add ADR announcement blurb to the next sprint recap if broader broadcast is needed.

View File

@@ -0,0 +1,12 @@
# Docs Guild Update — 2025-10-19
**Subject:** Event envelope reference & canonical samples
**Audience:** Docs Guild, Platform Events, Runtime Guild
- Extended `docs/events/README.md` with envelope field tables, offline validation commands, and guidance for optional payload fields.
- Added canonical sample payloads under `docs/events/samples/` for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, and `attestor.logged@1`; validated them with `ajv-cli` to match the published schemas.
- Documented the validation loop so air-gapped operators can mirror the CI checks before rolling new event versions.
Next steps:
- Platform Events to embed the canonical samples into their contract tests.
- Runtime Guild checklist for quieted finding counts & progress hints published in `docs/runtime/SCANNER_RUNTIME_READINESS.md`; gather stakeholder sign-off.

View File

@@ -0,0 +1,10 @@
# Platform Events Update — 2025-10-19
**Subject:** Canonical event samples enforced across tests & CI
**Audience:** Platform Events Guild, Notify Guild, Scheduler Guild, Docs Guild
- Scanner WebService contract tests deserialize `scanner.report.ready@1` and `scanner.scan.completed@1` samples, validating DSSE payloads and canonical ordering via `NotifyCanonicalJsonSerializer`.
- Notify and Scheduler model suites now round-trip the published event samples (including `attestor.logged@1` and `scheduler.rescan.delta@1`) to catch drift in consumer expectations.
- Docs CI (`.gitea/workflows/docs.yml`) validates every sample against its schema with `ajv-cli`, keeping offline bundles and repositories aligned.
No additional follow-ups — downstream teams can rely on the committed samples for integration coverage.

View File

@@ -0,0 +1,5 @@
# 2025-10-19 Scanner ↔ Policy Sync
- Scanner WebService now emits `scanner.report.ready` and `scanner.scan.completed` via Redis Streams when `scanner.events.enabled=true`; DSSE envelopes are embedded verbatim to keep Notify/UI consumers in sync.
- Config plumbing introduces `scanner:events:*` settings (driver, DSN, stream, publish timeout) with validation and Redis-backed publisher wiring.
- Policy Guild coordination task `POLICY-RUNTIME-17-201` opened to track Zastava runtime feed contract; `SCANNER-RUNTIME-17-401` now depends on it so reachability tags stay aligned once runtime endpoints ship.

Some files were not shown because too many files have changed in this diff Show More