audit work, fixed StellaOps.sln warnings/errors, fixed tests, sprints work, new advisories

This commit is contained in:
master
2026-01-07 18:49:59 +02:00
parent 04ec098046
commit 608a7f85c0
866 changed files with 56323 additions and 6231 deletions

View File

@@ -0,0 +1,15 @@
# SBOM→VEX Offline Kit (Stub)
This kit supports sprint task 6 (SBOM-VEX-GAPS-300-013).
Contents (stub):
- `verify.sh` chain hash stub for SBOM + DSSE + Rekor + VEX
- `chain-hash-recipe.md` canonicalisation steps
- `inputs.lock` pinned tool versions and snapshot
- `proof-manifest.json` chain hash placeholder
- ~~`sbom-vex-blueprint.svg`~~ archived (empty placeholder)
Next steps:
- Add real SBOM/VEX samples and Rekor bundle snapshot.
- Produce DSSE signatures for proof manifest and scripts.
- Include time-anchor and backpressure/error policy notes per BP1BP10.

View File

@@ -0,0 +1,25 @@
# SBOM→VEX Chain Hash Recipe (Stub)
Use with sprint task 6 (SBOM-VEX-GAPS-300-013).
- Inputs: sorted SBOM documents, VEX statements, DSSE envelopes, Rekor bundle snapshot.
- Hashing: deterministic ordering (UTF-8, LF), SHA-256 over concatenated canonical JSON.
- Chain: derive cumulative hash for (SBOM → DSSE → Rekor → VEX) and store in proof manifest.
- Offline: no network; bundle Rekor root + snapshot; include `inputs.lock` with tool versions.
Example (stub):
```bash
sbom_files=(sbom.json)
vex_files=(vex.json)
dsse=envelope.dsse
rekor=rekor-bundle.json
cat "${sbom_files[@]}" | jq -S . > /tmp/sbom.canon
cat "${vex_files[@]}" | jq -S . > /tmp/vex.canon
cat "$dsse" | jq -S . > /tmp/dsse.canon
cat "$rekor" | jq -S . > /tmp/rekor.canon
cat /tmp/sbom.canon /tmp/dsse.canon /tmp/rekor.canon /tmp/vex.canon | sha256sum | awk '{print $1}' > proof.chainhash
echo "chain-hash: $(cat proof.chainhash)"
```

View File

@@ -0,0 +1,10 @@
{
"payloadType": "application/vnd.cyclonedx+json",
"payload": "ewogICJib21Gb3JtYXQiOiAiQ3ljbG9uZURYIiwKICAic3BlY1ZlcnNpb24iOiAiMS41IiwKICAidmVyc2lvbiI6IDEsCiAgImNvbXBvbmVudHMiOiBbCiAgICB7InR5cGUiOiAiY29udGFpbmVyIiwgIm5hbWUiOiAiZXhhbXBsZSIsICJ2ZXJzaW9uIjogIjEuMC4wIn0KICBdCn0K",
"signatures": [
{
"keyid": "stub-key-id",
"sig": "stub-signature"
}
]
}

View File

@@ -0,0 +1,7 @@
sbom_tool: "syft 1.1.0"
vex_tool: "stella-vex 0.4.2"
dsse_tool: "cosign 2.2.1"
rekor_snapshot: "rekor-snapshot-2025-11-30.json"
chain_hash_alg: "sha256"
tz: "UTC"
notes: "Offline kit; no live Rekor calls"

View File

@@ -0,0 +1,11 @@
{
"version": "0.1.0-stub",
"chain_hash": "7d72ed74065e8e359af34c5bb1805fa62629e2444dbe77b89efbebe5c4ddb932",
"inputs": {
"sbom": "sbom.json",
"vex": "vex.json",
"dsse": "envelope.dsse",
"rekor_bundle": "rekor-bundle.json"
},
"lockfile": "inputs.lock"
}

View File

@@ -0,0 +1,6 @@
{
"kind": "rekor.bundle",
"apiVersion": "0.1.0",
"logIndex": 123456,
"payloadHash": "stub"
}

View File

@@ -0,0 +1,8 @@
{
"bomFormat": "CycloneDX",
"specVersion": "1.5",
"version": 1,
"components": [
{"type": "container", "name": "example", "version": "1.0.0"}
]
}

View File

@@ -0,0 +1,33 @@
#!/usr/bin/env bash
set -euo pipefail
# Offline verifier stub for SBOM -> VEX proof bundles.
# Expected inputs: path to DSSE envelope, Rekor log snapshot, and bundled trust roots.
if [ "$#" -lt 4 ]; then
echo "usage: $0 <sbom.json> <vex.json> <dsse.envelope> <rekor-bundle.json>" >&2
exit 1
fi
SBOM="$1"
VEX="$2"
DSSE="$3"
REKOR="$4"
if ! command -v jq >/dev/null; then
echo "jq is required (offline-capable)." >&2
exit 2
fi
echo "[stub] canonicalising inputs..." >&2
tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT
jq -S . "$SBOM" > "$tmpdir/sbom.canon"
jq -S . "$VEX" > "$tmpdir/vex.canon"
jq -S . "$DSSE" > "$tmpdir/dsse.canon"
jq -S . "$REKOR" > "$tmpdir/rekor.canon"
cat "$tmpdir/sbom.canon" "$tmpdir/dsse.canon" "$tmpdir/rekor.canon" "$tmpdir/vex.canon" | sha256sum | awk '{print $1}' > "$tmpdir/proof.hash"
echo "chain-hash (sbom+dsse+rekor+vex): $(cat "$tmpdir/proof.hash")"
echo "[stub] verify DSSE signatures and Rekor inclusion separately; add manifests to DSSE envelope for full proof"

View File

@@ -0,0 +1,11 @@
{
"@context": "https://openvex.dev/ns/v0.2.0",
"statements": [
{
"vulnerability": "CVE-2025-0001",
"products": ["pkg:container/example@1.0.0"],
"status": "not_affected",
"justification": "vulnerable_code_not_present"
}
]
}

View File

@@ -1,34 +0,0 @@
# CI Recipes agent guide
## Mission
CI module collects reproducible pipeline recipes for builds, tests, and release promotion across supported platforms.
## Key docs
- [Module README](./README.md)
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
## How to get started
1. Open sprint file `/docs/implplan/SPRINT_*.md` and locate the stories referencing this module.
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
3. Read the architecture and README for domain context before editing code or docs.
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
## Guardrails
- Honour the Aggregation-Only Contract where applicable (see ../../aoc/aggregation-only-contract.md).
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
- Update runbooks/observability assets when operational characteristics change.
## Required Reading
- `docs/modules/ci/README.md`
- `docs/modules/ci/architecture.md`
- `docs/modules/ci/implementation_plan.md`
- `docs/modules/platform/architecture-overview.md`
## Working Agreement
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.

View File

@@ -1,49 +0,0 @@
# StellaOps CI Recipes
CI module collects reproducible pipeline recipes for builds, tests, and release promotion across supported platforms.
## Responsibilities
- Provide ready-to-use pipeline snippets for ingestion, scanning, policy evaluation, and exports.
- Document required secrets/scopes and deterministic build knobs.
- Highlight offline-compatible workflows and cache strategies.
## Key components
- Recipe catalogue in ./recipes.md.
## Integrations & dependencies
- DevOps release workflows.
- Module-specific test suites referenced in recipes.
## Operational notes
- Encourage reuse through templated YAML/JSON fragments.
## Related resources
- ./recipes.md
- ./TASKS.md (status mirror)
- ../../implplan/SPRINT_0315_0001_0001_docs_modules_ci.md (sprint tracker)
## Backlog references
- CI recipes refresh tracked in ../../TASKS.md under DOCS-CI stories.
## Epic alignment
- **Epic 1 AOC enforcement:** bake ingestion/verifier guardrails into CI recipes.
- **Epic 10 Export Center:** provide pipeline snippets for export packaging, signing, and Offline Kit publication.
- **Epic 11 Notifications Studio:** offer CI hooks for notification previews/tests where relevant.
## Implementation Status
**Epic Milestones:**
- Epic 1 (AOC enforcement) Ensure pipelines enforce schemas, provenance, and verifier jobs
- Epic 10 (Export Center) Add export/signing/Offline Kit automation templates
- Epic 11 (Notifications Studio) Document CI hooks for notification previews and tests
**Key Deliverables:**
- Reproducible pipeline recipes for builds, tests, and release promotion
- Ready-to-use snippets for ingestion, scanning, policy evaluation, and exports
- Documentation of required secrets/scopes and deterministic build knobs
- Offline-compatible workflows and cache strategies
**Operational Focus:**
- Maintain deterministic behavior and offline parity across releases
- Keep documentation, telemetry, and runbooks aligned with sprint outcomes
- Preserve determinism and provenance requirements in all recipe additions

View File

@@ -1,30 +0,0 @@
# CI Recipes architecture
## Scope & responsibilities
- Curate deterministic CI pipeline templates for ingestion, scanning, policy evaluation, export, and notifications.
- Capture provenance for each recipe (inputs, pinned tool versions, checksum manifests) and keep offline/air-gap parity.
- Provide reusable fragments (YAML/JSON) plus guardrails (AOC checks, DSSE attestation hooks, Rekor/Transparency toggles).
## Components
- **Recipe catalogue (`recipes.md`)** — Source of truth for pipeline snippets; sorted deterministically and annotated with required secrets/scopes.
- **Guardrail hooks** — Inline steps for schema validation, SBOM/VEX signing, and attestation verification; reuse Authority/Signer/Export Center helpers.
- **Observability shim** — Optional steps to emit structured logs/metrics to Telemetry Core when allowed; defaults to no-op in sealed/offline mode.
- **Offline bundle path** — Scripts/guides to package recipes and pinned tool archives for air-gapped runners; hashes recorded in release notes.
## Data & determinism
- All generated artifacts (templates, manifests, example outputs) must sort keys and lists, emit UTC ISO-8601 timestamps, and avoid host-specific paths.
- DSSE/attestation helpers should target the platform trust roots defined in Authority/Sigstore docs; prefer BLAKE3 hashing where compatible.
- Keep retry/backoff logic deterministic for reproducible CI runs; avoid time-based jitter unless explicitly documented.
## Integration points
- Authority/Signer for DSSE + Rekor publication; Export Center for bundle assembly; Notify for preview hooks; Telemetry Core for optional metrics.
- Recipes must remain compatible with CLI/SDK surface referenced in `docs/modules/cli/guides/` and devportal snippets.
## Testing lanes and catalog
- CI lane filters are defined by `docs/technical/testing/TEST_CATALOG.yml` and aligned with `docs/technical/testing/testing-strategy-models.md`.
- Standard categories: Unit, Contract, Integration, Security, Performance, Live (opt-in only).
- Any new test gate or lane must update `docs/technical/testing/TEST_SUITE_OVERVIEW.md` and `docs/technical/testing/ci-quality-gates.md`.
## Change process
- Track active work in `docs/implplan/SPRINT_0315_0001_0001_docs_modules_ci.md` and mirror statuses in `./TASKS.md`.
- When adding new recipes, include offline notes, determinism checks, and minimal test harness references in `docs/benchmarks` or `tests/**` as applicable.

View File

@@ -1,353 +0,0 @@
# StellaOps CI Recipes  (20250804)
## 0·Key variables (export these once)
| Variable | Meaning | Typical value |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| `STELLA_URL` | Host that: ① stores the **CLI** & **SBOMbuilder** images under `/registry` **and** ② receives API calls at `https://$STELLA_URL` | `stella-ops.ci.acme.example` |
| `DOCKER_HOST` | How containers reach your Docker daemon (because we no longer mount `/var/run/docker.sock`) | `tcp://docker:2375` |
| `WORKSPACE` | Directory where the pipeline stores artefacts (SBOM file) | `$(pwd)` |
| `IMAGE` | The image you are building & scanning | `acme/backend:sha-${COMMIT_SHA}` |
| `SBOM_FILE` | Immutable SBOM name `<image-ref>YYYYMMDDThhmmssZ.sbom.json` | `acme_backend_shaabc12320250804T153050Z.sbom.json` |
> **Authority graph scopes note (2025-10-27):** CI stages that spin up the Authority compose profile now rely on the checked-in `etc/authority.yaml`. Before running integration smoke jobs, inject real secrets for every `etc/secrets/*.secret` file (Cartographer, Graph API, Policy Engine, Concelier, Excititor). The repository defaults contain `*-change-me` placeholders and Authority will reject tokens if those secrets are not overridden. Reissue CI tokens that previously used `policy:write`/`policy:submit`/`policy:edit` scopes—new bundles must request `policy:read`, `policy:author`, `policy:review`, `policy:simulate`, and (`policy:approve`/`policy:operate`/`policy:activate` when pipelines promote policies).
```bash
export STELLA_URL="stella-ops.ci.acme.example"
export DOCKER_HOST="tcp://docker:2375" # Jenkins/Circle often expose it like this
export WORKSPACE="$(pwd)"
export IMAGE="acme/backend:sha-${COMMIT_SHA}"
export SBOM_FILE="$(echo "${IMAGE}" | tr '/:+' '__')-$(date -u +%Y%m%dT%H%M%SZ).sbom.json"
```
---
## 1·SBOM creation strategies
### Option A **Buildx attested SBOM** (preferred if you can use BuildKit)
You pass **two build args** so the Dockerfile can run the builder and copy the result out of the build context.
```bash
docker buildx build \
--build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \
--provenance=true --sbom=true \
--build-arg SBOM_FILE="$SBOM_FILE" \
-t "$IMAGE" .
```
**If you **cannot** use Buildx, use Option B below.** The older “run a builder stage inside the Dockerfile” pattern is unreliable for producing an SBOM of the final image.
```Dockerfile
ARG STELLA_SBOM_BUILDER
ARG SBOM_FILE
FROM $STELLA_SBOM_BUILDER as sbom
ARG IMAGE
ARG SBOM_FILE
RUN $STELLA_SBOM_BUILDER build --image $IMAGE --output /out/$SBOM_FILE
# ---- actual build stages … ----
FROM alpine:3.20
COPY --from=sbom /out/$SBOM_FILE / # (optional) keep or discard
# (rest of your Dockerfile)
```
### Option B **External builder step** (works everywhere; recommended baseline if Buildx isnt available)
*(keep this block if your pipeline already has an imagebuild step that you cant modify)*
```bash
docker run --rm \
-e DOCKER_HOST="$DOCKER_HOST" \ # let builder reach the daemon remotely
-v "$WORKSPACE:/workspace" \ # place SBOM beside the source code
"$STELLA_URL/registry/stella-sbom-builder:latest" \
build --image "$IMAGE" --output "/workspace/${SBOM_FILE}"
```
---
## 2·Scan the image & upload results
```bash
docker run --rm \
-e DOCKER_HOST="$DOCKER_HOST" \ # remotedaemon pointer
-v "$WORKSPACE/${SBOM_FILE}:/${SBOM_FILE}:ro" \ # mount SBOM under same name at container root
-e STELLA_OPS_URL="https://${STELLA_URL}" \ # where the CLI posts findings
"$STELLA_URL/registry/stella-cli:latest" \
scan --sbom "/${SBOM_FILE}" "$IMAGE"
```
The CLI returns **exit 0** if policies pass, **>0** if blocked — perfect for failing the job.
---
## 3·CI templates
Below are minimal, cutandpaste snippets.
**Feel free to deleteOption B** if you adopt Option A.
### 3.1 Jenkins (Declarative Pipeline)
```groovy
pipeline {
agent { docker { image 'docker:25' args '--privileged' } } // gives us /usr/bin/docker
environment {
STELLA_URL = 'stella-ops.ci.acme.example'
DOCKER_HOST = 'tcp://docker:2375'
IMAGE = "acme/backend:${env.BUILD_NUMBER}"
SBOM_FILE = "acme_backend_${env.BUILD_NUMBER}-${new Date().format('yyyyMMdd\'T\'HHmmss\'Z\'', TimeZone.getTimeZone('UTC'))}.sbom.json"
}
stages {
stage('Build image + SBOM (Option A)') {
steps {
sh '''
docker build \
--build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \
--build-arg SBOM_FILE="$SBOM_FILE" \
-t "$IMAGE" .
'''
}
}
/* ---------- Option B fallback (when you must keep the existing build step asis) ----------
stage('SBOM builder (Option B)') {
steps {
sh '''
docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
-v "$WORKSPACE:/workspace" \
"$STELLA_URL/registry/stella-sbom-builder:latest" \
build --image "$IMAGE" --output "/workspace/${SBOM_FILE}"
'''
}
}
------------------------------------------------------------------------------------------ */
stage('Scan & upload') {
steps {
sh '''
docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
-v "$WORKSPACE/${SBOM_FILE}:/${SBOM_FILE}:ro" \
-e STELLA_OPS_URL="https://$STELLA_URL" \
"$STELLA_URL/registry/stella-cli:latest" \
scan --sbom "/${SBOM_FILE}" "$IMAGE"
'''
}
}
}
}
```
---
### 3.2 CircleCI `.circleci/config.yml`
```yaml
version: 2.1
jobs:
stella_scan:
docker:
- image: cimg/base:stable # baremetal image with Docker CLI
environment:
STELLA_URL: stella-ops.ci.acme.example
DOCKER_HOST: tcp://docker:2375 # Circles “remote Docker” socket
steps:
- checkout
- run:
name: Compute vars
command: |
echo 'export IMAGE="acme/backend:${CIRCLE_SHA1}"' >> $BASH_ENV
echo 'export SBOM_FILE="$(echo acme/backend:${CIRCLE_SHA1} | tr "/:+" "__")-$(date -u +%Y%m%dT%H%M%SZ).sbom.json"' >> $BASH_ENV
- run:
name: Build image + SBOM (Option A)
command: |
docker build \
--build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \
--build-arg SBOM_FILE="$SBOM_FILE" \
-t "$IMAGE" .
# --- Option B fallback (when you must keep the existing build step asis) ---
#- run:
# name: SBOM builder (Option B)
# command: |
# docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
# -v "$PWD:/workspace" \
# "$STELLA_URL/registry/stella-sbom-builder:latest" \
# build --image "$IMAGE" --output "/workspace/${SBOM_FILE}"
- run:
name: Scan
command: |
docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
-v "$PWD/${SBOM_FILE}:/${SBOM_FILE}:ro" \
-e STELLA_OPS_URL="https://$STELLA_URL" \
"$STELLA_URL/registry/stella-cli:latest" \
scan --sbom "/${SBOM_FILE}" "$IMAGE"
workflows:
stella:
jobs: [stella_scan]
```
---
### 3.3 Gitea Actions `.gitea/workflows/stella.yml`
*(Gitea 1.22+ ships native Actions compatible with GitHub syntax)*
```yaml
name: Stella Scan
on: [push]
jobs:
stella:
runs-on: ubuntu-latest
env:
STELLA_URL: ${{ secrets.STELLA_URL }}
DOCKER_HOST: tcp://docker:2375 # provided by the docker:dind service
services:
docker:
image: docker:dind
options: >-
--privileged
steps:
- uses: actions/checkout@v4
- name: Compute vars
id: vars
run: |
echo "IMAGE=ghcr.io/${{ gitea.repository }}:${{ gitea.sha }}" >> $GITEA_OUTPUT
echo "SBOM_FILE=$(echo ghcr.io/${{ gitea.repository }}:${{ gitea.sha }} | tr '/:+' '__')-$(date -u +%Y%m%dT%H%M%SZ).sbom.json" >> $GITEA_OUTPUT
- name: Build image + SBOM (Option A)
run: |
docker build \
--build-arg STELLA_SBOM_BUILDER="${STELLA_URL}/registry/stella-sbom-builder:latest" \
--build-arg SBOM_FILE="${{ steps.vars.outputs.SBOM_FILE }}" \
-t "${{ steps.vars.outputs.IMAGE }}" .
# --- Option B fallback (when you must keep the existing build step asis) ---
#- name: SBOM builder (Option B)
# run: |
# docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
# -v "$(pwd):/workspace" \
# "${STELLA_URL}/registry/stella-sbom-builder:latest" \
# build --image "${{ steps.vars.outputs.IMAGE }}" --output "/workspace/${{ steps.vars.outputs.SBOM_FILE }}"
- name: Scan
run: |
docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \
-v "$(pwd)/${{ steps.vars.outputs.SBOM_FILE }}:/${{ steps.vars.outputs.SBOM_FILE }}:ro" \
-e STELLA_OPS_URL="https://${STELLA_URL}" \
"${STELLA_URL}/registry/stella-cli:latest" \
scan --sbom "/${{ steps.vars.outputs.SBOM_FILE }}" "${{ steps.vars.outputs.IMAGE }}"
```
---
## 4·Docs CI (Gitea Actions & Offline Mirror)
StellaOps ships a dedicated Docs workflow at `.gitea/workflows/docs.yml`. When mirroring the pipeline offline or running it locally, install the same toolchain so markdown linting, schema validation, and HTML preview stay deterministic.
### 4.1 Toolchain bootstrap
```bash
# Node.js 20.x is required; install once per runner
npm install --no-save \
markdown-link-check \
remark-cli \
remark-preset-lint-recommended \
ajv \
ajv-cli \
ajv-formats
# Python 3.11+ powers the preview renderer
python -m pip install --upgrade pip
python -m pip install markdown pygments
```
> **No `pip` available?** Some hardened Python builds (including the repos `tmp/docenv`
> interpreter) ship without `pip`/`ensurepip`. In that case download the purePython
> sdists (e.g. `Markdown-3.x.tar.gz`, `pygments-2.x.tar.gz`) and extract their
> packages directly into the virtualenvs `lib/python*/site-packages/` folder.
> This keeps the renderer working even when package managers are disabled.
**Offline tip.** Add the packages above to your artifact mirror (for example `ops/devops/offline-kit.json`) so runners can install them via `npm --offline` / `pip --no-index`.
### 4.2 Schema validation step
Ajv compiles every event schema to guard against syntax or format regressions. The workflow uses `ajv-formats` for UUID/date-time support.
```bash
for schema in docs/modules/signals/events/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
```
Run this loop before committing schema changes. For new references, append `-r additional-file.json` so CI and local runs stay aligned.
### 4.3 Preview build
```bash
python scripts/render_docs.py --source docs --output artifacts/docs-preview --clean
```
Host the resulting bundle via any static file server for review (for example `python -m http.server`).
### 4.4 Publishing checklist
- [ ] Toolchain installs succeed without hitting the public internet (mirror or cached tarballs).
- [ ] Ajv validation passes for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, `attestor.logged@1`.
- [ ] Markdown link check (`npx markdown-link-check`) reports no broken references.
- [ ] Preview bundle archived (or attached) for stakeholders.
### 4.5 Policy DSL lint stage
Policy Engine v2 pipelines now fail fast if policy documents are malformed. After checkout and dotnet restore, run:
```bash
dotnet run \
--project src/Tools/PolicyDslValidator/PolicyDslValidator.csproj \
-- \
--strict docs/modules/policy/samples/*.yaml
```
- `--strict` treats warnings as errors so missing metadata doesnt slip through.
- The validator accepts globs, so you can point it at tenant policy directories later (`policies/**/*.yaml`).
- Exit codes follow UNIX conventions: `0` success, `1` parse/errors, `2` warnings when `--strict` is set, `64` usage mistakes.
Capture the validator output as part of your build logs; Support uses it when triaging policy rollout issues.
### 4.6 Policy simulation smoke
Catch unexpected policy regressions by exercising a small set of golden SBOM findings via the simulation smoke tool:
```bash
dotnet run \
--project src/Tools/PolicySimulationSmoke/PolicySimulationSmoke.csproj \
-- \
--scenario-root samples/policy/simulations \
--output artifacts/policy-simulations
```
- The tool loads each `scenario.json` under `samples/policy/simulations`, evaluates the referenced policy, and fails the build if projected verdicts change.
- In CI the command runs twice (to `run1/` and `run2/`) and `diff -u` compares the summaries—any mismatch signals a determinism regression.
- Artifacts land in `artifacts/policy-simulations/policy-simulation-summary.json`; upload them for later inspection (see CI workflow).
- Expand scenarios by copying real-world findings into the samples directory—ensure expected statuses are recorded so regressions trip the pipeline.
---
## 5·Troubleshooting cheatsheet
| Symptom | Root cause | First things to try |
| ------------------------------------- | --------------------------- | --------------------------------------------------------------- |
| `no such host $STELLA_URL` | DNS typo or VPN outage | `ping $STELLA_URL` from runner |
| `connection refused` when CLI uploads | Port 443 blocked | open firewall / check ingress |
| `failed to stat /<sbom>.json` | SBOM wasnt produced | Did Option A actually run builder? If not, enable Option B |
| `registry unauthorized` | Runner lacks registry creds | `docker login $STELLA_URL/registry` (store creds in CI secrets) |
| Nonzero scan exit | Blocking vuln/licence | Open project in Ops UI → triage or waive |
---
### Change log
* **20251018** Documented Docs CI toolchain (Ajv validation, static preview) and offline checklist.
* **20250804** Variable cleanup, removed Dockersocket & cache mounts, added Jenkins / CircleCI / Gitea examples, clarified Option B comment.

View File

@@ -0,0 +1,242 @@
# Facet Seal Admission Webhook Configuration
**Sprint:** SPRINT_20260105_002_004_CLI
**Task:** CLI-017 - Admission webhook configuration documentation
## Overview
The StellaOps Zastava admission webhook validates facet seals during Kubernetes pod admission. When enabled, it ensures that container images have valid facet seals and that any drift from the baseline is within acceptable quotas.
## Prerequisites
- Kubernetes cluster with admission webhook support
- StellaOps Zastava webhook deployed
- Certificate management for webhook TLS
- Network access from API server to webhook endpoint
## Enabling Facet Validation
Facet seal validation is opt-in per namespace using annotations.
### Namespace Annotation
Add the following annotation to enable facet validation:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
annotations:
stellaops.io/facet-seal-required: "true"
```
### Annotation Values
| Value | Behavior |
|-------|----------|
| `"true"` | Facet seal validation enabled |
| `"false"` | Facet seal validation disabled |
| (not set) | Facet seal validation disabled |
## Validation Behavior
When facet validation is enabled, the webhook performs these checks:
1. **Seal Lookup**: Load the facet seal for the image digest
2. **Signature Verification**: Verify the seal's DSSE signature (if present)
3. **Drift Computation**: Compare current image state against baseline seal
4. **Quota Evaluation**: Check drift against configured quotas
### Verdict Outcomes
| Verdict | Result | Description |
|---------|--------|-------------|
| `Ok` | Allow | Drift within quotas |
| `Warning` | Allow (with warning) | Approaching quota limits |
| `Blocked` | Deny | Quota exceeded |
| `RequiresVex` | Deny | Requires VEX authorization |
## Configuration Options
### Webhook Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: zastava-webhook
namespace: stellaops-system
spec:
replicas: 3
selector:
matchLabels:
app: zastava-webhook
template:
metadata:
labels:
app: zastava-webhook
spec:
containers:
- name: webhook
image: stellaops/zastava-webhook:latest
ports:
- containerPort: 8443
env:
- name: STELLAOPS_BACKEND_URL
value: "https://api.stellaops.internal"
- name: STELLAOPS_FACET_SEAL_STORE
value: "remote"
volumeMounts:
- name: webhook-certs
mountPath: /etc/webhook/certs
readOnly: true
volumes:
- name: webhook-certs
secret:
secretName: zastava-webhook-certs
```
### ValidatingWebhookConfiguration
```yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: stellaops-facet-admission
webhooks:
- name: facet.admission.stellaops.io
clientConfig:
service:
name: zastava-webhook
namespace: stellaops-system
path: /validate
caBundle: ${CA_BUNDLE}
rules:
- operations: ["CREATE", "UPDATE"]
apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
namespaceSelector:
matchExpressions:
- key: stellaops.io/facet-seal-required
operator: In
values: ["true"]
failurePolicy: Fail
sideEffects: None
admissionReviewVersions: ["v1"]
```
## Quota Configuration
Facet drift quotas can be configured per namespace or globally.
### Global Quotas (ConfigMap)
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: stellaops-facet-quotas
namespace: stellaops-system
data:
default.yaml: |
quotas:
runtime:
warningThreshold: 10
blockThreshold: 25
vexThreshold: 50
config:
warningThreshold: 20
blockThreshold: 40
vexThreshold: 60
static:
warningThreshold: 30
blockThreshold: 50
vexThreshold: 75
```
### Per-Namespace Overrides
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: staging
annotations:
stellaops.io/facet-seal-required: "true"
stellaops.io/facet-quota-runtime-warn: "20"
stellaops.io/facet-quota-runtime-block: "50"
```
## Troubleshooting
### Common Issues
#### Seal Not Found
```
Admission denied: facet.seal.missing
No facet seal found for image sha256:abc123...
```
**Resolution**: Ensure the image was sealed before deployment:
```bash
stella seal sha256:abc123 --store
```
#### Invalid Signature
```
Admission denied: facet.seal.invalid_signature
Facet seal signature verification failed
```
**Resolution**: Verify the seal was signed with a trusted key and the trust roots are configured.
#### Quota Exceeded
```
Admission denied: facet.quota.exceeded
Facet quota exceeded: runtime(45.2%)
```
**Resolution**: Either:
1. Re-seal the image with current state
2. Generate and approve a VEX authorization
3. Adjust quota thresholds
### Debugging
Enable verbose logging:
```yaml
env:
- name: STELLAOPS_LOG_LEVEL
value: "Debug"
```
View webhook logs:
```bash
kubectl logs -n stellaops-system -l app=zastava-webhook
```
## Metrics
The webhook exposes Prometheus metrics:
| Metric | Type | Description |
|--------|------|-------------|
| `stellaops_facet_admission_total` | Counter | Total admission requests |
| `stellaops_facet_admission_allowed` | Counter | Allowed admissions |
| `stellaops_facet_admission_denied` | Counter | Denied admissions |
| `stellaops_facet_drift_percent` | Histogram | Drift percentage distribution |
| `stellaops_facet_validation_duration_seconds` | Histogram | Validation latency |
## Related Documentation
- [stella seal Command](../commands/seal.md)
- [stella vex gen --from-drift](../commands/vex.md#stella-vex-gen---from-drift)
- [Facet Drift Analysis](../commands/facet-drift.md)

View File

@@ -0,0 +1,267 @@
# stella drift (Facet Analysis) - Command Guide
**Sprint:** SPRINT_20260105_002_004_CLI
**Task:** CLI-016 - Facet drift command documentation
## Overview
The `stella drift` command analyzes facet drift between a baseline seal and the current state of a container image. Unlike reachability drift (which tracks call paths to vulnerable code), facet drift tracks file-level changes within categorized image layers.
## Commands
### stella drift
Analyze facet drift for an image against a baseline seal.
```bash
stella drift <IMAGE> [OPTIONS]
```
#### Arguments
| Argument | Description |
|----------|-------------|
| `IMAGE` | Image reference or digest to analyze (required) |
#### Options
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--baseline <ID>` | `-b` | Baseline seal ID for comparison | latest seal |
| `--format <FMT>` | `-f` | Output format: `table`, `json`, `yaml` | `table` |
| `--verbose` | `-v` | Show detailed file changes | `false` |
| `--fail-on-breach` | | Exit with error code if quota breached | `false` |
#### Examples
##### Basic drift analysis
```bash
stella drift sha256:abc123def456...
```
##### With specific baseline
```bash
stella drift myregistry.io/app:v2.0 --baseline seal-xyz789
```
##### JSON output for CI integration
```bash
stella drift sha256:abc123 --format json > drift-report.json
```
##### Fail build on quota breach
```bash
stella drift sha256:abc123 --fail-on-breach
```
##### Verbose output with file details
```bash
stella drift sha256:abc123 --verbose
```
---
## Output Formats
### Table Format (Default)
```
Overall Verdict: Warning
Total Changed Files: 15
+----------+-------+---------+----------+---------+-----------+
| Facet | Added | Removed | Modified | Churn % | Verdict |
+----------+-------+---------+----------+---------+-----------+
| runtime | 2 | 1 | 3 | 12.5% | Warning |
| config | 5 | 0 | 2 | 8.2% | Ok |
| static | 0 | 2 | 0 | 3.1% | Ok |
+----------+-------+---------+----------+---------+-----------+
```
With `--verbose`:
```
File Changes:
runtime
+ /usr/lib/libcrypto.so.3.0.1
+ /usr/lib/libssl.so.3.0.1
- /usr/lib/libcrypto.so.3.0.0
~ /usr/bin/app (sha256:old -> sha256:new)
~ /etc/app/config.yaml
~ /var/lib/app/data.db
```
### JSON Format
```json
{
"imageDigest": "sha256:abc123...",
"baselineSealId": "seal-xyz789",
"analyzedAt": "2026-01-05T10:30:00Z",
"overallVerdict": "warning",
"totalChangedFiles": 15,
"facetDrifts": [
{
"facetId": "runtime",
"baselineFileCount": 48,
"added": [
{
"path": "/usr/lib/libcrypto.so.3.0.1",
"digest": "sha256:new...",
"sizeBytes": 3145728,
"modifiedAt": null
}
],
"removed": [
{
"path": "/usr/lib/libcrypto.so.3.0.0",
"digest": "sha256:old...",
"sizeBytes": 3145600,
"modifiedAt": null
}
],
"modified": [
{
"path": "/usr/bin/app",
"previousDigest": "sha256:prev...",
"currentDigest": "sha256:curr...",
"previousSizeBytes": 15728640,
"currentSizeBytes": 15730000
}
],
"driftScore": 25.5,
"churnPercent": 12.5,
"quotaVerdict": "warning"
}
]
}
```
### YAML Format
```yaml
imageDigest: sha256:abc123...
baselineSealId: seal-xyz789
overallVerdict: warning
totalChangedFiles: 15
facetDrifts:
- facetId: runtime
added: 2
removed: 1
modified: 3
churnPercent: 12.50
verdict: warning
- facetId: config
added: 5
removed: 0
modified: 2
churnPercent: 8.20
verdict: ok
```
---
## Quota Verdicts
| Verdict | Description | Exit Code |
|---------|-------------|-----------|
| `Ok` | Drift within acceptable limits | 0 |
| `Warning` | Approaching quota limits | 0 |
| `Blocked` | Quota exceeded, deployment should be blocked | 2 |
| `RequiresVex` | Significant drift, requires VEX authorization | 2 |
## Exit Codes
| Code | Description |
|------|-------------|
| `0` | Success (no breach, or breach without `--fail-on-breach`) |
| `1` | Error (no baseline seal, image not found, etc.) |
| `2` | Quota breached (with `--fail-on-breach`) |
---
## CI/CD Integration
### GitHub Actions
```yaml
- name: Check Facet Drift
run: |
stella drift ${{ env.IMAGE_DIGEST }} \
--format json \
--fail-on-breach > drift.json
continue-on-error: true
id: drift-check
- name: Upload Drift Report
uses: actions/upload-artifact@v4
with:
name: facet-drift-report
path: drift.json
- name: Generate VEX if needed
if: failure() && steps.drift-check.outcome == 'failure'
run: |
stella vex gen --from-drift \
--image ${{ env.IMAGE_DIGEST }} \
--output vex-request.json
```
### GitLab CI
```yaml
facet-drift-check:
script:
- stella drift $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA --fail-on-breach --format json > drift.json
artifacts:
paths:
- drift.json
reports:
codequality: drift.json
allow_failure: true
```
---
## Workflow: Handling Drift Breaches
When drift exceeds quotas:
1. **Review the drift report**
```bash
stella drift sha256:abc123 --verbose
```
2. **Determine if changes are intentional**
- Legitimate updates: Generate VEX authorization
- Unexpected changes: Investigate and remediate
3. **For intentional changes, generate VEX**
```bash
stella vex gen --from-drift --image sha256:abc123 --output vex.json
```
4. **Review and sign the VEX**
```bash
stella vex sign --input vex.json --key /path/to/key
```
5. **Or re-seal to establish new baseline**
```bash
stella seal sha256:abc123 --store
```
---
## Related Documentation
- [Facet Seal Command](./seal.md)
- [VEX Generation from Drift](./vex.md#stella-vex-gen---from-drift)
- [Reachability Drift (different concept)](./drift.md)
- [Admission Webhook Configuration](../admin/admission-webhook.md)

View File

@@ -0,0 +1,204 @@
# stella seal - Command Guide
**Sprint:** SPRINT_20260105_002_004_CLI
**Task:** CLI-016 - Facet seal command documentation
## Overview
The `stella seal` command creates cryptographic seals for container image facets. A facet seal captures the state of specific file categories (binaries, libraries, configs, etc.) within an image and produces Merkle roots for tamper detection and drift analysis.
## Commands
### stella seal
Create a facet seal for an image.
```bash
stella seal <IMAGE> [OPTIONS]
```
#### Arguments
| Argument | Description |
|----------|-------------|
| `IMAGE` | Image reference or digest to seal (required) |
#### Options
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--output <PATH>` | `-o` | Output file path for seal | stdout |
| `--store` | `-s` | Store seal in remote API | `true` |
| `--sign` | | Sign seal with DSSE | `true` |
| `--key <PATH>` | `-k` | Private key path for signing | configured key |
| `--facets <LIST>` | `-f` | Specific facets to seal (comma-separated) | all |
| `--format <FMT>` | | Output format: `json`, `yaml`, `compact` | `json` |
| `--verbose` | `-v` | Enable verbose output | `false` |
#### Examples
##### Seal all facets
```bash
stella seal sha256:abc123def456...
```
##### Seal specific facets
```bash
stella seal myregistry.io/app:v1.0 --facets runtime,config
```
##### Output to file
```bash
stella seal myregistry.io/app:v1.0 --output seal.json
```
##### Seal without storing remotely
```bash
stella seal sha256:abc123 --no-store --output local-seal.json
```
##### Seal with custom signing key
```bash
stella seal sha256:abc123 --key /path/to/private.key
```
---
## Built-in Facets
| Facet ID | Name | Description | File Patterns |
|----------|------|-------------|---------------|
| `runtime` | Runtime Binaries | Executable binaries and shared libraries | `*.so`, `*.dll`, `/usr/bin/*` |
| `config` | Configuration | Configuration files | `*.conf`, `*.yaml`, `*.json`, `/etc/*` |
| `static` | Static Assets | Static web assets | `*.css`, `*.js`, `*.html` |
| `scripts` | Scripts | Script files | `*.sh`, `*.py`, `*.rb` |
| `data` | Data Files | Data and cache files | `*.db`, `*.sqlite`, `/var/lib/*` |
---
## Output Formats
### JSON Format (Default)
```json
{
"imageDigest": "sha256:abc123...",
"createdAt": "2026-01-05T10:30:00Z",
"combinedMerkleRoot": "sha256:combined...",
"facets": [
{
"facetId": "runtime",
"name": "Runtime Binaries",
"merkleRoot": "sha256:facet...",
"fileCount": 42,
"totalBytes": 15728640
}
],
"signature": {
"payloadType": "application/vnd.stellaops.facetseal+json",
"signatures": [...]
}
}
```
### YAML Format
```yaml
imageDigest: sha256:abc123...
createdAt: 2026-01-05T10:30:00Z
combinedMerkleRoot: sha256:combined...
facets:
- facetId: runtime
merkleRoot: sha256:facet...
fileCount: 42
```
### Compact Format
Single-line format for scripting:
```
sha256:abc123...|sha256:combined...|5
```
Format: `imageDigest|combinedRoot|facetCount`
---
## Exit Codes
| Code | Description |
|------|-------------|
| `0` | Success |
| `1` | General error |
| `2` | Image resolution failed |
| `3` | Signing failed |
| `4` | Storage failed |
---
## Environment Variables
| Variable | Description |
|----------|-------------|
| `STELLAOPS_BACKEND_URL` | Backend API URL for seal storage |
| `STELLAOPS_SIGNING_KEY` | Default signing key path |
| `STELLAOPS_TRUST_ROOTS` | Trust roots for verification |
---
## CI/CD Integration
### GitHub Actions
```yaml
- name: Seal Container Image
run: |
stella seal ${{ env.IMAGE_DIGEST }} \
--output seal.json \
--store
- name: Upload Seal Artifact
uses: actions/upload-artifact@v4
with:
name: facet-seal
path: seal.json
```
### GitLab CI
```yaml
seal-image:
script:
- stella seal $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA --output seal.json
artifacts:
paths:
- seal.json
```
---
## Admission Integration
When Kubernetes admission is configured with facet seal validation, the webhook will:
1. Check if namespace has `stellaops.io/facet-seal-required=true` annotation
2. Load the seal for the image being deployed
3. Verify the seal signature
4. Compute drift against current image state
5. Admit/reject based on quota verdicts
See [Admission Webhook Configuration](../admin/admission-webhook.md) for setup details.
---
## Related Documentation
- [Facet Drift Analysis](./facet-drift.md)
- [VEX Generation from Drift](./vex.md#stella-vex-gen---from-drift)
- [Admission Webhook](../admin/admission-webhook.md)

View File

@@ -1,9 +1,11 @@
# stella vex Command Guide
# stella vex - Command Guide
## Commands
- `stella vex consensus --query <filter> [--output json|ndjson|table] [--offline]`
- `stella vex get --id <consensusId> [--offline]`
- `stella vex simulate --input <vexDocs> --policy <policyConfig> [--offline]`
- `stella vex gen --from-drift --image <IMAGE> [--baseline <SEAL_ID>] [--output <PATH>]`
## Flags (common)
- `--offline`: use cached consensus snapshots; fail with exit code 5 if remote would be hit.
@@ -21,3 +23,126 @@
## Offline/air-gap notes
- Cached snapshots are required when `--offline`; otherwise exit code 5 with remediation message.
- Trust roots for signature verification are loaded from `STELLA_TRUST_ROOTS` when verifying cached snapshots.
---
## stella vex gen --from-drift
**Sprint:** SPRINT_20260105_002_004_CLI
Generate VEX statements from facet drift analysis. This command analyzes drift between a baseline seal and the current image state, then generates OpenVEX documents for facets that require authorization.
### Usage
```bash
stella vex gen --from-drift --image <IMAGE> [OPTIONS]
```
### Required Options
| Option | Alias | Description |
|--------|-------|-------------|
| `--from-drift` | | Enable drift-based VEX generation |
| `--image <REF>` | `-i` | Image reference or digest to analyze |
### Optional Options
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--baseline <ID>` | `-b` | Baseline seal ID for comparison | latest seal |
| `--output <PATH>` | `-o` | Output file path | stdout |
| `--format <FMT>` | `-f` | VEX format: `openvex`, `csaf` | `openvex` |
| `--status <STATUS>` | `-s` | VEX status: `under_investigation`, `not_affected`, `affected` | `under_investigation` |
| `--verbose` | `-v` | Enable verbose output | `false` |
### Examples
#### Generate VEX from drift
```bash
stella vex gen --from-drift --image sha256:abc123
```
#### Specify baseline seal
```bash
stella vex gen --from-drift --image myregistry.io/app:v2.0 --baseline seal-xyz789
```
#### Output to file with specific status
```bash
stella vex gen --from-drift --image sha256:abc123 \
--output vex-authorization.json \
--status not_affected
```
### Output Format (OpenVEX)
```json
{
"@context": "https://openvex.dev/ns",
"@id": "https://stellaops.io/vex/abc123-def456",
"author": "StellaOps CLI",
"timestamp": "2026-01-05T10:30:00Z",
"version": 1,
"statements": [
{
"@id": "vex:statement-1",
"status": "under_investigation",
"timestamp": "2026-01-05T10:30:00Z",
"products": [
{
"@id": "sha256:abc123...",
"identifiers": {
"facet": "runtime"
}
}
],
"justification": "Facet drift authorization for runtime. Churn: 15.50% (3 added, 1 removed, 2 modified)",
"action_statement": "Review required before deployment"
}
]
}
```
### Exit Codes
| Code | Description |
|------|-------------|
| `0` | Success |
| `1` | Error or no baseline seal found |
| `2` | Image resolution failed |
### Workflow Integration
The `vex gen --from-drift` command is typically used in a deployment pipeline:
1. **Build**: Container image is built
2. **Seal**: `stella seal` creates baseline seal at build time
3. **Deploy**: Deployment triggers admission webhook
4. **Drift Detection**: If drift exceeds quota, deployment is blocked
5. **VEX Generation**: `stella vex gen --from-drift` creates authorization document
6. **Review**: Security team reviews and signs VEX
7. **Retry Deploy**: With VEX in place, deployment proceeds
```bash
# After deployment blocked due to drift
stella vex gen --from-drift --image $IMAGE_DIGEST \
--output vex-authorization.json
# Review and sign the VEX document
stella vex sign --input vex-authorization.json --key $SIGNING_KEY
# Ingest the signed VEX
stella vex ingest --input vex-authorization.signed.json
# Retry deployment (webhook will now accept)
kubectl apply -f deployment.yaml
```
### Related Documentation
- [Facet Seal Command](./seal.md)
- [Facet Drift Analysis](./facet-drift.md)
- [Admission Webhook Configuration](../admin/admission-webhook.md)

View File

@@ -2,9 +2,44 @@
Per SPRINT_8200_0014_0003.
> **Related:** [Bundle Export Format](federation-bundle-export.md) for detailed bundle schema.
## Overview
Federation enables multi-site synchronization of canonical advisory data between Concelier instances. Sites can export bundles containing delta changes and import bundles from other sites to maintain synchronized vulnerability intelligence.
Federation enables secure, cursor-based synchronization of canonical vulnerability advisories between StellaOps sites. It supports:
- **Delta exports**: Only changed records since the last cursor are included
- **Air-gap transfers**: Bundles can be written to files for offline transfer
- **Multi-site topology**: Multiple sites can synchronize independently
- **Cryptographic verification**: DSSE signatures ensure bundle authenticity
## Bundle Format
Federation bundles are ZST-compressed TAR archives containing:
| File | Description |
|------|-------------|
| `MANIFEST.json` | Bundle metadata, cursor, counts, hash |
| `canonicals.ndjson` | Canonical advisories (one per line) |
| `edges.ndjson` | Source edges linking advisories to sources |
| `deletions.ndjson` | Withdrawn/deleted advisory IDs |
| `SIGNATURE.json` | Optional DSSE signature envelope |
## Cursor Format
Cursors use ISO-8601 timestamp with sequence number:
```
{ISO-8601 timestamp}#{sequence number}
Examples:
2025-01-15T10:00:00.000Z#0001
2025-01-15T10:00:00.000Z#0002
```
- Cursors are site-specific (each site maintains independent cursors)
- Sequence numbers distinguish concurrent exports
- Cursors are monotonically increasing within a site
## Architecture
@@ -384,3 +419,80 @@ stella feedser canonical get sha256:mergehash...
6. **Maintain Key Trust:** Regularly rotate and verify federation signing keys
7. **Document Site Policies:** Keep a registry of trusted sites and their policies
## Multi-Site Topologies
### Hub-and-Spoke Topology
```
┌─────────────┐
│ Hub Site │
│ (Primary) │
└──────┬──────┘
┌──────────┼──────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Site A │ │ Site B │ │ Site C │
│ (Spoke) │ │ (Spoke) │ │ (Spoke) │
└──────────┘ └──────────┘ └──────────┘
```
### Mesh Topology
Each site can import from multiple sources for redundancy:
```yaml
federation:
import:
allowed_sites:
- "hub-primary"
- "hub-secondary" # Redundancy
```
## Verification Details
### Hash Verification
Bundle hash is computed over compressed content:
```
SHA256(compressed bundle content)
```
### DSSE Signature Format
DSSE envelope contains:
```json
{
"payloadType": "application/stellaops.federation.bundle+json",
"payload": "base64(bundle_hash + site_id + cursor)",
"signatures": [
{
"keyId": "signing-key-001",
"algorithm": "ES256",
"signature": "base64(signature)"
}
]
}
```
## Monitoring Metrics
### Key Prometheus Metrics
- `federation_export_duration_seconds` - Export time
- `federation_import_duration_seconds` - Import time
- `federation_bundle_size_bytes` - Bundle sizes
- `federation_items_processed_total` - Items processed by type
- `federation_conflicts_total` - Merge conflicts encountered
## Security Considerations
1. **Never skip signature verification in production**
2. **Validate allowed_sites whitelist**
3. **Use TLS for API endpoints**
4. **Rotate signing keys periodically**
5. **Audit import events**
6. **Monitor for duplicate bundle imports**

View File

@@ -1,332 +0,0 @@
# Federation Setup and Operations Guide
This guide covers the setup and operation of StellaOps federation for multi-site vulnerability data synchronization.
## Overview
Federation enables secure, cursor-based synchronization of canonical vulnerability advisories between StellaOps sites. It supports:
- **Delta exports**: Only changed records since the last cursor are included
- **Air-gap transfers**: Bundles can be written to files for offline transfer
- **Multi-site topology**: Multiple sites can synchronize independently
- **Cryptographic verification**: DSSE signatures ensure bundle authenticity
## Architecture
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Site A │────▶│ Bundle │────▶│ Site B │
│ (Export) │ │ (.zst) │ │ (Import) │
└─────────────┘ └─────────────┘ └─────────────┘
┌───────────┐
│ Site C │
│ (Import) │
└───────────┘
```
## Bundle Format
Federation bundles are ZST-compressed TAR archives containing:
| File | Description |
|------|-------------|
| `MANIFEST.json` | Bundle metadata, cursor, counts, hash |
| `canonicals.ndjson` | Canonical advisories (one per line) |
| `edges.ndjson` | Source edges linking advisories to sources |
| `deletions.ndjson` | Withdrawn/deleted advisory IDs |
| `SIGNATURE.json` | Optional DSSE signature envelope |
## Configuration
### Export Site Configuration
```yaml
# concelier.yaml
federation:
enabled: true
site_id: "us-west-1" # Unique site identifier
export:
enabled: true
default_compression_level: 3 # ZST level (1-19)
sign_bundles: true # Sign exported bundles
max_items_per_bundle: 10000 # Maximum items per export
```
### Import Site Configuration
```yaml
# concelier.yaml
federation:
enabled: true
site_id: "eu-central-1"
import:
enabled: true
skip_signature_verification: false # NEVER set true in production
allowed_sites: # Trusted site IDs
- "us-west-1"
- "ap-south-1"
conflict_resolution: "prefer_remote" # prefer_remote | prefer_local | fail
force_cursor_validation: true # Reject out-of-order imports
```
## API Endpoints
### Export Endpoints
```bash
# Export delta bundle since cursor
GET /api/v1/federation/export?since_cursor={cursor}
# Preview export (counts only)
GET /api/v1/federation/export/preview?since_cursor={cursor}
# Get federation status
GET /api/v1/federation/status
```
### Import Endpoints
```bash
# Import bundle
POST /api/v1/federation/import
Content-Type: application/zstd
# Validate bundle without importing
POST /api/v1/federation/validate
Content-Type: application/zstd
# List federated sites
GET /api/v1/federation/sites
# Update site policy
PUT /api/v1/federation/sites/{site_id}/policy
```
## CLI Commands
### Export Operations
```bash
# Export full bundle (no cursor = all data)
feedser bundle export --output bundle.zst
# Export delta since last cursor
feedser bundle export --since-cursor "2025-01-15T10:00:00Z#0001" --output delta.zst
# Preview export without creating bundle
feedser bundle preview --since-cursor "2025-01-15T10:00:00Z#0001"
# Export without signing (testing only)
feedser bundle export --no-sign --output unsigned.zst
```
### Import Operations
```bash
# Import bundle
feedser bundle import bundle.zst
# Dry run (validate without importing)
feedser bundle import bundle.zst --dry-run
# Import from stdin (pipe)
cat bundle.zst | feedser bundle import -
# Force import (skip cursor validation)
feedser bundle import bundle.zst --force
```
### Site Management
```bash
# List federated sites
feedser sites list
# Show site details
feedser sites show us-west-1
# Enable/disable site
feedser sites enable ap-south-1
feedser sites disable ap-south-1
```
## Cursor Format
Cursors use ISO-8601 timestamp with sequence number:
```
{ISO-8601 timestamp}#{sequence number}
Examples:
2025-01-15T10:00:00.000Z#0001
2025-01-15T10:00:00.000Z#0002
```
- Cursors are site-specific (each site maintains independent cursors)
- Sequence numbers distinguish concurrent exports
- Cursors are monotonically increasing within a site
## Air-Gap Transfer Workflow
For environments without network connectivity:
```bash
# On Source Site (connected to authority)
feedser bundle export --since-cursor "$LAST_CURSOR" --output /media/usb/bundle.zst
feedser bundle preview --since-cursor "$LAST_CURSOR" > /media/usb/manifest.txt
# Transfer media to target site...
# On Target Site (air-gapped)
feedser bundle import /media/usb/bundle.zst --dry-run # Validate first
feedser bundle import /media/usb/bundle.zst # Import
```
## Multi-Site Synchronization
### Hub-and-Spoke Topology
```
┌─────────────┐
│ Hub Site │
│ (Primary) │
└──────┬──────┘
┌──────────┼──────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Site A │ │ Site B │ │ Site C │
│ (Spoke) │ │ (Spoke) │ │ (Spoke) │
└──────────┘ └──────────┘ └──────────┘
```
### Mesh Topology
Each site can import from multiple sources:
```yaml
federation:
import:
allowed_sites:
- "hub-primary"
- "hub-secondary" # Redundancy
```
## Merge Behavior
### Conflict Resolution
When importing, conflicts are resolved based on configuration:
| Strategy | Behavior |
|----------|----------|
| `prefer_remote` | Remote (bundle) value wins (default) |
| `prefer_local` | Local value preserved |
| `fail` | Import aborts on any conflict |
### Merge Actions
| Action | Description |
|--------|-------------|
| `Created` | New canonical added |
| `Updated` | Existing canonical updated |
| `Skipped` | No change needed (identical) |
## Verification
### Hash Verification
Bundle hash is computed over compressed content:
```
SHA256(compressed bundle content)
```
### Signature Verification
DSSE envelope contains:
```json
{
"payloadType": "application/stellaops.federation.bundle+json",
"payload": "base64(bundle_hash + site_id + cursor)",
"signatures": [
{
"keyId": "signing-key-001",
"algorithm": "ES256",
"signature": "base64(signature)"
}
]
}
```
## Monitoring
### Key Metrics
- `federation_export_duration_seconds` - Export time
- `federation_import_duration_seconds` - Import time
- `federation_bundle_size_bytes` - Bundle sizes
- `federation_items_processed_total` - Items processed by type
- `federation_conflicts_total` - Merge conflicts encountered
### Health Checks
```bash
# Check federation status
curl http://localhost:5000/api/v1/federation/status
# Response
{
"site_id": "us-west-1",
"export_enabled": true,
"import_enabled": true,
"last_export": "2025-01-15T10:00:00Z",
"last_import": "2025-01-15T09:30:00Z",
"sites_synced": 2
}
```
## Troubleshooting
### Common Issues
**Import fails with "cursor validation failed"**
- Bundle cursor is not after current site cursor
- Use `--force` to override (not recommended)
- Check if bundle was already imported
**Signature verification failed**
- Signing key not trusted on target site
- Key expired or revoked
- Use `--skip-signature` for testing only
**Large bundle timeout**
- Increase `federation.export.timeout`
- Use smaller `max_items_per_bundle`
- Stream directly to file
### Debug Logging
```yaml
logging:
level:
StellaOps.Concelier.Federation: Debug
```
## Security Considerations
1. **Never skip signature verification in production**
2. **Validate allowed_sites whitelist**
3. **Use TLS for API endpoints**
4. **Rotate signing keys periodically**
5. **Audit import events**
6. **Monitor for duplicate bundle imports**
## Related Documentation
- [Bundle Export Format](federation-bundle-export.md)
- [Sync Ledger Schema](../db/sync-ledger.md)
- [Signing Configuration](../security/signing.md)

View File

@@ -1,238 +1,238 @@
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
### 1.1 Service configuration quick reference
Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`.
Key knobs:
- `CONCELIER__MIRROR__EXPORTROOT` root folder containing export snapshots (`<exportId>/mirror/*`).
- `CONCELIER__MIRROR__ACTIVEEXPORTID` optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory.
- `CONCELIER__MIRROR__REQUIREAUTHENTICATION` default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`.
- `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`.
- `CONCELIER__MIRROR__DOMAINS__{n}__ID` domain identifier matching the exporter manifest; additional keys configure display name and rate budgets.
> The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures.
Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded.
### 1.2 Mirror connector configuration
Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in airgapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`):
- `BASEADDRESS` absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`).
- `INDEXPATH` relative path to the mirror index (`/concelier/exports/index.json` by default).
- `DOMAINID` mirror domain identifier from the index (`primary`, `community`, etc.).
- `HTTPTIMEOUT` request timeout; raise when mirrors sit behind slow WAN links.
- `SIGNATURE__ENABLED` require detached JWS verification for `bundle.json`.
- `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` expected signing key metadata.
- `SIGNATURE__PUBLICKEYPATH` PEM fallback used when the mirror key registry is offline.
The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses):
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature and cache headers
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \
| tee /tmp/manifest-headers.txt
grep -E '^Cache-Control: ' /tmp/manifest-headers.txt # expect public, max-age=300, immutable
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
# Service-level auth check (inside cluster no gateway credentials)
kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \
| head -n 5 # expect HTTP/1.1 401 with WWW-Authenticate: Bearer
# Rate limit smoke (repeat quickly; second call should return 429 + Retry-After)
for i in 1 2; do
curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \
-u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)'
sleep 1
done
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/modules/concelier/architecture.md`,
`docs/modules/excititor/mirrors.md`
- Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `devops/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
### 1.1 Service configuration quick reference
Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`.
Key knobs:
- `CONCELIER__MIRROR__EXPORTROOT` root folder containing export snapshots (`<exportId>/mirror/*`).
- `CONCELIER__MIRROR__ACTIVEEXPORTID` optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory.
- `CONCELIER__MIRROR__REQUIREAUTHENTICATION` default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`.
- `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`.
- `CONCELIER__MIRROR__DOMAINS__{n}__ID` domain identifier matching the exporter manifest; additional keys configure display name and rate budgets.
> The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures.
Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded.
### 1.2 Mirror connector configuration
Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in airgapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`):
- `BASEADDRESS` absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`).
- `INDEXPATH` relative path to the mirror index (`/concelier/exports/index.json` by default).
- `DOMAINID` mirror domain identifier from the index (`primary`, `community`, etc.).
- `HTTPTIMEOUT` request timeout; raise when mirrors sit behind slow WAN links.
- `SIGNATURE__ENABLED` require detached JWS verification for `bundle.json`.
- `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` expected signing key metadata.
- `SIGNATURE__PUBLICKEYPATH` PEM fallback used when the mirror key registry is offline.
The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes.
## 2. Secret & certificate layout
### Docker Compose (`devops/compose/docker-compose.mirror.yaml`)
- `devops/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `devops/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `devops/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `devops/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`devops/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp devops/compose/env/mirror.env.example devops/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `devops/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror devops/helm/stellaops -f devops/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses):
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature and cache headers
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \
| tee /tmp/manifest-headers.txt
grep -E '^Cache-Control: ' /tmp/manifest-headers.txt # expect public, max-age=300, immutable
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
# Service-level auth check (inside cluster no gateway credentials)
kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \
| head -n 5 # expect HTTP/1.1 401 with WWW-Authenticate: Bearer
# Rate limit smoke (repeat quickly; second call should return 429 + Retry-After)
for i in 1 2; do
curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \
-u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)'
sleep 1
done
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `devops/compose/docker-compose.mirror.yaml`,
`devops/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/modules/concelier/architecture.md`,
`docs/modules/excititor/mirrors.md`
- Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7

View File

@@ -1,42 +0,0 @@
# DevOps agent guide
## Mission
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Advisory Handling
- Any new/updated advisory triggers immediate doc + sprint updates (no approval).
- Update high-level + detailed docs; inline only short snippets; put runnable/long code in `docs/benchmarks/**` or `tests/**` (deterministic/offline) and link.
- Add tasks + Execution Log entries in relevant `SPRINT_*.md` with doc paths/owners; add risks if schema/feed/transparency caps apply.
- Check archived advisories; mark supersedes/extends if overlapping.
- Defaults: hybrid reachability, deterministic/frozen feeds; act first, report after.
## Key docs
- [Module README](./README.md)
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
- [Task Runner simulation notes](./task-runner-simulation.md)
## How to get started
1. Open sprint file `/docs/implplan/SPRINT_*.md` and locate the stories referencing this module.
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
3. Read the architecture and README for domain context before editing code or docs.
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
## Guardrails
- Honour the Aggregation-Only Contract where applicable (see ../../aoc/aggregation-only-contract.md).
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
- Update runbooks/observability assets when operational characteristics change.
## Required Reading
- `docs/modules/devops/README.md`
- `docs/modules/devops/architecture.md`
- `docs/modules/devops/implementation_plan.md`
- `docs/modules/platform/architecture-overview.md`
## Working Agreement
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.

View File

@@ -1,64 +0,0 @@
# StellaOps DevOps
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Responsibilities
- Maintain CI pipelines, signing workflows, and release packaging steps.
- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
- Provide offline kit assembly instructions and tooling integration.
- Wrap observability/telemetry bootstrap flows for platform teams.
## Key components
- Runbooks under ./runbooks/ (launch, deployment, nuget).
- Migration guidance under ./migrations/.
- Architecture overview bridging CI/CD & infrastructure concerns.
## Integrations & dependencies
- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
- Authority/Signer for supply chain signing.
- Telemetry stack bootstrap scripts.
## Operational notes
- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
- Dashboards for launch cutover rehearsals.
- Coordination with Security for enforced guardrails.
## Related resources
- ./runbooks/launch-readiness.md
- ./runbooks/launch-cutover.md
- ./runbooks/deployment-upgrade.md
- ./runbooks/nuget-preview-bootstrap.md
- ./migrations/semver-style.md
- ./task-runner-simulation.md
## Backlog references
- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
## Epic alignment
- **Epic 1 AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
- **Epic 9 Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
- **Epic 10 Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
- **Epic 15 Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.
## Implementation Status
### Objectives
- Maintain deterministic behaviour and offline parity across releases
- Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes
### Key Milestones
- **Epic 1 AOC enforcement:** ensure CI/CD guardrails, schema validation, and verifier pipelines are enforced
- **Epic 9 Orchestrator Dashboard:** deliver dashboards, recovery runbooks, and rate-limit governance
- **Epic 10 Export Center:** manage signing/promotions and Offline Kit bundle publishing
- **Epic 15 Observability & Forensics:** coordinate telemetry deployments, evidence retention, and forensic automation
### Workstreams
- Backlog grooming: reconcile open stories with module roadmap
- Implementation: collaborate with service owners to land feature work
- Validation: extend tests/fixtures to preserve determinism and provenance requirements
### Coordination
- Review ./AGENTS.md before picking up new work
- Sync with cross-cutting teams noted in sprint files
- Update plan whenever scope, dependencies, or guardrails change

View File

@@ -1,489 +0,0 @@
# component_architecture_devops.md — **StellaOps Release & Operations** (2025Q4)
> Draws from the AOC guardrails, Orchestrator, Export Center, and Observability module plans to describe how StellaOps is built, signed, distributed, and operated.
> **Scope.** Implementationready blueprint for **how StellaOps is built, versioned, signed, distributed, upgraded, licensed (PoE)**, and operated in customer environments (online and airgapped). Covers reproducible builds, supplychain attestations, registries, offline kits, migration/rollback, artifact lifecycle (RustFS default + PostgreSQL, S3 fallback), monitoring SLOs, and customer activation.
---
## 0) Product vision (operations lens)
StellaOps must be **trustable at a glance** and **boringly operable**:
* Every release ships with **firstparty SBOMs, provenance, and signatures**; services verify **each others** integrity at runtime.
* Customers can deploy by **digest** and stay aligned with **LTS/stable/edge** channels.
* Paid customers receive **attestation authority** (Signer accepts their PoE) while the core platform remains **free to run**.
* Airgapped customers receive **offline kits** with verifiable digests and deterministic import.
* Artifacts expire predictably; operators know whats kept, for how long, and why.
---
## 1) Release trains & versioning
### 1.1 Channels
* **LTS** (12month support window): quarterly cadence (Q1/Q2/Q3/Q4).
* **Stable** (default): monthly rollup (bug fixes + compatible features).
* **Edge**: weekly; for early adopters, no guarantees.
### 1.2 Version strings
Semantic core + calendar tag:
```
<MAJOR>.<MINOR>.<PATCH> (<YYYY>.<MM>) e.g., 2.4.1 (2027.06)
```
* **MAJOR**: breaking API/DB changes (rare).
* **MINOR**: new features, compatible schema migrations (expand/contract pattern).
* **PATCH**: bug fixes, perf and security updates.
* **Calendar tag** exposes **release year** used by Signer for **PoE window checks**.
### 1.3 Component alignment
A release is a **bundle** of image digests + charts + manifests. All services in a bundle are **wirecompatible**. Mixed minor versions are allowed within a bounded skew:
* **Web UI ↔ backend**: `±1 minor`.
* **Scanner ↔ Policy/Excititor/Concelier**: `±1 minor`.
* **Authority/Signer/Attestor triangle**: **must** be same minor (crypto and DPoP/mTLS binding rules).
At startup, services **selfadvertise** their semver & channel; the UI surfaces **mismatch warnings**.
---
## 2) Supplychain pipeline (how a release is built)
### 2.1 Deterministic builds
* **Builders**: isolated **BuildKit** workers with pinned base images (digest only).
* **Pinning**: lock files or `go.mod`, `package-lock.json`, `global.json`, `Directory.Packages.props` are **frozen** at tag.
* **Reproducibility**: timestamps normalized; source date epoch; deterministic zips/tars.
* **Multiarch**: linux/amd64 + linux/arm64 (Windows images track M2 roadmap).
### 2.2 Firstparty SBOMs & provenance
* Each image gets **CycloneDX (JSON+Protobuf) SBOM** and **SLSAstyle provenance** attached as **OCI referrers**.
* Scanners **Buildx generator** is used to produce SBOMs *during* build; a separate postbuild scan verifies parity (red flag if drift).
* **Release manifest** (see §6.1) lists all digests and SBOM/attestation refs.
### 2.3 Signing & transparency
* Images are **cosignsigned** (keyless) with a StellaOps release identity; inclusion in a **transparency log** (Rekor) is required.
* SBOM and provenance attestations are **DSSE** and also transparencylogged.
* Release keys (Fulcio roots or public keys) are embedded in **Signer** policy (for **scannerrelease validation** at customer side).
### 2.4 Gates & tests
* **Static**: linters, codegen checks, protobuf API freeze (backwardcompat tests).
* **Unit/integration**: per-component, plus **end-to-end** flows (scan→vex→policy→sign→attest).
* **Perf SLOs**: hot paths (SBOM compose, diff, export) measured against budgets.
* **Security**: dependency audit vs Concelier export; container hardening tests; minimal caps.
* **Deployment assets**: `Build Test Deploy` workflows `profile-validation` job installs Helm and runs `helm lint` + `helm template` against `deploy/helm/stellaops` for every `values*.yaml`, catching ConfigMap/templating drift before merges.
* **Analyzer smoke**: restart-time language plug-ins (currently Python) verified via `dotnet run --project src/Tools/LanguageAnalyzerSmoke` to ensure manifest integrity plus cold vs warm determinism (<30s / <5s budgets); the harness logs deviations from repository goldens for follow-up.
* **Canary cohort**: internal staging + selected customers; one week on **edge** before **stable** tag.
### 2.5 Debug-store artefacts
* Every release exports stripped debug information for ELF binaries discovered in service images. Debug files follow the GNU build-id layout (`debug/.build-id/<aa>/<rest>.debug`) and are generated via `objcopy --only-keep-debug`.
* `debug/debug-manifest.json` captures build-id component/image/source mappings with SHA-256 checksums so operators can mirror the directory into debuginfod or offline symbol stores. The manifest (and its `.sha256` companion) ships with every release bundle and Offline Kit.
---
## 3) Distribution & activation
### 3.1 Registries
* **Primary**: `registry.stella-ops.org` (OCI v2, supports Referrers API).
* **Mirrors**: GHCR (readonly), regional mirrors for latency.
* Operational runbook: see `docs/modules/concelier/operations/mirror.md` for deployment profiles, CDN guidance, and sync automation.
* **Pull by digest only** in Kubernetes/Compose manifests.
**Gating policy**:
* **Core images** (Authority, Scanner, Concelier, Excititor, Attestor, UI): public **read**.
* **Enterprise addons** (if any) and **prerelease**: private repos via the **Registry Token Service** (`src/Registry/StellaOps.Registry.TokenService`) which exchanges Authority-issued OpToks for short-lived Docker registry bearer tokens.
> Monetization lever is **signing** (PoE gate), not image pulls, so the core remains simple to consume.
### 3.2 OAuth2 token service (for private repos)
* Docker Registrys token flow backed by **Authority**:
1. Client hits registry (`401` with `WWW-Authenticate: Bearer realm=…`).
2. Client gets an **access token** from the token service (validated by Authority) with `scope=repository:…:pull`.
3. Registry allows pull for the requested repo.
* Tokens are **shortlived** (60300s) and **DPoPbound**.
The token service enforces plan gating via `registry-token.yaml` (see `docs/modules/registry/operations/token-service.md`) and exposes Prometheus metrics (`registry_token_issued_total`, `registry_token_rejected_total`). Revoked licence identifiers halt issuance even when scope requirements are met.
### 3.3 Offline kits (airgapped)
* Tarball per release channel:
```
stellaops-kit-<ver>-<channel>.tar.zst
/images/ OCI layout with all first-party images (multi-arch)
/sboms/ CycloneDX JSON+PB for each image
/attest/ DSSE bundles + Rekor proofs
/charts/ Helm charts + values templates
/compose/ docker-compose.yml + .env template
/plugins/ Concelier/Excititor connectors (restart-time)
/policy/ example policies
/manifest/ release.yaml (see §6.1)
```
* Import via CLI `offline kit import`; checks digests and signatures before load.
---
## 4) Licensing (PoE) & monetization
**Principle**: **Only paid StellaOps issues valid signed attestations.** Running the stack is free; signing requires PoE.
### 4.1 PoE issuance
* Customers purchase a plan and obtain a **PoE artifact** from `www.stella-ops.org`:
* **PoEJWT** (DPoP/mTLSbound) **or** **PoE mTLS client certificate**.
* Contains: `license_id`, `plan`, `valid_release_year`, `max_version`, `exp`, optional `tenant/customer` IDs.
### 4.2 Online enforcement
* **Signer** calls **Licensing /license/introspect** on every signing request (see signer doc).
* If **revoked/expired/outofwindow** → deny with machinereadable reason.
* All **valid** bundles are DSSEsigned and **Attestor** logs them; Rekor UUID returned.
* UI badges: “**Verified by StellaOps**” with link to the public log.
### 4.3 Airgapped / offline
* Customers obtain a **timeboxed PoE lease** (signed JSON, 730 days).
* Signer accepts the lease and emits **provisional** attestations (clearly labeled).
* When connectivity returns, a background job **endorses** the provisional entries with the cloud service, updating their status to **verified**.
* Operators can export a **verification bundle** for auditors even before endorsement (contains DSSE + local Rekor proof + lease snapshot).
### 4.4 Stolen/abused PoE
* Customers report theft; **Licensing** flags `license_id` as **revoked**.
* Subsequent Signer requests **deny**; previous attestations remain but can be marked **contested** (UI shows badge, optional resign path upon new PoE).
---
## 5) Deployment path (customer side)
### 5.1 First install
* **Helm** (Kubernetes) or **Compose** (VMs). Example (K8s):
```bash
helm repo add stellaops https://charts.stella-ops.org
helm install stella stellaops/platform \
--version 2.4.0 \
--set global.channel=stable \
--set authority.issuer=https://authority.stella.local \
--set scanner.rustfs.endpoint=http://rustfs.stella.local:8080 \
--set global.postgres.connectionString="Host=postgres.stella.local;Database=stellaops_platform;Username=stellaops;Password=<secret>"
```
* Postinstall job registers **Authority clients** (Scanner, Signer, Attestor, UI) and prints **bootstrap** URLs and client credentials (sealed secrets).
* UI banner shows **release bundle** and verification state (cosign OK? Rekor OK?).
### 5.2 Updates
* **Blue/green**: pull new bundle by **digest**; deploy sidebyside; cut traffic.
* **Rolling**: upgrade stateful components in safe order:
1. Authority (stateless, dualkey rotation ready)
2. Signer/Attestor (same minor)
3. Scanner WebService & Workers
4. Concelier, then Excititor (schema migrations are expand/contract)
5. UI last
* **DB migrations** are **expand/contract**:
* Phase A (release N): **add** new fields/indexes, write old+new.
* Phase B (N+1): **read** new fields; **drop** old.
* Rollback is a matter of redeploying previous images and keeping both schemas valid.
### 5.3 Rollback
* Images referenced by **digest**; keep previous release manifest `K` versions back.
* `helm rollback` or compose `docker compose -f release-K.yml up -d`.
* PostgreSQL migrations are additive; **no destructive changes** within a single minor.
---
## 6) Release payloads & manifests
### 6.1 Release manifest (`release.yaml`)
```yaml
release:
version: "2.4.1"
channel: "stable"
date: "2027-06-20T12:00:00Z"
calendar: "2027.06"
components:
- name: scanner-webservice
image: registry.stella-ops.org/stellaops/scanner-web@sha256:aa..bb
sbom: oci://.../referrers/cdx-json@sha256:11..22
provenance: oci://.../attest/provenance@sha256:33..44
signature: { rekorUUID: "…" }
- name: signer
image: registry.stella-ops.org/stellaops/signer@sha256:cc..dd
signature: { rekorUUID: "…" }
charts:
- name: platform
version: "2.4.1"
digest: "sha256:ee..ff"
compose:
file: "docker-compose.yml"
digest: "sha256:77..88"
checksums:
sha256: "… digest of this release.yaml …"
```
The manifest is **cosignsigned**; UI/CLI can verify a bundle without talking to registries.
> Deployment guardrails The repository keeps channel-aligned Compose bundles
> in `deploy/compose/` and Helm overlays in `deploy/helm/stellaops/`. Both sets
> pull their digests from `deploy/releases/` and are validated by
> `deploy/tools/validate-profiles.sh` to guarantee lint/dry-run cleanliness.
### 6.2 Image labels (release metadata)
Each image sets OCI labels:
```
org.opencontainers.image.version = "2.4.1"
org.opencontainers.image.revision = "<git sha>"
org.opencontainers.image.created = "2027-06-20T12:00:00Z"
org.stellaops.release.calendar = "2027.06"
org.stellaops.release.channel = "stable"
org.stellaops.build.slsaProvenance = "oci://…"
```
Signer validates **scanner** images cosign identity + calendar tag for **release window** checks.
---
## 7) Artifact lifecycle & storage (RustFS/PostgreSQL)
### 7.1 Buckets & prefixes (RustFS)
```
rustfs://stellaops/
scanner/
layers/<sha256>/sbom.cdx.json.zst
images/<imgDigest>/inventory.cdx.pb
images/<imgDigest>/usage.cdx.pb
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json
concelier/
json/<exportId>/...
trivy/<exportId>/...
excititor/
exports/<exportId>/...
attestor/
dsse/<bundleSha256>.json
proof/<rekorUuid>.json
```
### 7.2 ILM classes
* **`short`**: working artifacts (diffs, queues) — TTL 714 days.
* **`default`**: SBOMs & indexes — TTL 90180 days (configurable).
* **`compliance`**: signed reports & attested exports — retention enforced via RustFS hold or S3 Object Lock (governance/compliance) 17 years.
### 7.3 Artifact Lifecycle Controller (ALC)
* A background worker (part of Scanner.WebService) enforces **TTL** and **reference counting**:
* Artifacts referenced by **reports** or **tickets** are pinned.
* ILM actions logged; UI shows perclass usage & upcoming purges.
> **Migration note.** Follow `docs/modules/scanner/operations/rustfs-migration.md` when transitioning existing
> MinIO buckets to RustFS. The provided migrator is idempotent and safe to rerun per prefix.
### 7.4 PostgreSQL retention
* **Scanner**: `runtime.events` use TTL (e.g., 3090 days); **catalog** permanent.
* **Concelier/Excititor**: raw docs keep **last N windows**; canonical stores permanent.
* **Attestor**: `entries` permanent; `dedupe` TTL 2448h.
### 7.5 PostgreSQL server baseline
* **Minimum supported server:** PostgreSQL **16+**. Earlier versions lack required features (e.g., enhanced JSON functions, performance improvements).
* **Deploy images:** Compose/Helm defaults stay on `postgres:16`. For air-gapped installs, refresh Offline Kit bundles so the packaged PostgreSQL image matches ≥16.
* **Upgrade guard:** During rollout, verify PostgreSQL major version ≥16 before applying schema migrations; automation should hard-stop if version check fails.
---
## 8) Observability & SLOs (operations)
* **Uptime SLO**: 99.9% for Signer/Authority/Attestor; 99.5% for Scanner WebService; Excititor/Concelier 99.0%.
* **Error budgets**: tracked per month; dashboards show burn rates.
* **Golden signals**:
* **Latency**: token issuance, sign→attest roundtrip, scan enqueue→emit, export build.
* **Saturation**: queue depth, PostgreSQL write IOPS, RustFS throughput / queue depth (or S3 metrics when in fallback mode).
* **Traffic**: scans/min, attestations/min, webhook admits/min.
* **Errors**: 5xx rates, cosign verification failures, Rekor timeouts.
Prometheus + OTLP; Grafana dashboards ship in the charts.
---
## 9) Security & compliance operations
* **Key rotation**:
* Authority JWKS: 60day cadence, dualkey overlap.
* Release signing identities: rotate per minor or quarterly.
* Sigstore roots mirrored and pinned; alarms on drift.
* **FIPS mode** (Gov build):
* Enforce `ES256` + KMS/HSM; disable Ed25519; MLS ciphers only.
* Local **Rekor v2** and **Fulcio** alternatives; **airgapped** CA.
* **Vulnerability response**:
* Concelier red-flag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice.
* 2025-10: Pinned `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1903 warnings; future bumps follow the central override pattern. MongoDB dependencies were removed in Sprint 4400 (all persistence now uses PostgreSQL).
* **Backups/DR**:
* PostgreSQL nightly snapshots; MinIO versioning + replication (if configured).
* Restore runbooks tested quarterly with synthetic data.
---
## 10) Customer update flow (how versions are fetched & activated)
### 10.1 Online clusters
* **UI** surfaces update banner with **release manifest** diff and risk notes.
* Operator approves → **Controller** pulls new images by digest; healthchecks; moves traffic; deprecates old revision.
* Postswitch, **schema Phase B** migrations (if any) run automatically.
### 10.2 Airgapped clusters
* Operator downloads **offline kit** from a mirror → `stellaops offline kit import`.
* Controller validates bundle checksums and **cosign signatures**; applies charts/compose by digest.
* After install, **verify** page shows green checks: image sigs, SBOMs attached, provenance logged.
### 10.3 CLI selfupdate (optional)
* `stellaops self-update` pulls a **signed release manifest** and verifies the **CLI binary** with cosign before swapping (admin can disable).
---
## 11) Compatibility & deprecation policy
* **APIs** are stable within a **major**; breaking changes imply **MAJOR++** and deprecation period of one minor.
* **Storage**: expand/contract; “drop old fields” only after one minor grace.
* **Config**: feature flags (default off) for risky features (e.g., eBPF).
---
## 12) Runbooks (selected)
### 12.1 Lost PoE
1. Suspend **automatic attestation** jobs.
2. Use CLI `stellaops signer status` to confirm `entitlement_denied`.
3. Obtain new PoE from portal; verify on Signer `/poe/verify`.
4. Reenable; optionally **resign** last N reports (UI button → batch).
### 12.2 Rekor outage (selfhosted)
* Attestor returns `202 (pending)` with queued proof fetch.
* Keep DSSE bundles locally; resubmit on schedule; UI badge shows **Pending**.
* If outage > SLA, you can switch to a **mirror** log in config; Attestor writes to both when restored.
### 12.3 Emergency downgrade
* Identify prior release manifest (UI → Admin → Releases).
* `helm rollback stella <revision>` (or compose apply previous file).
* Services tolerate skew per §1.3; ensure **Signer/Authority/Attestor** are rolled together.
---
## 13) Example: cluster bootstrap (Compose)
```yaml
version: "3.9"
services:
authority:
image: registry.stella-ops.org/stellaops/authority@sha256:...
env_file: ./env/authority.env
ports: ["8440:8440"]
signer:
image: registry.stella-ops.org/stellaops/signer@sha256:...
depends_on: [authority]
environment:
- SIGNER__POE__LICENSING__INTROSPECTURL=https://www.stella-ops.org/api/v1/license/introspect
attestor:
image: registry.stella-ops.org/stellaops/attestor@sha256:...
depends_on: [signer]
scanner-web:
image: registry.stella-ops.org/stellaops/scanner-web@sha256:...
environment:
- SCANNER__ARTIFACTSTORE__ENDPOINT=http://rustfs:8080
scanner-worker:
image: registry.stella-ops.org/stellaops/scanner-worker@sha256:...
deploy: { replicas: 4 }
concelier:
image: registry.stella-ops.org/stellaops/concelier@sha256:...
excititor:
image: registry.stella-ops.org/stellaops/excititor@sha256:...
web-ui:
image: registry.stella-ops.org/stellaops/web-ui@sha256:...
postgres:
image: postgres:16
valkey:
image: valkey/valkey:8.0
rustfs:
image: registry.stella-ops.org/stellaops/rustfs:2025.10.0-edge
```
---
## 14) Governance & keys (who owns the trust root)
* **Release key policy**: only the Release Engineering group can push signed releases; 4eyes approval; TUFstyle manifest possible in future.
* **Signer acceptance policy**: embedded release identities are updated **only** via minor upgrade; emergency CRL supported.
* **Customer keys**: none needed for core use; enterprise addons may require percustomer registries and keys.
---
## 15) Roadmap (Ops)
* **Windows containers GA** (Scanner + Zastava).
* **Key Transparency** for Signer certs.
* **Deltakit** (offline) for incremental updates.
* **Operator CRDs** (K8s) to manage policy and ILM declaratively.
* **SBOM **protobuf** as default transport at rest (smaller, faster).
---
### Appendix A — Minimal SLO monitors
* `authority.tokens_issued_total` slope ≈ normal.
* `signer.requests_total{result="success"}/minute` > 0 (when scans occur).
* `attestor.submit_latency_seconds{quantile=0.95}` < 0.3.
* `scanner.scan_latency_seconds{quantile=0.95}` < target per image size.
* `concelier.export.duration_seconds` stable; `excititor.consensus.conflicts_total` not exploding after policy changes.
* RustFS request error rate near zero (or `s3_requests_errors_total` when operating against S3); PostgreSQL `pg_stat_bgwriter` counters hit expected baseline.
### Appendix B — Upgrade safety checklist
* Verify **release manifest** signature.
* Ensure **Signer/Authority/Attestor** are same minor.
* Verify **DB backups** < 24h old.
* Confirm **ILM** wont purge compliance artifacts during upgrade window.
* Roll **one component** at a time; watch SLOs; abort on regression.
---
**End — component_architecture_devops.md**

View File

@@ -1,102 +0,0 @@
# Console CI Contract (DEVOPS-CONSOLE-23-001)
## Scope
Define a deterministic, offline-friendly CI pipeline for the Console web app covering lint, type-check, unit, Storybook a11y, Playwright smoke, Lighthouse perf/a11y, and artifact retention.
## Stages & Gates
1. **Setup**
- Node 20.x, pnpm 9.x from cached tarball (`tools/cache/node20.tgz`, `tools/cache/pnpm-9.tgz`).
- Restore `node_modules` from `.pnpm-store` cache key `console-${{ hashFiles('pnpm-lock.yaml') }}`; fallback to offline tarball `local-npm-cache.tar.zst`.
- Export `PLAYWRIGHT_BROWSERS_PATH=./.playwright` and hydrate from `tools/cache/playwright-browsers.tar.zst`.
2. **Lint/Format/Types** (fail-fast)
- `pnpm lint`
- `pnpm format:check`
- `pnpm typecheck`
3. **Unit Tests**
- `pnpm test -- --runInBand --reporter=junit --outputFile=.artifacts/junit.xml`
- Collect coverage to `.artifacts/coverage` (lcov + summary).
4. **Storybook a11y**
- `pnpm storybook:build` (static export)
- `pnpm storybook:a11y --ci --output .artifacts/storybook-a11y.json`
5. **Playwright Smoke**
- `pnpm playwright test --config=playwright.config.ci.ts --reporter=list,junit=.artifacts/playwright.xml`
- Upload `playwright-report/` and `.artifacts/playwright.xml`.
6. **Lighthouse (CI mode)**
- Serve built app with `pnpm serve --port 4173` and run `pnpm lhci autorun --config=lighthouserc.ci.js --upload.target=filesystem --upload.outputDir=.artifacts/lhci`
- Enforce budgets: performance >= 0.80, accessibility >= 0.90, best-practices >= 0.90, seo >= 0.85.
7. **SBOM/Provenance**
- `pnpm exec syft packages dir:dist --output=spdx-json=.artifacts/console.spdx.json`
- Attach `.artifacts/console.spdx.json` and provenance attestation from release job.
## Determinism & Offline
- No network fetches after cache hydrate; fail if `pnpm install` hits the network (set `PNPM_FETCH_RETRIES=0`, `PNPM_OFFLINE=1`).
- All artifacts written under `.artifacts/` and uploaded as CI artifacts.
- Timestamps normalized via `SOURCE_DATE_EPOCH=${{ github.run_id }}` for reproducible Storybook/LH builds.
## Inputs/Secrets
- Required only for Playwright auth flows: `CONSOLE_E2E_USER`, `CONSOLE_E2E_PASS` (scoped to non-prod tenant). Pipeline must soft-skip auth tests when unset.
- No signing keys required in CI; release handles signing separately.
## Outputs
- `.artifacts/junit.xml` (unit)
- `.artifacts/playwright.xml`, `playwright-report/`
- `.artifacts/storybook-a11y.json`
- `.artifacts/lhci/` (Lighthouse reports)
- `.artifacts/coverage/`
- `.artifacts/console.spdx.json`
## Example Gitea workflow snippet
```yaml
- name: Console CI (DEVOPS-CONSOLE-23-001)
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Prep pnpm
run: |
corepack enable
corepack prepare pnpm@9 --activate
- name: Cache pnpm store
uses: actions/cache@v4
with:
path: |
~/.pnpm-store
./node_modules
key: console-${{ hashFiles('pnpm-lock.yaml') }}
- name: Install (offline)
env:
PNPM_FETCH_RETRIES: 0
PNPM_OFFLINE: 1
run: pnpm install --frozen-lockfile
- name: Lint/Types
run: pnpm lint && pnpm format:check && pnpm typecheck
- name: Unit
run: pnpm test -- --runInBand --reporter=junit --outputFile=.artifacts/junit.xml
- name: Storybook a11y
run: pnpm storybook:build && pnpm storybook:a11y --ci --output .artifacts/storybook-a11y.json
- name: Playwright
run: pnpm playwright test --config=playwright.config.ci.ts --reporter=list,junit=.artifacts/playwright.xml
- name: Lighthouse
run: pnpm serve --port 4173 & pnpm lhci autorun --config=lighthouserc.ci.js --upload.target=filesystem --upload.outputDir=.artifacts/lhci
- name: SBOM
run: pnpm exec syft packages dir:dist --output=spdx-json=.artifacts/console.spdx.json
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: console-ci-artifacts
path: .artifacts
```
## Acceptance to mark blocker cleared
- Pipeline executes fully in a clean runner with network blocked after cache hydrate.
- All artefacts uploaded and budgets enforced; failing budgets fail the job.
- Soft-skip auth-dependent tests when secrets are absent, without failing the pipeline.

View File

@@ -1,41 +0,0 @@
# Export Center CI Contract (DEVOPS-EXPORT-35-001)
Goal: Deterministic, offline-friendly CI for Export Center services (WebService + Worker) with storage fixtures, smoke/perf gates, and observability artefacts.
## Pipeline stages
1) **Setup**
- .NET SDK 10.x (cached); Node 20.x only if UI assets present.
- Restore NuGet from `local-nugets/` + cache; fail on external fetch (configure `RestoreDisableParallel` and source mapping).
- Spin up MinIO (minio/minio:RELEASE.2024-10-08T09-56-18Z) via docker-compose fixture `ops/devops/export/minio-compose.yml` with deterministic creds (`exportci/exportci123`).
2) **Build & Lint**
- `dotnet format --verify-no-changes` on `src/ExportCenter/**`.
- `dotnet build src/ExportCenter/StellaOps.ExportCenter.WebService/StellaOps.ExportCenter.WebService.csproj -c Release /p:ContinuousIntegrationBuild=true`.
3) **Unit/Integration Tests**
- `dotnet test src/ExportCenter/__Tests/StellaOps.ExportCenter.Tests/StellaOps.ExportCenter.Tests.csproj -c Release --logger "trx;LogFileName=export-tests.trx"`
- Tests must use MinIO fixture with bucket `export-ci` and deterministic seed objects (see fixtures below).
4) **Perf/Smoke (optional gated)**
- `dotnet test ... --filter Category=Smoke` against live MinIO; cap runtime < 90s.
5) **Artifacts**
- Publish TRX to `.artifacts/export-tests.trx`.
- Collect coverage to `.artifacts/coverage` (coverlet; lcov + summary).
- Export appsettings used for the run to `.artifacts/appsettings.ci.json`.
- Syft SBOM: `syft dir:./src/ExportCenter -o spdx-json=.artifacts/exportcenter.spdx.json`.
6) **Dashboards (seed)**
- Produce starter Grafana JSON with: request rate, p95 latency, MinIO error rate, queue depth, export job duration histogram. Store under `.artifacts/grafana/export-center-ci.json` for import.
## Fixtures
- MinIO compose file: `ops/devops/export/minio-compose.yml` (add if missing) with:
- Access key: `exportci`
- Secret key: `exportci123`
- Bucket: `export-ci`
- Seed object script: `ops/devops/export/seed-minio.sh` to create bucket and upload deterministic sample (`sample-export.ndjson`).
## Determinism & Offline
- No external network after restore; MinIO uses local image tag pinned above.
- All timestamps emitted as UTC and tests assert deterministic ordering.
- Coverage, SBOM, Grafana seed stored under `.artifacts/` and uploaded.
## Acceptance to clear blocker
- CI run passes on clean runner with network blocked post-restore.
- Artifacts (.trx, coverage, SBOM, Grafana JSON) uploaded and MinIO fixture exercised in tests.
- Smoke perf subset completes < 90s.

View File

@@ -1,24 +0,0 @@
# DevOps Governance Rules Anchor (Sprint33)
> **Scope** · Exit deliverable for `DEVOPS-RULES-33-001`
> **Audience** · DevOps Guild, Platform leads, service owners
> **Related** · `ops/devops/TASKS.md`, `docs/backlog/2025-10-cleanup.md`, `docs/modules/platform/architecture-overview.md`
This note consolidates the platform governance rules ratified on 30October2025.
Each rule captures intent, affected surfaces, enforcement actions, and references to the
source-of-truth backlogs so that subsequent sprints do not reintroduce conflicting work.
| Rule | Intent & Rationale | Enforcement & Ownership | Follow-ups |
|------|--------------------|-------------------------|------------|
| **Gateway is a proxy only; Policy Engine owns overlays/simulations.** | Keep Gateway thin and deterministic: it authenticates, authorises, and forwards requests. All overlay composition, simulation, and policy evaluation stays inside Policy Engine so we avoid duplicated logic and time-of-check drift. | *Owners:* BEBase Platform Guild + Policy Engine Guild. <br/>*Enforcement:* Gateway PR reviews block embedded overlay code, new endpoints require `Policy Engine` contracts, CI parity checks compare Gateway ↔ Policy overlay schemas. | - Update open tasks referencing “gateway overlay” work to point at `POLICY-ENGINE-20-00x`.<br/>- Close or rewrite backlog items `WEB-POLICY-20-00x` that attempted to compute overlays in Gateway. |
| **AOC ingestion is canonical-only; no merges at ingest.** | Concelier/Excititor persist upstream truth plus provenance. Derived severity, merges, or dedupe belong to downstream Policy workflows. This keeps ingestion auditable and replayable. | *Owners:* Concelier & Excititor guilds, DevOps Guild for CI pipelines. <br/>*Enforcement:* `StellaOps.Aoc` guard library, Mongo validators, Roslyn analyzer backlog (`WEB-AOC-19-003`), CI job `stella aoc verify`. | - Ensure ingestion tasks reference the guard library (`StellaOps.Aoc`).<br/>- Retire legacy tasks that still mention merge-at-ingest (see backlog cleanup note). |
| **Single graph platform: Graph Indexer + Graph API (Cartographer retired).** | Replace the historical Cartographer service with the Graph Indexer + Graph API pairing so graph storage, overlays, and explorer flows share one platform. | *Owners:* Graph Platform Guild, Scheduler Guild, DevOps Guild. <br/>*Enforcement:* New graph work lands in `docs/modules/graph/**` and `src/Graph/**`. Gateway/UI/CLI tickets reference the Graph API endpoints only. | - Archive Cartographer handshake docs and mark Cartographer backlog items as historical.<br/>- Update Scheduler/SBOM/Console tickets to depend on `GRAPH-*` IDs instead of `CARTO-*`. |
## Tracking & documentation
- ✅ Rules recorded in correspoding sprint file `/docs/implplan/SPRINT_*.md` (Sprint33) and `/docs/ops/devops/TASKS.md`.
- ✅ Repository-wide references to “Cartographer as active platform” updated (see backlog note amendment and doc banner).
- ✅ Changelog entry (`docs/updates/2025-10-30-devops-governance.md`) captures reviewer acknowledgement.
Future adjustments to these rules must update this file and reference `DEVOPS-RULES-33-001`
when proposing changes so the DevOps Guild can track history.

View File

@@ -1,52 +0,0 @@
# SemVer Style Backfill Runbook
_Last updated: 2025-10-11_
> **Note (2025-12):** This runbook is obsolete. MongoDB was fully removed in Sprint 4400 and replaced with PostgreSQL. The migration functionality described here was executed during the transition period and is no longer applicable. Retained for historical reference only.
## Overview
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
runs when the feature flag `concelier:storage:enableSemVerStyle` is enabled.
## Preconditions
1. **Review configuration** set `concelier.storage.enableSemVerStyle` to `true` on all Concelier services.
2. **Confirm batch size** adjust `concelier.storage.backfillBatchSize` if you need smaller batches for older
deployments (default: `250`).
3. **Back up** capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
4. **Staging dry-run** enable the flag in a staging environment and observe the migration output before
rolling to production.
## Execution
No manual command is required. After deploying the configuration change, restart the Concelier WebService or
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
```
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
Mongo migration 20251011-semver-style-backfill applied
```
The migration reads advisories in batches (`concelier.storage.backfillBatchSize`) and writes flattened
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
## Post-checks
1. Verify the new indexes exist:
```
db.advisory.getIndexes()
```
You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
the embedded package data.
3. Run `dotnet test` for `StellaOps.Concelier.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
the storage suite passes with the feature flag enabled.
## Rollback
Set `concelier.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
the feature flag is off. If you must remove them entirely, restore from the backup captured during
preparation.

View File

@@ -1,27 +0,0 @@
# Policy Schema Export Automation
This utility generates JSON Schema documents for the Policy Engine run contracts.
## Command
```
scripts/export-policy-schemas.sh [output-directory]
```
When no output directory is supplied, schemas are written to `docs/modules/policy/schemas/`.
The exporter builds against `StellaOps.Scheduler.Models` and emits:
- `policy-run-request.schema.json`
- `policy-run-status.schema.json`
- `policy-diff-summary.schema.json`
- `policy-explain-trace.schema.json`
The build pipeline (`.gitea/workflows/build-test-deploy.yml`, job **Export policy run schemas**) runs this script on every push and pull request. Exports land under `artifacts/policy-schemas/<commit>/`, are published as the `policy-schema-exports` artifact, and changes trigger a Slack post to `#policy-engine` via the `POLICY_ENGINE_SCHEMA_WEBHOOK` secret. A unified diff is stored alongside the exports for downstream consumers.
## CI integration checklist
- [x] Invoke the script in the DevOps pipeline (see `DEVOPS-POLICY-20-004`).
- [x] Publish the generated schemas as pipeline artifacts.
- [x] Notify downstream consumers when schemas change (Slack `#policy-engine`, changelog snippet).
- [ ] Gate CLI validation once schema artifacts are available.

View File

@@ -1,151 +0,0 @@
# StellaOps Deployment Upgrade & Rollback Runbook
_Last updated: 2025-10-26 (Sprint 14 DEVOPS-OPS-14-003)._
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
---
## 1. Channel overview
| Channel | Release manifest | Helm values | Compose profile |
|---------|------------------|-------------|-----------------|
| `edge` | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` |
| `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` |
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` |
Infrastructure components (PostgreSQL, Valkey, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set.
---
## 2. Pre-flight checklist
1. **Refresh release manifest**
Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`).
2. **Align deployment bundles with the manifest**
Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services.
```bash
./deploy/tools/check-channel-alignment.py \
--release deploy/releases/2025.10-edge.yaml \
--target deploy/helm/stellaops/values-dev.yaml \
--target deploy/compose/docker-compose.dev.yaml \
--ignore-repo nats
```
Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files.
3. **Lint and template profiles**
```bash
./deploy/tools/validate-profiles.sh
```
4. **Smoke the Offline Kit debug store (edge/stable only)**
When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree:
```bash
./ops/offline-kit/mirror_debug_store.py \
--release-dir out/release \
--offline-kit-dir out/offline-kit
```
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
5. **Review compatibility matrix**
Confirm PostgreSQL, Valkey, and RustFS versions in the release manifest match platform SLOs. The default targets are `postgres:16-alpine`, `valkey:8.0`, `rustfs:2025.10.0-edge`.
6. **Create a rollback bookmark**
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
---
## 3. Helm upgrade procedure (staging → production)
1. Switch to the deployment branch and ensure secrets/config maps are current.
2. Apply the upgrade in the staging cluster:
```bash
helm upgrade stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-stage.yaml \
--namespace stellaops \
--atomic \
--timeout 15m
```
3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks).
4. Promote to production using the prod values file and the same command.
5. Record the new revision number and Git SHA in the change log.
### Rollback (Helm)
1. Identify the previous revision: `helm history stellaops -n stellaops`.
2. Execute:
```bash
helm rollback stellaops <revision> \
--namespace stellaops \
--wait \
--timeout 10m
```
3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests.
4. Update the incident/operations log with root cause and rollback details.
---
## 4. Docker Compose upgrade procedure
1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts.
2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable).
3. Apply the upgrade:
```bash
docker compose \
--env-file deploy/compose/env/prod.env \
-f deploy/compose/docker-compose.prod.yaml \
pull
docker compose \
--env-file deploy/compose/env/prod.env \
-f deploy/compose/docker-compose.prod.yaml \
up -d
```
4. Tail logs for critical services (`docker compose logs -f authority concelier`).
5. Update monitoring dashboards/alerts to confirm normal operation.
### Rollback (Compose)
1. Check out the previous release tag (e.g. `git checkout 2025.09.1`).
2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests.
3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/modules/authority/operations/backup-restore.md` and associated service guides).
4. Log the rollback in the operations journal.
---
## 5. Channel promotion workflow
1. Author or update the channel manifest under `deploy/releases/`.
2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`).
4. Publish release notes and update `deploy/releases/README.md` (if applicable).
5. Tag the repository when promoting stable or airgap builds.
---
## 6. Upgrade rehearsal & rollback drill log
Maintain rehearsal notes in `docs/modules/devops/runbooks/launch-cutover.md` or the relevant sprint planning document. After each drill capture:
- Release version tested
- Date/time
- Participants
- Issues encountered & fixes
- Rollback duration (if executed)
Attach the log to the sprint retro or operational wiki.
| Date (UTC) | Channel | Outcome | Notes |
|------------|---------|---------|-------|
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.
---
## 7. References
- `deploy/README.md` structure and validation workflow for deployment bundles.
- `docs/RELEASE_ENGINEERING_PLAYBOOK.md` release automation and signing pipeline.
- `docs/modules/devops/architecture.md` high-level DevOps architecture, SLOs, and compliance requirements.
- `ops/offline-kit/mirror_debug_store.py` debug-store mirroring helper.
- `deploy/tools/check-channel-alignment.py` release vs deployment digest alignment checker.

View File

@@ -1,130 +0,0 @@
# Launch Cutover Runbook - Stella Ops
_Document owner: DevOps Guild (2025-10-26)_
_Scope:_ Full-platform launch from staging to production for release `2025.09.2`.
> **Note (2025-12):** This document reflects the state at initial launch. Since then, MongoDB has been fully removed (Sprint 4400) and replaced with PostgreSQL. MinIO references now use RustFS. Redis references now use Valkey. See current deployment docs in `deploy/` for up-to-date configuration.
## 1. Roles and Communication
| Role | Primary | Backup | Contact |
| --- | --- | --- | --- |
| Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) |
| Authority stack | Authority Core guild rep | Security guild rep | `#authority` |
| Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` |
| Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation |
| Observability | Telemetry guild rep | SRE on-call | `#telemetry` |
| Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket |
Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes.
## 2. Timeline Overview (UTC)
| Time | Activity | Owner |
| --- | --- | --- |
| T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead |
| T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer |
| T-6h | Freeze non-launch deployments; notify guild leads. | Product owner |
| T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps |
| T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead |
| T0 | Execute production cutover steps (Section 4). | Cutover team |
| T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead |
| T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner |
## 3. Rehearsal (Staging) Checklist
1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host).
2. Run `deploy/tools/validate-profiles.sh` and archive output.
3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`.
4. Perform `helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yaml` in staging cluster.
5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`.
6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI.
7. Document total wall time and any deviations in the rehearsal log.
Rehearsal must complete without manual interventions before proceeding to production.
## 4. Production Cutover Steps
### 4.1 Pre-flight
- Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`.
- Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host.
- Back up current configuration and data:
- Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`.
- MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`.
### 4.2 Apply Updates (Compose)
1. On each compose node, pull updated images for release `2025.09.2`:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull
```
2. Deploy changes:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d
```
3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`.
### 4.3 Apply Updates (Helm/Kubernetes)
If using Kubernetes, perform:
```bash
helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m
```
Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`.
### 4.4 Configuration Validation
- Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`.
- Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`.
- Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success).
- Ensure Notify (legacy) still accessible while Notifier migration pending.
## 5. Smoke Tests
| Test | Command / Action | Expected Result |
| --- | --- | --- |
| API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` |
| Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE |
| Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` |
| Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata |
| Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` |
| Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent |
Log results in the change ticket with timestamps and screenshots where applicable.
## 6. Rollback Procedure
1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts.
2. For Compose:
```bash
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down
docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d
```
3. For Helm:
```bash
helm rollback stellaops <previous-release-number> --namespace stellaops
```
4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`.
5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`.
6. Notify stakeholders of rollback and capture root cause notes in incident ticket.
## 7. Post-cutover Actions
- Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth.
- Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored.
- Update `docs/modules/devops/runbooks/launch-readiness.md` if any new gaps or follow-ups discovered.
- Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner.
## 8. Approval Matrix
| Step | Required Approvers | Record Location |
| --- | --- | --- |
| Production deployment plan | CTO + DevOps lead | Change ticket comment |
| Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary |
| Post-smoke success | DevOps lead + product owner | Change ticket closure |
| Rollback (if invoked) | DevOps lead + CTO | Incident ticket |
Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned.
## 9. Rehearsal Log
| Date (UTC) | What We Exercised | Outcome | Follow-up |
| --- | --- | --- | --- |
| 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. |

View File

@@ -1,51 +0,0 @@
# Launch Readiness Record - Stella Ops
_Updated: 2025-10-26 (UTC)_
> **Note (2025-12):** This document reflects the state at initial launch. Since then, MongoDB has been fully removed (Sprint 4400) and replaced with PostgreSQL. Redis references now use Valkey. See current deployment docs in `deploy/` for up-to-date configuration.
This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover.
## 1. Sign-off Summary
| Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes |
| --- | --- | --- | --- | --- | --- |
| Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. |
| Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. |
| Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. |
| Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. |
| Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/modules/concelier/operations/conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). |
| Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (Sprint backlog reference) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. |
| Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. |
| Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. |
| DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. |
| Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-11-23T15:05Z | Release workflow now ships `out/release/debug`; run `mirror_debug_store.py` on next release artefact and commit `metadata/debug-store.json`. |
_\* READY with caveat - remaining work noted in Section 3._
## 2. Deployment Readiness Checklist
- **Production profiles committed:** `deploy/compose/docker-compose.prod.yaml` and `deploy/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services.
- **Secrets placeholders documented:** `deploy/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`).
- **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing.
- **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services.
- **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks.
- **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback.
- **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned.
## 3. Outstanding Gaps & Follow-ups
| Item | Owner | Tracking Ref | Target / Next Step | Impact |
| --- | --- | --- | --- | --- |
| Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. |
| Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/modules/signals/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. |
| Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. |
| Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (TODO 2025-11-23) | Release pipeline now publishes `out/release/debug`; run `mirror_debug_store.py`, verify hashes, and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. |
| Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. |
| Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. |
## 4. Approvals & Distribution
- Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement.
- Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history.
- Cutover rehearsal and rollback drills are tracked separately in `docs/modules/devops/runbooks/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch

View File

@@ -1,64 +0,0 @@
# NuGet Preview Bootstrap (Offline-Friendly)
The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC).
`NuGet.config` now wires three sources:
1. `local``./local-nuget` (preferred, air-gapped mirror)
2. `dotnet-public``https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json`
3. `nuget.org` → fallback for everything else
Follow the steps below whenever you refresh the repo or roll a new Offline Kit drop.
## 1. Mirror the preview packages
```bash
./ops/devops/sync-preview-nuget.sh
```
* Reads `ops/devops/nuget-preview-packages.csv`. Each line specifies the package, version, expected SHA-256 hash, and (optionally) the flat-container base URL (we pin to `dotnet-public`).
* Downloads the `.nupkg` straight into `./local-nuget/` and re-verifies the checksum. Existing files are skipped when hashes already match.
* Use `NUGET_V2_BASE` if you need to temporarily point at a different mirror.
💡 The script never mutates packages in place—if a checksum changes you will see a “SHA mismatch … refreshing” message.
## 2. Restore using the shared `NuGet.config`
From the repo root:
```bash
DOTNET_NOLOGO=1 dotnet restore src/Excititor/__Libraries/StellaOps.Excititor.Connectors.Abstractions/StellaOps.Excititor.Connectors.Abstractions.csproj \
--configfile NuGet.config
```
The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated.
Before committing changes (or when wiring up a new environment) run:
```bash
python3 ops/devops/validate_restore_sources.py
```
The validator asserts:
- `NuGet.config` lists `local``dotnet-public``nuget.org` in that order.
- `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror.
- No stray `NuGet.config` files shadow the repo root configuration.
CI executes the validator in both the `build-test-deploy` and `release` workflows,
so regressions trip before any restore/build begins.
If you run fully air-gapped, remember to clear the cache between SDK upgrades:
```bash
dotnet nuget locals all --clear
```
## 3. Troubleshooting
| Symptom | Fix |
| --- | --- |
| `dotnet restore` still hits nuget.org for preview packages | Re-run `sync-preview-nuget.sh` to ensure the `.nupkg` exists locally, then delete `~/.nuget/packages/microsoft.extensions.*` so the resolver picks up the mirrored copy. |
| SHA mismatch in the manifest | Update `ops/devops/nuget-preview-packages.csv` with the new version + checksum (from the feed) and re-run the sync script. |
| Azure DevOps feed throttling | Set `DOTNET_PUBLIC_FLAT_BASE` env var and point it at your own mirrored flat-container, then add the URL to the 4th column of the manifest. |
Keep this doc alongside Offline Kit instructions so air-gapped operators know exactly how to refresh the mirror and verify packages before restore.

View File

@@ -1,49 +0,0 @@
# Zastava Deployment Runbook
> **Audience:** DevOps, Zastava Guild
>
> **Purpose:** Provide steps for deploying Zastava Observer + Webhook in connected and air-gapped clusters.
## 1. Prerequisites
- Kubernetes 1.26+ with admission registration permissions.
- Access to StellaOps Container Registry or offline bundle with Zastava images.
- Authority scopes and certificates configured for Zastava identities.
- Surface.FS cache endpoint (RustFS/S3) reachable from nodes.
## 2. Installation Steps
1. **Prepare namespace & secrets**
- Create Kubernetes namespace (default `stellaops-runtime`).
- Provision secrets (`zastava-mtls`, `zastava-op-token`, `surface-secrets`).
2. **Deploy Observer**
- Apply Helm chart `helm/zastava` with values aligning to Surface.Env settings.
- Confirm DaemonSet pods schedule on all nodes; check `/healthz` endpoints.
3. **Deploy Webhook**
- Install ValidatingWebhookConfiguration with CA bundle and service reference.
- Enable dry-run mode first, monitor logs, then switch `enforce=true` once validations pass.
4. **Configure policies**
- Populate admission policies in Policy Engine; ensure tokens contain `runtime:read` scopes.
- Update CLI/Console settings for runtime posture view.
5. **Observability**
- Scrape metrics (`zastava_observer_*`, `zastava_webhook_*`).
- Stream logs to central collector.
## 3. Air-Gapped Deployment Notes
- Use Offline Kit bundle (`offline/zastava/`) to load images and configuration.
- Validate Surface.FS bundles before enabling enforcement.
- Replace webhook CA with offline authority; document rotation schedule.
## 4. Validation
- Run `stella runtime policy test` against sample workloads.
- Trigger deployment denial for unsigned images; verify Notifier emits alerts.
- Check timeline events for observer telemetry.
## 5. References
- `docs/modules/zastava/architecture.md`
- `docs/modules/scanner/architecture.md`
- `docs/airgap/airgap-mode.md`
- `docs/forensics/timeline.md`

View File

@@ -1,48 +0,0 @@
# Task Runner — Simulation & Failure Policy Notes
> **Status:** Draft (2025-11-04) — execution wiring + CLI simulate command landed; docs pending final polish
The Task Runner planning layer now materialises additional runtime metadata to unblock execution and simulation flows:
- **Execution graph builder** converts `TaskPackPlan` steps (including `map` and `parallel`) into a deterministic graph with preserved enablement flags and per-step metadata (`maxParallel`, `continueOnError`, parameters, approval IDs).
- **Simulation engine** walks the execution graph and classifies steps as `pending`, `skipped`, `requires-approval`, or `requires-policy`, producing a deterministic preview for CLI/UI consumers while surfacing declared outputs.
- **Failure policy** pack-level `spec.failure.retries` is normalised into a `TaskPackPlanFailurePolicy` (default: `maxAttempts = 1`, `backoffSeconds = 0`). The new step state machine uses this policy to schedule retries and to determine when a run must abort.
- **Simulation API + Worker** `POST /v1/task-runner/simulations` returns the deterministic preview; `GET /v1/task-runner/runs/{id}` exposes persisted retry windows now written by the worker as it honours `maxParallel`, `continueOnError`, and retry windows during execution.
## Current behaviour
- Map steps expand into child iterations (`stepId[index]::templateId`) with per-item parameters preserved for runtime reference.
- Parallel blocks honour `maxParallel` (defaults to unlimited) and the worker executes children accordingly, short-circuiting when `continueOnError` is false.
- Simulation output mirrors approvals/policy gates, allowing the WebService/CLI to show which actions must occur before execution resumes.
- File-backed state store persists `PackRunState` snapshots (`nextAttemptAt`, attempts, reasons) so orchestration clients and CLI can resume runs deterministically even in air-gapped environments.
- Step state machine transitions:
- `pending → running → succeeded`
- `running → failed` (abort) once attempts ≥ `maxAttempts`
- `running → pending` with scheduled `nextAttemptAt` when retries remain
- `pending → skipped` for disabled steps (e.g., `when` expressions).
## CLI usage
Run the simulation without mutating state:
```bash
stella task-runner simulate \
--manifest ./packs/sample-pack.yaml \
--inputs ./inputs.json \
--format table
```
Use `--format json` (or `--output path.json`) to emit the raw payload produced by `POST /api/task-runner/simulations`.
## Follow-up gaps
- Fold the CLI command into the official reference/quickstart guides and capture exit-code conventions.
References:
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunExecutionGraphBuilder.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/Simulation/PackRunSimulationEngine.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunStepStateMachine.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Infrastructure/Execution/FilePackRunStateStore.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Worker/Services/PackRunWorkerService.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService/Program.cs`

View File

@@ -0,0 +1,40 @@
# DevPortal
> Developer portal for API documentation and SDK access.
## Purpose
DevPortal provides a unified developer experience for StellaOps API consumers. It hosts API documentation, SDK downloads, and developer guides.
## Quick Links
- [Guides](./guides/) - Developer guides and tutorials
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Beta |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Platform Guild |
## Key Features
- **API Documentation**: Interactive OpenAPI documentation
- **SDK Downloads**: Language-specific SDK packages
- **Developer Guides**: Integration tutorials and examples
- **API Playground**: Interactive API testing environment
## Dependencies
### Upstream (this module depends on)
- **Authority** - Developer authentication
- **Gateway** - API proxy and rate limiting
### Downstream (modules that depend on this)
- None (consumer-facing portal)
## Related Documentation
- [API Overview](../../api/overview.md)
- [CLI Reference](../../cli/command-reference.md)

View File

@@ -0,0 +1,496 @@
# Event Envelope Schema
> **Version:** 1.0.0
> **Status:** Draft
> **Sprint:** [SPRINT_20260107_003_001_LB](../../implplan/SPRINT_20260107_003_001_LB_event_envelope_sdk.md)
This document specifies the canonical event envelope schema for the StellaOps Unified Event Timeline.
---
## Overview
The event envelope provides a standardized format for all events emitted across StellaOps services. It enables:
- **Unified Timeline:** Cross-service correlation with HLC ordering
- **Deterministic Replay:** Reproducible event streams for forensics
- **Audit Compliance:** DSSE-signed event bundles for export
- **Causal Analysis:** Stage latency measurement and bottleneck identification
---
## Envelope Schema (v1)
### JSON Schema
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://stellaops.org/schemas/timeline-event.v1.json",
"title": "TimelineEvent",
"description": "Canonical event envelope for StellaOps Unified Event Timeline",
"type": "object",
"required": [
"eventId",
"tHlc",
"tsWall",
"service",
"correlationId",
"kind",
"payload",
"payloadDigest",
"engineVersion",
"schemaVersion"
],
"properties": {
"eventId": {
"type": "string",
"description": "Deterministic event ID: SHA-256(correlationId || tHlc || service || kind)[0:32] hex",
"pattern": "^[a-f0-9]{32}$"
},
"tHlc": {
"type": "string",
"description": "HLC timestamp in sortable string format: <physicalTimeMs>:<logicalCounter>:<nodeId>",
"pattern": "^\\d+:\\d+:[a-zA-Z0-9_-]+$"
},
"tsWall": {
"type": "string",
"format": "date-time",
"description": "Wall-clock time in ISO 8601 format (informational only)"
},
"service": {
"type": "string",
"description": "Service name that emitted the event",
"enum": ["Scheduler", "AirGap", "Attestor", "Policy", "VexLens", "Scanner", "Concelier", "Platform"]
},
"traceParent": {
"type": ["string", "null"],
"description": "W3C Trace Context traceparent header",
"pattern": "^[0-9a-f]{2}-[0-9a-f]{32}-[0-9a-f]{16}-[0-9a-f]{2}$"
},
"correlationId": {
"type": "string",
"description": "Correlation ID linking related events (e.g., scanId, jobId, artifactDigest)"
},
"kind": {
"type": "string",
"description": "Event kind/type",
"enum": [
"ENQUEUE", "DEQUEUE", "EXECUTE", "COMPLETE", "FAIL",
"IMPORT", "EXPORT", "MERGE", "CONFLICT",
"ATTEST", "VERIFY",
"EVALUATE", "GATE_PASS", "GATE_FAIL",
"CONSENSUS", "OVERRIDE",
"SCAN_START", "SCAN_COMPLETE",
"EMIT", "ACK", "ERR"
]
},
"payload": {
"type": "string",
"description": "RFC 8785 canonicalized JSON payload"
},
"payloadDigest": {
"type": "string",
"description": "SHA-256 digest of payload as hex string",
"pattern": "^[a-f0-9]{64}$"
},
"engineVersion": {
"type": "object",
"description": "Engine/resolver version for reproducibility",
"required": ["engineName", "version", "sourceDigest"],
"properties": {
"engineName": {
"type": "string",
"description": "Name of the engine/service"
},
"version": {
"type": "string",
"description": "Semantic version string"
},
"sourceDigest": {
"type": "string",
"description": "SHA-256 digest of engine source/binary"
}
}
},
"dsseSig": {
"type": ["string", "null"],
"description": "Optional DSSE signature in format keyId:base64Signature"
},
"schemaVersion": {
"type": "integer",
"description": "Schema version for envelope evolution",
"const": 1
}
}
}
```
### C# Record Definition
```csharp
/// <summary>
/// Canonical event envelope for unified timeline.
/// </summary>
public sealed record TimelineEvent
{
/// <summary>
/// Deterministic event ID: SHA-256(correlationId || tHlc || service || kind)[0:32] hex.
/// NOT a random ULID - ensures replay determinism.
/// </summary>
[Required]
[RegularExpression("^[a-f0-9]{32}$")]
public required string EventId { get; init; }
/// <summary>
/// HLC timestamp from StellaOps.HybridLogicalClock library.
/// </summary>
[Required]
public required HlcTimestamp THlc { get; init; }
/// <summary>
/// Wall-clock time (informational only, not used for ordering).
/// </summary>
[Required]
public required DateTimeOffset TsWall { get; init; }
/// <summary>
/// Service name that emitted the event.
/// </summary>
[Required]
public required string Service { get; init; }
/// <summary>
/// W3C Trace Context traceparent for OpenTelemetry correlation.
/// </summary>
public string? TraceParent { get; init; }
/// <summary>
/// Correlation ID linking related events.
/// </summary>
[Required]
public required string CorrelationId { get; init; }
/// <summary>
/// Event kind (ENQUEUE, EXECUTE, ATTEST, etc.).
/// </summary>
[Required]
public required string Kind { get; init; }
/// <summary>
/// RFC 8785 canonicalized JSON payload.
/// </summary>
[Required]
public required string Payload { get; init; }
/// <summary>
/// SHA-256 digest of Payload.
/// </summary>
[Required]
public required byte[] PayloadDigest { get; init; }
/// <summary>
/// Engine version for reproducibility (per CLAUDE.md Rule 8.2.1).
/// </summary>
[Required]
public required EngineVersionRef EngineVersion { get; init; }
/// <summary>
/// Optional DSSE signature (keyId:base64Signature).
/// </summary>
public string? DsseSig { get; init; }
/// <summary>
/// Schema version (current: 1).
/// </summary>
public int SchemaVersion { get; init; } = 1;
}
public sealed record EngineVersionRef(
string EngineName,
string Version,
string SourceDigest);
```
---
## Field Specifications
### eventId
**Purpose:** Unique, deterministic identifier for each event.
**Computation:**
```csharp
public static string GenerateEventId(
string correlationId,
HlcTimestamp tHlc,
string service,
string kind)
{
using var hasher = IncrementalHash.CreateHash(HashAlgorithmName.SHA256);
hasher.AppendData(Encoding.UTF8.GetBytes(correlationId));
hasher.AppendData(Encoding.UTF8.GetBytes(tHlc.ToSortableString()));
hasher.AppendData(Encoding.UTF8.GetBytes(service));
hasher.AppendData(Encoding.UTF8.GetBytes(kind));
var hash = hasher.GetHashAndReset();
return Convert.ToHexString(hash.AsSpan(0, 16)).ToLowerInvariant();
}
```
**Rationale:** Unlike ULID or UUID, this deterministic approach ensures that:
- The same event produces the same ID across replays
- Duplicate events can be detected and deduplicated
- Event ordering is verifiable
### tHlc
**Purpose:** Primary ordering timestamp using Hybrid Logical Clock.
**Format:** `<physicalTimeMs>:<logicalCounter>:<nodeId>`
**Example:** `1704585600000:42:scheduler-node-1`
**Ordering:** Lexicographic comparison produces correct temporal order:
1. Compare physical time (milliseconds since Unix epoch)
2. If equal, compare logical counter
3. If equal, compare node ID (for uniqueness)
**Implementation:** Uses existing `StellaOps.HybridLogicalClock.HlcTimestamp` type.
### tsWall
**Purpose:** Human-readable wall-clock timestamp for debugging.
**Format:** ISO 8601 with UTC timezone (e.g., `2026-01-07T12:00:00.000Z`)
**Important:** This field is **informational only**. Never use for ordering or comparison. The `tHlc` field is the authoritative timestamp.
### service
**Purpose:** Identifies the StellaOps service that emitted the event.
**Allowed Values:**
| Value | Description |
|-------|-------------|
| `Scheduler` | Job scheduling and queue management |
| `AirGap` | Offline/air-gap sync operations |
| `Attestor` | DSSE attestation and verification |
| `Policy` | Policy engine evaluation |
| `VexLens` | VEX consensus computation |
| `Scanner` | Container scanning |
| `Concelier` | Advisory ingestion |
| `Platform` | Console backend aggregation |
### traceParent
**Purpose:** W3C Trace Context correlation for OpenTelemetry integration.
**Format:** `00-{trace-id}-{span-id}-{trace-flags}`
**Example:** `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`
**Population:** Automatically captured from `Activity.Current?.Id` during event emission.
### correlationId
**Purpose:** Links related events across services.
**Common Patterns:**
| Pattern | Example | Usage |
|---------|---------|-------|
| Scan ID | `scan-abc123` | Container scan lifecycle |
| Job ID | `job-xyz789` | Scheduled job lifecycle |
| Artifact Digest | `sha256:abc...` | Artifact processing |
| Bundle ID | `bundle-def456` | Air-gap bundle operations |
### kind
**Purpose:** Categorizes the event type.
**Event Kinds by Service:**
| Service | Kinds |
|---------|-------|
| Scheduler | `ENQUEUE`, `DEQUEUE`, `EXECUTE`, `COMPLETE`, `FAIL` |
| AirGap | `IMPORT`, `EXPORT`, `MERGE`, `CONFLICT` |
| Attestor | `ATTEST`, `VERIFY` |
| Policy | `EVALUATE`, `GATE_PASS`, `GATE_FAIL` |
| VexLens | `CONSENSUS`, `OVERRIDE` |
| Scanner | `SCAN_START`, `SCAN_COMPLETE` |
| Generic | `EMIT`, `ACK`, `ERR` |
### payload
**Purpose:** Domain-specific event data.
**Requirements:**
1. **RFC 8785 Canonicalization:** Must use `CanonJson.Serialize()` from `StellaOps.Canonical.Json`
2. **No Non-Deterministic Fields:** No random IDs, current timestamps, or environment-specific data
3. **Bounded Size:** Payload should be < 1MB; use references for large data
**Example:**
```json
{
"artifactDigest": "sha256:abc123...",
"jobId": "job-xyz789",
"status": "completed",
"findingsCount": 42
}
```
### payloadDigest
**Purpose:** Integrity verification of payload.
**Computation:**
```csharp
var digest = SHA256.HashData(Encoding.UTF8.GetBytes(payload));
```
**Format:** 64-character lowercase hex string.
### engineVersion
**Purpose:** Records the engine/resolver version for reproducibility verification (per CLAUDE.md Rule 8.2.1).
**Fields:**
| Field | Description | Example |
|-------|-------------|---------|
| `engineName` | Service/engine name | `"Scheduler"` |
| `version` | Semantic version | `"2.5.0"` |
| `sourceDigest` | Build artifact hash | `"sha256:abc..."` |
**Population:** Use `EngineVersionRef.FromAssembly(Assembly.GetExecutingAssembly())`.
### dsseSig
**Purpose:** Optional cryptographic signature for audit compliance.
**Format:** `{keyId}:{base64Signature}`
**Example:** `signing-key-001:MEUCIQD...`
**Integration:** Uses existing `StellaOps.Attestation.DsseHelper` for signature generation.
### schemaVersion
**Purpose:** Enables schema evolution without breaking compatibility.
**Current Value:** `1`
**Migration Strategy:** When schema changes:
1. Increment version number
2. Add migration logic for older versions
3. Document breaking changes
---
## Database Schema
```sql
CREATE SCHEMA IF NOT EXISTS timeline;
CREATE TABLE timeline.events (
event_id TEXT PRIMARY KEY,
t_hlc TEXT NOT NULL,
ts_wall TIMESTAMPTZ NOT NULL,
service TEXT NOT NULL,
trace_parent TEXT,
correlation_id TEXT NOT NULL,
kind TEXT NOT NULL,
payload JSONB NOT NULL,
payload_digest BYTEA NOT NULL,
engine_name TEXT NOT NULL,
engine_version TEXT NOT NULL,
engine_digest TEXT NOT NULL,
dsse_sig TEXT,
schema_version INTEGER NOT NULL DEFAULT 1,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Primary query: events by correlation, HLC ordered
CREATE INDEX idx_events_corr_hlc ON timeline.events (correlation_id, t_hlc);
-- Service-specific queries
CREATE INDEX idx_events_svc_hlc ON timeline.events (service, t_hlc);
-- Payload search (JSONB GIN index)
CREATE INDEX idx_events_payload ON timeline.events USING GIN (payload);
-- Kind filtering
CREATE INDEX idx_events_kind ON timeline.events (kind);
```
---
## Usage Examples
### Emitting an Event
```csharp
public class SchedulerService
{
private readonly ITimelineEventEmitter _emitter;
public async Task EnqueueJobAsync(Job job, CancellationToken ct)
{
// Business logic...
await _queue.EnqueueAsync(job, ct);
// Emit timeline event
await _emitter.EmitAsync(
correlationId: job.Id.ToString(),
kind: "ENQUEUE",
payload: new { jobId = job.Id, priority = job.Priority },
ct);
}
}
```
### Querying Timeline
```csharp
public async Task<IReadOnlyList<TimelineEvent>> GetJobTimelineAsync(
string jobId,
CancellationToken ct)
{
return await _timelineService.GetEventsAsync(
correlationId: jobId,
options: new TimelineQueryOptions
{
Services = ["Scheduler", "Attestor"],
Kinds = ["ENQUEUE", "EXECUTE", "COMPLETE", "ATTEST"]
},
ct);
}
```
---
## Compatibility Notes
### Relation to Existing HLC Infrastructure
This schema builds on the existing `StellaOps.HybridLogicalClock` library:
- Uses `HlcTimestamp` type directly
- Integrates with `IHybridLogicalClock.Tick()` for timestamp generation
- Compatible with air-gap merge algorithms
### Relation to Existing Replay Infrastructure
This schema integrates with `StellaOps.Replay.Core`:
- `KnowledgeSnapshot` can include timeline event references
- Replay uses `FakeTimeProvider` with HLC timestamps
- Verification compares payload digests
---
## References
- [SPRINT_20260107_003_000_INDEX](../../implplan/SPRINT_20260107_003_000_INDEX_unified_event_timeline.md) - Parent sprint index
- [SPRINT_20260105_002_000_INDEX](../../implplan/SPRINT_20260105_002_000_INDEX_hlc_audit_safe_ordering.md) - HLC foundation
- [RFC 8785](https://datatracker.ietf.org/doc/html/rfc8785) - JSON Canonicalization Scheme
- [W3C Trace Context](https://www.w3.org/TR/trace-context/) - Distributed tracing
- CLAUDE.md Section 8.2.1 - Engine version tracking
- CLAUDE.md Section 8.7 - RFC 8785 canonicalization

View File

@@ -0,0 +1,171 @@
# Timeline UI Component
> **Module:** Eventing / Timeline
> **Status:** Implemented
> **Last Updated:** 2026-01-07
## Overview
The Timeline UI provides a visual representation of HLC-ordered events across StellaOps services. It enables operators to trace the causal flow of operations, identify bottlenecks, and investigate specific events with full evidence links.
## Features
### Causal Lanes Visualization
Events are displayed in swimlanes organized by service:
```
┌─────────────────────────────────────────────────────────────────────┐
│ HLC Timeline Axis │
│ |-------|-------|-------|-------|-------|-------|-------|-------> │
├─────────────────────────────────────────────────────────────────────┤
│ Scheduler [E]─────────[X]───────────────[C] │
├─────────────────────────────────────────────────────────────────────┤
│ AirGap [I]──────────[M] │
├─────────────────────────────────────────────────────────────────────┤
│ Attestor [A]──────────[V] │
├─────────────────────────────────────────────────────────────────────┤
│ Policy [G] │
└─────────────────────────────────────────────────────────────────────┘
```
Legend:
- **[E]** Enqueue - Job queued for processing
- **[X]** Execute - Job execution started
- **[C]** Complete - Job completed
- **[I]** Import - Data imported (e.g., SBOM, advisory)
- **[M]** Merge - Data merged
- **[A]** Attest - Attestation created
- **[V]** Verify - Attestation verified
- **[G]** Gate - Policy gate evaluated
### Critical Path Analysis
The critical path view shows the longest sequence of dependent operations:
- Color-coded by severity (green/yellow/red)
- Bottleneck stage highlighted
- Percentage of total duration shown
- Clickable stages for drill-down
### Event Detail Panel
Selected events display:
- Event ID and metadata
- HLC timestamp and wall-clock time
- Service and event kind
- JSON payload viewer
- Engine version information
- Evidence links (SBOM, VEX, Policy, Attestation)
### Filtering
Events can be filtered by:
- **Services**: Scheduler, AirGap, Attestor, Policy, Scanner, etc.
- **Event Kinds**: ENQUEUE, EXECUTE, COMPLETE, IMPORT, ATTEST, etc.
- **HLC Range**: From/To timestamps
Filter state is persisted in URL query parameters.
### Export
Timeline data can be exported as:
- **NDJSON**: Newline-delimited JSON (streaming-friendly)
- **JSON**: Standard JSON array
- **DSSE-signed**: Cryptographically signed bundles for auditing
## Usage
### Accessing the Timeline
Navigate to `/timeline/{correlationId}` where `correlationId` is the unique identifier for a scan, job, or workflow.
Example:
```
/timeline/scan-abc123-def456
```
### Keyboard Navigation
| Key | Action |
|-----|--------|
| Tab | Navigate between events |
| Enter/Space | Select focused event |
| Escape | Clear selection |
| Arrow keys | Scroll within panel |
### URL Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| `services` | Comma-separated service filter | `?services=Scheduler,AirGap` |
| `kinds` | Comma-separated kind filter | `?kinds=EXECUTE,COMPLETE` |
| `fromHlc` | Start of HLC range | `?fromHlc=1704067200000:0:node1` |
| `toHlc` | End of HLC range | `?toHlc=1704153600000:0:node1` |
## Component Architecture
```
timeline/
├── components/
│ ├── causal-lanes/ # Swimlane visualization
│ ├── critical-path/ # Bottleneck bar chart
│ ├── event-detail-panel/ # Selected event details
│ ├── evidence-links/ # Links to SBOM/VEX/Policy
│ ├── export-button/ # Export dropdown
│ └── timeline-filter/ # Service/kind filters
├── models/
│ └── timeline.models.ts # TypeScript interfaces
├── pages/
│ └── timeline-page/ # Main page component
├── services/
│ └── timeline.service.ts # API client
└── timeline.routes.ts # Lazy-loaded routes
```
## API Integration
The Timeline UI integrates with the Timeline API:
| Endpoint | Description |
|----------|-------------|
| `GET /api/v1/timeline/{correlationId}` | Fetch events |
| `GET /api/v1/timeline/{correlationId}/critical-path` | Fetch critical path |
| `POST /api/v1/timeline/{correlationId}/export` | Initiate export |
| `GET /api/v1/timeline/export/{exportId}` | Check export status |
| `GET /api/v1/timeline/export/{exportId}/download` | Download bundle |
## Accessibility
The Timeline UI follows WCAG 2.1 AA guidelines:
- **Keyboard Navigation**: All interactive elements are focusable
- **Screen Readers**: ARIA labels on all regions and controls
- **Color Contrast**: 4.5:1 minimum contrast ratio
- **Focus Indicators**: Visible focus rings on all controls
- **Motion**: Respects `prefers-reduced-motion`
## Performance
- **Virtual Scrolling**: Handles 10K+ events efficiently
- **Lazy Loading**: Events loaded on-demand as user scrolls
- **Caching**: Recent queries cached to reduce API calls
- **Debouncing**: Filter changes debounced to avoid excessive requests
## Screenshots
### Timeline View
![Timeline View](./assets/timeline-view.png)
### Critical Path Analysis
![Critical Path](./assets/critical-path.png)
### Event Detail Panel
![Event Details](./assets/event-details.png)
## Related Documentation
- [Timeline API Reference](../../api/timeline-api.md)
- [HLC Clock Specification](../hlc/architecture.md)
- [Eventing SDK](../eventing/architecture.md)
- [Evidence Model](../../schemas/evidence.md)

View File

@@ -0,0 +1,354 @@
# Evidence Bundle Export Format Specification
> **Version:** 1.0.0
> **Status:** FINAL
> **Sprint Reference:** [SPRINT_20260106_003_003](../../../docs/implplan/SPRINT_20260106_003_003_EVIDENCE_export_bundle.md)
## Overview
This document specifies the standard export format for StellaOps evidence bundles. The export format enables offline verification of software supply chain artifacts including SBOMs, VEX statements, attestations, and policy verdicts.
## Export Archive Format
### Filename Convention
```
evidence-bundle-<bundle-id>.tar.gz
```
Where `<bundle-id>` follows the pattern: `eb-<YYYY-MM-DD>-<unique-suffix>`
Example: `evidence-bundle-eb-2026-01-06-abc123.tar.gz`
### Compression
- **Format:** gzip-compressed tar archive
- **Compression level:** Configurable (1-9, default: 6)
- **Determinism:** Fixed gzip header timestamp (`2026-01-01T00:00:00Z`)
- **Permissions:** All files `0644`, directories `0755`, uid/gid `0:0`
## Directory Structure
```
evidence-bundle-<id>/
├── manifest.json # Bundle manifest with all artifact refs
├── metadata.json # Bundle metadata (provenance, timestamps)
├── README.md # Human-readable verification instructions
├── verify.sh # Bash verification script (POSIX-compliant)
├── verify.ps1 # PowerShell verification script
├── checksums.sha256 # BSD-format SHA256 checksums
├── keys/
│ ├── signing-key-001.pem # Public key(s) for DSSE verification
│ ├── signing-key-002.pem # Additional keys (multi-signature)
│ └── trust-bundle.pem # CA chain (if applicable)
├── sboms/
│ ├── image.cdx.json # Aggregated CycloneDX SBOM
│ ├── image.spdx.json # Aggregated SPDX SBOM
│ └── layers/
│ ├── <layer-digest>.cdx.json # Per-layer CycloneDX
│ └── <layer-digest>.spdx.json # Per-layer SPDX
├── vex/
│ ├── statements/
│ │ └── <statement-id>.openvex.json
│ └── consensus/
│ └── image-consensus.json # VEX consensus result
├── attestations/
│ ├── sbom.dsse.json # SBOM attestation envelope
│ ├── vex.dsse.json # VEX attestation envelope
│ ├── policy.dsse.json # Policy verdict attestation
│ └── rekor-proofs/
│ └── <uuid>.proof.json # Rekor inclusion proofs
├── findings/
│ ├── scan-results.json # Vulnerability findings
│ └── gate-results.json # VEX gate decisions
└── audit/
└── timeline.ndjson # Audit event timeline
```
## Core Artifacts
### manifest.json
The manifest provides a complete inventory of all artifacts in the bundle.
```json
{
"manifestVersion": "1.0.0",
"bundleId": "eb-2026-01-06-abc123",
"createdAt": "2026-01-06T10:30:00.000000Z",
"subject": {
"type": "container-image",
"digest": "sha256:abcdef1234567890...",
"name": "registry.example.com/app:v1.2.3"
},
"artifacts": [
{
"path": "sboms/image.cdx.json",
"type": "sbom",
"format": "cyclonedx-1.7",
"digest": "sha256:...",
"size": 45678
},
{
"path": "attestations/sbom.dsse.json",
"type": "attestation",
"format": "dsse-v1",
"predicateType": "StellaOps.SBOMAttestation@1",
"digest": "sha256:...",
"size": 12345,
"signedBy": ["sha256:keyabc..."]
}
],
"verification": {
"merkleRoot": "sha256:...",
"algorithm": "sha256",
"checksumFile": "checksums.sha256"
}
}
```
### metadata.json
Provides provenance and chain information.
```json
{
"bundleId": "eb-2026-01-06-abc123",
"exportedAt": "2026-01-06T10:35:00.000000Z",
"exportedBy": "stella evidence export",
"exportVersion": "2026.04",
"provenance": {
"tenantId": "tenant-xyz",
"scanId": "scan-abc123",
"pipelineId": "pipeline-def456",
"sourceRepository": "https://github.com/example/app",
"sourceCommit": "abc123def456..."
},
"chainInfo": {
"previousBundleId": "eb-2026-01-05-xyz789",
"sequenceNumber": 42
},
"transparency": {
"rekorLogUrl": "https://rekor.sigstore.dev",
"rekorEntryUuids": ["uuid1", "uuid2"]
}
}
```
### checksums.sha256
BSD-format SHA256 checksums for all artifacts:
```
SHA256 (manifest.json) = abc123...
SHA256 (metadata.json) = def456...
SHA256 (sboms/image.cdx.json) = 789abc...
SHA256 (attestations/sbom.dsse.json) = cde012...
```
## Verification Scripts
### verify.sh (Bash)
POSIX-compliant bash script for Unix/Linux/macOS verification:
```bash
#!/bin/bash
set -euo pipefail
BUNDLE_DIR="$(cd "$(dirname "$0")" && pwd)"
MANIFEST="$BUNDLE_DIR/manifest.json"
CHECKSUMS="$BUNDLE_DIR/checksums.sha256"
echo "=== StellaOps Evidence Bundle Verification ==="
echo "Bundle: $(basename "$BUNDLE_DIR")"
echo ""
# Step 1: Verify checksums
echo "[1/4] Verifying artifact checksums..."
cd "$BUNDLE_DIR"
sha256sum -c "$CHECKSUMS" --quiet
echo " OK: All checksums match"
# Step 2: Verify Merkle root
echo "[2/4] Verifying Merkle root..."
COMPUTED_ROOT=$(compute-merkle-root "$CHECKSUMS")
EXPECTED_ROOT=$(jq -r '.verification.merkleRoot' "$MANIFEST")
if [ "$COMPUTED_ROOT" = "$EXPECTED_ROOT" ]; then
echo " OK: Merkle root verified"
else
echo " FAIL: Merkle root mismatch"
exit 1
fi
# Step 3: Verify DSSE signatures
echo "[3/4] Verifying attestation signatures..."
for dsse in "$BUNDLE_DIR"/attestations/*.dsse.json; do
verify-dsse "$dsse" --keys "$BUNDLE_DIR/keys/"
echo " OK: $(basename "$dsse")"
done
# Step 4: Verify Rekor proofs (if online)
echo "[4/4] Verifying Rekor proofs..."
if [ "${OFFLINE:-false}" = "true" ]; then
echo " SKIP: Offline mode, Rekor verification skipped"
else
for proof in "$BUNDLE_DIR"/attestations/rekor-proofs/*.proof.json; do
verify-rekor-proof "$proof"
echo " OK: $(basename "$proof")"
done
fi
echo ""
echo "=== Verification Complete: PASSED ==="
```
### verify.ps1 (PowerShell)
Windows PowerShell verification script:
```powershell
#Requires -Version 5.1
$ErrorActionPreference = 'Stop'
$BundleDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$Manifest = Join-Path $BundleDir 'manifest.json'
$Checksums = Join-Path $BundleDir 'checksums.sha256'
Write-Host "=== StellaOps Evidence Bundle Verification ===" -ForegroundColor Cyan
Write-Host "Bundle: $(Split-Path -Leaf $BundleDir)"
Write-Host ""
# Step 1: Verify checksums
Write-Host "[1/4] Verifying artifact checksums..." -ForegroundColor Yellow
Push-Location $BundleDir
try {
Get-Content $Checksums | ForEach-Object {
if ($_ -match 'SHA256 \((.+)\) = (.+)') {
$File = $Matches[1]
$Expected = $Matches[2]
$Actual = (Get-FileHash -Path $File -Algorithm SHA256).Hash.ToLower()
if ($Actual -ne $Expected) {
throw "Checksum mismatch: $File"
}
}
}
Write-Host " OK: All checksums match" -ForegroundColor Green
} finally {
Pop-Location
}
# Step 2-4: Continue verification...
Write-Host ""
Write-Host "=== Verification Complete: PASSED ===" -ForegroundColor Green
```
## Determinism Requirements
### Timestamp Handling
- All timestamps MUST be UTC ISO-8601 with microsecond precision
- Format: `YYYY-MM-DDTHH:MM:SS.ffffffZ`
- Archive metadata timestamps are fixed for reproducibility
### Ordering
- Manifest artifacts: sorted lexicographically by `path`
- Checksum entries: sorted lexicographically by filename
- JSON object keys: sorted lexicographically (RFC 8785)
- NDJSON records: sorted by primary key (e.g., `observationId`)
### Hash Computation
- Algorithm: SHA-256 (lowercase hex)
- Input: raw file bytes (no BOM, LF line endings)
- Merkle tree: RFC 6962 compliant binary Merkle tree
## Artifact Types
### SBOMs
| Format | File Extension | MIME Type |
|--------|---------------|-----------|
| CycloneDX 1.7 | `.cdx.json` | `application/vnd.cyclonedx+json` |
| SPDX 3.0.1 | `.spdx.json` | `application/spdx+json` |
### Attestations
| Type | Predicate Type | File Pattern |
|------|---------------|--------------|
| SBOM | `StellaOps.SBOMAttestation@1` | `sbom.dsse.json` |
| VEX | `StellaOps.VEXAttestation@1` | `vex.dsse.json` |
| Policy | `StellaOps.PolicyEvaluation@1` | `policy.dsse.json` |
| Gate | `StellaOps.VexGate@1` | `gate.dsse.json` |
### VEX Statements
- Format: OpenVEX 0.2.0+
- File extension: `.openvex.json`
- Location: `vex/statements/`
## Export Options
### CLI Command
```bash
# Basic export
stella evidence export --bundle <bundle-id> --output ./audit-bundle.tar.gz
# With options
stella evidence export --bundle <bundle-id> \
--output ./bundle.tar.gz \
--include-layers \
--include-rekor-proofs \
--compression 9
# Verify exported bundle
stella evidence verify ./audit-bundle.tar.gz
# Verify offline (skip Rekor)
stella evidence verify ./audit-bundle.tar.gz --offline
```
### API Endpoint
```http
POST /api/v1/bundles/{bundleId}/export
Content-Type: application/json
{
"format": "tar.gz",
"compression": "gzip",
"compressionLevel": 6,
"includeRekorProofs": true,
"includeLayerSboms": true
}
```
## Offline Verification
For air-gapped environments:
1. Transfer `evidence-bundle-<id>.tar.gz` to isolated system
2. Extract archive: `tar -xzf evidence-bundle-<id>.tar.gz`
3. Run verification: `./verify.sh` (Unix) or `.\verify.ps1` (Windows)
4. Pass `OFFLINE=true` to skip Rekor verification: `OFFLINE=true ./verify.sh`
## Compatibility
- **StellaOps CLI:** 2026.04+
- **Export Library:** StellaOps.EvidenceLocker.Export 1.0.0+
- **Verify scripts:** bash 4.0+ / PowerShell 5.1+
## Related Documentation
- [Bundle Packaging](bundle-packaging.md) - Internal bundle structure
- [Evidence Bundle v1](evidence-bundle-v1.md) - Core bundle contract
- [Verify Offline](verify-offline.md) - Offline verification procedures
- [Attestation Contract](attestation-contract.md) - DSSE envelope format
## Change Log
| Date | Version | Author | Description |
|------|---------|--------|-------------|
| 2026-01-07 | 1.0.0 | StellaOps | Initial specification for Sprint 003_003 |

View File

@@ -0,0 +1,41 @@
# Facet
> Cryptographically sealed manifests for logical slices of container images.
## Purpose
The Facet Sealing subsystem provides cryptographically sealed manifests for logical slices of container images, enabling fine-grained drift detection, per-facet quota enforcement, and deterministic change tracking.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Scanner Guild, Policy Guild |
## Key Features
- **Facet Types**: OS packages, language dependencies, binaries, configs, custom patterns
- **Cryptographic Sealing**: Each facet can be individually sealed with a cryptographic snapshot
- **Drift Detection**: Monitor changes between seals for compliance enforcement
- **Merkle Tree Structure**: Content-addressed storage with integrity verification
## Dependencies
### Upstream (this module depends on)
- **Scanner** - Facet extraction during image analysis
- **Attestor** - DSSE signing for sealed facets
### Downstream (modules that depend on this)
- **Policy** - Drift detection and quota enforcement
- **Replay** - Facet verification in replay workflows
## Related Documentation
- [Scanner Architecture](../scanner/architecture.md)
- [Replay Architecture](../replay/architecture.md)

View File

@@ -0,0 +1,43 @@
# Feedser
> Evidence collection library for backport detection and binary fingerprinting.
## Purpose
Feedser provides deterministic, cryptographic evidence collection for backport detection. It extracts patch signatures from unified diffs and binary fingerprints from compiled code to enable high-confidence vulnerability status determination for packages where upstream fixes have been backported by distro maintainers.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Concelier Guild |
## Key Features
- **Patch Signature Extraction**: Parse unified diffs and extract normalized hunk signatures
- **Binary Fingerprinting**: TLSH fuzzy hashing and instruction sequence hashing
- **Four-Tier Proof System**: Supporting backport detection at multiple confidence levels
- **Deterministic Outputs**: Canonical JSON serialization with stable hashing
## Dependencies
### Upstream (this module depends on)
- None (library with no external service dependencies)
### Downstream (modules that depend on this)
- **Concelier** - ProofService layer consumes Feedser for backport evidence
- **Attestor** - Evidence storage for generated proofs
## Notes
Feedser is a **library**, not a standalone service. It does not expose REST APIs directly and does not make vulnerability decisions. It provides evidence that feeds into VEX statements and Policy Engine evaluation.
## Related Documentation
- [Concelier Architecture](../concelier/architecture.md)

View File

@@ -17,7 +17,7 @@
1. **Create env files**
```bash
cp deploy/compose/env/ledger.env.example ledger.env
cp devops/compose/env/ledger.env.example ledger.env
cp etc/secrets/ledger.postgres.secret.example ledger.postgres.env
# Populate LEDGER__DB__CONNECTIONSTRING, LEDGER__ATTACHMENTS__ENCRYPTIONKEY, etc.
```
@@ -48,7 +48,7 @@
-- --connection "$LEDGER__DB__CONNECTIONSTRING"
docker compose --env-file ledger.env --env-file ledger.postgres.env \
-f deploy/compose/docker-compose.prod.yaml up -d findings-ledger
-f devops/compose/docker-compose.prod.yaml up -d findings-ledger
```
4. **Smoke test**
```bash
@@ -87,8 +87,8 @@
```
3. **Install/upgrade**
```bash
helm upgrade --install stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-prod.yaml
helm upgrade --install stellaops devops/helm/stellaops \
-f devops/helm/stellaops/values-prod.yaml
```
4. **Verify**
```bash

View File

@@ -1,7 +1,7 @@
# Issuer Directory Backup & Restore
## Scope
- **Applies to:** Issuer Directory when deployed via Docker Compose (`deploy/compose/docker-compose.*.yaml`) or the Helm chart (`deploy/helm/stellaops`).
- **Applies to:** Issuer Directory when deployed via Docker Compose (`devops/compose/docker-compose.*.yaml`) or the Helm chart (`devops/helm/stellaops`).
- **Artifacts covered:** PostgreSQL database `issuer_directory`, service configuration (`etc/issuer-directory.yaml`), CSAF seed file (`data/csaf-publishers.json`), and secret material for the PostgreSQL connection string.
- **Frequency:** Take a hot backup before every upgrade and at least daily in production. Keep encrypted copies off-site/air-gapped according to your compliance program.
@@ -23,12 +23,12 @@
```
2. **Dump PostgreSQL tables**
```bash
docker compose -f deploy/compose/docker-compose.prod.yaml exec postgres \
docker compose -f devops/compose/docker-compose.prod.yaml exec postgres \
pg_dump --format=custom --compress=9 \
--file=/dump/issuer-directory-$(date +%Y%m%dT%H%M%SZ).dump \
--schema=issuer_directory issuer_directory
docker compose -f deploy/compose/docker-compose.prod.yaml cp \
docker compose -f devops/compose/docker-compose.prod.yaml cp \
postgres:/dump/issuer-directory-$(date +%Y%m%dT%H%M%SZ).dump "$BACKUP_DIR/"
```
For Kubernetes, run the same `pg_dump` command inside the `stellaops-postgres` pod and copy the archive via `kubectl cp`.
@@ -53,7 +53,7 @@
1. Notify stakeholders and pause automation calling the API.
2. Stop services:
```bash
docker compose -f deploy/compose/docker-compose.prod.yaml down issuer-directory
docker compose -f devops/compose/docker-compose.prod.yaml down issuer-directory
```
(For Helm: `kubectl scale deploy stellaops-issuer-directory --replicas=0`.)
3. Snapshot volumes:

View File

@@ -1,7 +1,7 @@
# Issuer Directory Deployment Guide
## Scope
- **Applies to:** Issuer Directory WebService (`stellaops/issuer-directory-web`) running via the provided Docker Compose bundles (`deploy/compose/docker-compose.*.yaml`) or the Helm chart (`deploy/helm/stellaops`).
- **Applies to:** Issuer Directory WebService (`stellaops/issuer-directory-web`) running via the provided Docker Compose bundles (`devops/compose/docker-compose.*.yaml`) or the Helm chart (`devops/helm/stellaops`).
- **Covers:** Environment prerequisites, secret handling, Compose + Helm rollout steps, and post-deploy verification.
- **Audience:** Platform/DevOps engineers responsible for Identity & Signing sprint deliverables.
@@ -16,7 +16,7 @@
## 2 · Deploy with Docker Compose
1. **Prepare environment variables**
```bash
cp deploy/compose/env/dev.env.example dev.env
cp devops/compose/env/dev.env.example dev.env
cp etc/secrets/issuer-directory.postgres.secret.example issuer-directory.postgres.env
# Edit dev.env and issuer-directory.postgres.env with production-ready secrets.
```
@@ -26,7 +26,7 @@
docker compose \
--env-file dev.env \
--env-file issuer-directory.postgres.env \
-f deploy/compose/docker-compose.dev.yaml config
-f devops/compose/docker-compose.dev.yaml config
```
The command confirms the new `issuer-directory` service resolves the port (`${ISSUER_DIRECTORY_PORT:-8447}`) and the PostgreSQL connection string is in place.
@@ -35,7 +35,7 @@
docker compose \
--env-file dev.env \
--env-file issuer-directory.postgres.env \
-f deploy/compose/docker-compose.dev.yaml up -d issuer-directory
-f devops/compose/docker-compose.dev.yaml up -d issuer-directory
```
Compose automatically mounts `../../etc/issuer-directory.yaml` into the container at `/etc/issuer-directory.yaml`, seeds CSAF publishers, and exposes the API on `https://localhost:8447`.
@@ -70,16 +70,16 @@
2. **Template for validation**
```bash
helm template issuer-directory deploy/helm/stellaops \
-f deploy/helm/stellaops/values-prod.yaml \
helm template issuer-directory devops/helm/stellaops \
-f devops/helm/stellaops/values-prod.yaml \
--set services.issuer-directory.env.ISSUERDIRECTORY__AUTHORITY__ISSUER=https://authority.prod.stella-ops.org \
> /tmp/issuer-directory.yaml
```
3. **Install / upgrade**
```bash
helm upgrade --install stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-prod.yaml \
helm upgrade --install stellaops devops/helm/stellaops \
-f devops/helm/stellaops/values-prod.yaml \
--set services.issuer-directory.env.ISSUERDIRECTORY__AUTHORITY__ISSUER=https://authority.prod.stella-ops.org
```
The chart provisions:

View File

@@ -24,8 +24,8 @@ Include the following artefacts in your Offline Update Kit staging tree:
```
2. Copy Compose artefacts:
```bash
cp deploy/compose/docker-compose.airgap.yaml .
cp deploy/compose/env/airgap.env.example airgap.env
cp devops/compose/docker-compose.airgap.yaml .
cp devops/compose/env/airgap.env.example airgap.env
cp secrets/issuer-directory/connection.env issuer-directory.mongo.env
```
3. Update `airgap.env` with site-specific values (Authority issuer, tenant, ports) and remove outbound endpoints.
@@ -47,8 +47,8 @@ Include the following artefacts in your Offline Update Kit staging tree:
(Generate this file during packaging with `kubectl create secret generic issuer-directory-secrets ... --dry-run=client -o yaml`.)
3. Install/upgrade the chart:
```bash
helm upgrade --install stellaops deploy/helm/stellaops \
-f deploy/helm/stellaops/values-airgap.yaml \
helm upgrade --install stellaops devops/helm/stellaops \
-f devops/helm/stellaops/values-airgap.yaml \
--set services.issuer-directory.env.ISSUERDIRECTORY__AUTHORITY__ISSUER=https://authority.airgap.local/realms/stellaops
```
4. Confirm `issuer_directory_changes_total` is visible in your offline Prometheus stack.

View File

@@ -0,0 +1,49 @@
# Packs Registry
> Task packs registry and distribution service.
## Purpose
PacksRegistry provides a centralized registry for distributable task packs, policy packs, and analyzer bundles. It enables versioned pack management with integrity verification and air-gap support.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Usage and configuration guides
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Platform Guild |
## Key Features
- **Centralized Registry**: Store and manage task packs, policy packs, and analyzer bundles
- **Versioned Management**: Semantic versioning with upgrade/downgrade support
- **Content-Addressed**: All packs are content-addressed with integrity verification
- **Offline Distribution**: Bundle export for air-gapped environments
## Dependencies
### Upstream (this module depends on)
- **PostgreSQL** - Pack metadata storage
- **RustFS/S3** - Pack content storage
- **Authority** - Authentication and authorization
### Downstream (modules that depend on this)
- **TaskRunner** - Consumes packs for execution
## Configuration
```yaml
packs_registry:
storage_backend: rustfs # or s3
max_pack_size_mb: 100
```
## Related Documentation
- [TaskRunner Architecture](../task-runner/architecture.md)

View File

@@ -8,7 +8,7 @@ This dossier summarises the end-to-end runtime topology after the Aggregation-On
---
> Need a quick orientation? The [Developer Quickstart](../onboarding/dev-quickstart.md) (29-Nov-2025 advisory) captures the core repositories, determinism checks, DSSE conventions, and starter tasks that explain how the platform pieces fit together.
> Need a quick orientation? The [Developer Quickstart](../../dev/onboarding/dev-quickstart.md) (29-Nov-2025 advisory) captures the core repositories, determinism checks, DSSE conventions, and starter tasks that explain how the platform pieces fit together.
> Testing strategy models and CI lanes live in `docs/technical/testing/testing-strategy-models.md`, with the source catalog in `docs/technical/testing/TEST_CATALOG.yml`.

View File

@@ -0,0 +1,176 @@
# Determinization Gate API Reference
> **Audience:** Backend integrators, policy operators, and security engineers working with CVE observations.
> **Sprint:** 20260106_001_003_POLICY_determinization_gates
This document describes the Determinization Gate API and the `GuardedPass` verdict status for uncertain CVE observations.
---
## 1. Overview
The Determinization Gate evaluates CVE observations against uncertainty thresholds and produces verdicts based on available evidence (EPSS, VEX, reachability, runtime, backport signals). It introduces `GuardedPass` status for observations that don't have enough evidence for a confident determination but don't exceed risk thresholds.
### Key Components
| Component | Purpose |
|-----------|---------|
| `DeterminizationGate` | Policy gate that evaluates uncertainty and produces verdicts |
| `DeterminizationPolicy` | Rule set for allow/quarantine/escalate decisions |
| `SignalUpdateHandler` | Handles signal updates and triggers re-evaluation |
| `DeterminizationGateMetrics` | OpenTelemetry metrics for observability |
---
## 2. PolicyVerdictStatus
Policy evaluations return a `PolicyVerdictStatus` indicating the outcome:
| Status | Code | Description | Monitoring Required |
|--------|------|-------------|---------------------|
| `Pass` | 0 | Finding meets policy requirements. No action needed. | No |
| `Blocked` | 1 | Finding fails policy checks; must be remediated. | No |
| `Ignored` | 2 | Finding deliberately ignored via policy exception. | No |
| `Warned` | 3 | Finding passes but with warnings. | No |
| `Deferred` | 4 | Decision deferred; needs additional evidence. | No |
| `Escalated` | 5 | Decision escalated for human review. | No |
| `RequiresVex` | 6 | VEX statement required to make decision. | No |
| `GuardedPass` | 7 | Finding allowed with runtime monitoring guardrails. | **Yes** |
---
## 3. GuardedPass Status
`GuardedPass` is a specialized status for CVE observations with uncertain evidence. It enables "allow with guardrails" semantics.
### 3.1 When Issued
- Observation has insufficient evidence for full determination (high entropy)
- Trust score below thresholds but above blocking level
- Non-production environments with moderate uncertainty
- Reachability status is unknown but not confirmed reachable
### 3.2 GuardRails Object
When `GuardedPass` is returned, the verdict includes a `guardRails` object:
```json
{
"status": "GuardedPass",
"reason": "Uncertain observation allowed with guardrails in staging",
"matchedRule": "GuardedAllowNonProd",
"guardRails": {
"enableRuntimeMonitoring": true,
"reviewInterval": "7.00:00:00",
"epssEscalationThreshold": 0.4,
"escalatingReachabilityStates": ["Reachable", "ObservedReachable"],
"maxGuardedDuration": "30.00:00:00",
"policyRationale": "Auto-allowed: entropy=0.45, trust=0.38, env=staging"
},
"suggestedObservationState": "PendingDeterminization",
"uncertaintyScore": {
"entropy": 0.45,
"completeness": 0.55,
"tier": "Medium",
"missingSignals": ["runtime", "backport"]
}
}
```
### 3.3 Consumer Responsibilities
When receiving `GuardedPass`:
1. **Enable Runtime Monitoring:** Start observing the component for vulnerable code execution
2. **Schedule Review:** Set up periodic review at `reviewInterval`
3. **EPSS Escalation:** Auto-escalate if EPSS score exceeds `epssEscalationThreshold`
4. **Reachability Escalation:** Auto-escalate if reachability transitions to escalating states
5. **Duration Limit:** Convert to `Blocked` if guard duration exceeds `maxGuardedDuration`
---
## 4. Determinization Rules
The gate evaluates rules in priority order:
| Priority | Rule | Condition | Result |
|----------|------|-----------|--------|
| 10 | RuntimeEscalation | Runtime evidence shows vulnerable code loaded | Escalated |
| 20 | EpssQuarantine | EPSS score exceeds threshold | Blocked |
| 25 | ReachabilityQuarantine | Code proven reachable | Blocked |
| 30 | ProductionEntropyBlock | High entropy in production | Blocked |
| 40 | StaleEvidenceDefer | Evidence is stale | Deferred |
| 50 | GuardedAllowNonProd | Uncertain in non-prod | GuardedPass |
| 60 | UnreachableAllow | Unreachable with high confidence | Pass |
| 65 | VexNotAffectedAllow | VEX not_affected from trusted issuer | Pass |
| 70 | SufficientEvidenceAllow | Low entropy, high trust | Pass |
| 80 | GuardedAllowModerateUncertainty | Moderate uncertainty, reasonable trust | GuardedPass |
| 100 | DefaultDefer | No rule matched | Deferred |
---
## 5. Environment Thresholds
Thresholds vary by deployment environment:
| Environment | MinConfidence | MaxEntropy | EPSS Threshold | Require Reachability |
|-------------|---------------|------------|----------------|---------------------|
| Production | 0.75 | 0.3 | 0.3 | Yes |
| Staging | 0.60 | 0.5 | 0.4 | Yes |
| Development | 0.40 | 0.7 | 0.6 | No |
---
## 6. Signal Update Events
The Determinization Gate subscribes to signal updates for automatic re-evaluation:
| Event Type | Description |
|------------|-------------|
| `epss.updated` | EPSS score changed |
| `vex.updated` | VEX statement added/modified |
| `reachability.updated` | Reachability analysis completed |
| `runtime.updated` | Runtime observation recorded |
| `backport.updated` | Backport detection result |
| `observation.state_changed` | Observation state transition |
---
## 7. Metrics
OpenTelemetry metrics for monitoring:
| Metric | Type | Description |
|--------|------|-------------|
| `stellaops_policy_determinization_evaluations_total` | Counter | Total evaluations by status/environment/rule |
| `stellaops_policy_determinization_rule_matches_total` | Counter | Rule matches by rule name/status/environment |
| `stellaops_policy_observation_state_transitions_total` | Counter | State transitions by from/to state/trigger |
| `stellaops_policy_determinization_entropy` | Histogram | Distribution of entropy scores |
| `stellaops_policy_determinization_trust_score` | Histogram | Distribution of trust scores |
| `stellaops_policy_determinization_evaluation_duration_ms` | Histogram | Evaluation latency |
---
## 8. Service Registration
Add determinization services via dependency injection:
```csharp
// Register all determinization services
services.AddDeterminizationEngine();
// Or as part of full Policy Engine registration
services.AddPolicyEngine(); // Includes determinization
```
---
## 9. Related Documentation
- [Determinization Library](./determinization-architecture.md) - Core determinization models
- [Policy Engine Architecture](./architecture.md) - Overall policy engine design
- [Signal Snapshot Models](../../api/signals/reachability-contract.md) - Signal data structures
---
*Last updated: 2026-01-07 (Sprint 20260106_001_003)*

View File

@@ -34,7 +34,7 @@
#### Activation configuration wiring
- **Helm ConfigMap.** `deploy/helm/stellaops/values*.yaml` now include a `policy-engine-activation` ConfigMap. The chart automatically injects it via `envFrom` into both the Policy Engine and Policy Gateway pods, so overriding the ConfigMap data updates the services with no manifest edits.
- **Helm ConfigMap.** `devops/helm/stellaops/values*.yaml` now include a `policy-engine-activation` ConfigMap. The chart automatically injects it via `envFrom` into both the Policy Engine and Policy Gateway pods, so overriding the ConfigMap data updates the services with no manifest edits.
- **Type safety.** Quote ConfigMap values (e.g., `"true"`, `"false"`) because Kubernetes ConfigMaps carry string data. This mirrors the defaults checked into the repo and keeps `helm template` deterministic.
- **File-based overrides (optional).** The Policy Engine host already probes `/config/policy-engine/activation.yaml`, `../etc/policy-engine.activation.yaml`, and ambient `policy-engine.activation.yaml` files beside the binary. Mounting the ConfigMap as a file at `/config/policy-engine/activation.yaml` works immediately if/when we add a volume.
- **Offline/Compose.** Compose/offline bundles can continue exporting `STELLAOPS_POLICY_ENGINE__ACTIVATION__*` variables directly; the ConfigMap wiring simply mirrors those keys for Kubernetes clusters.

View File

@@ -0,0 +1,369 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://stellaops.dev/schemas/stellaops.suppression.v1.schema.json",
"title": "StellaOps Suppression Witness v1",
"description": "A DSSE-signable suppression witness documenting why a vulnerability is not exploitable",
"type": "object",
"required": [
"witness_schema",
"witness_id",
"artifact",
"vuln",
"suppression_type",
"evidence",
"confidence",
"observed_at"
],
"properties": {
"witness_schema": {
"type": "string",
"const": "stellaops.suppression.v1",
"description": "Schema version identifier"
},
"witness_id": {
"type": "string",
"pattern": "^sup:sha256:[a-f0-9]{64}$",
"description": "Content-addressed witness ID (e.g., 'sup:sha256:...')"
},
"artifact": {
"$ref": "#/definitions/WitnessArtifact",
"description": "The artifact (SBOM, component) this witness relates to"
},
"vuln": {
"$ref": "#/definitions/WitnessVuln",
"description": "The vulnerability this witness concerns"
},
"suppression_type": {
"type": "string",
"enum": [
"Unreachable",
"LinkerGarbageCollected",
"FeatureFlagDisabled",
"PatchedSymbol",
"GateBlocked",
"CompileTimeExcluded",
"VexNotAffected",
"FunctionAbsent",
"VersionNotAffected",
"PlatformNotAffected"
],
"description": "The type of suppression (unreachable, patched, gate-blocked, etc.)"
},
"evidence": {
"$ref": "#/definitions/SuppressionEvidence",
"description": "Evidence supporting the suppression claim"
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Confidence level in this suppression [0.0, 1.0]"
},
"expires_at": {
"type": "string",
"format": "date-time",
"description": "Optional expiration date for time-bounded suppressions (UTC ISO-8601)"
},
"observed_at": {
"type": "string",
"format": "date-time",
"description": "When this witness was generated (UTC ISO-8601)"
},
"justification": {
"type": "string",
"description": "Optional justification narrative"
}
},
"additionalProperties": false,
"definitions": {
"WitnessArtifact": {
"type": "object",
"required": ["sbom_digest", "component_purl"],
"properties": {
"sbom_digest": {
"type": "string",
"pattern": "^sha256:[a-f0-9]{64}$",
"description": "SHA-256 digest of the SBOM"
},
"component_purl": {
"type": "string",
"pattern": "^pkg:",
"description": "Package URL of the vulnerable component"
}
},
"additionalProperties": false
},
"WitnessVuln": {
"type": "object",
"required": ["id", "source", "affected_range"],
"properties": {
"id": {
"type": "string",
"description": "Vulnerability identifier (e.g., 'CVE-2024-12345')"
},
"source": {
"type": "string",
"description": "Vulnerability source (e.g., 'NVD', 'OSV', 'GHSA')"
},
"affected_range": {
"type": "string",
"description": "Affected version range expression"
}
},
"additionalProperties": false
},
"SuppressionEvidence": {
"type": "object",
"required": ["witness_evidence"],
"properties": {
"witness_evidence": {
"$ref": "#/definitions/WitnessEvidence"
},
"unreachability": {
"$ref": "#/definitions/UnreachabilityEvidence"
},
"patched_symbol": {
"$ref": "#/definitions/PatchedSymbolEvidence"
},
"function_absent": {
"$ref": "#/definitions/FunctionAbsentEvidence"
},
"gate_blocked": {
"$ref": "#/definitions/GateBlockedEvidence"
},
"feature_flag": {
"$ref": "#/definitions/FeatureFlagEvidence"
},
"vex_statement": {
"$ref": "#/definitions/VexStatementEvidence"
},
"version_range": {
"$ref": "#/definitions/VersionRangeEvidence"
},
"linker_gc": {
"$ref": "#/definitions/LinkerGcEvidence"
}
},
"additionalProperties": false
},
"WitnessEvidence": {
"type": "object",
"required": ["callgraph_digest"],
"properties": {
"callgraph_digest": {
"type": "string",
"description": "BLAKE3 digest of the call graph used"
},
"surface_digest": {
"type": "string",
"description": "SHA-256 digest of the attack surface manifest"
},
"analysis_config_digest": {
"type": "string",
"description": "SHA-256 digest of the analysis configuration"
},
"build_id": {
"type": "string",
"description": "Build identifier for the analyzed artifact"
}
},
"additionalProperties": false
},
"UnreachabilityEvidence": {
"type": "object",
"required": ["analyzed_entrypoints", "unreachable_symbol", "analysis_method", "graph_digest"],
"properties": {
"analyzed_entrypoints": {
"type": "integer",
"minimum": 0,
"description": "Number of entrypoints analyzed"
},
"unreachable_symbol": {
"type": "string",
"description": "Vulnerable symbol that was confirmed unreachable"
},
"analysis_method": {
"type": "string",
"description": "Analysis method (static, dynamic, hybrid)"
},
"graph_digest": {
"type": "string",
"description": "Graph digest for reproducibility"
}
},
"additionalProperties": false
},
"FunctionAbsentEvidence": {
"type": "object",
"required": ["function_name", "binary_digest", "verification_method"],
"properties": {
"function_name": {
"type": "string",
"description": "Vulnerable function name"
},
"binary_digest": {
"type": "string",
"description": "Binary digest where function was checked"
},
"verification_method": {
"type": "string",
"description": "Verification method (symbol table scan, disassembly, etc.)"
}
},
"additionalProperties": false
},
"GateBlockedEvidence": {
"type": "object",
"required": ["detected_gates", "gate_coverage_percent", "effectiveness"],
"properties": {
"detected_gates": {
"type": "array",
"items": {
"$ref": "#/definitions/DetectedGate"
},
"description": "Detected gates along all paths to vulnerable code"
},
"gate_coverage_percent": {
"type": "integer",
"minimum": 0,
"maximum": 100,
"description": "Minimum gate coverage percentage [0, 100]"
},
"effectiveness": {
"type": "string",
"description": "Gate effectiveness assessment"
}
},
"additionalProperties": false
},
"DetectedGate": {
"type": "object",
"required": ["type", "guard_symbol", "confidence"],
"properties": {
"type": {
"type": "string",
"description": "Gate type (authRequired, inputValidation, rateLimited, etc.)"
},
"guard_symbol": {
"type": "string",
"description": "Symbol that implements the gate"
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Confidence level (0.0 - 1.0)"
},
"detail": {
"type": "string",
"description": "Human-readable detail about the gate"
}
},
"additionalProperties": false
},
"PatchedSymbolEvidence": {
"type": "object",
"required": ["vulnerable_symbol", "patched_symbol", "symbol_diff"],
"properties": {
"vulnerable_symbol": {
"type": "string",
"description": "Vulnerable symbol identifier"
},
"patched_symbol": {
"type": "string",
"description": "Patched symbol identifier"
},
"symbol_diff": {
"type": "string",
"description": "Symbol diff showing the patch"
},
"patch_ref": {
"type": "string",
"description": "Patch commit or release reference"
}
},
"additionalProperties": false
},
"VexStatementEvidence": {
"type": "object",
"required": ["vex_id", "vex_author", "vex_status", "vex_digest"],
"properties": {
"vex_id": {
"type": "string",
"description": "VEX statement identifier"
},
"vex_author": {
"type": "string",
"description": "VEX statement author/authority"
},
"vex_status": {
"type": "string",
"enum": ["not_affected", "fixed"],
"description": "VEX statement status"
},
"vex_digest": {
"type": "string",
"description": "Content digest of the VEX document"
}
},
"additionalProperties": false
},
"FeatureFlagEvidence": {
"type": "object",
"required": ["flag_name", "flag_state", "verification_source"],
"properties": {
"flag_name": {
"type": "string",
"description": "Feature flag name/key"
},
"flag_state": {
"type": "string",
"description": "Feature flag state (off, disabled)"
},
"verification_source": {
"type": "string",
"description": "Source of flag verification (config file, runtime)"
}
},
"additionalProperties": false
},
"VersionRangeEvidence": {
"type": "object",
"required": ["actual_version", "affected_range", "comparison_method"],
"properties": {
"actual_version": {
"type": "string",
"description": "Actual version of the component"
},
"affected_range": {
"type": "string",
"description": "Affected version range from advisory"
},
"comparison_method": {
"type": "string",
"description": "Version comparison method used"
}
},
"additionalProperties": false
},
"LinkerGcEvidence": {
"type": "object",
"required": ["removed_symbol", "linker_method", "verification_digest"],
"properties": {
"removed_symbol": {
"type": "string",
"description": "Symbol removed by linker GC"
},
"linker_method": {
"type": "string",
"description": "Linker garbage collection method"
},
"verification_digest": {
"type": "string",
"description": "Digest of final binary for verification"
}
},
"additionalProperties": false
}
}
}

View File

@@ -173,7 +173,7 @@ curl -X POST http://grafana:3000/api/dashboards/db \
# Via Helm (auto-provisioned)
# Dashboard is auto-imported when using StellaOps Helm chart
helm upgrade stellaops ./deploy/helm/stellaops \
helm upgrade stellaops ./devops/helm/stellaops \
--set grafana.dashboards.provcache.enabled=true
```

View File

@@ -0,0 +1,51 @@
# Provenance
> Provenance attestation library for SLSA/DSSE compliance.
## Purpose
Provenance provides deterministic, verifiable provenance attestations for all StellaOps artifacts. It enables SLSA compliance through DSSE statement generation, Merkle tree construction, and cryptographic verification.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Attestation generation guides
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Security Guild |
## Key Features
- **DSSE Statement Generation**: Build provenance attestations per DSSE spec
- **SLSA Compliance**: Support for SLSA build predicates
- **Merkle Tree Construction**: Content-addressed integrity verification
- **Promotion Attestations**: Track artifact promotions across environments
- **Verification Harness**: Validate attestation chains
## Dependencies
### Upstream (this module depends on)
- **Signer/KMS** - Key management for signing (delegated)
### Downstream (modules that depend on this)
- **Attestor** - Stores generated attestations
- **EvidenceLocker** - Evidence bundle attestations
- **ExportCenter** - Export attestations
## Notes
Provenance is a **library**, not a standalone service. It does not:
- Store attestations (handled by Attestor and EvidenceLocker)
- Hold signing keys (delegated to Signer/KMS)
All attestation outputs are deterministic with canonical JSON serialization.
## Related Documentation
- [Attestor Architecture](../attestor/architecture.md)
- [DSSE Specification](../../security/trust-and-signing.md)

View File

@@ -0,0 +1,54 @@
# ReachGraph
> Unified store for reachability subgraphs with edge-level explainability.
## Purpose
The ReachGraph module provides a unified store for reachability subgraphs, enabling fast, deterministic, audit-ready answers to "exactly why a dependency is reachable." It consolidates data from Scanner, Signals, and Attestor into content-addressed artifacts with edge-level explainability.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Usage and query guides
- [Schemas](./schemas/) - ReachGraph schema definitions
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Scanner Guild, Signals Guild |
## Key Features
- **Unified Schema**: Extends PoE subgraph format with edge explainability
- **Content-Addressed Store**: All artifacts identified by BLAKE3 digest
- **Slice Query API**: Fast queries by package, CVE, entrypoint, or file
- **Deterministic Replay**: Verify that same inputs produce same graph
- **DSSE Signing**: Offline-verifiable proofs
## Dependencies
### Upstream (this module depends on)
- **Scanner** - CallGraph data source
- **Signals** - ReachabilityFactDocument source
- **Attestor** - PoE JSON source
### Downstream (modules that depend on this)
- **Policy Engine** - Reachability-based policy evaluation
- **Web Console** - Reachability visualization
- **CLI** - Reachability queries
- **ExportCenter** - Reachability data exports
## API Endpoints
- `POST /v1/reachgraphs` - Create new reachgraph
- `GET /v1/reachgraphs/{digest}` - Retrieve reachgraph by digest
- `GET /v1/reachgraphs/{digest}/slice` - Query slice of reachgraph
- `POST /v1/reachgraphs/replay` - Verify deterministic replay
## Related Documentation
- [Scanner Architecture](../scanner/architecture.md)
- [Signals Architecture](../signals/architecture.md)

View File

@@ -197,6 +197,6 @@ Track function-level reachability changes between scans:
- **Daily reachability stand-up** in `#reachability-build`.
- **Fixture sync** every Friday: QA leads run reachbench matrix, post report to Confluence + link in `docs/modules/reach-graph/guides/DELIVERY_GUIDE.md`.
- **Decision log** Append ADRs under `docs/adr/reachability-*` for schema changes.
- **Decision log** Append ADRs under `docs/technical/adr/reachability-*` for schema changes.
Keep this guide updated whenever scope shifts or a new sprint is added.

View File

@@ -0,0 +1,51 @@
# Replay
> Deterministic replay engine for vulnerability verdict reproducibility.
## Purpose
Replay enables deterministic reproducibility of vulnerability verdicts. Given identical inputs (SBOM, policy, feeds, toolchain), the system MUST produce identical outputs. Replay provides the infrastructure to capture, store, and verify these deterministic execution chains.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Replay verification guides
- [Schemas](./schemas/) - Replay manifest and proof schemas
- [Replay Proof Schema](./replay-proof-schema.md) - Detailed proof format
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Platform Guild |
## Key Features
- **Replay Tokens**: Cryptographically bound to input digests for verification
- **Replay Manifests**: Capture all inputs required to reproduce a verdict
- **Feed Snapshots**: Point-in-time snapshots of vulnerability feeds
- **Verification Workflows**: Validate that replay produces identical results
## Dependencies
### Upstream (this module depends on)
- **Concelier** - Feed snapshot coordination
- **Attestor** - Replay proof signing
- **Policy** - Policy evaluation replay
### Downstream (modules that depend on this)
- **Attestor** - Stores replay proofs
- **ExportCenter** - Includes replay tokens in exports
## Notes
- Replay does not make vulnerability decisions; it captures inputs and outputs
- Replay does not store SBOMs or vulnerability data; it stores references (digests)
- All timestamps are UTC ISO-8601 with microsecond precision
## Related Documentation
- [Determinism Specification](../../technical/architecture/determinism-specification.md)
- [Facet Architecture](../facet/architecture.md)

View File

@@ -0,0 +1,60 @@
# Risk Engine
> Risk scoring runtime with pluggable providers and explainability.
## Purpose
RiskEngine computes deterministic, explainable risk scores for vulnerabilities by aggregating signals from multiple data sources (EPSS, CVSS, KEV, VEX, reachability). It produces audit trails and explainability payloads for every scoring decision.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Scoring configuration guides
- [Samples](./samples/) - Risk profile examples
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Policy Guild |
## Key Features
- **Pluggable Providers**: EPSS, CVSS+KEV, VEX status, fix availability providers
- **Deterministic Scoring**: Same inputs produce identical scores
- **Explainability**: Audit trails for every scoring decision
- **Offline Support**: Air-gapped operation via factor bundles
## Dependencies
### Upstream (this module depends on)
- **Concelier** - CVSS, KEV data
- **Excititor** - VEX status data
- **Signals** - Reachability data
- **Authority** - Authentication
### Downstream (modules that depend on this)
- **Policy Engine** - Consumes risk scores for policy evaluation
## Configuration
```yaml
risk_engine:
providers:
- epss
- cvss_kev
- vex_gate
- fix_exposure
cache_ttl_minutes: 60
```
## Notes
RiskEngine does not make PASS/FAIL decisions. It provides scores to the Policy Engine which makes enforcement decisions.
## Related Documentation
- [Policy Architecture](../policy/architecture.md)
- [Risk Scoring Contract](../../contracts/risk-scoring.md)

View File

@@ -0,0 +1,56 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://cyclonedx.org/schema/bom-1.7.schema.json",
"$comment": "Placeholder schema for CycloneDX 1.7 - Download full schema from https://raw.githubusercontent.com/CycloneDX/specification/master/schema/bom-1.7.schema.json",
"type": "object",
"title": "CycloneDX Software Bill of Materials Standard",
"properties": {
"bomFormat": {
"type": "string",
"enum": ["CycloneDX"]
},
"specVersion": {
"type": "string"
},
"serialNumber": {
"type": "string"
},
"version": {
"type": "integer"
},
"metadata": {
"type": "object"
},
"components": {
"type": "array"
},
"services": {
"type": "array"
},
"externalReferences": {
"type": "array"
},
"dependencies": {
"type": "array"
},
"compositions": {
"type": "array"
},
"vulnerabilities": {
"type": "array"
},
"annotations": {
"type": "array"
},
"formulation": {
"type": "array"
},
"declarations": {
"type": "object"
},
"definitions": {
"type": "object"
}
},
"required": ["bomFormat", "specVersion"]
}

View File

@@ -0,0 +1,43 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://spdx.org/schema/3.0.1/spdx-json-schema.json",
"$comment": "Placeholder schema for SPDX 3.0.1 JSON-LD - Download full schema from https://spdx.org/schema/3.0.1/spdx-json-schema.json",
"type": "object",
"title": "SPDX 3.0.1 JSON-LD Schema",
"properties": {
"@context": {
"oneOf": [
{ "type": "string" },
{ "type": "object" },
{ "type": "array" }
]
},
"@graph": {
"type": "array"
},
"@type": {
"type": "string"
},
"spdxId": {
"type": "string"
},
"creationInfo": {
"type": "object"
},
"name": {
"type": "string"
},
"element": {
"type": "array"
},
"rootElement": {
"type": "array"
},
"namespaceMap": {
"type": "array"
},
"externalMap": {
"type": "array"
}
}
}

View File

@@ -0,0 +1,311 @@
# SPDX 3.0.1 Profile Support
> **Version:** Draft
> **Status:** Planned
> **Sprint:** [SPRINT_20260107_004](../../implplan/SPRINT_20260107_004_000_INDEX_spdx3_profile_support.md)
This document describes StellaOps support for SPDX 3.0.1 and its profile-based model.
---
## Overview
SPDX 3.0.1 introduces a **profile-based model** that extends the Core specification with domain-specific metadata. StellaOps implements the following profiles:
| Profile | Status | Description | Integration |
|---------|--------|-------------|-------------|
| Core | Planned | Foundation for all elements | Required |
| Software | Planned | Packages, files, snippets | Scanner |
| Lite | Planned | Minimal CI/CD SBOMs | Scanner |
| Build | Planned | Build provenance | Attestor |
| Security | Planned | Vulnerability assessments | VexLens |
| Licensing | Future | License expressions | Policy |
| AI | Future | AI model artifacts | AdvisoryAI |
| Dataset | Future | Dataset metadata | Future |
---
## Current Support
### SPDX 2.x (Current)
StellaOps currently supports SPDX 2.2 and 2.3:
- **Parsing:** Full support via `SpdxParser`
- **Generation:** SPDX 2.3 JSON format
- **Integration:** AirGap importer, Scanner output
### SPDX 3.0.1 (Planned)
Full SPDX 3.0.1 support is planned with:
- **Parsing:** JSON-LD format with profile detection
- **Generation:** Profile-conformant output
- **Integration:** Attestor (Build), VexLens (Security)
---
## Profile Details
### Core Profile
The Core profile is the foundation for all SPDX 3.0.1 documents.
**Key Elements:**
- `Element` - Base for all typed elements
- `Relationship` - Links between elements
- `CreationInfo` - Document metadata
- `ExternalRef` - References to external resources
- `ExternalIdentifier` - PURL, CPE, SWID identifiers
- `IntegrityMethod` - Hash verification
**Required Fields:**
- `spdxId` - Unique IRI identifier
- `creationInfo` - With `specVersion: "3.0.1"`
### Software Profile
The Software profile describes software components.
**Key Elements:**
- `Package` - Software package
- `File` - Individual file
- `Snippet` - Code snippet
- `SpdxDocument` - Document root
**Common Properties:**
- `packageVersion` - Version string
- `packageUrl` - PURL (via ExternalIdentifier)
- `downloadLocation` - Source URL
- `homePage` - Project homepage
**Example:**
```json
{
"@type": "software_Package",
"spdxId": "urn:stellaops:spdx:sha256-abc:Package:xyz",
"name": "example-package",
"software_packageVersion": "1.0.0",
"externalIdentifier": [
{
"@type": "ExternalIdentifier",
"externalIdentifierType": "packageUrl",
"identifier": "pkg:npm/example-package@1.0.0"
}
]
}
```
### Lite Profile
The Lite profile provides minimal SBOMs for CI/CD pipelines.
**Minimal Required Fields:**
- `spdxId`
- `creationInfo` (created, createdBy, specVersion)
- `name`
- `packageVersion` (for packages)
- `downloadLocation` OR `packageUrl`
**Use Cases:**
- CI/CD pipeline artifacts
- Quick compliance checks
- Lightweight transmission
### Build Profile
The Build profile captures build provenance.
**Key Element:**
- `Build` - Build process information
**Properties:**
- `buildType` - Build system URI
- `buildId` - Unique build identifier
- `buildStartTime` / `buildEndTime` - Timing
- `configSourceUri` - Build configuration sources
- `environment` - Build environment
- `parameter` - Build parameters
**Integration with Attestor:**
```
in-toto/SLSA SPDX 3.0.1 Build
----------- ----------------
buildType --> build_buildType
builder.id --> createdBy (Agent)
invocation.config --> build_configSourceUri
materials --> Relationships (GENERATED_FROM)
```
### Security Profile
The Security profile describes vulnerability assessments.
**Key Elements:**
- `Vulnerability` - CVE/vulnerability reference
- `VexAffectedVulnAssessmentRelationship`
- `VexNotAffectedVulnAssessmentRelationship`
- `VexFixedVulnAssessmentRelationship`
- `VexUnderInvestigationVulnAssessmentRelationship`
- `CvssV3VulnAssessmentRelationship`
- `EpssVulnAssessmentRelationship`
**Integration with VexLens:**
```
OpenVEX SPDX 3.0.1 Security
------- -------------------
status: affected --> VexAffectedVulnAssessmentRelationship
status: not_affected --> VexNotAffectedVulnAssessmentRelationship
status: fixed --> VexFixedVulnAssessmentRelationship
justification --> statusNotes
action_statement --> actionStatement
```
---
## JSON-LD Structure
SPDX 3.0.1 uses JSON-LD format:
```json
{
"@context": "https://spdx.org/rdf/3.0.1/spdx-context.jsonld",
"@graph": [
{
"@type": "SpdxDocument",
"spdxId": "urn:example:doc",
"creationInfo": {
"@type": "CreationInfo",
"specVersion": "3.0.1",
"created": "2026-01-07T12:00:00Z",
"createdBy": ["urn:example:tool:stellaops"],
"profile": [
"https://spdx.org/rdf/3.0.1/terms/Core/ProfileIdentifierType/core",
"https://spdx.org/rdf/3.0.1/terms/Software/ProfileIdentifierType/software"
]
},
"rootElement": ["urn:example:pkg:root"],
"element": [
"urn:example:pkg:root",
"urn:example:pkg:dep1"
]
},
{
"@type": "software_Package",
"spdxId": "urn:example:pkg:root",
"name": "my-application",
"software_packageVersion": "2.0.0"
}
]
}
```
---
## API Usage
### Scanner SBOM Generation
```http
GET /api/v1/scan/{id}/sbom?format=spdx3&profile=software
```
**Parameters:**
| Parameter | Values | Default |
|-----------|--------|---------|
| `format` | `spdx3`, `spdx2`, `cyclonedx` | `spdx2` |
| `profile` | `software`, `lite` | `software` |
### Attestor Build Profile
```http
GET /api/v1/attestation/{id}?format=spdx3
```
### VexLens Security Profile
```http
GET /api/v1/vex/consensus/{artifact}?format=spdx3
```
---
## Profile Conformance
Documents declare profile conformance in `CreationInfo.profile`:
```json
{
"profile": [
"https://spdx.org/rdf/3.0.1/terms/Core/ProfileIdentifierType/core",
"https://spdx.org/rdf/3.0.1/terms/Software/ProfileIdentifierType/software",
"https://spdx.org/rdf/3.0.1/terms/Security/ProfileIdentifierType/security"
]
}
```
StellaOps validates documents against declared profiles when:
- Parsing external documents (opt-in validation)
- Generating documents (automatic conformance)
---
## Migration from SPDX 2.x
### Parallel Support
During transition, StellaOps supports both formats:
1. **SPDX 2.x** - Default for backward compatibility
2. **SPDX 3.0.1** - Opt-in via `format=spdx3`
### Key Differences
| Aspect | SPDX 2.x | SPDX 3.0.1 |
|--------|----------|------------|
| Format | Flat JSON | JSON-LD |
| Version field | `spdxVersion` | `specVersion` in CreationInfo |
| Packages | `packages[]` array | `@graph` with `@type` |
| Relationships | `relationships[]` | Relationship elements in `@graph` |
| Checksums | `checksums[]` | `verifiedUsing` IntegrityMethod |
| PURL | `externalRefs` | `externalIdentifier` |
### Version Detection
StellaOps auto-detects SPDX version:
```csharp
// 2.x: Has spdxVersion property
if (root.TryGetProperty("spdxVersion", out _))
return SpdxVersion.V2;
// 3.x: Has @context property
if (root.TryGetProperty("@context", out _))
return SpdxVersion.V3;
```
---
## Air-Gap Considerations
For air-gapped deployments:
1. **Bundle contexts locally** - SPDX JSON-LD contexts must be available offline
2. **Configure local context URIs** - Point to local context files
3. **Validate offline** - Use bundled schemas for validation
```yaml
# etc/spdx3.yaml
Spdx3:
ContextResolution:
Mode: Local # or Remote, Cached
LocalContextPath: /etc/stellaops/spdx3-contexts/
```
---
## References
- [SPDX 3.0.1 Specification](https://spdx.github.io/spdx-spec/v3.0.1/)
- [SPDX 3.0.1 Model Repository](https://github.com/spdx/spdx-3-model)
- [Sprint: SPDX 3.0.1 Profile Support](../../implplan/SPRINT_20260107_004_000_INDEX_spdx3_profile_support.md)

View File

@@ -12,7 +12,7 @@ Align Kubernetes/VM target coverage between Scanner and Zastava so runtime signa
- Standardize labels/annotations for scan jobs and Zastava monitors:
- `stellaops.workload/id`, `tenant`, `project`, `component`, `channel`.
- Container image digest required; tag optional.
- Shared manifest snippet lives in `deploy/helm/stellaops` overlays; reuse in job templates.
- Shared manifest snippet lives in `devops/helm/stellaops` overlays; reuse in job templates.
2) **Runtime evidence channels**
- Scanner EntryTrace publishes `runtime.events` with fields: `workloadId`, `namespace`, `node`, `edgeType` (syscall/net/fs), `timestamp` (UTC, ISO-8601), `code_id` (when available).
- Zastava observers mirror the same schema on `zastava.runtime.events`; controller stitches by `workloadId` and `imageDigest`.
@@ -36,5 +36,5 @@ Align Kubernetes/VM target coverage between Scanner and Zastava so runtime signa
- Tests: determinism checks on merged runtime bundle; label presence asserted in integration harness.
## Next Steps
- Wire labels/flags into `deploy/helm/stellaops` templates and Scanner Worker job manifests.
- Wire labels/flags into `devops/helm/stellaops` templates and Scanner Worker job manifests.
- Add integration test to ensure EntryTrace and Zastava events with same workload id are coalesced without reordering.

View File

@@ -63,7 +63,7 @@ graph LR
| Artifact | Owner | Location |
|----------|-------|----------|
| RFC Document | Scanner TL | `docs/adr/` |
| RFC Document | Scanner TL | `docs/technical/adr/` |
| Mapping CSV | Scanner TL | `docs/modules/scanner/fixtures/adapters/` |
| Golden Fixtures | QA | `docs/modules/scanner/fixtures/cdx17-cbom/` |
| Hash List | QA | `docs/modules/scanner/fixtures/*/hashes.txt` |
@@ -167,7 +167,7 @@ To modify a locked adapter:
| Record | Location | Retention |
|--------|----------|-----------|
| RFC decisions | `docs/adr/` | Permanent |
| RFC decisions | `docs/technical/adr/` | Permanent |
| Hash changes | Git history + `CHANGELOG.md` | Permanent |
| Approval records | PR comments | Permanent |
| DSSE envelopes | CAS + offline kit | Permanent |

View File

@@ -0,0 +1,176 @@
# Scheduler HLC Ordering Architecture
This document describes the Hybrid Logical Clock (HLC) based ordering system used by the StellaOps Scheduler for audit-safe job queue operations.
## Overview
The Scheduler uses HLC timestamps instead of wall-clock time to ensure:
1. **Total ordering** of jobs across distributed nodes
2. **Audit-safe sequencing** with cryptographic chain linking
3. **Deterministic merge** when offline nodes reconnect
4. **Clock skew tolerance** in distributed deployments
## HLC Timestamp Format
An HLC timestamp consists of three components:
```
(PhysicalTime, LogicalCounter, NodeId)
```
| Component | Description | Example |
|-----------|-------------|---------|
| PhysicalTime | Unix milliseconds (UTC) | `1704585600000` |
| LogicalCounter | Monotonic counter for same-millisecond events | `0`, `1`, `2`... |
| NodeId | Unique identifier for the node | `scheduler-prod-01` |
**String format:** `{physical}:{logical}:{nodeId}`
Example: `1704585600000:0:scheduler-prod-01`
## Database Schema
### scheduler_log Table
```sql
CREATE TABLE scheduler.scheduler_log (
id BIGSERIAL PRIMARY KEY,
t_hlc TEXT NOT NULL, -- HLC timestamp
job_id TEXT NOT NULL, -- Job identifier
action TEXT NOT NULL, -- ENQUEUE, DEQUEUE, EXECUTE, COMPLETE, FAIL
prev_chain_link TEXT, -- Hash of previous entry
chain_link TEXT NOT NULL, -- Hash of this entry
payload JSONB NOT NULL, -- Job metadata
tenant_id TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_scheduler_log_hlc ON scheduler.scheduler_log (t_hlc);
CREATE INDEX idx_scheduler_log_tenant_hlc ON scheduler.scheduler_log (tenant_id, t_hlc);
CREATE INDEX idx_scheduler_log_job ON scheduler.scheduler_log (job_id);
```
### batch_snapshot Table
```sql
CREATE TABLE scheduler.batch_snapshot (
id BIGSERIAL PRIMARY KEY,
snapshot_hlc TEXT NOT NULL, -- HLC at snapshot time
from_chain_link TEXT NOT NULL, -- First entry in batch
to_chain_link TEXT NOT NULL, -- Last entry in batch
entry_count INTEGER NOT NULL,
merkle_root TEXT NOT NULL, -- Merkle root of entries
dsse_envelope JSONB, -- DSSE-signed attestation
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
### chain_heads Table
```sql
CREATE TABLE scheduler.chain_heads (
tenant_id TEXT PRIMARY KEY,
head_chain_link TEXT NOT NULL, -- Current chain head
head_hlc TEXT NOT NULL, -- HLC of chain head
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
## Chain Link Computation
Each log entry is cryptographically linked to its predecessor:
```csharp
public static string ComputeChainLink(
string tHlc,
string jobId,
string action,
string? prevChainLink,
string payloadDigest)
{
using var hasher = IncrementalHash.CreateHash(HashAlgorithmName.SHA256);
hasher.AppendData(Encoding.UTF8.GetBytes(tHlc));
hasher.AppendData(Encoding.UTF8.GetBytes(jobId));
hasher.AppendData(Encoding.UTF8.GetBytes(action));
hasher.AppendData(Encoding.UTF8.GetBytes(prevChainLink ?? "genesis"));
hasher.AppendData(Encoding.UTF8.GetBytes(payloadDigest));
return Convert.ToHexString(hasher.GetHashAndReset()).ToLowerInvariant();
}
```
## Configuration Options
```yaml
# etc/scheduler.yaml
scheduler:
hlc:
enabled: true # Enable HLC ordering (default: true)
nodeId: "scheduler-prod-01" # Unique node identifier
maxClockSkew: "00:00:05" # Maximum tolerable clock skew (5 seconds)
persistenceInterval: "00:01:00" # HLC state persistence interval
chain:
enabled: true # Enable chain linking (default: true)
batchSize: 1000 # Entries per batch snapshot
batchInterval: "00:05:00" # Batch snapshot interval
signSnapshots: true # DSSE-sign batch snapshots
keyId: "scheduler-signing-key" # Key for snapshot signing
```
## Operational Considerations
### Clock Skew Handling
The HLC algorithm tolerates clock skew by:
1. Advancing logical counter when physical time hasn't progressed
2. Rejecting events with excessive clock skew (> `maxClockSkew`)
3. Emitting `hlc_clock_skew_rejections_total` metric for monitoring
**Alert:** `HlcClockSkewExceeded` triggers when skew > tolerance.
### Chain Verification
Verify chain integrity on startup and periodically:
```bash
# CLI command
stella scheduler chain verify --tenant-id <tenant>
# API endpoint
GET /api/v1/scheduler/chain/verify?tenantId=<tenant>
```
### Offline Merge
When offline nodes reconnect:
1. Export local job log as bundle
2. Import on connected node
3. HLC-based merge produces deterministic ordering
4. Chain is extended with merged entries
See `docs/operations/airgap-operations-runbook.md` for details.
## Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `hlc_ticks_total` | Counter | Total HLC tick operations |
| `hlc_clock_skew_rejections_total` | Counter | Events rejected due to clock skew |
| `hlc_physical_offset_seconds` | Gauge | Current physical time offset |
| `scheduler_chain_entries_total` | Counter | Total chain log entries |
| `scheduler_chain_verifications_total` | Counter | Chain verification operations |
| `scheduler_chain_verification_failures_total` | Counter | Failed verifications |
| `scheduler_batch_snapshots_total` | Counter | Batch snapshots created |
## Grafana Dashboard
See `devops/observability/grafana/hlc-queue-metrics.json` for the HLC monitoring dashboard.
## Related Documentation
- [HLC Core Library](../../../src/__Libraries/StellaOps.HybridLogicalClock/README.md)
- [HLC Migration Guide](./hlc-migration-guide.md)
- [Air-Gap Operations Runbook](../../operations/airgap-operations-runbook.md)
- [HLC Troubleshooting](../../operations/runbooks/hlc-troubleshooting.md)

View File

@@ -1,51 +0,0 @@
# Snapshot
**Status:** Design/Planning
**Source:** N/A (cross-cutting concept)
**Owner:** Platform Team
## Purpose
Snapshot defines the knowledge snapshot model for deterministic, point-in-time captures of StellaOps data. Enables offline operation, merge preview, replay, and air-gap export with cryptographic integrity.
## Components
**Concept Documentation:**
- `merge-preview.md` - Merge preview specification
- `replay-yaml.md` - Replay YAML format and semantics
**Snapshot Types:**
- Advisory snapshots (Concelier ingestion state)
- VEX snapshots (VexHub distribution state)
- SBOM snapshots (SbomService repository state)
- Policy snapshots (Policy Engine rule state)
- Task pack snapshots (PacksRegistry versions)
## Implementation Locations
Snapshot functionality is implemented across multiple modules:
- **AirGap** - Snapshot export/import orchestration
- **ExportCenter** - Snapshot bundle creation
- **Replay** - Deterministic replay from snapshots
- **Concelier** - Advisory snapshot merge preview
- All data modules (snapshot sources)
## Dependencies
- AirGap (snapshot orchestration)
- ExportCenter (bundle creation)
- Replay (snapshot replay)
- All data modules (snapshot sources)
## Related Documentation
- Merge Preview: `./merge-preview.md`
- Replay YAML: `./replay-yaml.md`
- AirGap: `../airgap/`
- ExportCenter: `../export-center/`
- Replay: `../replay/` (if exists)
- Offline Kit: `../../OFFLINE_KIT.md`
## Current Status
Snapshot concepts documented in merge-preview.md and replay-yaml.md. Implementation distributed across AirGap (export/import), ExportCenter (packaging), and Replay (playback) modules. Used for offline/air-gap operation.

View File

@@ -1,250 +0,0 @@
# Policy Merge Preview
## Overview
The **Policy Merge Preview** shows how VEX statements from different sources combine using lattice logic. This visualization helps analysts understand:
1. What each source contributes
2. How conflicts are resolved
3. What evidence is missing
4. The final merged status
## Merge Semantics
StellaOps uses a three-layer merge model:
```
vendor ⊕ distro ⊕ internal = final
```
Where ⊕ represents the lattice join operation.
### Layer Hierarchy
| Layer | Description | Typical Trust |
|-------|-------------|---------------|
| **Vendor** | Software vendor's VEX statements | 0.95-1.0 |
| **Distro** | Distribution maintainer's assessments | 0.85-0.95 |
| **Internal** | Organization's own assessments | 0.70-0.90 |
### Lattice Operations
Using K4 (Kleene's 4-valued logic):
| Status | Lattice Position |
|--------|-----------------|
| Affected | Top (Both) |
| UnderInvestigation | Unknown (Neither) |
| Fixed | True |
| NotAffected | False |
Join table (⊕):
```
| Affected | Under... | Fixed | NotAffected
---------+----------+----------+-------+------------
Affected | A | A | A | A
Under... | A | U | U | U
Fixed | A | U | F | F
NotAff | A | U | F | N
```
## API Endpoint
### GET /policy/merge-preview/{cveId}
**Parameters:**
- `cveId`: CVE identifier
- `artifact`: Artifact digest (query param)
**Response:**
```json
{
"cveId": "CVE-2024-1234",
"artifactDigest": "sha256:abc...",
"contributions": [
{
"layer": "vendor",
"sources": ["lodash-security"],
"status": "NotAffected",
"trustScore": 0.95,
"statements": [...],
"mergeTrace": {
"explanation": "Vendor states not affected due to version"
}
},
{
"layer": "distro",
"sources": ["redhat-csaf"],
"status": "Affected",
"trustScore": 0.90,
"statements": [...],
"mergeTrace": {
"explanation": "Distro states affected without context"
}
},
{
"layer": "internal",
"sources": [],
"status": null,
"trustScore": 0,
"statements": [],
"mergeTrace": null
}
],
"finalStatus": "NotAffected",
"finalConfidence": 0.925,
"missingEvidence": [
{
"type": "internal-assessment",
"description": "Add internal security assessment",
"priority": "medium"
}
],
"latticeType": "K4",
"generatedAt": "2024-12-22T12:00:00Z"
}
```
## Conflict Resolution
When sources disagree, resolution follows:
1. **Trust Weight**: Higher trust wins
2. **Lattice Position**: If trust equal, higher lattice position wins
3. **Freshness**: If still tied, more recent statement wins
4. **Tie**: First statement wins
### Resolution Trace
Each merge includes explanation:
```json
{
"leftSource": "vendor:lodash",
"rightSource": "distro:redhat",
"leftStatus": "NotAffected",
"rightStatus": "Affected",
"leftTrust": 0.95,
"rightTrust": 0.90,
"resultStatus": "NotAffected",
"explanation": "'vendor:lodash' has higher trust weight than 'distro:redhat'"
}
```
## UI Components
### Layer Cards
Each layer shows:
- Source names
- Current status badge
- Trust score bar
- Statement count
### Merge Flow Diagram
Visual representation:
```
┌─────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐
│ Vendor │ ──⊕─│ Distro │ ──⊕─│ Internal │ ──=─│ Final │
│ NotAff. │ │ Affected│ │ — │ │ NotAff. │
│ (0.95) │ │ (0.90) │ │ │ │ (0.925) │
└─────────┘ └─────────┘ └──────────┘ └─────────┘
```
### Missing Evidence CTA
Prominent call-to-action for missing evidence:
```
┌─────────────────────────────────────────┐
│ ⚠ Improve Confidence │
│ │
│ [+] Add internal security assessment │
│ Increases confidence by ~5% │
│ │
│ [+] Add reachability analysis │
│ Determines if code is called │
└─────────────────────────────────────────┘
```
### Merge Traces (Expandable)
Detailed explanation of each merge:
```
▼ Merge Details
Vendor layer:
Sources: lodash-security
Status: NotAffected (trust: 95%)
Distro layer:
Sources: redhat-csaf
Status: Affected (trust: 90%)
Merge: 'lodash-security' has higher trust weight
Internal layer:
No statements
Final: NotAffected (confidence: 92.5%)
```
## Configuration
### Trust Weights
Configure in `trust.yaml`:
```yaml
sources:
vendor: 1.0
distro: 0.9
nvd: 0.8
internal: 0.85
community: 0.5
unknown: 0.3
```
### Lattice Type
Configure lattice in `policy.yaml`:
```yaml
lattice:
type: K4 # or Boolean, 8-state
joinStrategy: trust-weighted
```
## Use Cases
### 1. Understanding Disagreement
When vendor says "not affected" but NVD says "affected":
- View merge preview
- See trust scores
- Understand why vendor wins
- Decide if internal assessment needed
### 2. Adding Internal Context
When external sources lack context:
- View missing evidence
- Click "Add evidence"
- Submit internal VEX statement
- See confidence increase
### 3. Audit Documentation
For compliance:
- Export merge preview
- Include in audit report
- Document decision rationale
## Best Practices
1. **Review Conflicts**: When layers disagree, investigate why
2. **Add Internal Context**: Your reachability data often resolves conflicts
3. **Trust Calibration**: Adjust trust weights based on source accuracy
4. **Document Decisions**: Use merge preview in exception justifications
## Related Documentation
- [VEX Trust Scoring](../excititor/scoring.md)
- [Lattice Configuration](../policy/implementation_plan.md)
- [REPLAY.yaml Specification](./replay-yaml.md)

View File

@@ -1,295 +0,0 @@
# REPLAY.yaml Manifest Specification
## Overview
The **REPLAY.yaml** manifest defines the complete set of inputs required to reproduce a StellaOps evaluation. It is the root document in a `.stella-replay.tgz` bundle.
## File Location
```
.stella-replay.tgz
├── REPLAY.yaml # This manifest
├── sboms/
├── vex/
├── reach/
├── exceptions/
├── policies/
├── feeds/
├── config/
└── SIGNATURE.sig # Optional DSSE signature
```
## Schema Version
Current schema version: `1.0.0`
```yaml
version: "1.0.0"
```
## Complete Example
```yaml
version: "1.0.0"
snapshot:
id: "snap-20241222-abc123"
createdAt: "2024-12-22T12:00:00Z"
artifact: "sha256:abc123..."
previousId: "snap-20241221-xyz789"
inputs:
sboms:
- path: "sboms/cyclonedx.json"
format: "cyclonedx-1.6"
digest: "sha256:def456..."
- path: "sboms/spdx.json"
format: "spdx-3.0.1"
digest: "sha256:ghi789..."
vex:
- path: "vex/vendor-lodash.json"
source: "vendor:lodash"
format: "openvex"
digest: "sha256:jkl012..."
trustScore: 0.95
- path: "vex/redhat-csaf.json"
source: "distro:redhat"
format: "csaf"
digest: "sha256:mno345..."
trustScore: 0.90
reachability:
- path: "reach/api-handler.json"
entryPoint: "/api/handler"
digest: "sha256:pqr678..."
nodeCount: 42
edgeCount: 57
exceptions:
- path: "exceptions/exc-001.json"
exceptionId: "exc-001"
digest: "sha256:stu901..."
policies:
bundlePath: "policies/bundle.tar.gz"
digest: "sha256:vwx234..."
version: "2.1.0"
rulesHash: "sha256:yza567..."
feeds:
- feedId: "nvd"
name: "National Vulnerability Database"
version: "2024-12-22T00:00:00Z"
digest: "sha256:bcd890..."
fetchedAt: "2024-12-22T06:00:00Z"
- feedId: "ghsa"
name: "GitHub Security Advisories"
version: "2024-12-22T01:00:00Z"
digest: "sha256:efg123..."
fetchedAt: "2024-12-22T06:15:00Z"
lattice:
type: "K4"
configDigest: "sha256:hij456..."
trust:
configDigest: "sha256:klm789..."
defaultWeight: 0.5
outputs:
verdictPath: "verdict.json"
verdictDigest: "sha256:nop012..."
findingsPath: "findings.ndjson"
findingsDigest: "sha256:qrs345..."
seeds:
rng: 12345678
sampling: 87654321
environment:
STELLAOPS_POLICY_VERSION: "2.1.0"
STELLAOPS_LATTICE_TYPE: "K4"
signature:
algorithm: "ecdsa-p256"
keyId: "signing-key-prod-2024"
value: "MEUCIQDx..."
```
## Field Reference
### snapshot
Metadata about the snapshot itself.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| id | string | Yes | Unique snapshot identifier |
| createdAt | datetime | Yes | ISO 8601 timestamp |
| artifact | string | Yes | Artifact digest being evaluated |
| previousId | string | No | Previous snapshot for diff |
### inputs.sboms
SBOM documents included in bundle.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| path | string | Yes | Path within bundle |
| format | string | Yes | `cyclonedx-1.6` or `spdx-3.0.1` |
| digest | string | Yes | Content digest |
### inputs.vex
VEX documents from various sources.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| path | string | Yes | Path within bundle |
| source | string | Yes | Source identifier (vendor:*, distro:*, etc.) |
| format | string | Yes | `openvex`, `csaf`, `cyclonedx-vex` |
| digest | string | Yes | Content digest |
| trustScore | number | Yes | Trust weight (0.0-1.0) |
### inputs.reachability
Reachability subgraph data.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| path | string | Yes | Path within bundle |
| entryPoint | string | Yes | Entry point identifier |
| digest | string | Yes | Content digest |
| nodeCount | integer | No | Number of nodes |
| edgeCount | integer | No | Number of edges |
### inputs.exceptions
Active exceptions at snapshot time.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| path | string | Yes | Path within bundle |
| exceptionId | string | Yes | Exception identifier |
| digest | string | Yes | Content digest |
### inputs.policies
Policy bundle reference.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| bundlePath | string | Yes | Path to policy bundle |
| digest | string | Yes | Bundle digest |
| version | string | No | Policy version |
| rulesHash | string | Yes | Hash of compiled rules |
### inputs.feeds
Advisory feed versions at snapshot time.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| feedId | string | Yes | Feed identifier |
| name | string | No | Human-readable name |
| version | string | Yes | Feed version/timestamp |
| digest | string | Yes | Feed content digest |
| fetchedAt | datetime | Yes | When feed was fetched |
### inputs.lattice
Lattice configuration for merge semantics.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| type | string | Yes | `K4`, `Boolean`, `8-state` |
| configDigest | string | Yes | Configuration hash |
### inputs.trust
Trust weight configuration.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| configDigest | string | Yes | Configuration hash |
| defaultWeight | number | No | Default trust weight |
### outputs
Evaluation outputs for verification.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| verdictPath | string | Yes | Path to verdict file |
| verdictDigest | string | Yes | Verdict content digest |
| findingsPath | string | No | Path to findings file |
| findingsDigest | string | No | Findings content digest |
### seeds
Random seeds for deterministic evaluation.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| rng | integer | No | Random number generator seed |
| sampling | integer | No | Sampling algorithm seed |
### environment
Environment variables captured (non-sensitive).
Key-value pairs of environment configuration.
### signature
DSSE signature over manifest.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| algorithm | string | Yes | Signing algorithm |
| keyId | string | Yes | Signing key identifier |
| value | string | Yes | Base64-encoded signature |
## Digest Format
All digests use the format:
```
sha256:<64-char-hex>
```
Example:
```
sha256:a1b2c3d4e5f6...
```
## Validation
Bundle validation checks:
1. REPLAY.yaml exists at bundle root
2. All referenced files exist
3. All digests match content
4. Schema validates against JSON Schema
5. Signature verifies (if present)
## CLI Usage
```bash
# Create bundle
stella snapshot export --output snapshot.stella-replay.tgz
# Verify bundle
stella snapshot verify snapshot.stella-replay.tgz
# Replay from bundle
stella replay --bundle snapshot.stella-replay.tgz
# View manifest
stella snapshot manifest snapshot.stella-replay.tgz
```
## Related Documentation
- [Knowledge Snapshot Model](./knowledge-snapshot.md)
- [Merge Preview](./merge-preview.md)
- [Replay Engine](../../modules/policy/implementation_plan.md)

View File

@@ -1,113 +1,113 @@
# Telemetry Collector Deployment Guide
> **Scope:** DevOps Guild, Observability Guild, and operators enabling the StellaOps telemetry pipeline (DEVOPS-OBS-50-001 / DEVOPS-OBS-50-003).
This guide describes how to deploy the default OpenTelemetry Collector packaged with StellaOps, validate its ingest endpoints, and prepare an offline-ready bundle for air-gapped environments.
---
## 1. Overview
The collector terminates OTLP traffic from StellaOps services and exports metrics, traces, and logs.
| Endpoint | Purpose | TLS | Authentication |
| -------- | ------- | --- | -------------- |
| `:4317` | OTLP gRPC ingest | mTLS | Client certificate issued by collector CA |
| `:4318` | OTLP HTTP ingest | mTLS | Client certificate issued by collector CA |
| `:9464` | Prometheus scrape | mTLS | Same client certificate |
| `:13133` | Health check | mTLS | Same client certificate |
| `:1777` | pprof diagnostics | mTLS | Same client certificate |
The default configuration lives at `deploy/telemetry/otel-collector-config.yaml` and mirrors the Helm values in the `stellaops` chart.
---
## 2. Local validation (Compose)
```bash
# 1. Generate dev certificates (CA + collector + client)
./ops/devops/telemetry/generate_dev_tls.sh
# 2. Start the collector overlay
cd deploy/compose
docker compose -f docker-compose.telemetry.yaml up -d
# 3. Start the storage overlay (Prometheus, Tempo, Loki)
docker compose -f docker-compose.telemetry-storage.yaml up -d
# 4. Run the smoke test (OTLP HTTP)
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
```
The smoke test posts sample traces, metrics, and logs and verifies that the collector increments the `otelcol_receiver_accepted_*` counters exposed via the Prometheus exporter. The storage overlay gives you a local Prometheus/Tempo/Loki stack to confirm end-to-end wiring. The same client certificate can be used by local services to weave traces together. See [`Telemetry Storage Deployment`](storage.md) for the storage configuration guidelines used in staging/production.
---
## 3. Kubernetes deployment
Enable the collector in Helm by setting the following values (example shown for the dev profile):
```yaml
telemetry:
collector:
enabled: true
defaultTenant: <tenant>
tls:
secretName: stellaops-otel-tls-<env>
```
Provide a Kubernetes secret named `stellaops-otel-tls-<env>` (for staging: `stellaops-otel-tls-stage`) with the keys `tls.crt`, `tls.key`, and `ca.crt`. The secret must contain the collector certificate, private key, and issuing CA respectively. Example:
```bash
kubectl create secret generic stellaops-otel-tls-stage \
--from-file=tls.crt=collector.crt \
--from-file=tls.key=collector.key \
--from-file=ca.crt=ca.crt
```
Helm renders the collector deployment, service, and config map automatically:
```bash
helm upgrade --install stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-dev.yaml
```
Update client workloads to trust `ca.crt` and present client certificates that chain back to the same CA.
---
## 4. Offline packaging (DEVOPS-OBS-50-003)
Use the packaging helper to produce a tarball that can be mirrored inside the Offline Kit or air-gapped sites:
```bash
python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-bundle.tar.gz
```
The script gathers:
- `deploy/telemetry/README.md`
- Collector configuration (`deploy/telemetry/otel-collector-config.yaml` and Helm copy)
- Helm template/values for the collector
- Compose overlay (`deploy/compose/docker-compose.telemetry.yaml`)
The tarball ships with a `.sha256` checksum. To attach a Cosign signature, add `--sign` and provide `COSIGN_KEY_REF`/`COSIGN_IDENTITY_TOKEN` env vars (or use the `--cosign-key` flag).
Distribute the bundle alongside certificates generated by your PKI. For air-gapped installs, regenerate certificates inside the enclave and recreate the `stellaops-otel-tls` secret.
---
## 5. Operational checks
1. **Health probes** `kubectl exec` into the collector pod and run `curl -fsSk --cert client.crt --key client.key --cacert ca.crt https://127.0.0.1:13133/healthz`.
2. **Metrics scrape** confirm Prometheus ingests `otelcol_receiver_accepted_*` counters.
3. **Trace correlation** ensure services propagate `trace_id` and `tenant.id` attributes; refer to `docs/modules/telemetry/guides/observability.md` for expected spans.
4. **Certificate rotation** when rotating the CA, update the secret and restart the collector; roll out new client certificates before enabling `require_client_certificate` if staged.
---
## 6. Related references
- `deploy/telemetry/README.md` source configuration and local workflow.
- `ops/devops/telemetry/smoke_otel_collector.py` OTLP smoke test.
- `docs/modules/telemetry/guides/observability.md` metrics/traces/logs taxonomy.
# Telemetry Collector Deployment Guide
> **Scope:** DevOps Guild, Observability Guild, and operators enabling the StellaOps telemetry pipeline (DEVOPS-OBS-50-001 / DEVOPS-OBS-50-003).
This guide describes how to deploy the default OpenTelemetry Collector packaged with StellaOps, validate its ingest endpoints, and prepare an offline-ready bundle for air-gapped environments.
---
## 1. Overview
The collector terminates OTLP traffic from StellaOps services and exports metrics, traces, and logs.
| Endpoint | Purpose | TLS | Authentication |
| -------- | ------- | --- | -------------- |
| `:4317` | OTLP gRPC ingest | mTLS | Client certificate issued by collector CA |
| `:4318` | OTLP HTTP ingest | mTLS | Client certificate issued by collector CA |
| `:9464` | Prometheus scrape | mTLS | Same client certificate |
| `:13133` | Health check | mTLS | Same client certificate |
| `:1777` | pprof diagnostics | mTLS | Same client certificate |
The default configuration lives at `deploy/telemetry/otel-collector-config.yaml` and mirrors the Helm values in the `stellaops` chart.
---
## 2. Local validation (Compose)
```bash
# 1. Generate dev certificates (CA + collector + client)
./ops/devops/telemetry/generate_dev_tls.sh
# 2. Start the collector overlay
cd deploy/compose
docker compose -f docker-compose.telemetry.yaml up -d
# 3. Start the storage overlay (Prometheus, Tempo, Loki)
docker compose -f docker-compose.telemetry-storage.yaml up -d
# 4. Run the smoke test (OTLP HTTP)
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
```
The smoke test posts sample traces, metrics, and logs and verifies that the collector increments the `otelcol_receiver_accepted_*` counters exposed via the Prometheus exporter. The storage overlay gives you a local Prometheus/Tempo/Loki stack to confirm end-to-end wiring. The same client certificate can be used by local services to weave traces together. See [`Telemetry Storage Deployment`](storage.md) for the storage configuration guidelines used in staging/production.
---
## 3. Kubernetes deployment
Enable the collector in Helm by setting the following values (example shown for the dev profile):
```yaml
telemetry:
collector:
enabled: true
defaultTenant: <tenant>
tls:
secretName: stellaops-otel-tls-<env>
```
Provide a Kubernetes secret named `stellaops-otel-tls-<env>` (for staging: `stellaops-otel-tls-stage`) with the keys `tls.crt`, `tls.key`, and `ca.crt`. The secret must contain the collector certificate, private key, and issuing CA respectively. Example:
```bash
kubectl create secret generic stellaops-otel-tls-stage \
--from-file=tls.crt=collector.crt \
--from-file=tls.key=collector.key \
--from-file=ca.crt=ca.crt
```
Helm renders the collector deployment, service, and config map automatically:
```bash
helm upgrade --install stellaops devops/helm/stellaops -f devops/helm/stellaops/values-dev.yaml
```
Update client workloads to trust `ca.crt` and present client certificates that chain back to the same CA.
---
## 4. Offline packaging (DEVOPS-OBS-50-003)
Use the packaging helper to produce a tarball that can be mirrored inside the Offline Kit or air-gapped sites:
```bash
python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-bundle.tar.gz
```
The script gathers:
- `deploy/telemetry/README.md`
- Collector configuration (`deploy/telemetry/otel-collector-config.yaml` and Helm copy)
- Helm template/values for the collector
- Compose overlay (`devops/compose/docker-compose.telemetry.yaml`)
The tarball ships with a `.sha256` checksum. To attach a Cosign signature, add `--sign` and provide `COSIGN_KEY_REF`/`COSIGN_IDENTITY_TOKEN` env vars (or use the `--cosign-key` flag).
Distribute the bundle alongside certificates generated by your PKI. For air-gapped installs, regenerate certificates inside the enclave and recreate the `stellaops-otel-tls` secret.
---
## 5. Operational checks
1. **Health probes** `kubectl exec` into the collector pod and run `curl -fsSk --cert client.crt --key client.key --cacert ca.crt https://127.0.0.1:13133/healthz`.
2. **Metrics scrape** confirm Prometheus ingests `otelcol_receiver_accepted_*` counters.
3. **Trace correlation** ensure services propagate `trace_id` and `tenant.id` attributes; refer to `docs/modules/telemetry/guides/observability.md` for expected spans.
4. **Certificate rotation** when rotating the CA, update the secret and restart the collector; roll out new client certificates before enabling `require_client_certificate` if staged.
---
## 6. Related references
- `deploy/telemetry/README.md` source configuration and local workflow.
- `ops/devops/telemetry/smoke_otel_collector.py` OTLP smoke test.
- `docs/modules/telemetry/guides/observability.md` metrics/traces/logs taxonomy.
- `docs/RELEASE_ENGINEERING_PLAYBOOK.md` release checklist for telemetry assets.

View File

@@ -1,25 +1,25 @@
# Telemetry Storage Deployment (DEVOPS-OBS-50-002)
> **Audience:** DevOps Guild, Observability Guild
>
> **Scope:** Prometheus (metrics), Tempo (traces), Loki (logs) storage backends with tenant isolation, TLS, retention policies, and Authority integration.
---
## 1. Components & Ports
| Service | Port | Purpose | TLS |
|-----------|------|---------|-----|
| Prometheus | 9090 | Metrics API / alerting | Client auth (mTLS) to scrape collector |
| Tempo | 3200 | Trace ingest + API | mTLS (client cert required) |
| Loki | 3100 | Log ingest + API | mTLS (client cert required) |
The collector forwards OTLP traffic to Tempo (traces), Prometheus scrapes the collectors `/metrics` endpoint, and Loki is used for log search.
---
## 2. Local validation (Compose)
# Telemetry Storage Deployment (DEVOPS-OBS-50-002)
> **Audience:** DevOps Guild, Observability Guild
>
> **Scope:** Prometheus (metrics), Tempo (traces), Loki (logs) storage backends with tenant isolation, TLS, retention policies, and Authority integration.
---
## 1. Components & Ports
| Service | Port | Purpose | TLS |
|-----------|------|---------|-----|
| Prometheus | 9090 | Metrics API / alerting | Client auth (mTLS) to scrape collector |
| Tempo | 3200 | Trace ingest + API | mTLS (client cert required) |
| Loki | 3100 | Log ingest + API | mTLS (client cert required) |
The collector forwards OTLP traffic to Tempo (traces), Prometheus scrapes the collectors `/metrics` endpoint, and Loki is used for log search.
---
## 2. Local validation (Compose)
```bash
./ops/devops/telemetry/generate_dev_tls.sh
cd deploy/compose
@@ -35,145 +35,145 @@ python ../../ops/devops/telemetry/tenant_isolation_smoke.py \
```
Configuration files live in `deploy/telemetry/storage/`. Adjust the overrides before shipping to staging/production.
---
## 3. Kubernetes blueprint
Deploy Prometheus, Tempo, and Loki to the `observability` namespace. The Helm values snippet below illustrates the key settings (charts not yet versioned—define them in the observability repo):
```yaml
prometheus:
server:
extraFlags:
- web.enable-lifecycle
persistentVolume:
enabled: true
size: 200Gi
additionalScrapeConfigsSecret: stellaops-prometheus-scrape
extraSecretMounts:
- name: otel-mtls
secretName: stellaops-otel-tls-stage
mountPath: /etc/telemetry/tls
readOnly: true
- name: otel-token
secretName: stellaops-prometheus-token
mountPath: /etc/telemetry/auth
readOnly: true
loki:
auth_enabled: true
singleBinary:
replicas: 2
storage:
type: filesystem
existingSecretForTls: stellaops-otel-tls-stage
runtimeConfig:
configMap:
name: stellaops-loki-tenant-overrides
tempo:
server:
http_listen_port: 3200
storage:
trace:
backend: s3
s3:
endpoint: tempo-minio.observability.svc:9000
bucket: tempo-traces
multitenancyEnabled: true
extraVolumeMounts:
- name: otel-mtls
mountPath: /etc/telemetry/tls
readOnly: true
- name: tempo-tenant-overrides
mountPath: /etc/telemetry/tenants
readOnly: true
```
### Staging bootstrap commands
```bash
kubectl create namespace observability --dry-run=client -o yaml | kubectl apply -f -
# TLS material (generated via ops/devops/telemetry/generate_dev_tls.sh or from PKI)
kubectl -n observability create secret generic stellaops-otel-tls-stage \
--from-file=tls.crt=collector-stage.crt \
--from-file=tls.key=collector-stage.key \
--from-file=ca.crt=collector-ca.crt
# Prometheus bearer token issued by Authority (scope obs:read)
kubectl -n observability create secret generic stellaops-prometheus-token \
--from-file=token=prometheus-stage.token
# Tenant overrides
kubectl -n observability create configmap stellaops-loki-tenant-overrides \
--from-file=overrides.yaml=deploy/telemetry/storage/tenants/loki-overrides.yaml
kubectl -n observability create configmap tempo-tenant-overrides \
--from-file=tempo-overrides.yaml=deploy/telemetry/storage/tenants/tempo-overrides.yaml
# Additional scrape config referencing the collector service
kubectl -n observability create secret generic stellaops-prometheus-scrape \
--from-file=prometheus-additional.yaml=deploy/telemetry/storage/prometheus.yaml
```
Provision the following secrets/configs (names can be overridden via Helm values):
| Name | Type | Notes |
|------|------|-------|
| `stellaops-otel-tls-stage` | Secret | Shared CA + server cert/key for collector/storage mTLS.
| `stellaops-prometheus-token` | Secret | Bearer token minted by Authority (`obs:read`).
| `stellaops-loki-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/loki-overrides.yaml`.
| `tempo-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/tempo-overrides.yaml`.
---
## 4. Authority & tenancy integration
1. Create Authority clients for each backend (`observability-prometheus`, `observability-loki`, `observability-tempo`).
```bash
stella authority client create observability-prometheus \
--scopes obs:read \
--audience observability --description "Prometheus collector scrape"
stella authority client create observability-loki \
--scopes obs:logs timeline:read \
--audience observability --description "Loki ingestion"
stella authority client create observability-tempo \
--scopes obs:traces \
--audience observability --description "Tempo ingestion"
```
2. Mint tokens/credentials and store them in the secrets above (see staging bootstrap commands). Example:
```bash
stella authority token issue observability-prometheus --ttl 30d > prometheus-stage.token
```
3. Update ingress/gateway policies to forward `X-StellaOps-Tenant` into Loki/Tempo so tenant headers propagate end-to-end, and ensure each workload sets `tenant.id` attributes (see `docs/observability/observability.md`).
---
## 5. Retention & isolation
- Adjust `deploy/telemetry/storage/tenants/*.yaml` to set per-tenant retention and ingestion limits.
- Configure object storage (S3, GCS, Azure Blob) when moving beyond filesystem storage.
- For air-gapped deployments, mirror the telemetry bundle using `ops/devops/telemetry/package_offline_bundle.py` and import inside the Offline Kit staging directory.
---
## 6. Operational checklist
- [ ] Certificates rotated and secrets updated.
---
## 3. Kubernetes blueprint
Deploy Prometheus, Tempo, and Loki to the `observability` namespace. The Helm values snippet below illustrates the key settings (charts not yet versioned—define them in the observability repo):
```yaml
prometheus:
server:
extraFlags:
- web.enable-lifecycle
persistentVolume:
enabled: true
size: 200Gi
additionalScrapeConfigsSecret: stellaops-prometheus-scrape
extraSecretMounts:
- name: otel-mtls
secretName: stellaops-otel-tls-stage
mountPath: /etc/telemetry/tls
readOnly: true
- name: otel-token
secretName: stellaops-prometheus-token
mountPath: /etc/telemetry/auth
readOnly: true
loki:
auth_enabled: true
singleBinary:
replicas: 2
storage:
type: filesystem
existingSecretForTls: stellaops-otel-tls-stage
runtimeConfig:
configMap:
name: stellaops-loki-tenant-overrides
tempo:
server:
http_listen_port: 3200
storage:
trace:
backend: s3
s3:
endpoint: tempo-minio.observability.svc:9000
bucket: tempo-traces
multitenancyEnabled: true
extraVolumeMounts:
- name: otel-mtls
mountPath: /etc/telemetry/tls
readOnly: true
- name: tempo-tenant-overrides
mountPath: /etc/telemetry/tenants
readOnly: true
```
### Staging bootstrap commands
```bash
kubectl create namespace observability --dry-run=client -o yaml | kubectl apply -f -
# TLS material (generated via ops/devops/telemetry/generate_dev_tls.sh or from PKI)
kubectl -n observability create secret generic stellaops-otel-tls-stage \
--from-file=tls.crt=collector-stage.crt \
--from-file=tls.key=collector-stage.key \
--from-file=ca.crt=collector-ca.crt
# Prometheus bearer token issued by Authority (scope obs:read)
kubectl -n observability create secret generic stellaops-prometheus-token \
--from-file=token=prometheus-stage.token
# Tenant overrides
kubectl -n observability create configmap stellaops-loki-tenant-overrides \
--from-file=overrides.yaml=deploy/telemetry/storage/tenants/loki-overrides.yaml
kubectl -n observability create configmap tempo-tenant-overrides \
--from-file=tempo-overrides.yaml=deploy/telemetry/storage/tenants/tempo-overrides.yaml
# Additional scrape config referencing the collector service
kubectl -n observability create secret generic stellaops-prometheus-scrape \
--from-file=prometheus-additional.yaml=deploy/telemetry/storage/prometheus.yaml
```
Provision the following secrets/configs (names can be overridden via Helm values):
| Name | Type | Notes |
|------|------|-------|
| `stellaops-otel-tls-stage` | Secret | Shared CA + server cert/key for collector/storage mTLS.
| `stellaops-prometheus-token` | Secret | Bearer token minted by Authority (`obs:read`).
| `stellaops-loki-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/loki-overrides.yaml`.
| `tempo-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/tempo-overrides.yaml`.
---
## 4. Authority & tenancy integration
1. Create Authority clients for each backend (`observability-prometheus`, `observability-loki`, `observability-tempo`).
```bash
stella authority client create observability-prometheus \
--scopes obs:read \
--audience observability --description "Prometheus collector scrape"
stella authority client create observability-loki \
--scopes obs:logs timeline:read \
--audience observability --description "Loki ingestion"
stella authority client create observability-tempo \
--scopes obs:traces \
--audience observability --description "Tempo ingestion"
```
2. Mint tokens/credentials and store them in the secrets above (see staging bootstrap commands). Example:
```bash
stella authority token issue observability-prometheus --ttl 30d > prometheus-stage.token
```
3. Update ingress/gateway policies to forward `X-StellaOps-Tenant` into Loki/Tempo so tenant headers propagate end-to-end, and ensure each workload sets `tenant.id` attributes (see `docs/observability/observability.md`).
---
## 5. Retention & isolation
- Adjust `deploy/telemetry/storage/tenants/*.yaml` to set per-tenant retention and ingestion limits.
- Configure object storage (S3, GCS, Azure Blob) when moving beyond filesystem storage.
- For air-gapped deployments, mirror the telemetry bundle using `ops/devops/telemetry/package_offline_bundle.py` and import inside the Offline Kit staging directory.
---
## 6. Operational checklist
- [ ] Certificates rotated and secrets updated.
- [ ] Prometheus scrape succeeds (`curl -sk --cert client.crt --key client.key https://collector:9464`).
- [ ] Tempo and Loki report tenant activity (`/api/status`).
- [ ] Retention policy tested by uploading sample data and verifying expiry.
- [ ] `python ops/devops/telemetry/validate_storage_stack.py` passes before committing updated configs.
- [ ] Alerts wired into SLO evaluator (DEVOPS-OBS-51-001).
- [ ] Component rule packs imported (e.g. `docs/modules/scheduler/operations/worker-prometheus-rules.yaml`).
---
## 7. References
- `deploy/telemetry/storage/README.md`
- `deploy/compose/docker-compose.telemetry-storage.yaml`
- `docs/modules/telemetry/operations/collector.md`
- `docs/observability/observability.md`
---
## 7. References
- `deploy/telemetry/storage/README.md`
- `devops/compose/docker-compose.telemetry-storage.yaml`
- `docs/modules/telemetry/operations/collector.md`
- `docs/observability/observability.md`

View File

@@ -1,409 +0,0 @@
# Testing Enhancements Architecture
**Version:** 1.0.0
**Last Updated:** 2026-01-05
**Status:** In Development
## Overview
This document describes the architecture of StellaOps testing enhancements derived from the product advisory "New Testing Enhancements for Stella Ops" (05-Dec-2026). The enhancements address gaps in temporal correctness, policy drift control, replayability, and competitive awareness.
## Problem Statement
> "The next gains for StellaOps testing are no longer about coverage—they're about temporal correctness, policy drift control, replayability, and competitive awareness. Systems that fail now do so quietly, over time, and under sequence pressure."
### Key Gaps Identified
| Gap | Impact | Current State |
|-----|--------|---------------|
| **Temporal Edge Cases** | Silent failures under clock drift, leap seconds, TTL boundaries | TimeProvider exists but no edge case tests |
| **Failure Choreography** | Cascading failures untested | Single-point chaos tests only |
| **Trace Replay** | Assumptions vs. reality mismatch | Replay module underutilized |
| **Policy Drift** | Silent behavior changes | Determinism tests exist but no diff testing |
| **Decision Opacity** | Audit/debug difficulty | Verdicts without explanations |
| **Evidence Gaps** | Test runs not audit-grade | TRX files not in EvidenceLocker |
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Testing Enhancements Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Time-Skew │ │ Trace Replay │ │ Failure │ │
│ │ & Idempotency │ │ & Evidence │ │ Choreography │ │
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ StellaOps.Testing.* Libraries │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ Temporal │ │ Replay │ │ Chaos │ │ Evidence │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────┘ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ Policy │ │Explainability│ │ Coverage │ │ConfigDiff│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ Existing Infrastructure │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ TestKit │ │Determinism │ │ Postgres │ │ AirGap │ │ │
│ │ │ │ │ Testing │ │ Testing │ │ Testing │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
## Component Architecture
### 1. Temporal Testing (`StellaOps.Testing.Temporal`)
**Purpose:** Simulate temporal edge conditions and verify idempotency.
```
┌─────────────────────────────────────────────────────────────┐
│ Temporal Testing │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ SimulatedTimeProvider│ │ IdempotencyVerifier │ │
│ │ - Advance() │ │ - VerifyAsync() │ │
│ │ - JumpTo() │ │ - VerifyWithRetries│ │
│ │ - SetDrift() │ └─────────────────────┘ │
│ │ - JumpBackward() │ │
│ └─────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │LeapSecondTimeProvider│ │TtlBoundaryTimeProvider│ │
│ │ - AdvanceThrough │ │ - PositionAtExpiry │ │
│ │ LeapSecond() │ │ - GenerateBoundary │ │
│ └─────────────────────┘ │ TestCases() │ │
│ └─────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ ClockSkewAssertions │ │
│ │ - AssertHandlesClockJumpForward() │ │
│ │ - AssertHandlesClockJumpBackward() │ │
│ │ - AssertHandlesClockDrift() │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
**Key Interfaces:**
- `SimulatedTimeProvider` - Time progression with drift
- `IdempotencyVerifier<T>` - Retry idempotency verification
- `ClockSkewAssertions` - Clock anomaly assertions
### 2. Trace Replay & Evidence (`StellaOps.Testing.Replay`, `StellaOps.Testing.Evidence`)
**Purpose:** Replay production traces and link test runs to EvidenceLocker.
```
┌─────────────────────────────────────────────────────────────┐
│ Trace Replay & Evidence │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │TraceAnonymizer │ │ TestEvidenceService │ │
│ │ - AnonymizeAsync│ │ - BeginSessionAsync │ │
│ │ - ValidateAnon │ │ - RecordTestResult │ │
│ └────────┬────────┘ │ - FinalizeSession │ │
│ │ └──────────┬──────────┘ │
│ ▼ │ │
│ ┌─────────────────┐ ▼ │
│ │TraceCorpusManager│ ┌─────────────────────┐ │
│ │ - ImportAsync │ │ EvidenceLocker │ │
│ │ - QueryAsync │ │ (immutable storage)│ │
│ └────────┬─────────┘ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ ReplayIntegrationTestBase │ │
│ │ - ReplayAndVerifyAsync() │ │
│ │ - ReplayBatchAsync() │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
**Data Flow:**
```
Production Traces → Anonymization → Corpus → Replay Tests → Evidence Bundle
```
### 3. Failure Choreography (`StellaOps.Testing.Chaos`)
**Purpose:** Orchestrate sequenced, cascading failure scenarios.
```
┌─────────────────────────────────────────────────────────────┐
│ Failure Choreography │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ FailureChoreographer │ │
│ │ - InjectFailure(componentId, failureType) │ │
│ │ - RecoverComponent(componentId) │ │
│ │ - ExecuteOperation(name, action) │ │
│ │ - AssertCondition(name, condition) │ │
│ │ - ExecuteAsync() → ChoreographyResult │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────┐ ┌────────────────┐ │
│ │DatabaseFailure │ │HttpClient │ │ CacheFailure │ │
│ │ Injector │ │ Injector │ │ Injector │ │
│ └────────────────┘ └────────────┘ └────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ ConvergenceTracker │ │
│ │ - CaptureSnapshotAsync() │ │
│ │ - WaitForConvergenceAsync() │ │
│ │ - VerifyConvergenceAsync() │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────┐ ┌────────────────┐ │
│ │ DatabaseState │ │ Metrics │ │ QueueState │ │
│ │ Probe │ │ Probe │ │ Probe │ │
│ └────────────────┘ └────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
**Failure Types:**
- `Unavailable` - Component completely down
- `Timeout` - Slow responses
- `Intermittent` - Random failures
- `PartialFailure` - Some operations fail
- `Degraded` - Reduced capacity
- `Flapping` - Alternating up/down
### 4. Policy & Explainability (`StellaOps.Core.Explainability`, `StellaOps.Testing.Policy`)
**Purpose:** Explain automated decisions and test policy changes.
```
┌─────────────────────────────────────────────────────────────┐
│ Policy & Explainability │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ DecisionExplanation │ │
│ │ - DecisionId, DecisionType, DecidedAt │ │
│ │ - Outcome (value, confidence, summary) │ │
│ │ - Factors[] (type, weight, contribution) │ │
│ │ - AppliedRules[] (id, triggered, impact) │ │
│ │ - Metadata (engine version, input hashes) │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │IExplainableDecision│ │ ExplainabilityAssertions│ │
│ │ <TInput, TOutput> │ │ - AssertHasExplanation │ │
│ │ - EvaluateWith │ │ - AssertExplanation │ │
│ │ ExplanationAsync│ │ Reproducible │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ PolicyDiffEngine │ │
│ │ - ComputeDiffAsync(baseline, new, inputs) │ │
│ │ → PolicyDiffResult (changed behaviors, deltas) │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ PolicyRegressionTestBase │ │
│ │ - Policy_Change_Produces_Expected_Diff() │ │
│ │ - Policy_Change_No_Unexpected_Regressions() │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
**Explainable Services:**
- `ExplainableVexConsensusService`
- `ExplainableRiskScoringService`
- `ExplainablePolicyEngine`
### 5. Cross-Cutting Standards (`StellaOps.Testing.*`)
**Purpose:** Enforce standards across all testing.
```
┌─────────────────────────────────────────────────────────────┐
│ Cross-Cutting Standards │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ BlastRadius Annotations │ │
│ │ - Auth, Scanning, Evidence, Compliance │ │
│ │ - Advisories, RiskPolicy, Crypto │ │
│ │ - Integrations, Persistence, Api │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ SchemaEvolutionTestBase │ │
│ │ - TestAgainstPreviousSchemaAsync() │ │
│ │ - TestReadBackwardCompatibilityAsync() │ │
│ │ - TestWriteForwardCompatibilityAsync() │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ BranchCoverageEnforcer │ │
│ │ - Validate() → dead paths │ │
│ │ - GenerateDeadPathReport() │ │
│ │ - Exemption mechanism │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ ConfigDiffTestBase │ │
│ │ - TestConfigBehavioralDeltaAsync() │ │
│ │ - TestConfigIsolationAsync() │ │
│ └───────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
## Library Structure
```
src/__Tests/__Libraries/
├── StellaOps.Testing.Temporal/
│ ├── SimulatedTimeProvider.cs
│ ├── LeapSecondTimeProvider.cs
│ ├── TtlBoundaryTimeProvider.cs
│ ├── IdempotencyVerifier.cs
│ └── ClockSkewAssertions.cs
├── StellaOps.Testing.Replay/
│ ├── ReplayIntegrationTestBase.cs
│ └── IReplayOrchestrator.cs
├── StellaOps.Testing.Evidence/
│ ├── ITestEvidenceService.cs
│ ├── TestEvidenceService.cs
│ └── XunitEvidenceReporter.cs
├── StellaOps.Testing.Chaos/
│ ├── FailureChoreographer.cs
│ ├── ConvergenceTracker.cs
│ ├── Injectors/
│ │ ├── IFailureInjector.cs
│ │ ├── DatabaseFailureInjector.cs
│ │ ├── HttpClientFailureInjector.cs
│ │ └── CacheFailureInjector.cs
│ └── Probes/
│ ├── IStateProbe.cs
│ ├── DatabaseStateProbe.cs
│ └── MetricsStateProbe.cs
├── StellaOps.Testing.Policy/
│ ├── PolicyDiffEngine.cs
│ ├── PolicyRegressionTestBase.cs
│ └── PolicyVersionControl.cs
├── StellaOps.Testing.Explainability/
│ └── ExplainabilityAssertions.cs
├── StellaOps.Testing.SchemaEvolution/
│ └── SchemaEvolutionTestBase.cs
├── StellaOps.Testing.Coverage/
│ └── BranchCoverageEnforcer.cs
└── StellaOps.Testing.ConfigDiff/
└── ConfigDiffTestBase.cs
```
## CI/CD Integration
### Pipeline Structure
```
┌─────────────────────────────────────────────────────────────┐
│ CI/CD Pipelines │
├─────────────────────────────────────────────────────────────┤
│ │
│ PR-Gating: │
│ ├── test-blast-radius.yml (validate annotations) │
│ ├── policy-diff.yml (policy change validation) │
│ ├── dead-path-detection.yml (coverage enforcement) │
│ └── test-evidence.yml (evidence capture) │
│ │
│ Scheduled: │
│ ├── schema-evolution.yml (backward compat tests) │
│ ├── chaos-choreography.yml (failure choreography) │
│ └── trace-replay.yml (production trace replay) │
│ │
│ On-Demand: │
│ └── rollback-lag.yml (rollback timing measurement) │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Workflow Triggers
| Workflow | Trigger | Purpose |
|----------|---------|---------|
| test-blast-radius | PR (test files) | Validate annotations |
| policy-diff | PR (policy files) | Validate policy changes |
| dead-path-detection | Push/PR | Prevent untested code |
| test-evidence | Push (main) | Store test evidence |
| schema-evolution | Daily | Backward compatibility |
| chaos-choreography | Weekly | Cascading failure tests |
| trace-replay | Weekly | Production trace validation |
| rollback-lag | Manual | Measure rollback timing |
## Implementation Roadmap
### Sprint Schedule
| Sprint | Focus | Duration | Key Deliverables |
|--------|-------|----------|------------------|
| 002_001 | Time-Skew & Idempotency | 3 weeks | Temporal libraries, module tests |
| 002_002 | Trace Replay & Evidence | 3 weeks | Anonymization, evidence linking |
| 002_003 | Failure Choreography | 3 weeks | Choreographer, cascade tests |
| 002_004 | Policy & Explainability | 3 weeks | Explanation schema, diff testing |
| 002_005 | Cross-Cutting Standards | 3 weeks | Annotations, CI enforcement |
### Dependencies
```
002_001 (Temporal) ────┐
002_002 (Replay) ──────┼──→ 002_003 (Choreography) ──→ 002_005 (Cross-Cutting)
│ ↑
002_004 (Policy) ──────┘────────────────────────────────────┘
```
## Success Metrics
| Metric | Baseline | Target | Sprint |
|--------|----------|--------|--------|
| Temporal edge case coverage | ~5% | 80%+ | 002_001 |
| Idempotency test coverage | ~10% | 90%+ | 002_001 |
| Replay test coverage | 0% | 50%+ | 002_002 |
| Test evidence capture | 0% | 100% | 002_002 |
| Choreographed failure scenarios | 0 | 15+ | 002_003 |
| Decisions with explanations | 0% | 100% | 002_004 |
| Policy changes with diff tests | 0% | 100% | 002_004 |
| Tests with blast-radius | ~10% | 100% | 002_005 |
| Dead paths (non-exempt) | Unknown | <50 | 002_005 |
## References
- **Sprint Files:**
- `docs/implplan/SPRINT_20260105_002_001_TEST_time_skew_idempotency.md`
- `docs/implplan/SPRINT_20260105_002_002_TEST_trace_replay_evidence.md`
- `docs/implplan/SPRINT_20260105_002_003_TEST_failure_choreography.md`
- `docs/implplan/SPRINT_20260105_002_004_TEST_policy_explainability.md`
- `docs/implplan/SPRINT_20260105_002_005_TEST_cross_cutting.md`
- **Advisory:** `docs/product-advisories/05-Dec-2026 - New Testing Enhancements for Stella Ops.md`
- **Test Infrastructure:** `src/__Tests/AGENTS.md`

View File

@@ -0,0 +1,56 @@
# Timeline Indexer
> Timeline event indexing and query service.
## Purpose
TimelineIndexer provides fast, indexed access to timeline events across all StellaOps services. It enables efficient querying of vulnerability history, scan timelines, and policy evaluation trails.
## Quick Links
- [Architecture](./architecture.md) - Technical design and implementation details
- [Guides](./guides/) - Query and configuration guides
## Status
| Attribute | Value |
|-----------|-------|
| **Maturity** | Production |
| **Last Reviewed** | 2025-12-29 |
| **Maintainer** | Platform Guild |
## Key Features
- **Event Indexing**: Index events from multiple StellaOps services
- **Time-Range Queries**: Efficient time-series queries with filtering
- **Event Stream Integration**: Consume from NATS/Valkey event streams
- **PostgreSQL Storage**: Time-series indexes for fast retrieval
## Dependencies
### Upstream (this module depends on)
- **PostgreSQL** - Event storage with time-series indexes
- **NATS/Valkey** - Event stream consumption
- **Authority** - Authentication
### Downstream (modules that depend on this)
- **Web Console** - Timeline visualization
- **CLI** - Timeline queries
- **ExportCenter** - Timeline data exports
## Configuration
```yaml
timeline_indexer:
event_sources:
- nats://events.stellaops.local
retention_days: 365
```
## Notes
TimelineIndexer indexes events; it does not generate them. Events are received from event streams published by other services.
## Related Documentation
- [Telemetry Architecture](../telemetry/architecture.md)

View File

@@ -1,128 +1,128 @@
# Zastava Runtime Operations Runbook
This runbook covers the runtime plane (Observer DaemonSet + Admission Webhook).
It aligns with `Sprint 12 Runtime Guardrails` and assumes components consume
`StellaOps.Zastava.Core` (`AddZastavaRuntimeCore(...)`).
## 1. Prerequisites
- **Authority client credentials** service principal `zastava-runtime` with scopes
`aud:scanner` and `api:scanner.runtime.write`. Provision DPoP keys and mTLS client
certs before rollout.
- **Scanner/WebService reachability** cluster DNS entry (e.g. `scanner.internal`)
resolvable from every node running Observer/Webhook.
- **Host mounts** read-only access to `/proc`, container runtime state
(`/var/lib/containerd`, `/var/run/containerd/containerd.sock`) and scratch space
(`/var/run/zastava`).
- **Offline kit bundle** operators staging air-gapped installs must download
`offline-kit/zastava-runtime-{version}.tar.zst` containing container images,
Grafana dashboards, and Prometheus rules referenced below.
- **Secrets** Authority OpTok cache dir, DPoP private keys, and webhook TLS secrets
live outside git. For air-gapped installs copy them to the sealed secrets vault.
### 1.1 Telemetry quick reference
| Metric | Description | Notes |
|--------|-------------|-------|
| `zastava.runtime.events.total{tenant,component,kind}` | Rate of observer events sent to Scanner | Expect >0 on busy nodes. |
| `zastava.runtime.backend.latency.ms` | Histogram (ms) for `/runtime/events` and `/policy/runtime` calls | P95 & P99 drive alerting. |
| `zastava.admission.decisions.total{decision}` | Admission verdict counts | Track deny spikes or fail-open fallbacks. |
| `zastava.admission.cache.hits.total` | (future) Cache utilisation once Observer batches land | Placeholder until Observer tasks 12-004 complete. |
## 2. Deployment workflows
### 2.1 Fresh install (Helm overlay)
1. Load offline kit bundle: `oras cp offline-kit/zastava-runtime-*.tar.zst oci:registry.internal/zastava`.
2. Render values:
- `zastava.runtime.tenant`, `environment`, `deployment` (cluster identifier).
- `zastava.runtime.authority` block (issuer, clientId, audience, DPoP toggle).
- `zastava.runtime.metrics.commonTags.cluster` for Prometheus labels.
3. Pre-create secrets:
- `zastava-authority-dpop` (JWK + private key).
- `zastava-authority-mtls` (client cert/key chain).
- `zastava-webhook-tls` (serving cert; CSR bundle if using auto-approval).
4. Deploy Observer DaemonSet and Webhook chart:
```sh
helm upgrade --install zastava-runtime deploy/helm/zastava \
-f values/zastava-runtime.yaml \
--namespace stellaops \
--create-namespace
```
5. Verify:
- `kubectl -n stellaops get pods -l app=zastava-observer` ready.
- `kubectl -n stellaops logs ds/zastava-observer --tail=20` shows
`Issued runtime OpTok` audit line with DPoP token type.
- Admission webhook registered: `kubectl get validatingwebhookconfiguration zastava-webhook`.
### 2.2 Upgrades
1. Scale webhook deployment to `--replicas=3` (rolling).
2. Drain one node per AZ to ensure Observer tolerates disruption.
3. Apply chart upgrade; watch `zastava.runtime.backend.latency.ms` P95 (<250 ms).
4. Post-upgrade, run smoke tests:
- Apply unsigned Pod manifest → expect `deny` (policy fail).
- Apply signed Pod manifest → expect `allow`.
5. Record upgrade in ops log with Git SHA + Helm chart version.
### 2.3 Rollback
1. Use Helm revision history: `helm history zastava-runtime`.
2. Rollback: `helm rollback zastava-runtime <revision>`.
3. Invalidate cached OpToks:
```sh
kubectl -n stellaops exec deploy/zastava-webhook -- \
zastava-webhook invalidate-op-token --audience scanner
```
4. Confirm observers reconnect via metrics (`rate(zastava_runtime_events_total[5m])`).
## 3. Authority & security guardrails
- Tokens must be `DPoP` type when `requireDpop=true`. Logs emit
`authority.token.issue` scope with decision data; absence indicates misconfig.
- `requireMutualTls=true` enforces mTLS during token acquisition. Disable only in
lab clusters; expect warning log `Mutual TLS requirement disabled`.
- Static fallback tokens (`allowStaticTokenFallback=true`) should exist only during
initial bootstrap. Rotate nightly; preference is to disable once Authority reachable.
- Audit every change in `zastava.runtime.authority` through change management.
Use `kubectl get secret zastava-authority-dpop -o jsonpath='{.metadata.annotations.revision}'`
to confirm key rotation.
## 4. Incident response
### 4.1 Authority offline
1. Check Prometheus alert `ZastavaAuthorityTokenStale`.
2. Inspect Observer logs for `authority.token.fallback` scope.
3. If fallback engaged, verify static token validity duration; rotate secret if older than 24 h.
4. Once Authority restored, delete static fallback secret and restart pods to rebind DPoP keys.
### 4.2 Scanner/WebService latency spike
1. Alert `ZastavaRuntimeBackendLatencyHigh` fires at P95 > 750 ms for 5 minutes.
2. Run backend health: `kubectl -n scanner exec deploy/scanner-web -- curl -f localhost:8080/healthz/ready`.
3. If backend degraded, auto buffer may throttle. Confirm disk-backed queue size via
`kubectl logs ds/zastava-observer | grep buffer.drops`.
4. Consider enabling fail-open for namespaces listed in runbook Appendix B (temporary).
### 4.3 Admission deny storm
1. Alert `ZastavaAdmissionDenySpike` indicates >20 denies/minute.
2. Pull sample: `kubectl logs deploy/zastava-webhook --since=10m | jq '.decision'`.
3. Cross-check policy backlog in Scanner (`/policy/runtime` logs). Engage application
owner; optionally set namespace to `failOpenNamespaces` after risk assessment.
## 5. Offline kit & air-gapped notes
- Bundle contents:
- Observer/Webhook container images (multi-arch).
# Zastava Runtime Operations Runbook
This runbook covers the runtime plane (Observer DaemonSet + Admission Webhook).
It aligns with `Sprint 12 Runtime Guardrails` and assumes components consume
`StellaOps.Zastava.Core` (`AddZastavaRuntimeCore(...)`).
## 1. Prerequisites
- **Authority client credentials** service principal `zastava-runtime` with scopes
`aud:scanner` and `api:scanner.runtime.write`. Provision DPoP keys and mTLS client
certs before rollout.
- **Scanner/WebService reachability** cluster DNS entry (e.g. `scanner.internal`)
resolvable from every node running Observer/Webhook.
- **Host mounts** read-only access to `/proc`, container runtime state
(`/var/lib/containerd`, `/var/run/containerd/containerd.sock`) and scratch space
(`/var/run/zastava`).
- **Offline kit bundle** operators staging air-gapped installs must download
`offline-kit/zastava-runtime-{version}.tar.zst` containing container images,
Grafana dashboards, and Prometheus rules referenced below.
- **Secrets** Authority OpTok cache dir, DPoP private keys, and webhook TLS secrets
live outside git. For air-gapped installs copy them to the sealed secrets vault.
### 1.1 Telemetry quick reference
| Metric | Description | Notes |
|--------|-------------|-------|
| `zastava.runtime.events.total{tenant,component,kind}` | Rate of observer events sent to Scanner | Expect >0 on busy nodes. |
| `zastava.runtime.backend.latency.ms` | Histogram (ms) for `/runtime/events` and `/policy/runtime` calls | P95 & P99 drive alerting. |
| `zastava.admission.decisions.total{decision}` | Admission verdict counts | Track deny spikes or fail-open fallbacks. |
| `zastava.admission.cache.hits.total` | (future) Cache utilisation once Observer batches land | Placeholder until Observer tasks 12-004 complete. |
## 2. Deployment workflows
### 2.1 Fresh install (Helm overlay)
1. Load offline kit bundle: `oras cp offline-kit/zastava-runtime-*.tar.zst oci:registry.internal/zastava`.
2. Render values:
- `zastava.runtime.tenant`, `environment`, `deployment` (cluster identifier).
- `zastava.runtime.authority` block (issuer, clientId, audience, DPoP toggle).
- `zastava.runtime.metrics.commonTags.cluster` for Prometheus labels.
3. Pre-create secrets:
- `zastava-authority-dpop` (JWK + private key).
- `zastava-authority-mtls` (client cert/key chain).
- `zastava-webhook-tls` (serving cert; CSR bundle if using auto-approval).
4. Deploy Observer DaemonSet and Webhook chart:
```sh
helm upgrade --install zastava-runtime devops/helm/zastava \
-f values/zastava-runtime.yaml \
--namespace stellaops \
--create-namespace
```
5. Verify:
- `kubectl -n stellaops get pods -l app=zastava-observer` ready.
- `kubectl -n stellaops logs ds/zastava-observer --tail=20` shows
`Issued runtime OpTok` audit line with DPoP token type.
- Admission webhook registered: `kubectl get validatingwebhookconfiguration zastava-webhook`.
### 2.2 Upgrades
1. Scale webhook deployment to `--replicas=3` (rolling).
2. Drain one node per AZ to ensure Observer tolerates disruption.
3. Apply chart upgrade; watch `zastava.runtime.backend.latency.ms` P95 (<250 ms).
4. Post-upgrade, run smoke tests:
- Apply unsigned Pod manifest → expect `deny` (policy fail).
- Apply signed Pod manifest → expect `allow`.
5. Record upgrade in ops log with Git SHA + Helm chart version.
### 2.3 Rollback
1. Use Helm revision history: `helm history zastava-runtime`.
2. Rollback: `helm rollback zastava-runtime <revision>`.
3. Invalidate cached OpToks:
```sh
kubectl -n stellaops exec deploy/zastava-webhook -- \
zastava-webhook invalidate-op-token --audience scanner
```
4. Confirm observers reconnect via metrics (`rate(zastava_runtime_events_total[5m])`).
## 3. Authority & security guardrails
- Tokens must be `DPoP` type when `requireDpop=true`. Logs emit
`authority.token.issue` scope with decision data; absence indicates misconfig.
- `requireMutualTls=true` enforces mTLS during token acquisition. Disable only in
lab clusters; expect warning log `Mutual TLS requirement disabled`.
- Static fallback tokens (`allowStaticTokenFallback=true`) should exist only during
initial bootstrap. Rotate nightly; preference is to disable once Authority reachable.
- Audit every change in `zastava.runtime.authority` through change management.
Use `kubectl get secret zastava-authority-dpop -o jsonpath='{.metadata.annotations.revision}'`
to confirm key rotation.
## 4. Incident response
### 4.1 Authority offline
1. Check Prometheus alert `ZastavaAuthorityTokenStale`.
2. Inspect Observer logs for `authority.token.fallback` scope.
3. If fallback engaged, verify static token validity duration; rotate secret if older than 24 h.
4. Once Authority restored, delete static fallback secret and restart pods to rebind DPoP keys.
### 4.2 Scanner/WebService latency spike
1. Alert `ZastavaRuntimeBackendLatencyHigh` fires at P95 > 750 ms for 5 minutes.
2. Run backend health: `kubectl -n scanner exec deploy/scanner-web -- curl -f localhost:8080/healthz/ready`.
3. If backend degraded, auto buffer may throttle. Confirm disk-backed queue size via
`kubectl logs ds/zastava-observer | grep buffer.drops`.
4. Consider enabling fail-open for namespaces listed in runbook Appendix B (temporary).
### 4.3 Admission deny storm
1. Alert `ZastavaAdmissionDenySpike` indicates >20 denies/minute.
2. Pull sample: `kubectl logs deploy/zastava-webhook --since=10m | jq '.decision'`.
3. Cross-check policy backlog in Scanner (`/policy/runtime` logs). Engage application
owner; optionally set namespace to `failOpenNamespaces` after risk assessment.
## 5. Offline kit & air-gapped notes
- Bundle contents:
- Observer/Webhook container images (multi-arch).
- `docs/modules/zastava/operations/runtime-prometheus-rules.yaml` + Grafana dashboard JSON.
- Sample `zastava-runtime.values.yaml`.
- Verification:
- Validate signature: `cosign verify-blob offline-kit/zastava-runtime-*.tar.zst --certificate offline-kit/zastava-runtime.cert`.
- Extract Prometheus rules into offline monitoring cluster (`/etc/prometheus/rules.d`).
- Import Grafana dashboard via `grafana-cli --config ...`.
- Sample `zastava-runtime.values.yaml`.
- Verification:
- Validate signature: `cosign verify-blob offline-kit/zastava-runtime-*.tar.zst --certificate offline-kit/zastava-runtime.cert`.
- Extract Prometheus rules into offline monitoring cluster (`/etc/prometheus/rules.d`).
- Import Grafana dashboard via `grafana-cli --config ...`.
## 6. Observability assets
- Prometheus alert rules: `docs/modules/zastava/operations/runtime-prometheus-rules.yaml`.
@@ -135,46 +135,46 @@ It aligns with `Sprint 12 Runtime Guardrails` and assumes components consume
- Evidence: Observer appends `runtime.surface.manifest{resolved|not_found|fetch_error}` plus `runtime.surface.manifestUri`/`manifestDigest` and up to five artifact metadata keys per manifest; view via drift diagnostics or runtime posture evidence.
- Checklist: ensure `Surface:Manifest:RootDirectory` points to the Scanner cache mount, tenant matches `ZASTAVA_SURFACE_TENANT`, and `cas://` URIs from drift/entrytrace events exist on disk (`<root>/manifests/<hh>/<tt>/<digest>.json`).
- Offline: if missing, sync the manifests directory from Offline Kit bundle into the Observer node cache and rerun the drift check. Avoid network fetches.
## 7. Build-id correlation & symbol retrieval
Runtime events emitted by Observer now include `process.buildId` (from the ELF
`NT_GNU_BUILD_ID` note) and Scanner `/policy/runtime` surfaces the most recent
`buildIds` list per digest. Operators can use these hashes to locate debug
artifacts during incident response:
1. Capture the hash from CLI/webhook/Scanner API—for example:
```bash
stellaops-cli runtime policy test --image <digest> --namespace <ns>
```
Copy one of the `Build IDs` (e.g.
`5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789`).
2. Derive the debug path (`<aa>/<rest>` under `.build-id`) and check it exists:
```bash
ls /var/opt/debug/.build-id/5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug
```
3. If the file is missing, rehydrate it from Offline Kit bundles or the
`debug-store` object bucket (mirror of release artefacts):
```bash
oras cp oci://registry.internal/debug-store:latest . --include \
"5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug"
```
4. Confirm the running process advertises the same GNU build-id before
symbolising:
```bash
readelf -n /proc/$(pgrep -f payments-api | head -n1)/exe | grep -i 'Build ID'
```
5. Attach the `.debug` file in `gdb`/`lldb`, feed it to `eu-unstrip`, or cache it
in `debuginfod` for fleet-wide symbol resolution:
```bash
debuginfod-find debuginfo 5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789 >/tmp/payments-api.debug
```
6. For musl-based images, expect shorter build-id footprints. Missing hashes in
runtime events indicate stripped binaries without the GNU note—schedule a
rebuild with `-Wl,--build-id` enabled or add the binary to the debug-store
allowlist so the scanner can surface a fallback symbol package.
Monitor `scanner.policy.runtime` responses for the `buildIds` field; absence of
data after ZASTAVA-OBS-17-005 implies containers launched before the Observer
upgrade or non-ELF entrypoints (static scripts). Re-run the workload or restart
Observer to trigger a fresh capture if symbol parity is required.
## 7. Build-id correlation & symbol retrieval
Runtime events emitted by Observer now include `process.buildId` (from the ELF
`NT_GNU_BUILD_ID` note) and Scanner `/policy/runtime` surfaces the most recent
`buildIds` list per digest. Operators can use these hashes to locate debug
artifacts during incident response:
1. Capture the hash from CLI/webhook/Scanner API—for example:
```bash
stellaops-cli runtime policy test --image <digest> --namespace <ns>
```
Copy one of the `Build IDs` (e.g.
`5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789`).
2. Derive the debug path (`<aa>/<rest>` under `.build-id`) and check it exists:
```bash
ls /var/opt/debug/.build-id/5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug
```
3. If the file is missing, rehydrate it from Offline Kit bundles or the
`debug-store` object bucket (mirror of release artefacts):
```bash
oras cp oci://registry.internal/debug-store:latest . --include \
"5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug"
```
4. Confirm the running process advertises the same GNU build-id before
symbolising:
```bash
readelf -n /proc/$(pgrep -f payments-api | head -n1)/exe | grep -i 'Build ID'
```
5. Attach the `.debug` file in `gdb`/`lldb`, feed it to `eu-unstrip`, or cache it
in `debuginfod` for fleet-wide symbol resolution:
```bash
debuginfod-find debuginfo 5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789 >/tmp/payments-api.debug
```
6. For musl-based images, expect shorter build-id footprints. Missing hashes in
runtime events indicate stripped binaries without the GNU note—schedule a
rebuild with `-Wl,--build-id` enabled or add the binary to the debug-store
allowlist so the scanner can surface a fallback symbol package.
Monitor `scanner.policy.runtime` responses for the `buildIds` field; absence of
data after ZASTAVA-OBS-17-005 implies containers launched before the Observer
upgrade or non-ELF entrypoints (static scripts). Re-run the workload or restart
Observer to trigger a fresh capture if symbol parity is required.