work
This commit is contained in:
42
docs/runbooks/assistant-ops.md
Normal file
42
docs/runbooks/assistant-ops.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Assistant Ops Runbook (DOCS-AIAI-31-009)
|
||||
|
||||
_Updated: 2025-11-24 · Owners: DevOps Guild · Advisory AI Guild · Sprint 0111_
|
||||
|
||||
This runbook covers day-2 operations for Advisory AI (web + worker) with emphasis on cache priming, guardrail verification, and outage handling in offline/air-gapped installs.
|
||||
|
||||
## 1) Warmup & cache priming
|
||||
- Ensure Offline Kit fixtures are staged:
|
||||
- CLI guardrail bundles: `out/console/guardrails/cli-vuln-29-001/`, `out/console/guardrails/cli-vex-30-001/`.
|
||||
- SBOM context fixtures: copy into `data/advisory-ai/fixtures/sbom/` and record hashes in `SHA256SUMS`.
|
||||
- Profiles/prompts manifests: ensure `profiles.catalog.json` and `prompts.manifest` hashes match `AdvisoryAI:Provenance` settings.
|
||||
- Start services and prime caches using cache-only calls:
|
||||
- `stella advise run summary --advisory-key <id> --timeout 0 --json` (should return cached/empty context, exit 0).
|
||||
- `stella advise run remediation --advisory-key <id> --artifact-id <id> --timeout 0 --json` (verifies SBOM clamps without executing inference).
|
||||
|
||||
## 2) Guardrail & provenance verification
|
||||
- Run guardrail self-test: `dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj --filter Guardrail` (offline-safe).
|
||||
- Validate DSSE bundles:
|
||||
- `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/prompts.manifest.dsse --source prompts.manifest`
|
||||
- `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/policy-bundle.intoto.jsonl --digest <policy-digest>`
|
||||
- Confirm `AdvisoryAI:Guardrails:BlockedPhrases` file matches the hash captured during pack build; diff against `prompts.manifest`.
|
||||
|
||||
## 3) Scaling & queue health
|
||||
- Defaults: queue capacity 1024, dequeue wait 1s (see `docs/policy/assistant-parameters.md`). For bursty tenants, scale workers horizontally before increasing queue size to preserve determinism.
|
||||
- Metrics to watch: `advisory_ai_queue_depth`, `advisory_ai_latency_seconds`, `advisory_ai_guardrail_blocks_total`.
|
||||
- If queue depth > 75% for 5 minutes, add one worker pod or increase `Queue:Capacity` by 25% (record change in ops log).
|
||||
|
||||
## 4) Outage handling
|
||||
- **SBOM service down**: switch to `NullSbomContextClient` by unsetting `ADVISORYAI__SBOM__BASEADDRESS`; Advisory AI returns deterministic responses with `sbomSummary` counts at 0.
|
||||
- **Policy Engine unavailable**: pin last-known `policyVersion`; set `AdvisoryAI:Guardrails:RequireCitations=true` to avoid drift; raise `advisory.remediation.policyHold` in responses.
|
||||
- **Remote profile disabled**: keep `profile=cloud-openai` blocked; return `advisory.inference.remoteDisabled` with exit code 12 in CLI (see `docs/advisory-ai/cli.md`).
|
||||
|
||||
## 5) Air-gap / offline posture
|
||||
- All external calls are disabled by default. To re-enable remote inference, set `ADVISORYAI__INFERENCE__MODE=Remote` and provide an allowlisted `Remote.BaseAddress`; record the consent in Authority and in the ops log.
|
||||
- Mirror the guardrail artefact folders and `hashes.sha256` into the Offline Kit; re-run the guardrail self-test after mirroring.
|
||||
|
||||
## 6) Checklist before declaring healthy
|
||||
- [ ] Guardrail self-test suite green.
|
||||
- [ ] Cache-only CLI probes return 0 with correct `context.planCacheKey`.
|
||||
- [ ] DSSE verifications logged for prompts, profiles, policy bundle.
|
||||
- [ ] Metrics scrape shows queue depth < 75% and latency within SLO.
|
||||
- [ ] Ops log updated with any config overrides (queue size, clamps, remote inference toggles).
|
||||
44
docs/runbooks/concelier-airgap-bundle-deploy.md
Normal file
44
docs/runbooks/concelier-airgap-bundle-deploy.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Concelier Air-Gap Bundle Deploy Runbook (CONCELIER-AIRGAP-56-003)
|
||||
|
||||
Status: draft · 2025-11-24
|
||||
Scope: deploy sealed-mode Concelier evidence bundles using deterministic NDJSON + manifest/entry-trace outputs.
|
||||
|
||||
## Inputs
|
||||
- Bundle: `concelier-airgap.ndjson`
|
||||
- Manifest: `bundle.manifest.json`
|
||||
- Entry trace: `bundle.entry-trace.json`
|
||||
- Hashes: SHA256 recorded in manifest and entry-trace; verify before import.
|
||||
|
||||
## Preconditions
|
||||
- Concelier WebService running with `concelier:features:airgap` enabled.
|
||||
- No external egress; only local file system allowed for bundle path.
|
||||
- Mongo indexes applied (`advisory_observations`, `advisory_linksets`).
|
||||
|
||||
## Steps
|
||||
1) Transfer bundle directory to offline controller host.
|
||||
2) Verify hashes:
|
||||
```bash
|
||||
sha256sum concelier-airgap.ndjson | diff - <(jq -r .bundleSha256 bundle.manifest.json)
|
||||
jq -r '.[].sha256' bundle.entry-trace.json | nl | sed 's/\t/:/' > entry.hashes
|
||||
paste -d' ' <(cut -d: -f1 entry.hashes) <(cut -d: -f2 entry.hashes)
|
||||
```
|
||||
3) Import:
|
||||
```bash
|
||||
curl -sSf -X POST \
|
||||
-H 'Content-Type: application/x-ndjson' \
|
||||
--data-binary @concelier-airgap.ndjson \
|
||||
http://localhost:5000/internal/airgap/import
|
||||
```
|
||||
4) Validate import:
|
||||
```bash
|
||||
curl -sSf http://localhost:5000/internal/airgap/status | jq
|
||||
```
|
||||
5) Record evidence:
|
||||
- Store manifest + entry-trace alongside TRX/logs in `artifacts/airgap/<date>/`.
|
||||
|
||||
## Determinism notes
|
||||
- NDJSON ordering is lexicographic; do not re-sort downstream.
|
||||
- Entry-trace hashes must match post-transfer; any mismatch aborts import.
|
||||
|
||||
## Rollback
|
||||
- Delete imported batch by `bundleId` from `advisory_observations` and `advisory_linksets` (requires DBA approval); rerun import after fixing hash.
|
||||
Reference in New Issue
Block a user