# Assistant Ops Runbook (DOCS-AIAI-31-009) _Updated: 2025-11-24 · Owners: DevOps Guild · Advisory AI Guild · Sprint 0111_ This runbook covers day-2 operations for Advisory AI (web + worker) with emphasis on cache priming, guardrail verification, and outage handling in offline/air-gapped installs. ## 1) Warmup & cache priming - Ensure Offline Kit fixtures are staged: - CLI guardrail bundles: `out/console/guardrails/cli-vuln-29-001/`, `out/console/guardrails/cli-vex-30-001/`. - SBOM context fixtures: copy into `data/advisory-ai/fixtures/sbom/` and record hashes in `SHA256SUMS`. - Profiles/prompts manifests: ensure `profiles.catalog.json` and `prompts.manifest` hashes match `AdvisoryAI:Provenance` settings. - Start services and prime caches using cache-only calls: - `stella advise run summary --advisory-key --timeout 0 --json` (should return cached/empty context, exit 0). - `stella advise run remediation --advisory-key --artifact-id --timeout 0 --json` (verifies SBOM clamps without executing inference). ## 2) Guardrail & provenance verification - Run guardrail self-test: `dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj --filter Guardrail` (offline-safe). - Validate DSSE bundles: - `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/prompts.manifest.dsse --source prompts.manifest` - `slsa-verifier verify-attestation --bundle offline-kit/advisory-ai/provenance/policy-bundle.intoto.jsonl --digest ` - Confirm `AdvisoryAI:Guardrails:BlockedPhrases` file matches the hash captured during pack build; diff against `prompts.manifest`. ## 3) Scaling & queue health - Defaults: queue capacity 1024, dequeue wait 1s (see `docs/policy/assistant-parameters.md`). For bursty tenants, scale workers horizontally before increasing queue size to preserve determinism. - Metrics to watch: `advisory_ai_queue_depth`, `advisory_ai_latency_seconds`, `advisory_ai_guardrail_blocks_total`. - If queue depth > 75% for 5 minutes, add one worker pod or increase `Queue:Capacity` by 25% (record change in ops log). ## 4) Outage handling - **SBOM service down**: switch to `NullSbomContextClient` by unsetting `ADVISORYAI__SBOM__BASEADDRESS`; Advisory AI returns deterministic responses with `sbomSummary` counts at 0. - **Policy Engine unavailable**: pin last-known `policyVersion`; set `AdvisoryAI:Guardrails:RequireCitations=true` to avoid drift; raise `advisory.remediation.policyHold` in responses. - **Remote profile disabled**: keep `profile=cloud-openai` blocked; return `advisory.inference.remoteDisabled` with exit code 12 in CLI (see `docs/advisory-ai/cli.md`). ## 5) Air-gap / offline posture - All external calls are disabled by default. To re-enable remote inference, set `ADVISORYAI__INFERENCE__MODE=Remote` and provide an allowlisted `Remote.BaseAddress`; record the consent in Authority and in the ops log. - Mirror the guardrail artefact folders and `hashes.sha256` into the Offline Kit; re-run the guardrail self-test after mirroring. ## 6) Checklist before declaring healthy - [ ] Guardrail self-test suite green. - [ ] Cache-only CLI probes return 0 with correct `context.planCacheKey`. - [ ] DSSE verifications logged for prompts, profiles, policy bundle. - [ ] Metrics scrape shows queue depth < 75% and latency within SLO. - [ ] Ops log updated with any config overrides (queue size, clamps, remote inference toggles).