Files
git.stella-ops.org/docs/security/assistant-guardrails.md
master a1ce3f74fa
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement MongoDB-based storage for Pack Run approval, artifact, log, and state management
- Added MongoPackRunApprovalStore for managing approval states with MongoDB.
- Introduced MongoPackRunArtifactUploader for uploading and storing artifacts.
- Created MongoPackRunLogStore to handle logging of pack run events.
- Developed MongoPackRunStateStore for persisting and retrieving pack run states.
- Implemented unit tests for MongoDB stores to ensure correct functionality.
- Added MongoTaskRunnerTestContext for setting up MongoDB test environment.
- Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
2025-11-07 10:01:47 +02:00

5.5 KiB
Raw Permalink Blame History

Advisory AI Guardrails & Redaction Policy

Audience: Advisory AI guild, Security guild, Docs guild, operators consuming Advisory AI outputs. Scope: Prompt redaction rules, injection defenses, telemetry/alert wiring, and audit guidance for Advisory AI (Epic 8).

Advisory AI accepts structured evidence from Concelier/Excititor and assembles prompts before executing downstream inference. Guardrails enforce provenance, block injection attempts, and redact sensitive content prior to handing data to any inference provider (online or offline). This document enumerates the guardrail surface and how to observe, alert, and audit it.


1 · Input validation & injection defense

Advisory prompts are rejected when any of the following checks fail:

  1. Citation coverage every prompt must carry at least one citation with an index, document id, and chunk id. Missing or malformed citations raise the citation_missing / citation_invalid violations.
  2. Prompt length AdvisoryGuardrailOptions.MaxPromptLength defaults to 16000 characters. Longer payloads raise prompt_too_long.
  3. Blocked phrases the guardrail pipeline lowercases the prompt and searches for the blocked phrase cache (ignore previous instructions, disregard earlier instructions, you are now the system, override the system prompt, please jailbreak). Each hit raises prompt_injection and increments blocked_phrase_count metadata.
  4. Optional per-profile rules when additional phrases are configured via configuration, they are appended to the cache at startup and evaluated with the same logic.

Any validation failure stops the pipeline before inference and emits guardrail_blocked = true in the persisted output as well as the corresponding metric counter.

2 · Redaction rules

Redactions are deterministic so caches remain stable. The current rule set (in order) is:

Rule Regex Replacement
AWS secret access keys (?i)(aws_secret_access_key\s*[:=]\s*)([A-Za-z0-9/+=]{40,}) $1[REDACTED_AWS_SECRET]
Credentials/tokens `(?i)(token apikey
PEM private keys (?is)-----BEGIN [^-]+ PRIVATE KEY-----.*?-----END [^-]+ PRIVATE KEY----- [REDACTED_PRIVATE_KEY]

Redaction counts are surfaced via guardrailResult.Metadata["redaction_count"] and emitted as log fields to simplify threat hunting.

3 · Telemetry, logs, and traces

Advisory AI now exposes the following metrics (all tagged with task_type and, where applicable, cache/citation metadata):

Metric Type Description
advisory_ai_latency_seconds Histogram End-to-end worker latency from dequeue through persisted output. Aggregated with plan_cache_hit to compare cached vs. regenerated plans.
advisory_ai_guardrail_blocks_total Counter Number of guardrail rejections per task.
advisory_ai_validation_failures_total Counter Total validation violations emitted by the guardrail pipeline (one increment per violation instance).
advisory_ai_citation_coverage_ratio Histogram Ratio of unique citations to structured chunks (01). Tags include citations and structured_chunks.
advisory_plans_created/queued/processed Counters Existing plan lifecycle metrics (unchanged but now tagged by task type).

Logging

  • Successful writes: Stored advisory pipeline output {CacheKey} log line now includes guardrail_blocked, validation_failures, and citation_coverage.
  • Guardrail rejection: warning log includes violation count and advisory key.
  • All dequeued jobs emit info logs carrying cache:{Cache} for quicker diagnosis.

Tracing

  • WebService (/v1/advisory-ai/pipeline*) emits advisory_ai.plan_request / plan_batch spans with tags for tenant, advisory key, cache key, and validation state.
  • Worker emits advisory_ai.process spans for each queue item with latency measurement and cache hit tags.

4 · Dashboards & alerts

Update the “Advisory AI” Grafana board with the new metrics:

  1. Latency panel plot advisory_ai_latency_seconds p50/p95 split by plan_cache_hit. Alert when p95 > 30s for 5 minutes.
  2. Guardrail burn rate advisory_ai_guardrail_blocks_total vs. advisory_ai_validation_failures_total. Alert when either exceeds 5 blocks/min or 1% of total traffic.
  3. Citation coverage histogram heatmap of advisory_ai_citation_coverage_ratio to identify evidence gaps (alert when <0.6 for more than 10 minutes).

All alerts should route to #advisory-ai-ops with the tenant, task type, and recent advisory keys in the message template.

5 · Operations & audit

  • When an alert fires: capture the guardrail log entry, relevant metrics sample, and the cached plan from the worker output store. Attach them to the incident timeline entry.
  • Tenant overrides: any request to loosen guardrails or blocked phrase lists requires a signed change request and security approval. Update AdvisoryGuardrailOptions via configuration bundles and document the reason in the change log.
  • Offline kit checks: ensure the offline inference bundle uses the same guardrail configuration file as production; mismatches should fail the bundle validation step.
  • Forensics: persisted outputs now contain guardrail_blocked, plan_cache_hit, and citation_coverage metadata. Include these fields when exporting evidence bundles to prove guardrail enforcement.

Keep this document synced whenever guardrail rules, telemetry names, or alert targets change.