- Added MongoPackRunApprovalStore for managing approval states with MongoDB. - Introduced MongoPackRunArtifactUploader for uploading and storing artifacts. - Created MongoPackRunLogStore to handle logging of pack run events. - Developed MongoPackRunStateStore for persisting and retrieving pack run states. - Implemented unit tests for MongoDB stores to ensure correct functionality. - Added MongoTaskRunnerTestContext for setting up MongoDB test environment. - Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
5.5 KiB
Advisory AI Guardrails & Redaction Policy
Audience: Advisory AI guild, Security guild, Docs guild, operators consuming Advisory AI outputs. Scope: Prompt redaction rules, injection defenses, telemetry/alert wiring, and audit guidance for Advisory AI (Epic 8).
Advisory AI accepts structured evidence from Concelier/Excititor and assembles prompts before executing downstream inference. Guardrails enforce provenance, block injection attempts, and redact sensitive content prior to handing data to any inference provider (online or offline). This document enumerates the guardrail surface and how to observe, alert, and audit it.
1 · Input validation & injection defense
Advisory prompts are rejected when any of the following checks fail:
- Citation coverage – every prompt must carry at least one citation with an index, document id, and chunk id. Missing or malformed citations raise the
citation_missing/citation_invalidviolations. - Prompt length –
AdvisoryGuardrailOptions.MaxPromptLengthdefaults to 16 000 characters. Longer payloads raiseprompt_too_long. - Blocked phrases – the guardrail pipeline lowercases the prompt and searches for the blocked phrase cache (
ignore previous instructions,disregard earlier instructions,you are now the system,override the system prompt,please jailbreak). Each hit raisesprompt_injectionand incrementsblocked_phrase_countmetadata. - Optional per-profile rules – when additional phrases are configured via configuration, they are appended to the cache at startup and evaluated with the same logic.
Any validation failure stops the pipeline before inference and emits guardrail_blocked = true in the persisted output as well as the corresponding metric counter.
2 · Redaction rules
Redactions are deterministic so caches remain stable. The current rule set (in order) is:
| Rule | Regex | Replacement |
|---|---|---|
| AWS secret access keys | (?i)(aws_secret_access_key\s*[:=]\s*)([A-Za-z0-9/+=]{40,}) |
$1[REDACTED_AWS_SECRET] |
| Credentials/tokens | `(?i)(token | apikey |
| PEM private keys | (?is)-----BEGIN [^-]+ PRIVATE KEY-----.*?-----END [^-]+ PRIVATE KEY----- |
[REDACTED_PRIVATE_KEY] |
Redaction counts are surfaced via guardrailResult.Metadata["redaction_count"] and emitted as log fields to simplify threat hunting.
3 · Telemetry, logs, and traces
Advisory AI now exposes the following metrics (all tagged with task_type and, where applicable, cache/citation metadata):
| Metric | Type | Description |
|---|---|---|
advisory_ai_latency_seconds |
Histogram | End-to-end worker latency from dequeue through persisted output. Aggregated with plan_cache_hit to compare cached vs. regenerated plans. |
advisory_ai_guardrail_blocks_total |
Counter | Number of guardrail rejections per task. |
advisory_ai_validation_failures_total |
Counter | Total validation violations emitted by the guardrail pipeline (one increment per violation instance). |
advisory_ai_citation_coverage_ratio |
Histogram | Ratio of unique citations to structured chunks (0–1). Tags include citations and structured_chunks. |
advisory_plans_created/queued/processed |
Counters | Existing plan lifecycle metrics (unchanged but now tagged by task type). |
Logging
- Successful writes:
Stored advisory pipeline output {CacheKey}log line now includesguardrail_blocked,validation_failures, andcitation_coverage. - Guardrail rejection: warning log includes violation count and advisory key.
- All dequeued jobs emit info logs carrying
cache:{Cache}for quicker diagnosis.
Tracing
- WebService (
/v1/advisory-ai/pipeline*) emitsadvisory_ai.plan_request/plan_batchspans with tags for tenant, advisory key, cache key, and validation state. - Worker emits
advisory_ai.processspans for each queue item with latency measurement and cache hit tags.
4 · Dashboards & alerts
Update the “Advisory AI” Grafana board with the new metrics:
- Latency panel – plot
advisory_ai_latency_secondsp50/p95 split byplan_cache_hit. Alert when p95 > 30s for 5 minutes. - Guardrail burn rate –
advisory_ai_guardrail_blocks_totalvs.advisory_ai_validation_failures_total. Alert when either exceeds 5 blocks/min or 1% of total traffic. - Citation coverage – histogram heatmap of
advisory_ai_citation_coverage_ratioto identify evidence gaps (alert when <0.6 for more than 10 minutes).
All alerts should route to #advisory-ai-ops with the tenant, task type, and recent advisory keys in the message template.
5 · Operations & audit
- When an alert fires: capture the guardrail log entry, relevant metrics sample, and the cached plan from the worker output store. Attach them to the incident timeline entry.
- Tenant overrides: any request to loosen guardrails or blocked phrase lists requires a signed change request and security approval. Update
AdvisoryGuardrailOptionsvia configuration bundles and document the reason in the change log. - Offline kit checks: ensure the offline inference bundle uses the same guardrail configuration file as production; mismatches should fail the bundle validation step.
- Forensics: persisted outputs now contain
guardrail_blocked,plan_cache_hit, andcitation_coveragemetadata. Include these fields when exporting evidence bundles to prove guardrail enforcement.
Keep this document synced whenever guardrail rules, telemetry names, or alert targets change.