audit, advisories and doctors/setup work
This commit is contained in:
@@ -12,9 +12,10 @@ Advisory AI accepts structured evidence from Concelier/Excititor and assembles p
|
||||
Advisory prompts are rejected when any of the following checks fail:
|
||||
|
||||
1. **Citation coverage** – every prompt must carry at least one citation with an index, document id, and chunk id. Missing or malformed citations raise the `citation_missing` / `citation_invalid` violations.
|
||||
2. **Prompt length** – `AdvisoryGuardrailOptions.MaxPromptLength` defaults to 16 000 characters. Longer payloads raise `prompt_too_long`.
|
||||
2. **Prompt length** – `AdvisoryGuardrailOptions.MaxPromptLength` defaults to 16 000 characters. Longer payloads raise `prompt_too_long`.
|
||||
3. **Blocked phrases** – the guardrail pipeline lowercases the prompt and searches for the blocked phrase cache (`ignore previous instructions`, `disregard earlier instructions`, `you are now the system`, `override the system prompt`, `please jailbreak`). Each hit raises `prompt_injection` and increments `blocked_phrase_count` metadata.
|
||||
4. **Optional per-profile rules** – when additional phrases are configured via configuration, they are appended to the cache at startup and evaluated with the same logic.
|
||||
5. **Token and rate budgets** - per user/org budgets cap prompt size, requests/min, and tool calls/day; overages raise `quota_exceeded`.
|
||||
|
||||
Any validation failure stops the pipeline before inference and emits `guardrail_blocked = true` in the persisted output as well as the corresponding metric counter.
|
||||
|
||||
@@ -26,10 +27,17 @@ Redactions are deterministic so caches remain stable. The current rule set (in o
|
||||
|------|-------|-------------|
|
||||
| AWS secret access keys | `(?i)(aws_secret_access_key\s*[:=]\s*)([A-Za-z0-9/+=]{40,})` | `$1[REDACTED_AWS_SECRET]` |
|
||||
| Credentials/tokens | `(?i)(token|apikey|password)\s*[:=]\s*([A-Za-z0-9\-_/]{16,})` | `$1: [REDACTED_CREDENTIAL]` |
|
||||
| High entropy strings | `entropy >= threshold` | `[REDACTED_HIGH_ENTROPY]` |
|
||||
| PEM private keys | `(?is)-----BEGIN [^-]+ PRIVATE KEY-----.*?-----END [^-]+ PRIVATE KEY-----` | `[REDACTED_PRIVATE_KEY]` |
|
||||
|
||||
Redaction counts are surfaced via `guardrailResult.Metadata["redaction_count"]` and emitted as log fields to simplify threat hunting.
|
||||
|
||||
### Allowlist and entropy tuning
|
||||
|
||||
- Allowlist patterns bypass redaction for known-safe identifiers (scan IDs, digest prefixes, evidence refs).
|
||||
- Entropy thresholds are configurable per profile to reduce false positives in long hex IDs.
|
||||
- Configure scrubber knobs via `AdvisoryAI:Guardrails:EntropyThreshold`, `AdvisoryAI:Guardrails:EntropyMinLength`, `AdvisoryAI:Guardrails:AllowlistFile`, and `AdvisoryAI:Guardrails:AllowlistPatterns`.
|
||||
|
||||
## 3 · Telemetry, logs, and traces
|
||||
|
||||
Advisory AI now exposes the following metrics (all tagged with `task_type` and, where applicable, cache/citation metadata):
|
||||
@@ -67,7 +75,10 @@ All alerts should route to `#advisory-ai-ops` with the tenant, task type, and re
|
||||
|
||||
- **When an alert fires:** capture the guardrail log entry, relevant metrics sample, and the cached plan from the worker output store. Attach them to the incident timeline entry.
|
||||
- **Tenant overrides:** any request to loosen guardrails or blocked phrase lists requires a signed change request and security approval. Update `AdvisoryGuardrailOptions` via configuration bundles and document the reason in the change log.
|
||||
- **Chat settings overrides:** quotas and tool allowlists can be adjusted via the chat settings endpoints; env values remain defaults.
|
||||
- **Doctor check:** use `/api/v1/chat/doctor` to confirm quota/tool limits when chat requests are rejected.
|
||||
- **Offline kit checks:** ensure the offline inference bundle uses the same guardrail configuration file as production; mismatches should fail the bundle validation step.
|
||||
- **Forensics:** persisted outputs now contain `guardrail_blocked`, `plan_cache_hit`, and `citation_coverage` metadata. Include these fields when exporting evidence bundles to prove guardrail enforcement.
|
||||
- **Chat audit trail:** retain prompt hashes, redaction metadata, tool call hashes, and policy decisions for post-incident review.
|
||||
|
||||
Keep this document synced whenever guardrail rules, telemetry names, or alert targets change.
|
||||
|
||||
Reference in New Issue
Block a user