audit, advisories and doctors/setup work

This commit is contained in:
master
2026-01-13 18:53:39 +02:00
parent 9ca7cb183e
commit d7be6ba34b
811 changed files with 54242 additions and 4056 deletions

View File

@@ -12,9 +12,10 @@ Advisory AI accepts structured evidence from Concelier/Excititor and assembles p
Advisory prompts are rejected when any of the following checks fail:
1. **Citation coverage** every prompt must carry at least one citation with an index, document id, and chunk id. Missing or malformed citations raise the `citation_missing` / `citation_invalid` violations.
2. **Prompt length** `AdvisoryGuardrailOptions.MaxPromptLength` defaults to 16000 characters. Longer payloads raise `prompt_too_long`.
2. **Prompt length** `AdvisoryGuardrailOptions.MaxPromptLength` defaults to 16000 characters. Longer payloads raise `prompt_too_long`.
3. **Blocked phrases** the guardrail pipeline lowercases the prompt and searches for the blocked phrase cache (`ignore previous instructions`, `disregard earlier instructions`, `you are now the system`, `override the system prompt`, `please jailbreak`). Each hit raises `prompt_injection` and increments `blocked_phrase_count` metadata.
4. **Optional per-profile rules** when additional phrases are configured via configuration, they are appended to the cache at startup and evaluated with the same logic.
5. **Token and rate budgets** - per user/org budgets cap prompt size, requests/min, and tool calls/day; overages raise `quota_exceeded`.
Any validation failure stops the pipeline before inference and emits `guardrail_blocked = true` in the persisted output as well as the corresponding metric counter.
@@ -26,10 +27,17 @@ Redactions are deterministic so caches remain stable. The current rule set (in o
|------|-------|-------------|
| AWS secret access keys | `(?i)(aws_secret_access_key\s*[:=]\s*)([A-Za-z0-9/+=]{40,})` | `$1[REDACTED_AWS_SECRET]` |
| Credentials/tokens | `(?i)(token|apikey|password)\s*[:=]\s*([A-Za-z0-9\-_/]{16,})` | `$1: [REDACTED_CREDENTIAL]` |
| High entropy strings | `entropy >= threshold` | `[REDACTED_HIGH_ENTROPY]` |
| PEM private keys | `(?is)-----BEGIN [^-]+ PRIVATE KEY-----.*?-----END [^-]+ PRIVATE KEY-----` | `[REDACTED_PRIVATE_KEY]` |
Redaction counts are surfaced via `guardrailResult.Metadata["redaction_count"]` and emitted as log fields to simplify threat hunting.
### Allowlist and entropy tuning
- Allowlist patterns bypass redaction for known-safe identifiers (scan IDs, digest prefixes, evidence refs).
- Entropy thresholds are configurable per profile to reduce false positives in long hex IDs.
- Configure scrubber knobs via `AdvisoryAI:Guardrails:EntropyThreshold`, `AdvisoryAI:Guardrails:EntropyMinLength`, `AdvisoryAI:Guardrails:AllowlistFile`, and `AdvisoryAI:Guardrails:AllowlistPatterns`.
## 3 · Telemetry, logs, and traces
Advisory AI now exposes the following metrics (all tagged with `task_type` and, where applicable, cache/citation metadata):
@@ -67,7 +75,10 @@ All alerts should route to `#advisory-ai-ops` with the tenant, task type, and re
- **When an alert fires:** capture the guardrail log entry, relevant metrics sample, and the cached plan from the worker output store. Attach them to the incident timeline entry.
- **Tenant overrides:** any request to loosen guardrails or blocked phrase lists requires a signed change request and security approval. Update `AdvisoryGuardrailOptions` via configuration bundles and document the reason in the change log.
- **Chat settings overrides:** quotas and tool allowlists can be adjusted via the chat settings endpoints; env values remain defaults.
- **Doctor check:** use `/api/v1/chat/doctor` to confirm quota/tool limits when chat requests are rejected.
- **Offline kit checks:** ensure the offline inference bundle uses the same guardrail configuration file as production; mismatches should fail the bundle validation step.
- **Forensics:** persisted outputs now contain `guardrail_blocked`, `plan_cache_hit`, and `citation_coverage` metadata. Include these fields when exporting evidence bundles to prove guardrail enforcement.
- **Chat audit trail:** retain prompt hashes, redaction metadata, tool call hashes, and policy decisions for post-incident review.
Keep this document synced whenever guardrail rules, telemetry names, or alert targets change.