# Feedser Authority Audit Runbook _Last updated: 2025-10-12_ This runbook helps operators verify and monitor the StellaOps Feedser ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity. ## 1. Prerequisites - Authority integration is enabled in `feedser.yaml` (or via `FEEDSER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes. - OTLP metrics/log exporters are configured (`feedser.telemetry.*`) or container stdout is shipped to your SIEM. - Operators have access to the Feedser job trigger endpoints via CLI or REST for smoke tests. ### Configuration snippet ```yaml feedser: authority: enabled: true allowAnonymousFallback: false # keep true only during initial rollout issuer: "https://authority.internal" audiences: - "api://feedser" requiredScopes: - "feedser.jobs.trigger" bypassNetworks: - "127.0.0.1/32" - "::1/128" clientId: "feedser-jobs" clientSecretFile: "/run/secrets/feedser_authority_client" tokenClockSkewSeconds: 60 resilience: enableRetries: true retryDelays: - "00:00:01" - "00:00:02" - "00:00:05" allowOfflineCacheFallback: true offlineCacheTolerance: "00:10:00" ``` > Store secrets outside source control. Feedser reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service. ### Resilience tuning - **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Feedser retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts. - **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Feedser will fail fast but keep deterministic logs. - Feedser resolves these knobs through `IOptionsMonitor`. Edits to `feedser.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled. ## 2. Key Signals ### 2.1 Audit log channel Feedser emits structured audit entries via the `Feedser.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active. ``` Feedser authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=feedser-cli scopes=feedser.jobs.trigger bypass=False remote=10.1.4.7 ``` | Field | Sample value | Meaning | |--------------|-------------------------|------------------------------------------------------------------------------------------| | `route` | `/jobs/definitions` | Endpoint that processed the request. | | `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. | | `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). | | `clientId` | `feedser-cli` | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim). | | `scopes` | `feedser.jobs.trigger` | Normalised scope list extracted from token claims; `(none)` if the token carried none. | | `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. | | `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. | Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations: - `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout). - `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration. - Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated. ### 2.2 Metrics Feedser publishes counters under the OTEL meter `StellaOps.Feedser.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`. | Metric name | Description | PromQL example | |-------------------------------|----------------------------------------------------|----------------| | `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` | | `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` | | `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` | > Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names. Correlate audit logs with the following global meter exported via `Feedser.SourceDiagnostics`: - `feedser.source.http.requests_total{feedser_source="jobs-run"}` – ensures REST/manual triggers route through Authority. - If Grafana dashboards are deployed, extend the “Feedser Jobs” board with the above counters plus a table of recent audit log entries. ## 3. Alerting Guidance 1. **Unauthorized bypass attempt** - Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", status="401", bypass="True"}[5m])) > 0` - Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious. 2. **Missing scopes** - Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0` - Action: audit Authority client registration; ensure `requiredScopes` includes `feedser.jobs.trigger`. 3. **Trigger failure surge** - Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes. - Action: inspect correlated audit entries and `Feedser.Telemetry` traces for job execution errors. 4. **Conflict spike** - Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold). - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly. 5. **Authority offline** - Watch `Feedser.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback. ## 4. Rollout & Verification Procedure 1. **Pre-checks** - Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation. - Validate Authority issuer metadata is reachable from Feedser (`curl https://authority.internal/.well-known/openid-configuration` from the host). 2. **Smoke test with valid token** - Obtain a token via CLI: `stella auth login --scope feedser.jobs.trigger`. - Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://feedser.internal/jobs/definitions`. - Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=feedser.jobs.trigger`. 3. **Negative test without token** - Call the same endpoint without a token. Expect HTTP 401, `bypass=False`. - If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled. 4. **Bypass check (if applicable)** - From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries. 5. **Metrics validation** - Ensure `web.jobs.triggered` counter increments during accepted runs. - Exporters should show corresponding spans (`feedser.job.trigger`) if tracing is enabled. ## 5. Troubleshooting | Symptom | Probable cause | Remediation | |---------|----------------|-------------| | Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. | | Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Feedser. | | HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`feedser.jobs.trigger`) and ensure the token audience matches `audiences` config. | | Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `feedser.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Feedser.WebService.Jobs` meter. | | Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Feedser job logs, re-run with tracing enabled, validate Authority latency. | ## 6. References - `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start. - `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines. - `docs/ops/authority-monitoring.md` – Authority-side monitoring and alerting playbook. - `StellaOps.Feedser.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields.