9.6 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Feedser Authority Audit Runbook
Last updated: 2025-10-12
This runbook helps operators verify and monitor the StellaOps Feedser ⇆ Authority integration. It focuses on the /jobs* surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
1. Prerequisites
- Authority integration is enabled in feedser.yaml(or viaFEEDSER_AUTHORITY__*environment variables) with a validclientId, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (feedser.telemetry.*) or container stdout is shipped to your SIEM.
- Operators have access to the Feedser job trigger endpoints via CLI or REST for smoke tests.
Configuration snippet
feedser:
  authority:
    enabled: true
    allowAnonymousFallback: false          # keep true only during initial rollout
    issuer: "https://authority.internal"
    audiences:
      - "api://feedser"
    requiredScopes:
      - "feedser.jobs.trigger"
    bypassNetworks:
      - "127.0.0.1/32"
      - "::1/128"
    clientId: "feedser-jobs"
    clientSecretFile: "/run/secrets/feedser_authority_client"
    tokenClockSkewSeconds: 60
    resilience:
      enableRetries: true
      retryDelays:
        - "00:00:01"
        - "00:00:02"
        - "00:00:05"
      allowOfflineCacheFallback: true
      offlineCacheTolerance: "00:10:00"
Store secrets outside source control. Feedser reads
clientSecretFileon startup; rotate by updating the mounted file and restarting the service.
Resilience tuning
- Connected sites: keep the default 1 s / 2 s / 5 s retry ladder so Feedser retries transient Authority hiccups but still surfaces outages quickly. Leave allowOfflineCacheFallback=trueso cached discovery/JWKS data can bridge short Pathfinder restarts.
- Air-gapped/Offline Kit installs: extend offlineCacheTolerance(15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (enableRetries=false) if infrastructure teams prefer to handle exponential backoff at the network layer; Feedser will fail fast but keep deterministic logs.
- Feedser resolves these knobs through IOptionsMonitor<StellaOpsAuthClientOptions>. Edits tofeedser.yamlare applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
2. Key Signals
2.1 Audit log channel
Feedser emits structured audit entries via the Feedser.Authorization.Audit logger for every /jobs* request once Authority enforcement is active.
Feedser authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=feedser-cli scopes=feedser.jobs.trigger bypass=False remote=10.1.4.7
| Field | Sample value | Meaning | 
|---|---|---|
| route | /jobs/definitions | Endpoint that processed the request. | 
| status | 200/401/409 | Final HTTP status code returned to the caller. | 
| subject | ops@example.com | User or service principal subject (falls back to (anonymous)when unauthenticated). | 
| clientId | feedser-cli | OAuth client ID provided by Authority ( (none)if the token lacked the claim). | 
| scopes | feedser.jobs.trigger | Normalised scope list extracted from token claims; (none)if the token carried none. | 
| bypass | True/False | Indicates whether the request succeeded because its source IP matched a bypass CIDR. | 
| remote | 10.1.4.7 | Remote IP recorded from the connection / forwarded header test hooks. | 
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- status=401 AND bypass=True– bypass network accepted an unauthenticated call (should be temporary during rollout).
- status=202 AND scopes="(none)"– a token without scopes triggered a job; tighten client configuration.
- Spike in clientId="(none)"– indicates upstream Authority is not issuingclient_idclaims or the CLI is outdated.
2.2 Metrics
Feedser publishes counters under the OTEL meter StellaOps.Feedser.WebService.Jobs. Tags: job.kind, job.trigger, job.outcome.
| Metric name | Description | PromQL example | 
|---|---|---|
| web.jobs.triggered | Accepted job trigger requests. | sum by (job_kind) (rate(web_jobs_triggered_total[5m])) | 
| web.jobs.trigger.conflict | Rejected triggers (already running, disabled…). | sum(rate(web_jobs_trigger_conflict_total[5m])) | 
| web.jobs.trigger.failed | Server-side job failures. | sum(rate(web_jobs_trigger_failed_total[5m])) | 
Prometheus/OTEL collectors typically surface counters with
_totalsuffix. Adjust queries to match your pipeline’s generated metric names.
Correlate audit logs with the following global meter exported via Feedser.SourceDiagnostics:
- feedser.source.http.requests_total{feedser_source="jobs-run"}– ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Feedser Jobs” board with the above counters plus a table of recent audit log entries.
3. Alerting Guidance
- 
Unauthorized bypass attempt - Query: sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", status="401", bypass="True"}[5m])) > 0
- Action: verify bypassNetworkslist; confirm expected maintenance windows; rotate credentials if suspicious.
 
- Query: 
- 
Missing scopes - Query: sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0
- Action: audit Authority client registration; ensure requiredScopesincludesfeedser.jobs.trigger.
 
- Query: 
- 
Trigger failure surge - Query: sum(rate(web_jobs_trigger_failed_total[10m])) > 0with severitywarningif sustained for 10 minutes.
- Action: inspect correlated audit entries and Feedser.Telemetrytraces for job execution errors.
 
- Query: 
- 
Conflict spike - Query: sum(rate(web_jobs_trigger_conflict_total[10m])) > 5(tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
 
- Query: 
- 
Authority offline - Watch Feedser.Authorization.Auditlogs forstatus=503orstatus=500along withclientId="(none)". Investigate Authority availability before re-enabling anonymous fallback.
 
- Watch 
4. Rollout & Verification Procedure
- 
Pre-checks - Confirm allowAnonymousFallbackisfalsein production; keeptrueonly during staged validation.
- Validate Authority issuer metadata is reachable from Feedser (curl https://authority.internal/.well-known/openid-configurationfrom the host).
 
- Confirm 
- 
Smoke test with valid token - Obtain a token via CLI: stella auth login --scope feedser.jobs.trigger.
- Trigger a read-only endpoint: curl -H "Authorization: Bearer $TOKEN" https://feedser.internal/jobs/definitions.
- Expect HTTP 200/202 and an audit log with bypass=False,scopes=feedser.jobs.trigger.
 
- Obtain a token via CLI: 
- 
Negative test without token - Call the same endpoint without a token. Expect HTTP 401, bypass=False.
- If the request succeeds, double-check bypassNetworksand ensure fallback is disabled.
 
- Call the same endpoint without a token. Expect HTTP 401, 
- 
Bypass check (if applicable) - From an allowed maintenance IP, call /jobs/definitionswithout a token. Confirm the audit log showsbypass=True. Review business justification and expiry date for such entries.
 
- From an allowed maintenance IP, call 
- 
Metrics validation - Ensure web.jobs.triggeredcounter increments during accepted runs.
- Exporters should show corresponding spans (feedser.job.trigger) if tracing is enabled.
 
- Ensure 
5. Troubleshooting
| Symptom | Probable cause | Remediation | 
|---|---|---|
| Audit log shows clientId=(none)for all requests | Authority not issuing client_idclaim or CLI outdated | Update StellaOps Authority configuration ( StellaOpsAuthorityOptions.Token.Claims.ClientId), or upgrade the CLI token acquisition flow. | 
| Requests succeed with bypass=Trueunexpectedly | Local network added to bypassNetworksor fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Feedser. | 
| HTTP 401 with valid token | requiredScopesmissing from client registration or token audience mismatch | Verify Authority client scopes ( feedser.jobs.trigger) and ensure the token audience matchesaudiencesconfig. | 
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set feedser.telemetry.enableMetrics=true, ensure collector includesStellaOps.Feedser.WebService.Jobsmeter. | 
| Sudden spike in web.jobs.trigger.failed | Downstream job failure or Authority timeout mid-request | Inspect Feedser job logs, re-run with tracing enabled, validate Authority latency. | 
6. References
- docs/21_INSTALL_GUIDE.md– Authority configuration quick start.
- docs/17_SECURITY_HARDENING_GUIDE.md– Security guardrails and enforcement deadlines.
- docs/ops/authority-monitoring.md– Authority-side monitoring and alerting playbook.
- StellaOps.Feedser.WebService/Filters/JobAuthorizationAuditFilter.cs– source of audit log fields.