- Implemented PolicyDslValidator with command-line options for strict mode and JSON output. - Created PolicySchemaExporter to generate JSON schemas for policy-related models. - Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes. - Added project files and necessary dependencies for each tool. - Ensured proper error handling and usage instructions across tools.
		
			
				
	
	
	
		
			11 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Concelier Authority Audit Runbook
Last updated: 2025-10-22
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the /jobs* surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
1. Prerequisites
- Authority integration is enabled in concelier.yaml(or viaCONCELIER_AUTHORITY__*environment variables) with a validclientId, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (concelier.telemetry.*) or container stdout is shipped to your SIEM.
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
- The rollout table in docs/10_CONCELIER_CLI_QUICKSTART.mdhas been reviewed so stakeholders align on the staged → enforced toggle timeline.
Configuration snippet
concelier:
  authority:
    enabled: true
    allowAnonymousFallback: false          # keep true only during initial rollout
    issuer: "https://authority.internal"
    audiences:
      - "api://concelier"
    requiredScopes:
      - "concelier.jobs.trigger"
      - "advisory:read"
      - "advisory:ingest"
    requiredTenants:
      - "tenant-default"
    bypassNetworks:
      - "127.0.0.1/32"
      - "::1/128"
    clientId: "concelier-jobs"
    clientSecretFile: "/run/secrets/concelier_authority_client"
    tokenClockSkewSeconds: 60
    resilience:
      enableRetries: true
      retryDelays:
        - "00:00:01"
        - "00:00:02"
        - "00:00:05"
      allowOfflineCacheFallback: true
      offlineCacheTolerance: "00:10:00"
Store secrets outside source control. Concelier reads
clientSecretFileon startup; rotate by updating the mounted file and restarting the service.
Resilience tuning
- Connected sites: keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave allowOfflineCacheFallback=trueso cached discovery/JWKS data can bridge short Pathfinder restarts.
- Air-gapped/Offline Kit installs: extend offlineCacheTolerance(15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (enableRetries=false) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
- Concelier resolves these knobs through IOptionsMonitor<StellaOpsAuthClientOptions>. Edits toconcelier.yamlare applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
2. Key Signals
2.1 Audit log channel
Concelier emits structured audit entries via the Concelier.Authorization.Audit logger for every /jobs* request once Authority enforcement is active.
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7
| Field | Sample value | Meaning | 
|---|---|---|
| route | /jobs/definitions | Endpoint that processed the request. | 
| status | 200/401/409 | Final HTTP status code returned to the caller. | 
| subject | ops@example.com | User or service principal subject (falls back to (anonymous)when unauthenticated). | 
| clientId | concelier-cli | OAuth client ID provided by Authority ( (none)if the token lacked the claim). | 
| scopes | concelier.jobs.trigger advisory:ingest advisory:read | Normalised scope list extracted from token claims; (none)if the token carried none. | 
| tenant | tenant-default | Tenant claim extracted from the Authority token ( (none)when the token lacked it). | 
| bypass | True/False | Indicates whether the request succeeded because its source IP matched a bypass CIDR. | 
| remote | 10.1.4.7 | Remote IP recorded from the connection / forwarded header test hooks. | 
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- status=401 AND bypass=True– bypass network accepted an unauthenticated call (should be temporary during rollout).
- status=202 AND scopes="(none)"– a token without scopes triggered a job; tighten client configuration.
- status=202 AND NOT contains(scopes,"advisory:ingest")– ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above.
- tenant!=(tenant-default)– indicates a cross-tenant token was accepted. Ensure Concelier- requiredTenantsis aligned with Authority client registration.
- Spike in clientId="(none)"– indicates upstream Authority is not issuingclient_idclaims or the CLI is outdated.
2.2 Metrics
Concelier publishes counters under the OTEL meter StellaOps.Concelier.WebService.Jobs. Tags: job.kind, job.trigger, job.outcome.
| Metric name | Description | PromQL example | 
|---|---|---|
| web.jobs.triggered | Accepted job trigger requests. | sum by (job_kind) (rate(web_jobs_triggered_total[5m])) | 
| web.jobs.trigger.conflict | Rejected triggers (already running, disabled…). | sum(rate(web_jobs_trigger_conflict_total[5m])) | 
| web.jobs.trigger.failed | Server-side job failures. | sum(rate(web_jobs_trigger_failed_total[5m])) | 
Prometheus/OTEL collectors typically surface counters with
_totalsuffix. Adjust queries to match your pipeline’s generated metric names.
Correlate audit logs with the following global meter exported via Concelier.SourceDiagnostics:
- concelier.source.http.requests_total{concelier_source="jobs-run"}– ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
3. Alerting Guidance
- 
Unauthorized bypass attempt - Query: sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0
- Action: verify bypassNetworkslist; confirm expected maintenance windows; rotate credentials if suspicious.
 
- Query: 
- 
Missing scopes - Query: sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0
- Action: audit Authority client registration; ensure requiredScopesincludesconcelier.jobs.trigger,advisory:ingest, andadvisory:read.
 
- Query: 
- 
Trigger failure surge - Query: sum(rate(web_jobs_trigger_failed_total[10m])) > 0with severitywarningif sustained for 10 minutes.
- Action: inspect correlated audit entries and Concelier.Telemetrytraces for job execution errors.
 
- Query: 
- 
Conflict spike - Query: sum(rate(web_jobs_trigger_conflict_total[10m])) > 5(tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
 
- Query: 
- 
Authority offline - Watch Concelier.Authorization.Auditlogs forstatus=503orstatus=500along withclientId="(none)". Investigate Authority availability before re-enabling anonymous fallback.
 
- Watch 
4. Rollout & Verification Procedure
- 
Pre-checks - Align with the rollout phases documented in docs/10_CONCELIER_CLI_QUICKSTART.md(validation → rehearsal → enforced) and record the target dates in your change request.
- Confirm allowAnonymousFallbackisfalsein production; keeptrueonly during staged validation.
- Validate Authority issuer metadata is reachable from Concelier (curl https://authority.internal/.well-known/openid-configurationfrom the host).
 
- Align with the rollout phases documented in 
- 
Smoke test with valid token - Obtain a token via CLI: stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read.
- Trigger a read-only endpoint: curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions.
- Expect HTTP 200/202 and an audit log with bypass=False,scopes=concelier.jobs.trigger advisory:ingest advisory:read, andtenant=tenant-default.
 
- Obtain a token via CLI: 
- 
Negative test without token - Call the same endpoint without a token. Expect HTTP 401, bypass=False.
- If the request succeeds, double-check bypassNetworksand ensure fallback is disabled.
 
- Call the same endpoint without a token. Expect HTTP 401, 
- 
Bypass check (if applicable) - From an allowed maintenance IP, call /jobs/definitionswithout a token. Confirm the audit log showsbypass=True. Review business justification and expiry date for such entries.
 
- From an allowed maintenance IP, call 
- 
Metrics validation - Ensure web.jobs.triggeredcounter increments during accepted runs.
- Exporters should show corresponding spans (concelier.job.trigger) if tracing is enabled.
 
- Ensure 
5. Troubleshooting
| Symptom | Probable cause | Remediation | 
|---|---|---|
| Audit log shows clientId=(none)for all requests | Authority not issuing client_idclaim or CLI outdated | Update StellaOps Authority configuration ( StellaOpsAuthorityOptions.Token.Claims.ClientId), or upgrade the CLI token acquisition flow. | 
| Requests succeed with bypass=Trueunexpectedly | Local network added to bypassNetworksor fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. | 
| HTTP 401 with valid token | requiredScopesmissing from client registration or token audience mismatch | Verify Authority client scopes ( concelier.jobs.trigger) and ensure the token audience matchesaudiencesconfig. | 
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set concelier.telemetry.enableMetrics=true, ensure collector includesStellaOps.Concelier.WebService.Jobsmeter. | 
| Sudden spike in web.jobs.trigger.failed | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. | 
6. References
- docs/21_INSTALL_GUIDE.md– Authority configuration quick start.
- docs/17_SECURITY_HARDENING_GUIDE.md– Security guardrails and enforcement deadlines.
- docs/ops/authority-monitoring.md– Authority-side monitoring and alerting playbook.
- StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs– source of audit log fields.