- Added RustCargoLockParser to parse Cargo.lock files and extract package information. - Introduced RustFingerprintScanner to scan for Rust fingerprint records in JSON files. - Created test fixtures for Rust language analysis, including Cargo.lock and fingerprint JSON files. - Developed tests for RustLanguageAnalyzer to ensure deterministic output based on provided fixtures. - Added expected output files for both simple and signed Rust applications.
10 KiB
Concelier Authority Audit Runbook
Last updated: 2025-10-22
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the /jobs* surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
1. Prerequisites
- Authority integration is enabled in
concelier.yaml(or viaCONCELIER_AUTHORITY__*environment variables) with a validclientId, secret, audience, and required scopes. - OTLP metrics/log exporters are configured (
concelier.telemetry.*) or container stdout is shipped to your SIEM. - Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
- The rollout table in
docs/10_CONCELIER_CLI_QUICKSTART.mdhas been reviewed so stakeholders align on the staged → enforced toggle timeline.
Configuration snippet
concelier:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://concelier"
requiredScopes:
- "concelier.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "concelier-jobs"
clientSecretFile: "/run/secrets/concelier_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
Store secrets outside source control. Concelier reads
clientSecretFileon startup; rotate by updating the mounted file and restarting the service.
Resilience tuning
- Connected sites: keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave
allowOfflineCacheFallback=trueso cached discovery/JWKS data can bridge short Pathfinder restarts. - Air-gapped/Offline Kit installs: extend
offlineCacheTolerance(15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (enableRetries=false) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs. - Concelier resolves these knobs through
IOptionsMonitor<StellaOpsAuthClientOptions>. Edits toconcelier.yamlare applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
2. Key Signals
2.1 Audit log channel
Concelier emits structured audit entries via the Concelier.Authorization.Audit logger for every /jobs* request once Authority enforcement is active.
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger bypass=False remote=10.1.4.7
| Field | Sample value | Meaning |
|---|---|---|
route |
/jobs/definitions |
Endpoint that processed the request. |
status |
200 / 401 / 409 |
Final HTTP status code returned to the caller. |
subject |
ops@example.com |
User or service principal subject (falls back to (anonymous) when unauthenticated). |
clientId |
concelier-cli |
OAuth client ID provided by Authority ( (none) if the token lacked the claim). |
scopes |
concelier.jobs.trigger |
Normalised scope list extracted from token claims; (none) if the token carried none. |
bypass |
True / False |
Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
remote |
10.1.4.7 |
Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
status=401 AND bypass=True– bypass network accepted an unauthenticated call (should be temporary during rollout).status=202 AND scopes="(none)"– a token without scopes triggered a job; tighten client configuration.- Spike in
clientId="(none)"– indicates upstream Authority is not issuingclient_idclaims or the CLI is outdated.
2.2 Metrics
Concelier publishes counters under the OTEL meter StellaOps.Concelier.WebService.Jobs. Tags: job.kind, job.trigger, job.outcome.
| Metric name | Description | PromQL example |
|---|---|---|
web.jobs.triggered |
Accepted job trigger requests. | sum by (job_kind) (rate(web_jobs_triggered_total[5m])) |
web.jobs.trigger.conflict |
Rejected triggers (already running, disabled…). | sum(rate(web_jobs_trigger_conflict_total[5m])) |
web.jobs.trigger.failed |
Server-side job failures. | sum(rate(web_jobs_trigger_failed_total[5m])) |
Prometheus/OTEL collectors typically surface counters with
_totalsuffix. Adjust queries to match your pipeline’s generated metric names.
Correlate audit logs with the following global meter exported via Concelier.SourceDiagnostics:
concelier.source.http.requests_total{concelier_source="jobs-run"}– ensures REST/manual triggers route through Authority.- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
3. Alerting Guidance
-
Unauthorized bypass attempt
- Query:
sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0 - Action: verify
bypassNetworkslist; confirm expected maintenance windows; rotate credentials if suspicious.
- Query:
-
Missing scopes
- Query:
sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0 - Action: audit Authority client registration; ensure
requiredScopesincludesconcelier.jobs.trigger.
- Query:
-
Trigger failure surge
- Query:
sum(rate(web_jobs_trigger_failed_total[10m])) > 0with severitywarningif sustained for 10 minutes. - Action: inspect correlated audit entries and
Concelier.Telemetrytraces for job execution errors.
- Query:
-
Conflict spike
- Query:
sum(rate(web_jobs_trigger_conflict_total[10m])) > 5(tune threshold). - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
- Query:
-
Authority offline
- Watch
Concelier.Authorization.Auditlogs forstatus=503orstatus=500along withclientId="(none)". Investigate Authority availability before re-enabling anonymous fallback.
- Watch
4. Rollout & Verification Procedure
-
Pre-checks
- Align with the rollout phases documented in
docs/10_CONCELIER_CLI_QUICKSTART.md(validation → rehearsal → enforced) and record the target dates in your change request. - Confirm
allowAnonymousFallbackisfalsein production; keeptrueonly during staged validation. - Validate Authority issuer metadata is reachable from Concelier (
curl https://authority.internal/.well-known/openid-configurationfrom the host).
- Align with the rollout phases documented in
-
Smoke test with valid token
- Obtain a token via CLI:
stella auth login --scope concelier.jobs.trigger. - Trigger a read-only endpoint:
curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions. - Expect HTTP 200/202 and an audit log with
bypass=False,scopes=concelier.jobs.trigger.
- Obtain a token via CLI:
-
Negative test without token
- Call the same endpoint without a token. Expect HTTP 401,
bypass=False. - If the request succeeds, double-check
bypassNetworksand ensure fallback is disabled.
- Call the same endpoint without a token. Expect HTTP 401,
-
Bypass check (if applicable)
- From an allowed maintenance IP, call
/jobs/definitionswithout a token. Confirm the audit log showsbypass=True. Review business justification and expiry date for such entries.
- From an allowed maintenance IP, call
-
Metrics validation
- Ensure
web.jobs.triggeredcounter increments during accepted runs. - Exporters should show corresponding spans (
concelier.job.trigger) if tracing is enabled.
- Ensure
5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---|---|---|
Audit log shows clientId=(none) for all requests |
Authority not issuing client_id claim or CLI outdated |
Update StellaOps Authority configuration (StellaOpsAuthorityOptions.Token.Claims.ClientId), or upgrade the CLI token acquisition flow. |
Requests succeed with bypass=True unexpectedly |
Local network added to bypassNetworks or fallback still enabled |
Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
| HTTP 401 with valid token | requiredScopes missing from client registration or token audience mismatch |
Verify Authority client scopes (concelier.jobs.trigger) and ensure the token audience matches audiences config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set concelier.telemetry.enableMetrics=true, ensure collector includes StellaOps.Concelier.WebService.Jobs meter. |
Sudden spike in web.jobs.trigger.failed |
Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
6. References
docs/21_INSTALL_GUIDE.md– Authority configuration quick start.docs/17_SECURITY_HARDENING_GUIDE.md– Security guardrails and enforcement deadlines.docs/ops/authority-monitoring.md– Authority-side monitoring and alerting playbook.StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs– source of audit log fields.