Add authority bootstrap flows and Concelier ops runbooks
This commit is contained in:
@@ -15,7 +15,7 @@ Audit events share the `StellaOps.Cryptography.Audit.AuthEventRecord` contract.
|
||||
- `Client` — `AuthEventClient` with client identifier, display name, and originating provider/plugin.
|
||||
- `Scopes` — granted or requested OAuth scopes (sorted before emission).
|
||||
- `Network` — `AuthEventNetwork` with remote address, forwarded headers, and user agent string (all treated as PII).
|
||||
- `Properties` — additional `AuthEventProperty` entries for context-specific details (lockout durations, policy decisions, retries, etc.).
|
||||
- `Properties` — additional `AuthEventProperty` entries for context-specific details (lockout durations, policy decisions, retries, `request.tampered`/`request.unexpected_parameter`, `bootstrap.invite_token`, etc.).
|
||||
|
||||
## Data Classifications
|
||||
|
||||
@@ -33,7 +33,13 @@ Event names follow dotted notation:
|
||||
|
||||
- `authority.password.grant` — password grant handled by OpenIddict.
|
||||
- `authority.client_credentials.grant` — client credential grant handling.
|
||||
- `authority.token.tamper` — suspicious `/token` request detected (unexpected parameters or manipulated payload).
|
||||
- `authority.bootstrap.user` and `authority.bootstrap.client` — bootstrap API operations.
|
||||
- `authority.bootstrap.invite.created` — operator created a bootstrap invite.
|
||||
- `authority.bootstrap.invite.consumed` — invite consumed during user/client provisioning.
|
||||
- `authority.bootstrap.invite.expired` — invite expired without being used.
|
||||
- `authority.bootstrap.invite.rejected` — invite was rejected (invalid, mismatched provider/target, or already consumed).
|
||||
- `authority.token.replay.suspected` — replay heuristics detected a token being used from a new device fingerprint.
|
||||
- Future additions should preserve the `authority.<surface>.<action>` pattern to keep filtering deterministic.
|
||||
|
||||
## Persistence
|
||||
|
||||
@@ -82,9 +82,9 @@ flowchart LR
|
||||
| Threat | STRIDE Vector | Surface | Risk (L×I) | Existing Controls | Gaps / Actions | Owner |
|
||||
|--------|---------------|---------|------------|-------------------|----------------|-------|
|
||||
| Spoofed revocation bundle | Spoofing | TB5 — Authority ↔️ Agents | Med×High | Detached JWS signature (planned), offline kit checksums | Finalise signing key registry & verification script (SEC4.B/SEC4.HOST); add bundle freshness requirement | Security Guild (follow-up: **SEC5.B**) |
|
||||
| Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Add audit coverage for tampered inputs, align correlation IDs with SOC (SEC2.A/SEC2.B) | Security Guild + Authority Core (follow-up: **SEC5.C**) |
|
||||
| Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Enforce invite expiration + audit trail for unused invites | Security Guild (follow-up: **SEC5.D**) |
|
||||
| Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Planned revocation bundles, optional mTLS | Require agent binding (device fingerprint) and enforce revocation grace window alerts | Security Guild + Zastava (follow-up: **SEC5.E**) |
|
||||
| Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Tampered requests emit `authority.token.tamper` audit events (`request.tampered`, unexpected parameter names) correlating with `/token` outcomes (SEC5.C) | Security Guild + Authority Core (follow-up: **SEC5.C**) |
|
||||
| Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Invites expire automatically and emit audit events on consumption/expiration (SEC5.D) | Security Guild |
|
||||
| Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Signed revocation bundles, device fingerprint heuristics, optional mTLS | Monitor revocation acknowledgement latency via Zastava and tune replay alerting thresholds | Security Guild + Zastava (follow-up: **SEC5.E**) |
|
||||
| Privilege escalation via plug-in override | Elevation of Privilege | TB3 — Plug-in sandbox | Med×High | Signed plug-ins, restart-only loading, configuration validation | Add static analysis on manifest overrides + runtime warning when policy weaker than host | Security Guild + DevOps (follow-up: **SEC5.F**) |
|
||||
| Offline bundle tampering | Tampering | Distribution | Low×High | SHA256 manifest, signed bundles (planned) | Add supply-chain attestation for Offline Kit, publish verification CLI in docs | Security Guild + Ops (follow-up: **SEC5.G**) |
|
||||
| Failure to log denied tokens | Repudiation | TB2 — Authority ↔️ Mongo | Med×Med | Serilog structured events (partial), Mongo persistence path (planned) | Finalise audit schema (SEC2.A) and ensure `/token` denies include subject/client/IP fields | Security Guild + Authority Core (follow-up: **SEC5.H**) |
|
||||
@@ -98,7 +98,7 @@ Risk scoring uses qualitative scale (Low/Med/High) for likelihood × impact; mit
|
||||
| SEC5.B | Spoofed revocation bundle | Complete libsodium/Core signing integration and ship revocation verification script. | Security Guild + Authority Core |
|
||||
| SEC5.C | Parameter tampering on `/token` | Finalise audit contract (`SEC2.A`) and add request tamper logging. | Security Guild + Authority Core |
|
||||
| SEC5.D | Bootstrap invite replay | Implement expiry enforcement + audit coverage for unused bootstrap invites. | Security Guild |
|
||||
| SEC5.E | Token replay by stolen agent | Document device binding requirements and create detector for stale revocation acknowledgements. | Security Guild + Zastava |
|
||||
| SEC5.E | Token replay by stolen agent | Coordinate Zastava alerting with the new device fingerprint heuristics and surface stale revocation acknowledgements. | Security Guild + Zastava |
|
||||
| SEC5.F | Plug-in override escalation | Static analysis of plug-in manifests; warn on weaker password policy overrides. | Security Guild + DevOps |
|
||||
| SEC5.G | Offline bundle tampering | Extend Offline Kit build to include attested manifest + verification CLI sample. | Security Guild + Ops |
|
||||
| SEC5.H | Failure to log denied tokens | Ensure audit persistence for all `/token` denials with correlation IDs. | Security Guild + Authority Core |
|
||||
|
||||
76
docs/security/rate-limits.md
Normal file
76
docs/security/rate-limits.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# StellaOps Authority Rate Limit Guidance
|
||||
|
||||
StellaOps Authority applies fixed-window rate limiting to critical endpoints so that brute-force and burst traffic are throttled before they can exhaust downstream resources. This guide complements the lockout policy documentation and captures the recommended defaults, override scenarios, and monitoring practices for `/token`, `/authorize`, and `/internal/*` routes.
|
||||
|
||||
## Configuration Overview
|
||||
|
||||
Rate limits live under `security.rateLimiting` in `authority.yaml` (and map to the same hierarchy for environment variables). Each endpoint exposes:
|
||||
|
||||
- `enabled` — toggles the limiter.
|
||||
- `permitLimit` — maximum requests per fixed window.
|
||||
- `window` — window duration expressed as an ISO-8601 timespan (e.g., `00:01:00`).
|
||||
- `queueLimit` — number of requests allowed to queue when the window is exhausted.
|
||||
|
||||
```yaml
|
||||
security:
|
||||
rateLimiting:
|
||||
token:
|
||||
enabled: true
|
||||
permitLimit: 30
|
||||
window: 00:01:00
|
||||
queueLimit: 0
|
||||
authorize:
|
||||
enabled: true
|
||||
permitLimit: 60
|
||||
window: 00:01:00
|
||||
queueLimit: 10
|
||||
internal:
|
||||
enabled: false
|
||||
permitLimit: 5
|
||||
window: 00:01:00
|
||||
queueLimit: 0
|
||||
```
|
||||
|
||||
When limits trigger, middleware decorates responses with `Retry-After` headers and log tags (`authority.endpoint`, `authority.client_id`, `authority.remote_ip`) so operators can correlate events with clients and source IPs.
|
||||
|
||||
Environment overrides follow the same hierarchy. For example:
|
||||
|
||||
```
|
||||
STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__PERMITLIMIT=60
|
||||
STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__WINDOW=00:01:00
|
||||
```
|
||||
|
||||
## Recommended Profiles
|
||||
|
||||
| Scenario | permitLimit | window | queueLimit | Notes |
|
||||
|----------|-------------|--------|------------|-------|
|
||||
| Default production | 30 | 60s | 0 | Balances anonymous quota (33 scans/day) with headroom for tenant bursts. |
|
||||
| High-trust clustered IPs | 60 | 60s | 5 | Requires WAF allowlist + alert `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"} <= 1%` sustained. |
|
||||
| Air-gapped lab | 10 | 120s | 0 | Lower concurrency reduces noise when running from shared bastion hosts. |
|
||||
| Incident lockdown | 5 | 300s | 0 | Pair with credential lockout limit of 3 attempts and SOC paging for each denial. |
|
||||
|
||||
### Lockout Interplay
|
||||
|
||||
- Rate limiting throttles by IP/client; lockout policies apply per subject. Keep both enabled.
|
||||
- During lockdown scenarios, reduce `security.lockout.maxFailures` alongside the rate limits above so that subjects face quicker escalation.
|
||||
- Map support playbooks to the observed `Retry-After` value: anything above 120 seconds should trigger manual investigation before re-enabling clients.
|
||||
|
||||
## Monitoring and Alerts
|
||||
|
||||
1. **Metrics**
|
||||
- `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` for `/token`.
|
||||
- `aspnetcore_rate_limiting_rejections_total{limiter="authority-authorize"}` for `/authorize`.
|
||||
- Custom counters derived from the structured log tags (`authority.remote_ip`, `authority.client_id`).
|
||||
2. **Dashboards**
|
||||
- Requests vs. rejections per endpoint.
|
||||
- Top offending clients/IP ranges in the current window.
|
||||
- Heatmap of retry-after durations to spot persistent throttling.
|
||||
3. **Alerts**
|
||||
- Notify SOC when 429 rates exceed 25 % for five consecutive minutes on any limiter.
|
||||
- Trigger client-specific alerts when a single client_id produces >100 throttle events/hour.
|
||||
|
||||
## Operational Checklist
|
||||
|
||||
- Validate updated limits in staging before production rollout; smoke-test with representative workload.
|
||||
- When raising limits, confirm audit events continue to capture `authority.client_id`, `authority.remote_ip`, and correlation IDs for throttle responses.
|
||||
- Document any overrides in the change log, including justification and expiry review date.
|
||||
@@ -43,6 +43,7 @@ Consumers MUST treat the combination of `schemaVersion` and `sequence` as a mono
|
||||
{
|
||||
"alg": "ES256",
|
||||
"kid": "{signingKeyId}",
|
||||
"provider": "{providerName}",
|
||||
"typ": "application/vnd.stellaops.revocation-bundle+jws",
|
||||
"b64": false,
|
||||
"crit": ["b64"]
|
||||
@@ -54,8 +55,28 @@ Verification steps:
|
||||
|
||||
1. Validate `revocation-bundle.json` against the schema.
|
||||
2. Re-compute SHA-256 and compare with `.sha256` (if present).
|
||||
3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle.
|
||||
4. Verify the detached JWS using the stored signing key (example tooling coming with `stella auth revoke verify`).
|
||||
3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle, preferring the provider declared in the JWS header (`provider` falls back to `default`).
|
||||
4. Verify the detached JWS using the resolved provider. The CLI mirrors Authority resolution, so builds compiled with `StellaOpsCryptoSodium=true` automatically use the libsodium provider when advertised; otherwise verification downgrades to the managed fallback.
|
||||
|
||||
### CLI verification workflow
|
||||
|
||||
Use the bundled CLI command before distributing a bundle:
|
||||
|
||||
```bash
|
||||
stellaops auth revoke verify \
|
||||
--bundle artifacts/revocation-bundle.json \
|
||||
--signature artifacts/revocation-bundle.json.jws \
|
||||
--key etc/authority/signing/authority-public.pem \
|
||||
--verbose
|
||||
```
|
||||
|
||||
The verifier performs three checks:
|
||||
|
||||
1. Prints the computed digest in `sha256:<hex>` format. Compare it with the exported `.sha256` artefact.
|
||||
2. Confirms the detached JWS header advertises `b64: false`, captures the provider hint, and that the algorithm matches the Authority configuration (ES256 unless overridden).
|
||||
3. Registers the supplied PEM key with the crypto provider registry and validates the signature (falling back to the managed provider when the hinted provider is unavailable).
|
||||
|
||||
A zero exit code means the bundle is ready for mirroring/import. Non-zero codes signal missing arguments, malformed JWS payloads, or signature mismatches; regenerate or re-sign the bundle before distribution.
|
||||
|
||||
## Example
|
||||
|
||||
@@ -64,7 +85,7 @@ The repository contains an [example bundle](revocation-bundle-example.json) demo
|
||||
## Operations Quick Reference
|
||||
|
||||
- `stella auth revoke export` emits a canonical JSON bundle, `.sha256` digest, and detached JWS signature in one command. Use `--output` to write into your mirror staging directory.
|
||||
- `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key and reports digest mismatches before distribution.
|
||||
- `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key, honours the `provider` metadata embedded in the signature, and reports digest mismatches before distribution.
|
||||
- `POST /internal/revocations/export` provides the same payload for orchestrators that already talk to the bootstrap API.
|
||||
- `POST /internal/signing/rotate` rotates JWKS material without downtime; always export a fresh bundle afterward so downstream mirrors receive signatures from the new `kid`.
|
||||
- Offline Kit automation should mirror `revocation-bundle.json*` alongside Feedser exports so agents ingest revocations during the same sync pass.
|
||||
|
||||
Reference in New Issue
Block a user