up
Some checks failed
Build Test Deploy / docs (push) Has been cancelled
Build Test Deploy / deploy (push) Has been cancelled
Build Test Deploy / build-test (push) Has been cancelled
Build Test Deploy / authority-container (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
2025-10-12 20:37:18 +03:00
parent b97fc7685a
commit 607e72e2a1
306 changed files with 21409 additions and 4449 deletions

View File

@@ -228,9 +228,11 @@ See `docs/dev/32_AUTH_CLIENT_GUIDE.md` for recommended profiles (online vs. air-
| `stellaops-cli db merge` | Run canonical merge reconcile | — | Calls `POST /jobs/merge:reconcile`; exit code `0` on acceptance, `1` on failures/conflicts |
| `stellaops-cli db export` | Kick JSON / Trivy exports | `--format <json\|trivy-db>` (default `json`)<br>`--delta`<br>`--publish-full/--publish-delta`<br>`--bundle-full/--bundle-delta` | Sets `{ delta = true }` parameter when requested and can override ORAS/bundle toggles per run |
| `stellaops-cli auth <login\|logout\|status\|whoami>` | Manage cached tokens for StellaOps Authority | `auth login --force` (ignore cache)<br>`auth status`<br>`auth whoami` | Uses `StellaOps.Auth.Client`; honours `StellaOps:Authority:*` configuration, stores tokens under `~/.stellaops/tokens` by default, and `whoami` prints subject/scope/expiry |
| `stellaops-cli auth revoke export` | Export the Authority revocation bundle | `--output <directory>` (defaults to CWD) | Writes `revocation-bundle.json`, `.json.jws`, and `.json.sha256`; verifies the digest locally and includes key metadata in the log summary. |
| `stellaops-cli auth revoke verify` | Validate a revocation bundle offline | `--bundle <path>` `--signature <path>` `--key <path>`<br>`--verbose` | Verifies detached JWS signatures, reports the computed SHA-256, and can fall back to cached JWKS when `--key` is omitted. |
| `stellaops-cli config show` | Display resolved configuration | — | Masks secret values; helpful for airgapped installs |
When running on an interactive terminal without explicit override flags, the CLI uses Spectre.Console prompts to let you choose per-run ORAS/offline bundle behaviour.
| `stellaops-cli config show` | Display resolved configuration | — | Masks secret values; helpful for airgapped installs |
**Logging & exit codes**
@@ -340,6 +342,30 @@ Drop `appsettings.local.json` or `.yaml` beside the binary to override per envir
---
### 2.6 Authority Admin APIs
Administrative endpoints live under `/internal/*` on the Authority host and require the bootstrap API key (`x-stellaops-bootstrap-key`). Responses are deterministic and audited via `AuthEventRecord`.
| Path | Method | Description |
| ---- | ------ | ----------- |
| `/internal/revocations/export` | GET | Returns the revocation bundle (JSON + detached JWS + digest). Mirrors the output of `stellaops-cli auth revoke export`. |
| `/internal/signing/rotate` | POST | Promotes a new signing key and marks the previous key as retired without restarting the service. |
**Rotate request body**
```json
{
"keyId": "authority-signing-2025",
"location": "../certificates/authority-signing-2025.pem",
"source": "file",
"provider": "default"
}
```
The API responds with the active `kid`, previous key (if any), and the set of retired key identifiers. Always export a fresh revocation bundle after rotation so downstream mirrors receive signatures from the new key.
---
## 3 FirstParty CLI Tools
### 3.1 `stella`

161
docs/11_AUTHORITY.md Normal file
View File

@@ -0,0 +1,161 @@
# StellaOps Authority Service
> **Status:** Drafted 2025-10-12 (CORE5B.DOC / DOC1.AUTH) aligns with Authority revocation store, JWKS rotation, and bootstrap endpoints delivered in Sprint1.
## 1. Purpose
The **StellaOps Authority** service issues OAuth2/OIDC tokens for every StellaOps module (Feedser, Backend, Agent, Zastava) and exposes the policy controls required in sovereign/offline environments. Authority is built as a minimal ASP.NET host that:
- brokers password, client-credentials, and device-code flows through pluggable identity providers;
- persists access/refresh/device tokens in MongoDB with deterministic schemas for replay analysis and air-gapped audit copies;
- distributes revocation bundles and JWKS material so downstream services can enforce lockouts without direct database access;
- offers bootstrap APIs for first-run provisioning and key rotation without redeploying binaries.
Authority is deployed alongside Feedser in air-gapped environments and never requires outbound internet access. All trusted metadata (OpenIddict discovery, JWKS, revocation bundles) is cacheable, signed, and reproducible.
## 2. Component Architecture
Authority is composed of five cooperating subsystems:
1. **Minimal API host** configures OpenIddict endpoints (`/token`, `/authorize`, `/revoke`, `/jwks`) and structured logging/telemetry. Rate limiting hooks (`AuthorityRateLimiter`) wrap every request.
2. **Plugin host** loads `StellaOps.Authority.Plugin.*.dll` assemblies, applies capability metadata, and exposes password/client provisioning surfaces through dependency injection.
3. **Mongo storage** persists tokens, revocations, bootstrap invites, and plugin state in deterministic collections indexed for offline sync (`authority_tokens`, `authority_revocations`, etc.).
4. **Cryptography layer** `StellaOps.Cryptography` abstractions manage password hashing, signing keys, JWKS export, and detached JWS generation.
5. **Offline ops APIs** internal endpoints under `/internal/*` provide administrative flows (bootstrap users/clients, revocation export) guarded by API keys and deterministic audit events.
A high-level sequence for password logins:
```
Client -> /token (password grant)
-> Rate limiter & audit hooks
-> Plugin credential store (Argon2id verification)
-> Token persistence (Mongo authority_tokens)
-> Response (access/refresh tokens + deterministic claims)
```
## 3. Token Lifecycle & Persistence
Authority persists every issued token in MongoDB so operators can audit or revoke without scanning distributed caches.
- **Collection:** `authority_tokens`
- **Key fields:**
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
- `subjectId`, `clientId`, ordered `scope` array
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
- `revokedMetadata` (string dictionary for plugin-specific context)
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
- **Expiry maintenance:** `AuthorityTokenStore.DeleteExpiredAsync` prunes non-revoked tokens past their `expiresAt` timestamp. Operators should schedule this in maintenance windows if large volumes of tokens are issued.
### Expectations for resource servers
Resource servers (Feedser WebService, Backend, Agent) **must not** assume in-memory caches are authoritative. They should:
- cache `/jwks` and `/revocations/export` responses within configured lifetimes;
- honour `revokedReason` metadata when shaping audit trails;
- treat `status != "valid"` or missing tokens as immediate denial conditions.
## 4. Revocation Pipeline
Authority centralises revocation in `authority_revocations` with deterministic categories:
| Category | Meaning | Required fields |
| --- | --- | --- |
| `token` | Specific OAuth token revoked early. | `revocationId` (token id), `tokenType`, optional `clientId`, `subjectId` |
| `subject` | All tokens for a subject disabled. | `revocationId` (= subject id) |
| `client` | OAuth client registration revoked. | `revocationId` (= client id) |
| `key` | Signing/JWE key withdrawn. | `revocationId` (= key id) |
`RevocationBundleBuilder` flattens Mongo documents into canonical JSON, sorts entries by (`category`, `revocationId`, `revokedAt`), and signs exports using detached JWS (RFC7797) with cosign-compatible headers.
**Export surfaces** (deterministic output, suitable for Offline Kit):
- CLI: `stella auth revoke export --output ./out` writes `revocation-bundle.json`, `.jws`, `.sha256`.
- API: `GET /internal/revocations/export` (requires bootstrap API key) returns the same payload.
- Verification: `stella auth revoke verify` validates schema, digest, and detached JWS using cached JWKS or offline keys.
**Consumer guidance:**
1. Mirror `revocation-bundle.json*` alongside Feedser exports. Offline agents fetch both over the existing update channel.
2. Use bundle `sequence` and `bundleId` to detect replay or monotonicity regressions. Ignore bundles with older sequence numbers unless `bundleId` changes and `issuedAt` advances.
3. Treat `revokedReason` taxonomy as machine-friendly codes (`compromised`, `rotation`, `policy`, `lifecycle`). Translating to human-readable logs is the consumers responsibility.
## 5. Signing Keys & JWKS Rotation
Authority signs revocation bundles and publishes JWKS entries via the new signing manager:
- **Configuration (`authority.yaml`):**
```yaml
signing:
enabled: true
algorithm: ES256 # Defaults to ES256
keySource: file # Loader identifier (file, vault, etc.)
provider: default # Optional preferred crypto provider
activeKeyId: authority-signing-dev
keyPath: "../certificates/authority-signing-dev.pem"
additionalKeys:
- keyId: authority-signing-dev-2024
path: "../certificates/authority-signing-dev-2024.pem"
source: "file"
```
- **Sources:** The default loader supports PEM files relative to the content root; additional loaders can be registered via `IAuthoritySigningKeySource`.
- **Providers:** Keys are registered against the `ICryptoProviderRegistry`, so alternative implementations (HSM, libsodium) can be plugged in without changing host code.
- **JWKS output:** `GET /jwks` lists every signing key with `status` metadata (`active`, `retired`). Old keys remain until operators remove them from configuration, allowing verification of historical bundles/tokens.
### Rotation SOP (no downtime)
1. Generate a new P-256 private key (PEM) on an offline workstation and place it where the Authority host can read it (e.g., `../certificates/authority-signing-2025.pem`).
2. Call the authenticated admin API:
```bash
curl -sS -X POST https://authority.example.com/internal/signing/rotate \
-H "x-stellaops-bootstrap-key: ${BOOTSTRAP_KEY}" \
-H "Content-Type: application/json" \
-d '{
"keyId": "authority-signing-2025",
"location": "../certificates/authority-signing-2025.pem",
"source": "file"
}'
```
3. Verify the response reports the previous key as retired and fetch `/jwks` to confirm the new `kid` appears with `status: "active"`.
4. Persist the old key path in `signing.additionalKeys` (the rotation API updates in-memory options; rewrite the YAML to match so restarts remain consistent).
5. If you prefer automation, trigger the `.gitea/workflows/authority-key-rotation.yml` workflow with the new `keyId`/`keyPath`; it wraps `ops/authority/key-rotation.sh` and reads environment-specific secrets. The older key will be marked `retired` and appended to `signing.additionalKeys`.
6. Re-run `stella auth revoke export` so revocation bundles are signed with the new key. Downstream caches should refresh JWKS within their configured lifetime (`StellaOpsAuthorityOptions.Signing` + client cache tolerance).
The rotation API leverages the same cryptography abstractions as revocation signing; no restart is required and the previous key is marked `retired` but kept available for verification.
## 6. Bootstrap & Administrative Endpoints
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
| Endpoint | Method | Description |
| --- | --- | --- |
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
## 7. Configuration Reference
| Section | Key | Description | Notes |
| --- | --- | --- | --- |
| Root | `issuer` | Absolute HTTPS issuer advertised to clients. | Required. Loopback HTTP allowed only for development. |
| Tokens | `accessTokenLifetime`, `refreshTokenLifetime`, etc. | Lifetimes for each grant (access, refresh, device, authorization code, identity). | Enforced during issuance; persisted on each token document. |
| Storage | `storage.connectionString` | MongoDB connection string. | Required even for tests; offline kits ship snapshots for seeding. |
| Signing | `signing.enabled` | Enable JWKS/revocation signing. | Disable only for development. |
| Signing | `signing.algorithm` | Signing algorithm identifier. | Currently ES256; additional curves can be wired through crypto providers. |
| Signing | `signing.keySource` | Loader identifier (`file`, `vault`, custom). | Determines which `IAuthoritySigningKeySource` resolves keys. |
| Signing | `signing.keyPath` | Relative/absolute path understood by the loader. | Stored as-is; rotation request should keep it in sync with filesystem layout. |
| Signing | `signing.activeKeyId` | Active JWKS / revocation signing key id. | Exposed as `kid` in JWKS and bundles. |
| Signing | `signing.additionalKeys[].keyId` | Retired key identifier retained for verification. | Manager updates this automatically after rotation; keep YAML aligned. |
| Signing | `signing.additionalKeys[].source` | Loader identifier per retired key. | Defaults to `signing.keySource` if omitted. |
| Security | `security.rateLimiting` | Fixed-window limits for `/token`, `/authorize`, `/internal/*`. | See `docs/security/rate-limits.md` for tuning. |
| Bootstrap | `bootstrap.apiKey` | Shared secret required for `/internal/*`. | Only required when `bootstrap.enabled` is true. |
## 8. Offline & Sovereign Operation
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
## 9. Operational Checklist
- [ ] Protect the bootstrap API key and disable bootstrap endpoints (`bootstrap.enabled: false`) once initial setup is complete.
- [ ] Schedule `stella auth revoke export` (or `/internal/revocations/export`) at the same cadence as Feedser exports so bundles remain in lockstep.
- [ ] Rotate signing keys before expiration; keep at least one retired key until all cached bundles/tokens signed with it have expired.
- [ ] Monitor `/health` and `/ready` plus rate-limiter metrics to detect plugin outages early.
- [ ] Ensure downstream services cache JWKS and revocation bundles within tolerances; stale caches risk accepting revoked tokens.
For plug-in specific requirements, refer to **[Authority Plug-in Developer Guide](dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md)**. For revocation bundle validation workflow, see **[Authority Revocation Bundle](security/revocation-bundle.md)**.

View File

@@ -159,6 +159,7 @@ cosign verify ghcr.io/stellaops/backend@sha256:<DIGEST> \
showing whether a network bypass CIDR allowed the request. Configure your SIEM
to alert when unauthenticated requests (`status=401`) appear with
`bypass=true`, or when unexpected scopes invoke job triggers.
Detailed monitoring and response guidance lives in `docs/ops/feedser-authority-audit-runbook.md`.
## 8Update & patch strategy

View File

@@ -93,12 +93,20 @@ The Feedser container reads configuration from `etc/feedser.yaml` plus
FEEDSER_AUTHORITY__CLIENTSECRETFILE="/run/secrets/feedser_authority_client"
FEEDSER_AUTHORITY__BYPASSNETWORKS__0="127.0.0.1/32"
FEEDSER_AUTHORITY__BYPASSNETWORKS__1="::1/128"
FEEDSER_AUTHORITY__RESILIENCE__ENABLERETRIES=true
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__0="00:00:01"
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__1="00:00:02"
FEEDSER_AUTHORITY__RESILIENCE__RETRYDELAYS__2="00:00:05"
FEEDSER_AUTHORITY__RESILIENCE__ALLOWOFFLINECACHEFALLBACK=true
FEEDSER_AUTHORITY__RESILIENCE__OFFLINECACHETOLERANCE="00:10:00"
```
Store the client secret outside source control (Docker secrets, mounted file,
or Kubernetes Secret). Feedser loads the secret during post-configuration, so
the value never needs to appear in the YAML template.
Connected sites can keep the retry ladder short (1s,2s,5s) so job triggers fail fast when Authority is down. For airgapped or intermittently connected deployments, extend `RESILIENCE__OFFLINECACHETOLERANCE` (e.g. `00:30:00`) so cached discovery/JWKS data remains valid while the Offline Kit synchronises upstream changes.
2. Redeploy Feedser:
```bash
@@ -106,9 +114,10 @@ The Feedser container reads configuration from `etc/feedser.yaml` plus
```
3. Tail the logs: `docker compose logs -f feedser`. Successful `/jobs*` calls now
emit `Feedser.Authorization.Audit` entries listing subject, client ID, scopes,
remote IP, and whether the bypass CIDR allowed the call. 401 denials always log
`bypassAllowed=false` so unauthenticated cron jobs are easy to catch.
emit `Feedser.Authorization.Audit` entries with `route`, `status`, `subject`,
`clientId`, `scopes`, `bypass`, and `remote` fields. 401 denials keep the same
shape—watch for `bypass=True`, which indicates a bypass CIDR accepted an anonymous
call. See `docs/ops/feedser-authority-audit-runbook.md` for a full audit/alerting checklist.
> **Enforcement deadline** keep `FEEDSER_AUTHORITY__ALLOWANONYMOUSFALLBACK=true`
> only while validating the rollout. Set it to `false` (and restart Feedser)

View File

@@ -40,11 +40,13 @@ Everything here is opensource and versioned— when you check out a git ta
- [`zastava_scanner.md`](08_MODULE_SPECIFICATIONS/zastava_scanner.md)
- [`registry_scanner.md`](08_MODULE_SPECIFICATIONS/registry_scanner.md)
- [`nightly_scheduler.md`](08_MODULE_SPECIFICATIONS/nightly_scheduler.md)
- **09[API&CLI Reference](09_API_CLI_REFERENCE.md)**
- **10[Plugin SDK Guide](10_PLUGIN_SDK_GUIDE.md)**
- **11[Data Schemas](11_DATA_SCHEMAS.md)**
- **12[Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
- **13[ReleaseEngineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
- **09[API&CLI Reference](09_API_CLI_REFERENCE.md)**
- **10[Plugin SDK Guide](10_PLUGIN_SDK_GUIDE.md)**
- **11[Authority Service](11_AUTHORITY.md)**
- **11[Data Schemas](11_DATA_SCHEMAS.md)**
- **12[Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
- **13[ReleaseEngineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
- **30[Fixture Maintenance](dev/fixtures.md)**
### User & operator guides
- **14[Glossary](14_GLOSSARY_OF_TERMS.md)**
@@ -53,9 +55,11 @@ Everything here is opensource and versioned— when you check out a git ta
- **18[Coding Standards](18_CODING_STANDARDS.md)**
- **19[TestSuite Overview](19_TEST_SUITE_OVERVIEW.md)**
- **21[Install Guide](21_INSTALL_GUIDE.md)**
- **22[CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
- **23[FAQ](23_FAQ_MATRIX.md)**
- **24[Offline Update Kit Admin Guide](24_OUK_ADMIN_GUIDE.md)**
- **22[CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
- **23[FAQ](23_FAQ_MATRIX.md)**
- **24[Offline Update Kit Admin Guide](24_OUK_ADMIN_GUIDE.md)**
- **26[Authority Key Rotation Playbook](ops/authority-key-rotation.md)**
- **25[Feedser Apple Connector Operations](ops/feedser-apple-operations.md)**
### Legal & licence
- **29[Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**

View File

@@ -3,10 +3,10 @@
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|----|--------|----------|------------|-------------|---------------|
| DOC4.AUTH-PDG | REVIEW | Docs Guild, Plugin Team | PLG6.DOC | Copy-edit `docs/dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md`, export lifecycle diagram, add LDAP RFC cross-link. | ✅ PR merged with polish; ✅ Diagram committed; ✅ Slack handoff posted. |
| DOC1.AUTH | TODO | Docs Guild, Authority Core | CORE5B.DOC | Draft `docs/11_AUTHORITY.md` covering architecture, configuration, bootstrap flows. | ✅ Architecture + config sections approved by Core; ✅ Samples reference latest options; ✅ Offline note added. |
| DOC3.Feedser-Authority | DOING (2025-10-10) | Docs Guild, DevEx | FSR4 | Polish operator/runbook sections (DOC3/DOC5) to document Feedser authority rollout, bypass logging, and enforcement checklist. | ✅ DOC3/DOC5 updated; ✅ enforcement deadline highlighted; ✅ Docs guild sign-off. |
| DOC5.Feedser-Runbook | TODO | Docs Guild | DOC3.Feedser-Authority | Produce dedicated Feedser authority audit runbook covering log fields, monitoring recommendations, and troubleshooting steps. | ✅ Runbook published; ✅ linked from DOC3/DOC5; ✅ alerting guidance included. |
| DOC1.AUTH | DONE (2025-10-12) | Docs Guild, Authority Core | CORE5B.DOC | Draft `docs/11_AUTHORITY.md` covering architecture, configuration, bootstrap flows. | ✅ Architecture + config sections approved by Core; ✅ Samples reference latest options; ✅ Offline note added. |
| DOC3.Feedser-Authority | DONE (2025-10-12) | Docs Guild, DevEx | FSR4 | Polish operator/runbook sections (DOC3/DOC5) to document Feedser authority rollout, bypass logging, and enforcement checklist. | ✅ DOC3/DOC5 updated with audit runbook references; ✅ enforcement deadline highlighted; ✅ Docs guild sign-off. |
| DOC5.Feedser-Runbook | DONE (2025-10-12) | Docs Guild | DOC3.Feedser-Authority | Produce dedicated Feedser authority audit runbook covering log fields, monitoring recommendations, and troubleshooting steps. | ✅ Runbook published; ✅ linked from DOC3/DOC5; ✅ alerting guidance included. |
| FEEDDOCS-DOCS-05-001 | DONE (2025-10-11) | Docs Guild | FEEDMERGE-ENGINE-04-001, FEEDMERGE-ENGINE-04-002 | Publish Feedser conflict resolution runbook covering precedence workflow, merge-event auditing, and Sprint 3 metrics. | ✅ `docs/ops/feedser-conflict-resolution.md` committed; ✅ metrics/log tables align with latest merge code; ✅ Ops alert guidance handed to Feedser team. |
| FEEDDOCS-DOCS-05-002 | TODO | Docs Guild, Feedser Ops | FEEDDOCS-DOCS-05-001 | Capture ops sign-off: circulate conflict runbook, tune alert thresholds, and document rollout decisions in change log. | ✅ Ops review recorded; ✅ alert thresholds finalised; ✅ change-log entry linked from runbook. |
| FEEDDOCS-DOCS-05-002 | TODO | Docs Guild, Feedser Ops | FEEDDOCS-DOCS-05-001 | Capture ops sign-off: circulate conflict runbook, tune alert thresholds, and document rollout decisions in change log. | ✅ Ops review recorded; ✅ alert thresholds finalised using `docs/ops/feedser-authority-audit-runbook.md`; ✅ change-log entry linked from runbook once GHSA/NVD/OSV regression fixtures land. |
> Update statuses (TODO/DOING/REVIEW/DONE/BLOCKED) as progress changes. Keep guides in sync with configuration samples under `etc/`.

View File

@@ -0,0 +1,17 @@
%% Authority plug-in lifecycle sequence diagram (Mermaid)
flowchart LR
manifest[[Plugin Manifest<br/>etc/authority.plugins/*.yaml]]
loader[AuthorityPluginConfigurationLoader<br/>binds and validates options]
scanner[PluginHost Assembly Scan<br/>StellaOps.Authority.Plugin.*]
registrar[IAuthorityPluginRegistrar<br/>registers services & health checks]
runtime[Identity Provider Plugin<br/>IIdentityProviderPlugin surface]
capabilities{Capability Metadata<br/>password/mfa/bootstrap/clientProvisioning}
storage[(Credential Store<br/>Mongo collections or custom backend)]
telemetry[[Structured Logs & Metrics<br/>authority.*]]
manifest --> loader --> scanner --> registrar --> runtime --> storage
scanner --> capabilities
capabilities --> runtime
runtime --> telemetry
loader -. emits deterministic config hashes .-> telemetry
storage -. readiness probes .-> runtime

View File

@@ -0,0 +1,91 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 320" role="img">
<title>Authority plug-in lifecycle: manifest to runtime</title>
<defs>
<style>
.node { fill: #0f172a; stroke: #1e293b; stroke-width: 2; rx: 14; ry: 14; }
.node text { fill: #f1f5f9; font: 16px/1.4 "Segoe UI", sans-serif; }
.accent { fill: #1e3a8a; stroke: #1d4ed8; }
.accent text { fill: #e2e8f0; }
.note { fill: #0f766e; stroke: #134e4a; }
.note text { fill: #ecfeff; }
.caption { fill: #111827; font: bold 18px "Segoe UI", sans-serif; }
.annotation { fill: #475569; font: 14px "Segoe UI", sans-serif; }
</style>
<marker id="arrow" markerWidth="12" markerHeight="12" refX="10" refY="6" orient="auto">
<path d="M0,0 L12,6 L0,12 z" fill="#1d4ed8" />
</marker>
</defs>
<text class="caption" x="40" y="36">Authority plug-in lifecycle</text>
<!-- Nodes -->
<g transform="translate(40,70)">
<rect class="node" width="200" height="110" />
<text x="20" y="32">Manifest YAML</text>
<text x="20" y="58">etc/authority.plugins/*.yaml</text>
<text x="20" y="84">Deterministic hashes</text>
</g>
<g transform="translate(280,70)">
<rect class="node accent" width="220" height="110" />
<text x="20" y="32">AuthorityPluginConfigurationLoader</text>
<text x="20" y="58">binds + validates options</text>
<text x="20" y="84">logs config fingerprints</text>
</g>
<g transform="translate(540,70)">
<rect class="node" width="220" height="110" />
<text x="20" y="32">PluginHost assembly scan</text>
<text x="20" y="58">StellaOps.Authority.Plugin.*</text>
<text x="20" y="84">loads IAuthorityPluginRegistrar</text>
</g>
<g transform="translate(800,70)">
<rect class="node accent" width="220" height="110" />
<text x="20" y="32">IAuthorityPluginRegistrar</text>
<text x="20" y="58">register services &amp; health checks</text>
<text x="20" y="84">publish capability metadata</text>
</g>
<g transform="translate(1060,70)">
<rect class="node" width="220" height="110" />
<text x="20" y="32">Identity Provider plug-in</text>
<text x="20" y="58">IIdentityProviderPlugin</text>
<text x="20" y="84">ready/readiness probes</text>
</g>
<!-- Supporting nodes -->
<g transform="translate(420,210)">
<rect class="node note" width="240" height="96" />
<text x="20" y="32">Capability metadata broadcast</text>
<text x="20" y="58">password / mfa / bootstrap / clientProvisioning</text>
</g>
<g transform="translate(760,210)">
<rect class="node note" width="240" height="96" />
<text x="20" y="32">Credential &amp; audit storage</text>
<text x="20" y="58">Mongo collections or custom backend</text>
<text x="20" y="84">queried in readiness probes</text>
</g>
<g transform="translate(1040,210)">
<rect class="node note" width="180" height="96" />
<text x="20" y="32">Telemetry output</text>
<text x="20" y="58">logs + metrics with correlation IDs</text>
</g>
<!-- Arrows -->
<path d="M240,125 H280" stroke="#1d4ed8" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M500,125 H540" stroke="#1d4ed8" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M760,125 H800" stroke="#1d4ed8" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M1020,125 H1060" stroke="#1d4ed8" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M650,180 V210" stroke="#22d3ee" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M920,180 V210" stroke="#22d3ee" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<path d="M1180,180 V210" stroke="#22d3ee" stroke-width="3" fill="none" marker-end="url(#arrow)" />
<text class="annotation" x="320" y="56">1. Configuration</text>
<text class="annotation" x="600" y="56">2. Assembly discovery</text>
<text class="annotation" x="860" y="56">3. Registrar execution</text>
<text class="annotation" x="1120" y="56">4. Runtime surface</text>
</svg>

After

Width:  |  Height:  |  Size: 4.0 KiB

View File

@@ -0,0 +1,27 @@
%% Rate limit and lockout interplay for Standard plug-in (Mermaid)
sequenceDiagram
autonumber
participant Client as Client/App
participant Host as Authority Host
participant Limiter as Rate Limiter Middleware
participant Plugin as Standard Plugin
participant Store as Credential Store / Lockout State
Client->>Host: POST /token (client_id, credentials)
Host->>Limiter: Check quota (client_id + remote_ip)
alt quota exceeded
Limiter-->>Host: Reject (429, retryAfter)
Host-->>Client: 429 Too Many Requests\nRetry-After header with limiter tags
else quota ok
Limiter-->>Host: Allow (remaining tokens)
Host->>Plugin: VerifyCredentials(subject)
Plugin->>Store: Load hashed password + lockout counters
Store-->>Plugin: Credential result + deterministic counter
alt lockout threshold reached
Plugin-->>Host: Locked (retryAfter=lockoutWindow)
Host-->>Client: 423 Locked\nRetry-After header + `authority.lockout` tag
else valid credentials
Plugin-->>Host: Success (issue tokens)
Host-->>Client: 200 OK + tokens + limiter metadata
end
end

View File

@@ -0,0 +1,105 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1180 400" role="img">
<title>Authority rate limit and lockout flow</title>
<defs>
<style>
.lane { fill: #0f172a; }
.lane text { fill: #e2e8f0; font: bold 18px "Segoe UI", sans-serif; }
.step { fill: #1f2937; stroke: #1e3a8a; stroke-width: 2; rx: 12; ry: 12; }
.step text { fill: #f8fafc; font: 15px "Segoe UI", sans-serif; }
.decision { fill: #0f766e; stroke: #0d9488; stroke-width: 2; rx: 12; ry: 12; }
.decision text { fill: #ecfeff; font: 15px "Segoe UI", sans-serif; }
.note { fill: #1e293b; font: italic 14px "Segoe UI", sans-serif; }
</style>
<marker id="arrow-blue" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#2563eb" />
</marker>
<marker id="arrow-green" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#0d9488" />
</marker>
<marker id="arrow-red" markerWidth="11" markerHeight="11" refX="10" refY="6" orient="auto">
<path d="M0,0 L11,6 L0,12 z" fill="#dc2626" />
</marker>
</defs>
<rect class="lane" x="40" y="20" width="160" height="40" rx="8" />
<text x="60" y="48">Client / App</text>
<rect class="lane" x="300" y="20" width="200" height="40" rx="8" />
<text x="320" y="48">Authority Host</text>
<rect class="lane" x="560" y="20" width="210" height="40" rx="8" />
<text x="580" y="48">Rate Limiter</text>
<rect class="lane" x="840" y="20" width="150" height="40" rx="8" />
<text x="860" y="48">Standard Plug-in</text>
<rect class="lane" x="1040" y="20" width="120" height="40" rx="8" />
<text x="1060" y="48">Credential Store</text>
<!-- Flow steps -->
<g transform="translate(40,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">POST /token request</text>
<text x="20" y="64">client_id + subject creds</text>
</g>
<g transform="translate(300,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Authority middleware</text>
<text x="20" y="64">enriches context tags</text>
</g>
<g transform="translate(580,90)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Rate limiter window</text>
<text x="20" y="64">client_id + IP keyed</text>
</g>
<g transform="translate(580,210)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Quota exceeded?</text>
<text x="20" y="64">emit Retry-After &amp; tags</text>
</g>
<g transform="translate(580,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Quota OK</text>
<text x="20" y="64">pass remaining tokens</text>
</g>
<g transform="translate(840,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Verify credentials</text>
<text x="20" y="64">hash compare + audit tags</text>
</g>
<g transform="translate(1040,210)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Load lockout state</text>
<text x="20" y="64">deterministic counters</text>
</g>
<g transform="translate(840,330)">
<rect class="decision" width="220" height="90" />
<text x="20" y="36">Lockout threshold hit?</text>
<text x="20" y="64">follow dedup precedence</text>
</g>
<g transform="translate(300,330)">
<rect class="step" width="220" height="90" />
<text x="20" y="36">Issue tokens or errors</text>
<text x="20" y="64">include limiter metadata</text>
</g>
<!-- Arrows -->
<path d="M260,135 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M520,135 H580" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,180 V210" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M670,300 V330" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M800,375 H840" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,255 H1040" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M960,375 H840" stroke="#0d9488" stroke-width="3" fill="none" marker-end="url(#arrow-green)" />
<path d="M670,255 H520" stroke="#dc2626" stroke-width="3" fill="none" marker-end="url(#arrow-red)" />
<path d="M520,375 H300" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<path d="M260,375 H40" stroke="#2563eb" stroke-width="3" fill="none" marker-end="url(#arrow-blue)" />
<!-- Notes -->
<text class="note" x="40" y="210">429 path → add `authority.client_id`, `authority.remote_ip` tags for dashboards.</text>
<text class="note" x="40" y="240">Lockout path → reuse precedence strategy from Feedser dedup (see DEDUP_CONFLICTS_RESOLUTION_ALGO.md).</text>
</svg>

After

Width:  |  Height:  |  Size: 4.8 KiB

View File

@@ -1,6 +1,6 @@
# Authority Plug-in Developer Guide
> **Status:** Ready for Docs/DOC4 editorial review as of 2025-10-10. Content aligns with PLG6 acceptance criteria and references stable Authority primitives.
> **Status:** Updated 2025-10-11 (AUTHPLUG-DOCS-01-001) with lifecycle + limiter diagrams and refreshed rate-limit guidance aligned to PLG6 acceptance criteria.
## 1. Overview
Authority plug-ins extend the **StellaOps Authority** service with custom identity providers, credential stores, and client-management logic. Unlike Feedser plug-ins (which ingest or export advisories), Authority plug-ins participate directly in authentication flows:
@@ -17,6 +17,10 @@ Authority hosts follow a deterministic plug-in lifecycle. The flow below can be
3. **Registrar execution** each assembly is searched for `IAuthorityPluginRegistrar` implementations. Registrars bind options, register services, and optionally queue bootstrap tasks.
4. **Runtime** the host resolves `IIdentityProviderPlugin` instances, uses capability metadata to decide which OAuth grants to expose, and invokes health checks for readiness endpoints.
![Authority plug-in lifecycle diagram](../assets/authority/authority-plugin-lifecycle.svg)
_Source:_ `docs/assets/authority/authority-plugin-lifecycle.mmd`
**Data persistence primer:** the standard Mongo-backed plugin stores users in collections named `authority_users_<pluginName>` and lockout metadata in embedded documents. Additional plugins must document their storage layout and provide deterministic collection naming to honour the Offline Kit replication process.
## 3. Capability Metadata
@@ -100,7 +104,7 @@ Capability flags let the host reason about what your plug-in supports:
- Optional `IClientProvisioningStore` for machine-to-machine clients.
- `AuthorityIdentityProviderCapabilities` to advertise supported flows.
- Password guidance:
- Prefer Argon2 (Security Guild upcoming recommendation); Standard plug-in currently ships PBKDF2 with easy swap via `IPasswordHasher`.
- Standard plug-in hashes via `ICryptoProvider` using Argon2id by default and emits PHC-compliant strings. Successful PBKDF2 logins trigger automatic rehashes so migrations complete gradually. See `docs/security/password-hashing.md` for tuning advice.
- Enforce password policies before hashing to avoid storing weak credentials.
- Health checks should probe backing stores (e.g., Mongo `ping`) and return `AuthorityPluginHealthResult` so `/ready` can surface issues.
- When supporting additional factors (e.g., TOTP), implement `SupportsMfa` and document the enrolment flow for resource servers.
@@ -111,14 +115,64 @@ Capability flags let the host reason about what your plug-in supports:
- Never store raw secrets in git: allow operators to supply them via `.local.yaml`, environment variables, or injected secret files. Document which keys are mandatory.
- Validate configuration as soon as the registrar runs; use explicit error messages to guide operators. The Standard plug-in now enforces complete bootstrap credentials (username + password) and positive lockout windows via `StandardPluginOptions.Validate`.
- Cross-reference bootstrap workflows with `docs/ops/authority_bootstrap.md` (to be published alongside CORE6) so operators can reuse the same payload formats for manual provisioning.
- `passwordHashing` inherits defaults from `authority.security.passwordHashing`. Override only when hardware constraints differ per plug-in:
```yaml
passwordHashing:
algorithm: Argon2id
memorySizeInKib: 19456
iterations: 2
parallelism: 1
```
Invalid values (≤0) fail fast during startup, and legacy PBKDF2 hashes rehash automatically once the new algorithm succeeds.
## 8. Logging, Metrics, and Diagnostics
### 7.1 Token Persistence Contract
- The host automatically persists every issued principal (access, refresh, device, authorization code) in `authority_tokens`. Plug-in code **must not** bypass this store; use the provided `IAuthorityTokenStore` helpers when implementing custom flows.
- When a plug-in disables a subject or client outside the standard handlers, call `IAuthorityTokenStore.UpdateStatusAsync(...)` for each affected token so revocation bundles stay consistent.
- Supply machine-friendly `revokedReason` codes (`compromised`, `rotation`, `policy`, `lifecycle`, etc.) and optional `revokedMetadata` entries when invalidating credentials. These flow straight into `revocation-bundle.json` and should remain deterministic.
- Token scopes should be normalised (trimmed, unique, ordinal sort) before returning from plug-in verification paths. `TokenPersistenceHandlers` will keep that ordering for downstream consumers.
### 7.2 Claims & Enrichment Checklist
- Authority always sets the OpenID Connect basics: `sub`, `client_id`, `preferred_username`, optional `name`, and `role` (for password flows). Plug-ins must use `IClaimsEnricher` to append additional claims in a **deterministic** order (sort arrays, normalise casing) so resource servers can rely on stable shapes.
- Recommended enrichment keys:
- `stellaops.realm` plug-in/tenant identifier so services can scope policies.
- `stellaops.subject.type` values such as `human`, `service`, `bootstrap`.
- `groups` / `projects` sorted arrays describing operator entitlements.
- Claims visible in tokens should mirror what `/token` and `/userinfo` emit. Avoid injecting sensitive PII directly; mark values with `ClassifiedString.Personal` inside the plug-in so audit sinks can tag them appropriately.
- For client-credential flows, remember to enrich both the client principal and the validation path (`TokenValidationHandlers`) so refresh flows keep the same metadata.
### 7.3 Revocation Bundles & Reasons
- Use `IAuthorityRevocationStore` to record subject/client/token revocations when credentials are deleted or rotated. Stick to the standard categories (`token`, `subject`, `client`, `key`).
- Include a deterministic `reason` string and optional `reasonDescription` so operators understand *why* a subject was revoked when inspecting bundles offline.
- Plug-ins should populate `metadata` with stable keys (e.g., `revokedBy`, `sourcePlugin`, `ticketId`) to simplify SOC correlation. The keys must be lowercase, ASCII, and free of secrets—bundles are mirrored to air-gapped agents.
## 8. Rate Limiting & Lockout Interplay
Rate limiting and account lockouts are complementary controls. Plug-ins must surface both deterministically so operators can correlate limiter hits with credential rejections.
**Baseline quotas** (from `docs/dev/authority-rate-limit-tuning-outline.md`):
| Endpoint | Default policy | Notes |
|----------|----------------|-------|
| `/token` | 30 requests / 60s, queue 0 | Drop to 10/60s for untrusted ranges; raise only with WAF + monitoring. |
| `/authorize` | 60 requests / 60s, queue 10 | Reduce carefully; interactive UX depends on headroom. |
| `/internal/*` | Disabled by default; recommended 5/60s when enabled | Keep queue 0 for bootstrap APIs. |
**Retry metadata:** The middleware stamps `Retry-After` plus tags `authority.client_id`, `authority.remote_ip`, and `authority.endpoint`. Plug-ins should keep these tags intact when crafting responses or telemetry so dashboards remain consistent.
**Lockout counters:** Treat lockouts as **subject-scoped** decisions. When multiple instances update counters, reuse the deterministic tie-breakers documented in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` (freshness overrides, precedence, and stable hashes) to avoid divergent lockout states across replicas.
**Alerting hooks:** Emit structured logs/metrics when either the limiter or credential store rejects access. Suggested gauges include `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` and any custom `auth.plugins.<pluginName>.lockouts_total` counter.
![Authority rate limit and lockout flow](../assets/authority/authority-rate-limit-flow.svg)
_Source:_ `docs/assets/authority/authority-rate-limit-flow.mmd`
## 9. Logging, Metrics, and Diagnostics
- Always log via the injected `ILogger<T>`; include `pluginName` and correlation IDs where available.
- Activity/metric names should align with `AuthorityTelemetry` constants (`service.name=stellaops-authority`).
- Expose additional diagnostics via structured logging rather than writing custom HTTP endpoints; the host will integrate these into `/health` and `/ready`.
- Emit metrics with stable names (`auth.plugins.<pluginName>.*`) when introducing custom instrumentation; coordinate with the Observability guild to reserve prefixes.
## 9. Testing & Tooling
## 10. Testing & Tooling
- Unit tests: use Mongo2Go (or similar) to exercise credential stores without hitting production infrastructure (`StandardUserCredentialStoreTests` is a template).
- Determinism: fix timestamps to UTC and sort outputs consistently; avoid random GUIDs unless stable.
- Smoke tests: launch `dotnet run --project src/StellaOps.Authority/StellaOps.Authority` with your plug-in under `PluginBinaries/Authority` and verify `/ready`.
@@ -137,13 +191,13 @@ Capability flags let the host reason about what your plug-in supports:
}
```
## 10. Packaging & Delivery
## 11. Packaging & Delivery
- Output assembly should follow `StellaOps.Authority.Plugin.<Name>.dll` so the hosts search pattern picks it up.
- Place the compiled DLL plus dependencies under `PluginBinaries/Authority` for offline deployments; include hashes/signatures in release notes (Security Guild guidance forthcoming).
- Document any external prerequisites (e.g., CA cert bundle) in your plug-in README.
- Update `etc/authority.plugins/<plugin>.yaml` samples and include deterministic SHA256 hashes for optional bootstrap payloads when distributing Offline Kit artefacts.
## 11. Checklist & Handoff
## 12. Checklist & Handoff
- ✅ Capabilities declared and validated in automated tests.
- ✅ Bootstrap workflows documented (if `bootstrap` capability used) and repeatable.
- ✅ Local smoke test + unit/integration suites green (`dotnet test`).
@@ -151,7 +205,4 @@ Capability flags let the host reason about what your plug-in supports:
- Submit the developer guide update referencing PLG6/DOC4 and tag DevEx + Docs reviewers for sign-off.
---
**Next documentation actions:**
- Add rendered architectural diagram (PlantUML/mermaid) reflecting the lifecycle above once the Docs toolkit pipeline is ready.
- Reference the LDAP RFC (`docs/rfcs/authority-plugin-ldap.md`) in the capability section once review completes.
- Sync terminology with `docs/11_AUTHORITY.md` when that chapter is published to keep glossary terms consistent.
Mermaid sources for the embedded diagrams live under `docs/assets/authority/`. Regenerate the SVG assets with your preferred renderer before committing future updates so the visuals stay in sync with the `.mmd` definitions.

34
docs/dev/fixtures.md Normal file
View File

@@ -0,0 +1,34 @@
# Feedser Fixture Maintenance
Feedser uses a handful of deterministic fixtures to keep connector regressions in check. This guide lists the
fixture sets, where they live, and how to regenerate them safely.
---
## GHSA ↔ OSV parity fixtures
- **Location:** `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/osv-ghsa.*.json`
- **Purpose:** Exercised by `OsvGhsaParityRegressionTests` to ensure OSV + GHSA outputs stay aligned on aliases,
ranges, references, and credits.
- **Regeneration:** Either run the test harness with online regeneration (`UPDATE_PARITY_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj`)
or execute the fixture updater (`dotnet run --project tools/FixtureUpdater/FixtureUpdater.csproj`). Both paths
normalise timestamps and canonical ordering.
- **Verification:** Inspect the diff, then re-run `dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj` to confirm parity.
## GHSA credit parity fixtures
- **Location:** `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/credit-parity.{ghsa,osv,nvd}.json`
- **Purpose:** Exercised by `GhsaCreditParityRegressionTests` to guarantee GHSA/NVD/OSV acknowledgements remain in lockstep.
- **Regeneration:** `dotnet run --project tools/FixtureUpdater/FixtureUpdater.csproj` rewrites all three canonical snapshots.
- **Verification:** `dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj`.
> Always commit fixture changes together with the code that motivated them and reference the regression test that guards the behaviour.
## Apple security update fixtures
- **Location:** `src/StellaOps.Feedser.Source.Vndr.Apple.Tests/Apple/Fixtures/*.html` and `.expected.json`.
- **Purpose:** Exercised by `AppleLiveRegressionTests` to guarantee the Apple HTML parser and mapper stay deterministic while covering Rapid Security Responses and multi-device advisories.
- **Regeneration:** Use the helper scripts (`scripts/update-apple-fixtures.sh` or `scripts/update-apple-fixtures.ps1`). They export `UPDATE_APPLE_FIXTURES=1`, propagate the flag through `WSLENV`, touch `.update-apple-fixtures`, and then run the Apple test project. This keeps WSL/VSCode test invocations in sync while the refresh workflow fetches live Apple support pages, sanitises them, and rewrites both the HTML and expected DTO snapshots with normalised ordering.
- **Verification:** Inspect the generated diffs and re-run `dotnet test src/StellaOps.Feedser.Source.Vndr.Apple.Tests/StellaOps.Feedser.Source.Vndr.Apple.Tests.csproj` without the env var to confirm determinism.
> **Tip for other connector owners:** mirror the sentinel + `WSLENV` pattern (`touch .update-<connector>-fixtures`, append the env var via `WSLENV`) when you add fixture refresh scripts so contributors running under WSL inherit the regeneration flag automatically.

View File

@@ -29,6 +29,8 @@ var rule = primitive.ToNormalizedVersionRule(notes: "nvd:CVE-2025-1234");
// rule => scheme=semver, type=range, min=1.2.3, minInclusive=true, max=2.0.0, maxInclusive=false
```
If you omit the optional `notes` argument, `ToNormalizedVersionRule` now falls back to the primitives `ConstraintExpression`, ensuring the original comparator expression is preserved for provenance/audit queries.
Emit the resulting rule inside `AffectedPackage.NormalizedVersions` while continuing to populate `AffectedVersionRange.RangeExpression` for backward compatibility.
## 3. Merge dedupe flow
@@ -97,3 +99,56 @@ Follow the operational checklist in `docs/ops/migrations/SEMVER_STYLE.md`. The s
- [ ] Confirm integration tests include fixtures with normalized rules and SemVer styles.
For deeper query examples and maintenance tasks, continue with [Normalized Versions Query Guide](mongo_indices.md).
## 8. Storage projection reference
`NormalizedVersionDocumentFactory` copies each normalized rule into MongoDB using the shape below. Use this as a contract when reviewing connector fixtures or diagnosing merge/storage diffs:
```json
{
"packageId": "pkg:npm/example",
"packageType": "npm",
"scheme": "semver",
"type": "range",
"style": "range",
"min": "1.2.3",
"minInclusive": true,
"max": "2.0.0",
"maxInclusive": false,
"value": null,
"notes": "ghsa:GHSA-xxxx-yyyy",
"decisionReason": "ghsa-precedence-over-nvd",
"constraint": ">= 1.2.3 < 2.0.0",
"source": "ghsa",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
For distro-specific ranges (`nevra`, `evr`) the same envelope applies with `scheme` switched accordingly. Example:
```json
{
"packageId": "bash",
"packageType": "rpm",
"scheme": "nevra",
"type": "range",
"style": "range",
"min": "0:4.4.18-2.el7",
"minInclusive": true,
"max": "0:4.4.20-1.el7",
"maxInclusive": false,
"value": null,
"notes": "redhat:RHSA-2025:1234",
"decisionReason": "rhel-priority-over-nvd",
"constraint": "<= 0:4.4.20-1.el7",
"source": "redhat",
"recordedAt": "2025-10-11T00:00:00Z"
}
```
If a new scheme is required (for example, `apple.build` or `ios.semver`), raise it with the Models team before emitting documents so merge comparers and hashing logic can incorporate the change deterministically.
## 9. Observability signals
- `feedser.merge.normalized_rules` (counter, tags: `package_type`, `scheme`) increments once per normalized rule retained after precedence merge.
- `feedser.merge.normalized_rules_missing` (counter, tags: `package_type`) increments when a merged package still carries version ranges but no normalized rules; watch for spikes to catch connectors that have not emitted normalized arrays yet.

View File

@@ -2,6 +2,8 @@
This guide complements the Sprint12 normalized versions rollout. It documents recommended indexes and aggregation patterns for querying `AffectedPackage.normalizedVersions`.
For a field-by-field look at how normalized rules persist in MongoDB (including provenance metadata), see Section8 of the [Feedser SemVer Merge Playbook](merge_semver_playbook.md).
## 1. Recommended indexes
When `feedser.storage.enableSemVerStyle` is enabled, advisories expose a flattened

View File

@@ -0,0 +1,49 @@
# Normalized Versions Rollout Dashboard (Sprint 2 Feedser)
_Status date: 2025-10-12 17:05 UTC_
This dashboard tracks connector readiness for emitting `AffectedPackage.NormalizedVersions` arrays and highlights upcoming coordination checkpoints. Use it alongside:
- [`src/StellaOps.Feedser.Merge/RANGE_PRIMITIVES_COORDINATION.md`](../../src/StellaOps.Feedser.Merge/RANGE_PRIMITIVES_COORDINATION.md) for detailed guidance and timelines.
- [Feedser SemVer Merge Playbook](merge_semver_playbook.md) §8 for persisted Mongo document shapes.
- [Normalized Versions Query Guide](mongo_indices.md) for index/query validation steps.
## Key milestones
- **2025-10-13** Normalization to finalize `SemVerRangeRuleBuilder` API contract for review.
- **2025-10-17** Connector owners to post fixture PRs showing `NormalizedVersions` arrays (even if feature-flagged).
- **2025-10-18** Merge cross-connector review to validate consistent field usage before enabling union logic.
## Connector readiness matrix
| Connector | Owner team | Normalized versions status | Last update | Next action / link |
|-----------|------------|---------------------------|-------------|--------------------|
| Acsc | BE-Conn-ACSC | ❌ Not started mapper pending | 2025-10-11 | Design DTOs + mapper with normalized rule array; see `src/StellaOps.Feedser.Source.Acsc/TASKS.md`. |
| Cccs | BE-Conn-CCCS | ❌ Not started mapper pending | 2025-10-11 | Add normalized SemVer array in canonical mapper; coordinate fixtures per `TASKS.md`. |
| CertBund | BE-Conn-CERTBUND | ❌ Not started mapper pending | 2025-10-11 | Capture firmware-style ranges; emit normalized payload; `src/StellaOps.Feedser.Source.CertBund/TASKS.md`. |
| CertCc | BE-Conn-CERTCC | ⚠️ In progress fetch pipeline DOING | 2025-10-11 | Implement VINCE mapper with SemVer/NEVRA rules; unblock snapshot regeneration; `src/StellaOps.Feedser.Source.CertCc/TASKS.md`. |
| Kev | BE-Conn-KEV | ✅ Normalized catalog/due-date rules verified | 2025-10-12 | Fixtures reconfirmed via `dotnet test src/StellaOps.Feedser.Source.Kev.Tests`; `src/StellaOps.Feedser.Source.Kev/TASKS.md`. |
| Cve | BE-Conn-CVE | ✅ Normalized SemVer rules verified | 2025-10-12 | Snapshot parity green (`dotnet test src/StellaOps.Feedser.Source.Cve.Tests`); `src/StellaOps.Feedser.Source.Cve/TASKS.md`. |
| Ghsa | BE-Conn-GHSA | ⚠️ DOING normalized rollout task active | 2025-10-11 18:45 UTC | Wire `SemVerRangeRuleBuilder` + refresh fixtures; `src/StellaOps.Feedser.Source.Ghsa/TASKS.md`. |
| Osv | BE-Conn-OSV | ✅ SemVer mapper & parity fixtures verified | 2025-10-12 | GHSA parity regression passing (`dotnet test src/StellaOps.Feedser.Source.Osv.Tests`); `src/StellaOps.Feedser.Source.Osv/TASKS.md`. |
| Ics.Cisa | BE-Conn-ICS-CISA | ❌ Not started mapper TODO | 2025-10-11 | Plan SemVer/firmware scheme selection; `src/StellaOps.Feedser.Source.Ics.Cisa/TASKS.md`. |
| Kisa | BE-Conn-KISA | ❌ Not started mapper TODO | 2025-10-11 | Localisation-aware mapper with normalized rules; `src/StellaOps.Feedser.Source.Kisa/TASKS.md`. |
| Ru.Bdu | BE-Conn-BDU | ❌ Not started mapper TODO | 2025-10-11 | Emit normalized ranges, capture provenance; `src/StellaOps.Feedser.Source.Ru.Bdu/TASKS.md`. |
| Ru.Nkcki | BE-Conn-Nkcki | ❌ Not started mapper TODO | 2025-10-11 | Similar to BDU; ensure Cyrillic provenance preserved; `src/StellaOps.Feedser.Source.Ru.Nkcki/TASKS.md`. |
| Vndr.Apple | BE-Conn-Apple | ✅ Shipped emitting normalized arrays | 2025-10-11 | Continue fixture/tooling work; `src/StellaOps.Feedser.Source.Vndr.Apple/TASKS.md`. |
| Vndr.Cisco | BE-Conn-Cisco | ❌ Not started mapper TODO | 2025-10-11 | Decide on scheme (`semver` vs custom) before emitting rules; `src/StellaOps.Feedser.Source.Vndr.Cisco/TASKS.md`. |
| Vndr.Msrc | BE-Conn-MSRC | ❌ Not started mapper TODO | 2025-10-11 | Gather samples, define scheme, emit normalized rules; `src/StellaOps.Feedser.Source.Vndr.Msrc/TASKS.md`. |
| Nvd | BE-Conn-NVD | ⚠️ Needs follow-up mapper complete but normalized array MR pending | 2025-10-11 | Align CVE notes + normalized payload flag; `src/StellaOps.Feedser.Source.Nvd/TASKS.md`. |
Legend: ✅ complete, ⚠️ in progress/partial, ❌ not started.
## Monitoring
- Merge now emits `feedser.merge.normalized_rules` (tags: `package_type`, `scheme`) and `feedser.merge.normalized_rules_missing` (tags: `package_type`). Track these counters to confirm normalized arrays land as connectors roll out.
- Expect `normalized_rules_missing` to trend toward zero as each connector flips on normalized output. Investigate any sustained counts by checking the corresponding module `TASKS.md`.
## How to use this dashboard
1. Before opening a connector PR, update the module `TASKS.md` entry and drop a short bullet here (status + timestamp).
2. When a connector lands normalized outputs, flip the status to ✅ and note any rollout toggles (feature flags, fixture regenerations).
3. If a dependency or blocker emerges, add it both in the module `TASKS.md` and in this matrix so merge/storage can escalate quickly.

View File

@@ -0,0 +1,39 @@
# Feedser Connector Research 2025-10-11
Snapshot of direct network checks performed on 2025-10-11 (UTC) for the national/vendor connectors in scope. Use alongside each modules `TASKS.md` notes.
## ACSC (Australia)
- Enumerated feed slugs `/acsc/view-all-content/{alerts,advisories,news,publications,threats}/rss`; every endpoint negotiates HTTP/2 then aborts with `INTERNAL_ERROR` (curl exit92). Forcing HTTP/1.1 hangs >600s and sitemap/HTML fetches fail the same way.
- Next actions: prototype `SocketsHttpHandler` settings (`RequestVersionOrLower`, allow fallback to relay), capture successful headers from partner vantage (need retention + cache semantics), and keep `FEEDCONN-SHARED-HTTP2-001` open for downgrade work.
## CCCS (Canada)
- RSS endpoint (`https://cyber.gc.ca/api/cccs/rss/v1/get?...`) 301s to Atom feed (`/api/cccs/atom/v1/get?...`) with 50-entry window, HTML-heavy `<content>` fields, and no cache headers.
- Next actions: enumerate additional `feed` query values, sanitise inline HTML for DTO storage, and track retention depth via HTML pagination (`?page=`).
## CERT-Bund (Germany)
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss` responds 200 without cookies (250-item window, German taxonomy). Detail links load an Angular SPA that fetches JSON behind session cookies.
- Next actions: script SPA cookie/bootstrap, discover JSON detail endpoint, and capture advisory schema for parser planning.
## KISA / KNVD (Korea)
- `https://knvd.krcert.or.kr/rss/securityInfo.do` and `/rss/securityNotice.do` return UTF-8 RSS (10-item window) with `detailDos.do?IDX=` links. No cookies required for feed fetch.
- Next actions: trace SPA detail requests to identify JSON endpoints, normalise Hangul content, and finalise localisation plan.
## BDU (Russia / FSTEC)
- Candidate endpoints (`https://bdu.fstec.ru/component/rsform/form/7-bdu?format=xml/json`) return 403/404; TLS chain requires Russian Trusted Sub CA and WAF expects additional headers.
- Next actions: acquire official PEM chain, point `feedser:httpClients:source.bdu:trustedRootPaths` (or `feedser:sources:bdu:http:trustedRootPaths`) at the Offline Kit PEM, keep `allowInvalidCertificates=false`, script session bootstrap, then capture RSS/HTML schema for parser work.
## NKTsKI / cert.gov.ru (Russia)
- `https://cert.gov.ru/rss/advisories.xml` served via Bitrix returns 403/404 even with `Accept-Language: ru-RU`; TLS chain also requires Russian trust anchors.
- Next actions: source trust store, configure `feedser:httpClients:source.nkcki:trustedRootPaths` (Offline Kit root via `feedser:offline:root`), prepare proxy fallback, and once accessible document taxonomy/retention plus attachment handling.
## CISA ICS (United States)
- `curl -I https://www.cisa.gov/cybersecurity-advisories/ics-advisories.xml` returns HTTP 403 + `x-reference-error` (Akamai). Same for legacy feed paths.
- Next actions: secure GovDelivery access, document token rotation, and build HTML/email fallback with throttling.
## Cisco PSIRT
- `https://api.cisco.com/security/advisories/latest` returns `ERR_596_SERVICE_NOT_FOUND` when unauthenticated. openVuln REST requires Mashery OAuth (client credentials) with quotas ~5req/s, 30/min, 5000/day; supports `pageIndex/pageSize` pagination.
- Next actions: register OAuth app, capture pagination/delta parameters, and compare API vs RSS coverage.
## Microsoft MSRC
- REST endpoint (`https://api.msrc.microsoft.com/sug/v2.0/en-US/vulnerabilities`) requires Azure AD token + `api-version` (current `2024-08-01`) and supports delta filters (`lastModifiedStartDateTime`). CVRF ZIP remains available for offline use.
- Next actions: finalise AAD app registration, implement token cache, and design combined REST+CVRF ingestion path for determinism.

View File

@@ -80,12 +80,12 @@
docker compose up -d
curl -fsS http://localhost:8080/health
```
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations.
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`.
## Disaster Recovery Notes
- **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
- **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (key rotation tooling), and publish a revocation notice.
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice.
- **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Restoring across major versions requires a compatibility review.
## Verification Checklist

View File

@@ -0,0 +1,83 @@
# Authority Signing Key Rotation Playbook
> **Status:** Authored 2025-10-12 as part of OPS3.KEY-ROTATION rollout.
> Use together with `docs/11_AUTHORITY.md` (Authority service guide) and the automation shipped under `ops/authority/`.
## 1. Overview
Authority publishes JWKS and revocation bundles signed with ES256 keys. To rotate those keys without downtime we now provide:
- **Automation script:** `ops/authority/key-rotation.sh`
Shell helper that POSTS to `/internal/signing/rotate`, supports metadata, dry-run, and confirms JWKS afterwards.
- **CI workflow:** `.gitea/workflows/authority-key-rotation.yml`
Manual dispatch workflow that pulls environment-specific secrets, runs the script, and records the result. Works across staging/production by passing the `environment` input.
This playbook documents the repeatable sequence for all environments.
## 2. Pre-requisites
1. **Generate a new PEM key (per environment)**
```bash
openssl ecparam -name prime256v1 -genkey -noout \
-out certificates/authority-signing-<env>-<year>.pem
chmod 600 certificates/authority-signing-<env>-<year>.pem
```
2. **Stash the previous key** under the same volume so it can be referenced in `signing.additionalKeys` after rotation.
3. **Ensure secrets/vars exist in Gitea**
- `<ENV>_AUTHORITY_BOOTSTRAP_KEY`
- `<ENV>_AUTHORITY_URL`
- Optional shared defaults `AUTHORITY_BOOTSTRAP_KEY`, `AUTHORITY_URL`.
## 3. Executing the rotation
### Option A via CI workflow (recommended)
1. Navigate to **Actions → Authority Key Rotation**.
2. Provide inputs:
- `environment`: `staging`, `production`, etc.
- `key_id`: new `kid` (e.g. `authority-signing-2025-dev`).
- `key_path`: path as seen by the Authority service (e.g. `../certificates/authority-signing-2025-dev.pem`).
- Optional `metadata`: comma-separated `key=value` pairs (for audit trails).
3. Trigger. The workflow:
- Reads the bootstrap key/URL from secrets.
- Runs `ops/authority/key-rotation.sh`.
- Prints the JWKS response for verification.
### Option B manual shell invocation
```bash
AUTHORITY_BOOTSTRAP_KEY=$(cat /secure/authority-bootstrap.key) \
./ops/authority/key-rotation.sh \
--authority-url https://authority.example.com \
--key-id authority-signing-2025-dev \
--key-path ../certificates/authority-signing-2025-dev.pem \
--meta rotatedBy=ops --meta changeTicket=OPS-1234
```
Use `--dry-run` to inspect the payload before execution.
## 4. Post-rotation checklist
1. Update `authority.yaml` (or environment-specific overrides):
- Set `signing.activeKeyId` to the new key.
- Set `signing.keyPath` to the new PEM.
- Append the previous key into `signing.additionalKeys`.
- Ensure `keySource`/`provider` match the values passed to the script.
2. Run `stellaops-cli auth revoke export` so revocation bundles are re-signed with the new key.
3. Confirm `/jwks` lists the new `kid` with `status: "active"` and the previous one as `retired`.
4. Archive the old key securely; keep it available until all tokens/bundles signed with it have expired.
## 5. Development key state
For the sample configuration (`etc/authority.yaml.sample`) we minted a placeholder dev key:
- Active: `authority-signing-2025-dev` (`certificates/authority-signing-2025-dev.pem`)
- Retired: `authority-signing-dev`
Treat these as examples; real environments must maintain their own PEM material.
## 6. References
- `docs/11_AUTHORITY.md` Architecture and rotation SOP (Section 5).
- `docs/ops/authority-backup-restore.md` Recovery flow referencing this playbook.
- `ops/authority/README.md` CLI usage and examples.

View File

@@ -0,0 +1,77 @@
# Feedser Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `feedser.yaml`) with an `apple` section. Example baseline:
```yaml
feedser:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Feedser automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Feedser workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Feedser.Source.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `feedser.source.http.requests_total{feedser_source="vndr-apple"}` should increase for both index and detail phases.
- `feedser.source.http.failures_total{feedser_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Feedser Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(feedser_source_http_requests_total{feedser_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Feedser.WebService` now exports `StellaOps.Feedser.Source.Vndr.Apple` alongside existing Feedser meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/StellaOps.Feedser.Source.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/StellaOps.Feedser.Source.Vndr.Apple.Tests/StellaOps.Feedser.Source.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/StellaOps.Feedser.Source.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.

View File

@@ -0,0 +1,150 @@
# Feedser Authority Audit Runbook
_Last updated: 2025-10-12_
This runbook helps operators verify and monitor the StellaOps Feedser ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
## 1. Prerequisites
- Authority integration is enabled in `feedser.yaml` (or via `FEEDSER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (`feedser.telemetry.*`) or container stdout is shipped to your SIEM.
- Operators have access to the Feedser job trigger endpoints via CLI or REST for smoke tests.
### Configuration snippet
```yaml
feedser:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://feedser"
requiredScopes:
- "feedser.jobs.trigger"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "feedser-jobs"
clientSecretFile: "/run/secrets/feedser_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
```
> Store secrets outside source control. Feedser reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
### Resilience tuning
- **Connected sites:** keep the default 1s / 2s / 5s retry ladder so Feedser retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (1530minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Feedser will fail fast but keep deterministic logs.
- Feedser resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `feedser.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
## 2. Key Signals
### 2.1 Audit log channel
Feedser emits structured audit entries via the `Feedser.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
```
Feedser authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=feedser-cli scopes=feedser.jobs.trigger bypass=False remote=10.1.4.7
```
| Field | Sample value | Meaning |
|--------------|-------------------------|------------------------------------------------------------------------------------------|
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
| `clientId` | `feedser-cli` | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim). |
| `scopes` | `feedser.jobs.trigger` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- `status=401 AND bypass=True` bypass network accepted an unauthenticated call (should be temporary during rollout).
- `status=202 AND scopes="(none)"` a token without scopes triggered a job; tighten client configuration.
- Spike in `clientId="(none)"` indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
### 2.2 Metrics
Feedser publishes counters under the OTEL meter `StellaOps.Feedser.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
| Metric name | Description | PromQL example |
|-------------------------------|----------------------------------------------------|----------------|
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipelines generated metric names.
Correlate audit logs with the following global meter exported via `Feedser.SourceDiagnostics`:
- `feedser.source.http.requests_total{feedser_source="jobs-run"}` ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Feedser Jobs” board with the above counters plus a table of recent audit log entries.
## 3. Alerting Guidance
1. **Unauthorized bypass attempt**
- Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
2. **Missing scopes**
- Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
- Action: audit Authority client registration; ensure `requiredScopes` includes `feedser.jobs.trigger`.
3. **Trigger failure surge**
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
- Action: inspect correlated audit entries and `Feedser.Telemetry` traces for job execution errors.
4. **Conflict spike**
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
5. **Authority offline**
- Watch `Feedser.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
## 4. Rollout & Verification Procedure
1. **Pre-checks**
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
- Validate Authority issuer metadata is reachable from Feedser (`curl https://authority.internal/.well-known/openid-configuration` from the host).
2. **Smoke test with valid token**
- Obtain a token via CLI: `stella auth login --scope feedser.jobs.trigger`.
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://feedser.internal/jobs/definitions`.
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=feedser.jobs.trigger`.
3. **Negative test without token**
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
4. **Bypass check (if applicable)**
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
5. **Metrics validation**
- Ensure `web.jobs.triggered` counter increments during accepted runs.
- Exporters should show corresponding spans (`feedser.job.trigger`) if tracing is enabled.
## 5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---------|----------------|-------------|
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Feedser. |
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`feedser.jobs.trigger`) and ensure the token audience matches `audiences` config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `feedser.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Feedser.WebService.Jobs` meter. |
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Feedser job logs, re-run with tracing enabled, validate Authority latency. |
## 6. References
- `docs/21_INSTALL_GUIDE.md` Authority configuration quick start.
- `docs/17_SECURITY_HARDENING_GUIDE.md` Security guardrails and enforcement deadlines.
- `docs/ops/authority-monitoring.md` Authority-side monitoring and alerting playbook.
- `StellaOps.Feedser.WebService/Filters/JobAuthorizationAuditFilter.cs` source of audit log fields.

View File

@@ -47,6 +47,12 @@ Expect all logs at `Information`. Ensure OTEL exporters include the scope `Stell
3. **Job health**
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #feedser-ops.
### Threshold updates (2025-10-12)
- `feedser.merge.conflicts` Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- `feedser.merge.overrides` Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
- `feedser.merge.range_overrides` Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
---
## 4. Triage Workflow
@@ -128,3 +134,19 @@ Expect all logs at `Information`. Ensure OTEL exporters include the scope `Stell
- Storage audit trail: `src/StellaOps.Feedser.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Feedser.Storage.Mongo/MergeEvents`.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
---
## 9. Synthetic Regression Fixtures
- **Locations** Canonical conflict snapshots now live at `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Feedser.Source.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
- **Validation commands** To regenerate and verify the fixtures offline, run:
```bash
dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Nvd.Tests/StellaOps.Feedser.Source.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Feedser.Merge.Tests/StellaOps.Feedser.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
```
- **Expected signals** The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `feedser.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.

View File

@@ -0,0 +1,151 @@
{
"title": "Feedser CVE & KEV Observability",
"uid": "feedser-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}

View File

@@ -34,19 +34,22 @@ feedser:
- Feedser CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
3. Observe the following metrics (exported via OTEL meter `StellaOps.Feedser.Source.Cve`):
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
- `cve.map.success`
4. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
4. Verify Prometheus shows matching `feedser.source.http.requests_total{feedser_source="cve"}` deltas (list vs detail phases) while `feedser.source.http.failures_total{feedser_source="cve"}` stays flat.
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
### 1.3 Production Monitoring
- **Dashboards** Add the counters above plus `feedser.range.primitives` (filtered by `scheme=semver` or `scheme=vendor`) to the Feedser overview board. Alert when:
- `rate(cve.fetch.failures[5m]) > 0`
- `rate(cve.map.success[15m]) == 0` while fetch attempts continue
- `sum_over_time(cve.parse.quarantine[1h]) > 0`
- **Logs** Watch for `CveConnector` warnings such as `Failed fetching CVE record` or schema validation errors (`Malformed CVE JSON`). These are emitted with the CVE ID and document identifier for triage.
- **Backfill window** operators can tighten or widen the `initialBackfill` / `maxPagesPerFetch` values after validating baseline throughput. Update the config and restart the worker to apply changes.
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `feedser_source_http_requests_total{feedser_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `feedser.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- `rate(cve_fetch_failures_total[5m]) > 0` for 10minutes (`severity=warning`)
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
- **Logs** Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
- **Grafana pack** Import `docs/ops/feedser-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Feedser to apply changes.
## 2. CISA KEV Connector (`source:kev:*`)
@@ -67,7 +70,15 @@ feedser:
### 2.2 Schema validation & anomaly handling
From this sprint the connector validates the KEV JSON payload against `Schemas/kev-catalog.schema.json`. Malformed documents are quarantined, and entries missing a CVE ID are dropped with a warning (`reason=missingCveId`). Operators should treat repeated schema failures as an upstream regression and coordinate with CISA or mirror maintainers.
The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons:
| Reason | Meaning |
| --- | --- |
| `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. |
| `countMismatch` | Catalog `count` field disagreed with the actual entry total. |
| `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). |
Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers.
### 2.3 Smoke Test (staging)
@@ -79,13 +90,16 @@ From this sprint the connector validates the KEV JSON payload against `Schemas/k
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
- `kev.map.advisories` (tag `catalogVersion`)
4. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
4. Confirm `feedser.source.http.requests_total{feedser_source="kev"}` increments once per fetch and that the paired `feedser.source.http.failures_total` stays flat (zero increase).
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
### 2.4 Production Monitoring
- Alert when `kev.fetch.success` goes to zero for longer than the expected daily cadence (default: trigger if `rate(kev.fetch.success[8h]) == 0` during business hours).
- Track anomaly spikes via `kev.parse.anomalies{reason="missingCveId"}`. A sustained non-zero rate means the upstream catalog contains unexpected records.
- The connector logs each validated catalog: `Parsed KEV catalog document … entries=X`. Absence of that log alongside consecutive `kev.fetch.success` counts suggests schema validation failures—correlate with warning-level events in the `StellaOps.Feedser.Source.Kev` logger.
- Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`.
- Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror.
- Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs.
- Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run.
### 2.5 Known good dashboard tiles
@@ -93,12 +107,14 @@ Add the following panels to the Feedser observability board:
| Metric | Recommended visualisation |
|--------|---------------------------|
| `kev.fetch.success` | Single-stat (last 24h) with threshold alert |
| `rate(kev.parse.entries[1h])` by `catalogVersion` | Stacked area highlights daily release size |
| `sum_over_time(kev.parse.anomalies[1d])` by `reason` | Table anomaly breakdown |
| `rate(kev_fetch_success_total[30m])` | Single-stat (last 24h) with warning threshold `>0` |
| `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area highlights daily release size |
| `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table anomaly breakdown (matches dashboard panel) |
| `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted |
## 3. Runbook updates
- Record staging/production smoke test results (date, catalog version, advisory counts) in your teams change log.
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
- Version-control dashboard tweaks alongside `docs/ops/feedser-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.

View File

@@ -0,0 +1,111 @@
# Feedser GHSA Connector Operations Runbook
_Last updated: 2025-10-12_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource}
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
## 4. Configuration knobs (`feedser.yaml`)
```yaml
feedser:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `feedser.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Feedser requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `feedser.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
feedser:
environment:
FEEDSER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
feedser:
extraEnv:
- name: FEEDSER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: feedser-ghsa
key: apiToken
extraSecrets:
feedser-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Feedser workers (or run `kubectl rollout restart deployment/feedser`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `feedser.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delayinglogs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Feedser.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.

View File

@@ -0,0 +1,47 @@
# StellaOps Authority Audit Events
StellaOps Authority emits structured audit records for every credential flow and bootstrap operation. The goal is to provide deterministic, privacy-aware telemetry that can be persisted offline and replayed for incident response without leaking credentials.
## Contract
Audit events share the `StellaOps.Cryptography.Audit.AuthEventRecord` contract. Key fields:
- `EventType` — canonical identifier such as `authority.password.grant`, `authority.client_credentials.grant`, or `authority.bootstrap.user`.
- `OccurredAt` — UTC timestamp captured at emission time.
- `CorrelationId` — stable identifier propagated across logs and persistence.
- `Outcome``Success`, `Failure`, `LockedOut`, `RateLimited`, or `Error`.
- `Reason` — optional failure or policy message.
- `Subject``AuthEventSubject` carrying subject identifier, username, display name, and optional realm metadata. All subject fields are tagged as PII.
- `Client``AuthEventClient` with client identifier, display name, and originating provider/plugin.
- `Scopes` — granted or requested OAuth scopes (sorted before emission).
- `Network``AuthEventNetwork` with remote address, forwarded headers, and user agent string (all treated as PII).
- `Properties` — additional `AuthEventProperty` entries for context-specific details (lockout durations, policy decisions, retries, etc.).
## Data Classifications
Every string value uses `ClassifiedString` to assign a data classification:
- `None` — public or operational metadata (event type, outcome).
- `Personal` — personally identifiable information (PII) such as subject identifiers, usernames, remote IP addresses, and user agents.
- `Sensitive` — secrets or derived credentials (client secrets, retry tokens). Avoid storing raw credentials; emit only hashed or summarised data when the classification is `Sensitive`.
Downstream log sinks and persistence layers can inspect classifications to redact or separate PII before export.
## Event Naming
Event names follow dotted notation:
- `authority.password.grant` — password grant handled by OpenIddict.
- `authority.client_credentials.grant` — client credential grant handling.
- `authority.bootstrap.user` and `authority.bootstrap.client` — bootstrap API operations.
- Future additions should preserve the `authority.<surface>.<action>` pattern to keep filtering deterministic.
## Persistence
The Authority host converts audit records into `AuthorityLoginAttemptDocument` rows for MongoDB persistence. Documents must:
- Preserve `CorrelationId`, `SubjectId`, `ClientId`, `Plugin`, `Outcome`, `Reason`, and `OccurredAt`.
- Store remote address in `remoteAddress` only after classification as PII.
- Include summary booleans such as `Successful` to accelerate lockout policy checks.
When exporting to external SIEMs, honour the `ClassifiedString.Classification` tag to avoid shipping PII into restricted environments.

View File

@@ -0,0 +1,106 @@
# Authority Threat Model (STRIDE)
> Prepared by Security Guild — 2025-10-12. Scope covers Authority host, Standard plug-in, CLI, bootstrap workflow, and offline revocation distribution.
## 1. Scope & Method
- Methodology: STRIDE applied to primary Authority surfaces (token issuance, bootstrap, revocation, operator tooling, plug-in extensibility).
- Assets in scope: identity credentials, OAuth tokens (access/refresh), bootstrap invites, revocation manifests, signing keys, audit telemetry.
- Out of scope: Third-party IdPs federated via OpenIddict (tracked separately in SEC6 backlog).
## 2. Assets & Entry Points
| Asset / Surface | Description | Primary Actors |
|-----------------|-------------|----------------|
| Token issuance APIs (`/token`, `/authorize`) | OAuth/OIDC endpoints mediated by OpenIddict | CLI, UI, automation agents |
| Bootstrap channel | Initial admin invite + bootstrap CLI workflow | Platform operators |
| Revocation bundle | Offline JSON + detached JWS consumed by agents | Feedser, Agents, Zastava |
| Plug-in manifests | Standard plug-in configuration and password policy overrides | Operators, DevOps |
| Signing keys | ES256 signing keys backing tokens and revocation manifests | Security Guild, HSM/KeyOps |
| Audit telemetry | Structured login/audit stream persisted to Mongo/observability stack | SOC, SecOps |
## 3. Trust Boundaries
| Boundary | Rationale | Controls |
|----------|-----------|----------|
| TB1 — Public network ↔️ Authority ingress | Internet/extranet exposure for `/token`, `/authorize`, `/bootstrap` | TLS 1.3, reverse proxy ACLs, rate limiting (SEC3.A / CORE8.RL) |
| TB2 — Authority host ↔️ Mongo storage | Credential store, revocation state, audit log persistence | Authenticated Mongo, network segmentation, deterministic serializers |
| TB3 — Authority host ↔️ Plug-in sandbox | Plug-ins may override password policy and bootstrap flows | Code signing, manifest validation, restart-time loading only |
| TB4 — Operator workstation ↔️ CLI | CLI holds bootstrap secrets and revocation bundles | OS keychain storage, MFA on workstations, offline kit checksum |
| TB5 — Authority ↔️ Downstream agents | Revocation bundle consumption, token validation | Mutual TLS (planned), detached JWS signatures, bundle freshness checks |
## 4. Data Flow Diagrams
### 4.1 Runtime token issuance
```mermaid
flowchart LR
subgraph Client Tier
CLI[StellaOps CLI]
UI[UI / Automation]
end
subgraph Perimeter
RP[Reverse Proxy / WAF]
end
subgraph Authority
AUTH[Authority Host]
PLGIN[Standard Plug-in]
STORE[(Mongo Credential Store)]
end
CLI -->|OAuth password / client creds| RP --> AUTH
UI -->|OAuth flows| RP
AUTH -->|PasswordHashOptions + Secrets| PLGIN
AUTH -->|Verify / Persist hashes| STORE
STORE -->|Rehash needed| AUTH
AUTH -->|Access / refresh token| RP --> Client Tier
```
### 4.2 Bootstrap & revocation
```mermaid
flowchart LR
subgraph Operator
OPS[Operator Workstation]
end
subgraph Authority
AUTH[Authority Host]
STORE[(Mongo)]
end
subgraph Distribution
OFFKIT[Offline Kit Bundle]
AGENT[Authorized Agent / Feedser]
end
OPS -->|Bootstrap CLI (`stellaops auth bootstrap`)| AUTH
AUTH -->|One-time invite + Argon2 hash| STORE
AUTH -->|Revocation export (`stellaops auth revoke export`)| OFFKIT
OFFKIT -->|Signed JSON + .jws| AGENT
AGENT -->|Revocation ACK / telemetry| AUTH
```
## 5. STRIDE Analysis
| Threat | STRIDE Vector | Surface | Risk (L×I) | Existing Controls | Gaps / Actions | Owner |
|--------|---------------|---------|------------|-------------------|----------------|-------|
| Spoofed revocation bundle | Spoofing | TB5 — Authority ↔️ Agents | Med×High | Detached JWS signature (planned), offline kit checksums | Finalise signing key registry & verification script (SEC4.B/SEC4.HOST); add bundle freshness requirement | Security Guild (follow-up: **SEC5.B**) |
| Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Add audit coverage for tampered inputs, align correlation IDs with SOC (SEC2.A/SEC2.B) | Security Guild + Authority Core (follow-up: **SEC5.C**) |
| Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Enforce invite expiration + audit trail for unused invites | Security Guild (follow-up: **SEC5.D**) |
| Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Planned revocation bundles, optional mTLS | Require agent binding (device fingerprint) and enforce revocation grace window alerts | Security Guild + Zastava (follow-up: **SEC5.E**) |
| Privilege escalation via plug-in override | Elevation of Privilege | TB3 — Plug-in sandbox | Med×High | Signed plug-ins, restart-only loading, configuration validation | Add static analysis on manifest overrides + runtime warning when policy weaker than host | Security Guild + DevOps (follow-up: **SEC5.F**) |
| Offline bundle tampering | Tampering | Distribution | Low×High | SHA256 manifest, signed bundles (planned) | Add supply-chain attestation for Offline Kit, publish verification CLI in docs | Security Guild + Ops (follow-up: **SEC5.G**) |
| Failure to log denied tokens | Repudiation | TB2 — Authority ↔️ Mongo | Med×Med | Serilog structured events (partial), Mongo persistence path (planned) | Finalise audit schema (SEC2.A) and ensure `/token` denies include subject/client/IP fields | Security Guild + Authority Core (follow-up: **SEC5.H**) |
Risk scoring uses qualitative scale (Low/Med/High) for likelihood × impact; mitigation priority follows High > Med > Low.
## 6. Follow-up Backlog Hooks
| Backlog ID | Linked Threat | Summary | Target Owners |
|------------|---------------|---------|---------------|
| SEC5.B | Spoofed revocation bundle | Complete libsodium/Core signing integration and ship revocation verification script. | Security Guild + Authority Core |
| SEC5.C | Parameter tampering on `/token` | Finalise audit contract (`SEC2.A`) and add request tamper logging. | Security Guild + Authority Core |
| SEC5.D | Bootstrap invite replay | Implement expiry enforcement + audit coverage for unused bootstrap invites. | Security Guild |
| SEC5.E | Token replay by stolen agent | Document device binding requirements and create detector for stale revocation acknowledgements. | Security Guild + Zastava |
| SEC5.F | Plug-in override escalation | Static analysis of plug-in manifests; warn on weaker password policy overrides. | Security Guild + DevOps |
| SEC5.G | Offline bundle tampering | Extend Offline Kit build to include attested manifest + verification CLI sample. | Security Guild + Ops |
| SEC5.H | Failure to log denied tokens | Ensure audit persistence for all `/token` denials with correlation IDs. | Security Guild + Authority Core |
Update `src/StellaOps.Cryptography/TASKS.md` (Security Guild board) with the above backlog entries to satisfy SEC5.A exit criteria.

View File

@@ -0,0 +1,76 @@
# Authority Password Hashing Guidance
> **Status:** Drafted 2025-10-11 alongside SEC1.A / SEC1.PLG rollout. Argon2id is now the default hashing algorithm for the Standard plug-in and recommended for all Authority identity providers.
## 1. Overview
StellaOps Authority issues and verifies credentials through the shared `StellaOps.Cryptography` provider abstraction. As of October 2025:
- **Default algorithm:** Argon2id (PHC format `$argon2id$v=19$m=<mem>,t=<time>,p=<parallelism>$<salt>$<hash>`).
- **Legacy support:** PBKDF2-SHA256 hashes (`PBKDF2.<iterations>.<payload>`) continue to verify, but successful logins are transparently rehashed to Argon2id.
- **Configuration path:** `authority.security.passwordHashing` in the primary Authority configuration controls system-wide defaults. Individual plug-ins may override via `passwordHashing` in their manifests.
## 2. Recommended Parameters
| Environment | memorySizeInKib | iterations | parallelism | Notes |
|-------------|-----------------|------------|-------------|-------|
| Production (default) | 19456 | 2 | 1 | Balances CPU with 19MiB memory cost; ~175ms on 4 vCPU host. |
| High-security enclave | 32768 | 3 | 1 | Increases memory pressure; confirm capacity on shared hosts. |
| Resource-constrained lab | 8192 | 2 | 1 | Use only for bootstrap/testing; increase once hardware upgraded. |
| PBKDF2 fallback | — | ≥210000 | — | Set `algorithm: Pbkdf2` only when Argon2 hardware support unavailable. |
> ⚠️ Lowering parameters below these baselines should be a temporary measure. Document any deviations in runbooks and schedule follow-up work to restore defaults.
## 3. Configuring Authority Defaults
`authority.yaml` (or equivalent) accepts the following block:
```yaml
security:
passwordHashing:
algorithm: Argon2id # or Pbkdf2
memorySizeInKib: 19456 # ~19 MiB
iterations: 2
parallelism: 1
```
These values propagate to plug-ins that do not provide explicit overrides. Runtime validation ensures all numbers are > 0 and the algorithm is recognised.
## 4. Plug-in Overrides
The Standard plug-in inherits the host defaults but can fine-tune parameters per installation:
```yaml
passwordHashing:
algorithm: Argon2id
memorySizeInKib: 8192
iterations: 2
parallelism: 1
```
- When the plug-in configuration omits `passwordHashing`, the Authority defaults apply.
- Setting `algorithm: Pbkdf2` keeps PBKDF2 active but still upgrades credentials when the host default switches back to Argon2id.
- Invalid overrides (e.g., `memorySizeInKib: 0`) cause startup to fail with a descriptive validation error.
## 5. Observability & Migration
- Successful PBKDF2 verification logs a **rehash-needed** event and immediately persists an Argon2id hash.
- Metrics emitted: `auth.plugins.standard.password_rehash_total{algorithm="pbkdf2"}` (add dashboards to monitor upgrade progress).
- During migration, expect a gradual decline in PBKDF2 hashes as users authenticate. Use operator scripts to query `authority_users_*` collections for lingering `PBKDF2.` prefixes if you need to track completion.
## 6. Operational Checklist
1. Update Authority configuration with desired defaults; restart the host.
2. Regenerate plug-in manifests (if overrides required) and redeploy.
3. Monitor `password_rehash_total` and login success rates; investigate any spike in failures (likely due to mis-sized limits).
4. Review hardware utilisation; Argon2id increases memory pressure compared to PBKDF2.
5. Archive this document with the change request and notify SOC of the new baseline.
For additional context on tuning trade-offs, consult OWASP Password Storage Cheat Sheet and the StellaOps Security Guild guidance (to be published in `docs/security/rate-limits.md`).
## 7. Native Argon2 Preview Build Flag
- Set `dotnet build -p:StellaOpsCryptoSodium=true` (or define the MSBuild property in your CI) to enable the `STELLAOPS_CRYPTO_SODIUM` compilation symbol.
- The symbol switches `StellaOps.Cryptography` to use the native-oriented build pipeline so we can wire libsodium/Core bindings without affecting the managed default.
- Until the native implementation lands (SEC1.B follow-up), the flag falls back to the managed Konscious implementation while still validating the alternate compilation path.
- Document any production usage of the flag in your change log so future upgrades can align with the Security Guild rollout plan.

View File

@@ -0,0 +1,56 @@
{
"$schema": "../../etc/authority/revocation_bundle.schema.json",
"schemaVersion": "1.0.0",
"issuer": "https://auth.stella-ops.example",
"bundleId": "6f9d08bfa0c24a0a9f7f59e6c17d2f8e8bca2ef34215c3d3ba5a9a1f0fbe2d10",
"issuedAt": "2025-10-12T15:00:00Z",
"validFrom": "2025-10-12T15:00:00Z",
"sequence": 42,
"signingKeyId": "authority-signing-20251012",
"revocations": [
{
"id": "7ad4f3d2c21b461d9b3420e1151be9c4",
"category": "token",
"tokenType": "access_token",
"clientId": "feedser-cli",
"subjectId": "user:ops-admin",
"reason": "compromised",
"reasonDescription": "Access token reported by SOC automation run R-2045.",
"revokedAt": "2025-10-12T14:32:05Z",
"scopes": [
"feedser:export",
"feedser:jobs"
],
"fingerprint": "AD35E719C12204D7E7C92ED3F6DEBF0A44642D41AAF94233F9A47E183F4C5F18",
"metadata": {
"reportId": "R-2045",
"source": "soc-automation"
}
},
{
"id": "user:departed-vendor",
"category": "subject",
"subjectId": "user:departed-vendor",
"reason": "lifecycle",
"revokedAt": "2025-10-10T18:15:00Z",
"metadata": {
"ticket": "HR-8821"
}
},
{
"id": "ci-runner-legacy",
"category": "client",
"clientId": "ci-runner-legacy",
"reason": "rotation",
"revokedAt": "2025-10-09T11:00:00Z",
"expiresAt": "2025-11-01T00:00:00Z",
"metadata": {
"replacement": "ci-runner-2025"
}
}
],
"metadata": {
"generator": "stellaops-authority@1.4.0",
"jobId": "revocation-export-20251012T1500Z"
}
}

View File

@@ -0,0 +1,70 @@
# Authority Revocation Bundle
The Authority service exports revocation information as an offline-friendly JSON document plus a detached JWS signature. Operators can mirror the bundle alongside Feedser exports to ensure air-gapped scanners receive the latest token, subject, and client revocations.
## File layout
| Artefact | Description |
| --- | --- |
| `revocation-bundle.json` | Canonical JSON document describing revoked entities. Validates against [`etc/authority/revocation_bundle.schema.json`](../../etc/authority/revocation_bundle.schema.json). |
| `revocation-bundle.json.jws` | Detached JWS signature covering the exact UTF-8 bytes of `revocation-bundle.json`. |
| `revocation-bundle.json.sha256` | Hex-encoded SHA-256 digest used by mirror automation (optional but recommended). |
All hashes and signatures are generated after applying the deterministic formatting rules below.
## Deterministic formatting rules
- JSON is serialised with UTF-8 encoding, 2-space indentation, and lexicographically sorted object keys.
- Arrays are sorted by deterministic keys:
- Top-level `revocations` sorted by (`category`, `id`, `revokedAt`).
- Nested arrays (`scopes`) sorted ascending, unique enforced.
- Numeric values (`sequence`) are emitted without leading zeros.
- Timestamps use UTC ISO-8601 format with `Z` suffix.
Consumers MUST treat the combination of `schemaVersion` and `sequence` as a monotonic feed. Bundles with older `sequence` values are ignored unless `bundleId` differs and `issuedAt` is newer (supporting replay detection).
## Revocation entry categories
| Category | Description | Required fields |
| --- | --- | --- |
| `token` | A single OAuth token (access, refresh, device, authorization code). | `tokenType`, `clientId`, `revokedAt`, optional `subjectId` |
| `subject` | All credentials issued to a subject (user/service account). | `subjectId`, `revokedAt` |
| `client` | Entire OAuth client registration is revoked. | `clientId`, `revokedAt` |
| `key` | Signing/encryption key material revoked. | `id`, `revokedAt` |
`reason` is a machine-friendly code (`compromised`, `rotation`, `policy`, `lifecycle`, etc). `reasonDescription` may include a short operator note.
## Detached JWS workflow
1. Serialise `revocation-bundle.json` using the deterministic rules.
2. Compute SHA-256 digest; write to `revocation-bundle.json.sha256`.
3. Sign using ES256 (default) with the configured Authority signing key. The JWS header uses:
```json
{
"alg": "ES256",
"kid": "{signingKeyId}",
"typ": "application/vnd.stellaops.revocation-bundle+jws",
"b64": false,
"crit": ["b64"]
}
```
4. Persist the detached signature payload to `revocation-bundle.json.jws` (per RFC 7797).
Verification steps:
1. Validate `revocation-bundle.json` against the schema.
2. Re-compute SHA-256 and compare with `.sha256` (if present).
3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle.
4. Verify the detached JWS using the stored signing key (example tooling coming with `stella auth revoke verify`).
## Example
The repository contains an [example bundle](revocation-bundle-example.json) demonstrating a mixed export of token, subject, and client revocations. Use it as a reference for integration tests and tooling.
## Operations Quick Reference
- `stella auth revoke export` emits a canonical JSON bundle, `.sha256` digest, and detached JWS signature in one command. Use `--output` to write into your mirror staging directory.
- `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key and reports digest mismatches before distribution.
- `POST /internal/revocations/export` provides the same payload for orchestrators that already talk to the bootstrap API.
- `POST /internal/signing/rotate` rotates JWKS material without downtime; always export a fresh bundle afterward so downstream mirrors receive signatures from the new `kid`.
- Offline Kit automation should mirror `revocation-bundle.json*` alongside Feedser exports so agents ingest revocations during the same sync pass.