feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
2025-10-30 00:09:39 +02:00
parent 3154c67978
commit 7b5bdcf4d3
503 changed files with 16136 additions and 54638 deletions

View File

@@ -0,0 +1,159 @@
# Concelier Authority Audit Runbook
_Last updated: 2025-10-22_
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
## 1. Prerequisites
- Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
- OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM.
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
- The rollout table in `docs/10_CONCELIER_CLI_QUICKSTART.md` has been reviewed so stakeholders align on the staged → enforced toggle timeline.
### Configuration snippet
```yaml
concelier:
authority:
enabled: true
allowAnonymousFallback: false # keep true only during initial rollout
issuer: "https://authority.internal"
audiences:
- "api://concelier"
requiredScopes:
- "concelier.jobs.trigger"
- "advisory:read"
- "advisory:ingest"
requiredTenants:
- "tenant-default"
bypassNetworks:
- "127.0.0.1/32"
- "::1/128"
clientId: "concelier-jobs"
clientSecretFile: "/run/secrets/concelier_authority_client"
tokenClockSkewSeconds: 60
resilience:
enableRetries: true
retryDelays:
- "00:00:01"
- "00:00:02"
- "00:00:05"
allowOfflineCacheFallback: true
offlineCacheTolerance: "00:10:00"
```
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
### Resilience tuning
- **Connected sites:** keep the default 1s / 2s / 5s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (1530minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
## 2. Key Signals
### 2.1 Audit log channel
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
```
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7
```
| Field | Sample value | Meaning |
|--------------|-------------------------|------------------------------------------------------------------------------------------|
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
| `clientId` | `concelier-cli` | OAuth client ID provided by Authority (`(none)` if the token lacked the claim). |
| `scopes` | `concelier.jobs.trigger advisory:ingest advisory:read` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
| `tenant` | `tenant-default` | Tenant claim extracted from the Authority token (`(none)` when the token lacked it). |
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
- `status=401 AND bypass=True` bypass network accepted an unauthenticated call (should be temporary during rollout).
- `status=202 AND scopes="(none)"` a token without scopes triggered a job; tighten client configuration.
- `status=202 AND NOT contains(scopes,"advisory:ingest")` ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above.
- `tenant!=(tenant-default)` indicates a cross-tenant token was accepted. Ensure Concelier `requiredTenants` is aligned with Authority client registration.
- Spike in `clientId="(none)"` indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
### 2.2 Metrics
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
| Metric name | Description | PromQL example |
|-------------------------------|----------------------------------------------------|----------------|
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipelines generated metric names.
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` ensures REST/manual triggers route through Authority.
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
## 3. Alerting Guidance
1. **Unauthorized bypass attempt**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
2. **Missing scopes**
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
- Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`, `advisory:ingest`, and `advisory:read`.
3. **Trigger failure surge**
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
- Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
4. **Conflict spike**
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
5. **Authority offline**
- Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
## 4. Rollout & Verification Procedure
1. **Pre-checks**
- Align with the rollout phases documented in `docs/10_CONCELIER_CLI_QUICKSTART.md` (validation → rehearsal → enforced) and record the target dates in your change request.
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
- Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host).
2. **Smoke test with valid token**
- Obtain a token via CLI: `stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read`.
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger advisory:ingest advisory:read`, and `tenant=tenant-default`.
3. **Negative test without token**
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
4. **Bypass check (if applicable)**
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
5. **Metrics validation**
- Ensure `web.jobs.triggered` counter increments during accepted runs.
- Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
## 5. Troubleshooting
| Symptom | Probable cause | Remediation |
|---------|----------------|-------------|
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
## 6. References
- `docs/21_INSTALL_GUIDE.md` Authority configuration quick start.
- `docs/17_SECURITY_HARDENING_GUIDE.md` Security guardrails and enforcement deadlines.
- `docs/modules/authority/operations/monitoring.md` Authority-side monitoring and alerting playbook.
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` source of audit log fields.

View File

@@ -0,0 +1,160 @@
# Concelier Conflict Resolution Runbook (Sprint 3)
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
---
## 1. Precedence Model (recap)
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it.
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
---
## 2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose |
|------------|------|----------|---------|
| `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
| `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
| `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
| `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
| `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
### Structured logs
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
- `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments.
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`.
---
## 3. Detection & Alerting
1. **Dashboard panels**
- `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
- `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
- `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
- `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
2. **Log based alerts**
- `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
- `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
3. **Job health**
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
### Threshold updates (2025-10-12)
- `concelier.merge.conflicts` Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- `concelier.merge.overrides` Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
- `concelier.merge.range_overrides` Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
---
## 4. Triage Workflow
1. **Confirm job context**
- `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
2. **Inspect metrics**
- Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`.
3. **Pull structured logs**
- Example (vector output):
```
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
```
4. **Review merge events**
- `mongosh`:
```javascript
use concelier;
db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
```
- Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
5. **Interrogate provenance**
- `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
- Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
---
## 5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action |
|--------|--------------|------------------|
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. |
| Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
| `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
---
## 6. Resolution Playbook
1. **Connector data fix**
- Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
- Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
2. **Temporary precedence override**
- Edit `etc/concelier.yaml`:
```yaml
concelier:
merge:
precedence:
ranks:
osv: 1
ghsa: 0
```
- Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks.
- Document the override with expiry in the change log.
3. **Alias remediation**
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
4. **Escalation**
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
---
## 7. Validation Checklist
- [ ] Merge job rerun returns exit code `0`.
- [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action.
- [ ] Latest `merge_event` entry shows expected hash delta.
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
---
## 8. Reference Material
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
- Merge engine internals: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`.
- Metrics definitions: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
- Storage audit trail: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo/MergeEvents`.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
---
## 9. Synthetic Regression Fixtures
- **Locations** Canonical conflict snapshots now live at `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
- **Validation commands** To regenerate and verify the fixtures offline, run:
```bash
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/Concelier/__Tests/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
```
- **Expected signals** The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
---
## 10. Change Log
| Date (UTC) | Change | Notes |
|------------|--------|-------|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |

View File

@@ -0,0 +1,77 @@
# Concelier Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline:
```yaml
concelier:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases.
- `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.

View File

@@ -0,0 +1,72 @@
# Concelier CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
```yaml
concelier:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="cccs"}`
- `concelier.source.http.failures{concelier.source="cccs"}`
- `concelier.source.http.duration{concelier.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.

View File

@@ -0,0 +1,146 @@
# Concelier CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `concelier.yaml` fragment:
```yaml
concelier:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `concelier.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python src/Tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python src/Tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Concelier can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.

View File

@@ -0,0 +1,94 @@
# Concelier Cisco PSIRT Connector OAuth Provisioning SOP
_Last updated: 2025-10-14_
## 1. Scope
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
## 2. Prerequisites
- Active Cisco.com (CCO) account with access to the Cisco API Console.
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
- Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory.
## 3. Provisioning workflow
1. **Register the application**
- Sign in at <https://apiconsole.cisco.com>.
- Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
- Record the generated `clientId` and `clientSecret` in the Ops vault.
2. **Verify token issuance**
- Request an access token with:
```bash
curl -s https://id.cisco.com/oauth2/default/v1/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}"
```
- Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
- Preserve the response only long enough to validate syntax; do **not** persist tokens.
3. **Authorize Concelier runtime**
- Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
- For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platforms sealed secret format.
4. **Connectivity validation**
- From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
- Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
## 4. Rotation SOP
| Step | Owner | Notes |
| --- | --- | --- |
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
| 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. |
| 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
**Automation hooks**
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task.
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
## 5. Offline Kit packaging
1. Generate the credential bundle using the Offline Kit CLI:
`offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
## 6. Quota and throttling guidance
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5000 requests/day per application.citeturn0search0turn3search6
- Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
- Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented.
## 7. Telemetry & Monitoring
- **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)**
- `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
- `cisco.parse.success`, `cisco.parse.failures`
- `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="vndr-cisco"}`
- `concelier.source.http.failures{concelier.source="vndr-cisco"}`
- `concelier.source.http.duration{concelier.source="vndr-cisco"}`
- **Structured logs**
- `Cisco fetch completed date=… pages=… added=…` (info)
- `Cisco parse completed parsed=… failures=…` (info)
- `Cisco map completed mapped=… failures=…` (info)
- Warnings surface when DTO serialization fails or GridFS payload is missing.
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
## 8. Incident response
- **Token compromise** revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
- **Persistent 401/403** confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
- **429 spikes** inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
## 9. References
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
- openVuln API rate limit documentation.citeturn0search0turn3search6

View File

@@ -0,0 +1,151 @@
{
"title": "Concelier CVE & KEV Observability",
"uid": "concelier-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}

View File

@@ -0,0 +1,143 @@
# Concelier CVE & KEV Connector Operations
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
## 1. CVE Services Connector (`source:cve:*`)
### 1.1 Prerequisites
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers.
- Updated `concelier.yaml` (or the matching environment variables) with the following section:
```yaml
concelier:
sources:
cve:
baseEndpoint: "https://cveawg.mitre.org/api/"
apiOrg: "ORG123"
apiUser: "user@example.org"
apiKeyFile: "/var/run/secrets/concelier/cve-api-key"
seedDirectory: "./seed-data/cve"
pageSize: 200
maxPagesPerFetch: 5
initialBackfill: "30.00:00:00"
requestDelay: "00:00:00.250"
failureBackoff: "00:10:00"
```
> Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`.
> 🪙 When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repos `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
### 1.2 Smoke Test (staging)
1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials.
2. Trigger one end-to-end cycle:
- Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`):
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
- `cve.map.success`
4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat.
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
### 1.3 Production Monitoring
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- `rate(cve_fetch_failures_total[5m]) > 0` for 10minutes (`severity=warning`)
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
- **Logs** Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
- **Grafana pack** Import `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes.
### 1.4 Staging smoke log (2025-10-15)
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
- Command: `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"`
- Summary log emitted by the connector:
```
CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
```
- Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`:
| Metric | Value |
|--------|-------|
| `cve.fetch.attempts` | 1 |
| `cve.fetch.success` | 1 |
| `cve.fetch.documents` | 1 |
| `cve.parse.success` | 1 |
| `cve.map.success` | 1 |
The Grafana pack `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
## 2. CISA KEV Connector (`source:kev:*`)
### 2.1 Prerequisites
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
- Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed):
```yaml
concelier:
sources:
kev:
feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
requestTimeout: "00:01:00"
failureBackoff: "00:05:00"
```
### 2.2 Schema validation & anomaly handling
The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons:
| Reason | Meaning |
| --- | --- |
| `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. |
| `countMismatch` | Catalog `count` field disagreed with the actual entry total. |
| `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). |
Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers.
### 2.3 Smoke Test (staging)
1. Deploy the configuration and restart Concelier.
2. Trigger a pipeline run:
- CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
- REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`:
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
- `kev.map.advisories` (tag `catalogVersion`)
4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase).
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
### 2.4 Production Monitoring
- Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`.
- Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror.
- Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs.
- Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run.
### 2.5 Known good dashboard tiles
Add the following panels to the Concelier observability board:
| Metric | Recommended visualisation |
|--------|---------------------------|
| `rate(kev_fetch_success_total[30m])` | Single-stat (last 24h) with warning threshold `>0` |
| `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area highlights daily release size |
| `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table anomaly breakdown (matches dashboard panel) |
| `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted |
## 3. Runbook updates
- Record staging/production smoke test results (date, catalog version, advisory counts) in your teams change log.
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
- Version-control dashboard tweaks alongside `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.

View File

@@ -0,0 +1,123 @@
# Concelier GHSA Connector Operations Runbook
_Last updated: 2025-10-16_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10%** for longer than a single reset window helps avoid secondary limits.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
## 4. Configuration knobs (`concelier.yaml`)
```yaml
concelier:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
concelier:
environment:
CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
concelier:
extraEnv:
- name: CONCELIER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: concelier-ghsa
key: apiToken
extraSecrets:
concelier-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delayinglogs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.
## 8. Canonical metric fallback analytics
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.

View File

@@ -0,0 +1,122 @@
# Concelier CISA ICS Connector Operations
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
## 1. Credential Provisioning
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
- `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
- `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
- `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`.
> GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
## 2. Feed Validation
Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value):
```bash
curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \
"https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
```
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
## 3. Configuration Snippet
Add the connector configuration to `concelier.yaml` (or equivalent environment variables):
```yaml
concelier:
sources:
icscisa:
govDelivery:
code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}"
topics:
- "USDHSCISA_16"
- "USDHSCISA_19"
- "USDHSCISA_17"
rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
Environment variable example:
```bash
export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
```
Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
Optional tuning keys (set only when needed):
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
## 4. Seeding Without GovDelivery
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
> The CSVs are licensed under ODbL1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
## 4. Integration Validation
1. Ensure secrets are in place and restart the Concelier workers.
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
```bash
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \
CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \
stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
```
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
## 4. Rotation & Incident Response
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
## 5. Offline Kit Handling
Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`:
```
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
```
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user.
## 6. Telemetry & Monitoring
The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
- `icscisa.fetch.*` counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`).
- `icscisa.parse.*` counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
- `icscisa.detail.*` counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
- `icscisa.map.*` counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
Suggested alerts:
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
- Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns.
## 6. Related Tasks
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.

View File

@@ -0,0 +1,74 @@
# Concelier KISA Connector Operations
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
## 1. Prerequisites
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
- Connector options defined under `concelier:sources:kisa`:
```yaml
concelier:
sources:
kisa:
feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
maxAdvisoriesPerFetch: 10
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
> Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
## 2. Staging Smoke Test
1. Restart the Concelier workers so the KISA options bind.
2. Run a full connector cycle:
- CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
- REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`):
- `kisa.feed.success`, `kisa.feed.items`
- `kisa.detail.success` / `.failures`
- `kisa.parse.success` / `.failures`
- `kisa.map.success` / `.failures`
- `kisa.cursor.updates`
4. Inspect logs for structured entries:
- `KISA feed returned {ItemCount}`
- `KISA fetched detail for {Idx} … category={Category}`
- `KISA mapped advisory {AdvisoryId} (severity={Severity})`
- Absence of warnings such as `document missing GridFS payload`.
5. Validate MongoDB state:
- `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
- DTO store contains `schemaVersion="kisa.detail.v1"`.
- Advisories include aliases (`IDX`, CVE) and `language="ko"`.
- `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
## 3. Production Monitoring
- **Dashboards** Add the following Prometheus/OTEL expressions:
- `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])`
- `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
- `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
- `increase(kisa_map_failures_total[1h])` to flag schema drift
- `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
- **Alerts** Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
- **Logs** Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
## 4. Localisation Handling
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF8 and avoid transliteration.
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
## 5. Fixture & Regression Maintenance
- Regression fixtures: `src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
## 6. Known Issues
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.

View File

@@ -0,0 +1,86 @@
# Concelier MSRC Connector Azure AD Onboarding Brief
_Drafted: 2025-10-15_
## 1. App registration requirements
- **Tenant**: shared StellaOps production Azure AD.
- **Application type**: confidential client (web/API) issuing client credentials.
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
- **Token audience**: `https://api.msrc.microsoft.com/`.
- **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
## 2. Secret/credential policy
- Maintain two client secrets (primary + standby) rotating every 90 days.
- Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
- Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
## 3. Concelier configuration sample
```yaml
concelier:
sources:
vndr.msrc:
tenantId: "<azure-tenant-guid>"
clientId: "<app-registration-client-id>"
clientSecret: "<pull from secret store>"
apiVersion: "2024-08-01"
locale: "en-US"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
cursorOverlapMinutes: 10
downloadCvrf: false # set true to persist CVRF ZIP alongside JSON detail
```
## 4. CVRF artefacts
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
### 4.1 State seeding helper
Use `src/Tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
```json
{
"source": "vndr.msrc",
"cursor": {
"lastModifiedCursor": "2024-01-01T00:00:00Z"
},
"documents": [
{
"uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
"contentFile": "./seeds/adv2024-0001.json",
"contentType": "application/json",
"metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
"addToPendingDocuments": true
},
{
"uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
"contentFile": "./seeds/adv2024-0001.cvrf.zip",
"contentType": "application/zip",
"status": "mapped",
"addToPendingDocuments": false
}
]
}
```
Run the helper:
```bash
dotnet run --project src/Tools/SourceStateSeeder -- \
--connection-string "mongodb://localhost:27017" \
--database concelier \
--input seeds/msrc-backfill.json
```
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
## 5. Outstanding items
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.

View File

@@ -0,0 +1,48 @@
# NKCKI Connector Operations Guide
## Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
## Configuration
Key options exposed through `concelier:sources:ru-nkcki:http`:
- `maxBulletinsPerFetch` limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
## Telemetry
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
## Archive Backfill Strategy
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
## Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`src/Tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.

View File

@@ -0,0 +1,24 @@
# Concelier OSV Connector Operations Notes
_Last updated: 2025-10-16_
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
## 1. Canonical metric fallbacks
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
## 2. CWE provenance
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
## 3. Dashboards & alerts
- Extend existing merge dashboards with the new counter:
- Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
- Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
## 4. Runbook updates
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.

View File

@@ -0,0 +1,238 @@
# Concelier & Excititor Mirror Operations
This runbook describes how StellaOps operates the managed mirrors under `*.stella-ops.org`.
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
## 1. Prerequisites
- **Authority access** client credentials (`client_id` + secret) authorised for
`concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
- **Signed TLS certificates** wildcard or per-domain (`mirror-primary`, `mirror-community`).
Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
- **Mirror gateway credentials** Basic Auth htpasswd files per domain. Generate with
`htpasswd -B`. Operators distribute credentials to downstream consumers.
- **Export artifact source** read access to the canonical S3 buckets (or rsync share)
that hold `concelier` JSON bundles and `excititor` VEX exports.
- **Persistent volumes** storage for Concelier job metadata and mirror export trees.
For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
`excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
### 1.1 Service configuration quick reference
Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`.
Key knobs:
- `CONCELIER__MIRROR__EXPORTROOT` root folder containing export snapshots (`<exportId>/mirror/*`).
- `CONCELIER__MIRROR__ACTIVEEXPORTID` optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory.
- `CONCELIER__MIRROR__REQUIREAUTHENTICATION` default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`.
- `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`.
- `CONCELIER__MIRROR__DOMAINS__{n}__ID` domain identifier matching the exporter manifest; additional keys configure display name and rate budgets.
> The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures.
Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded.
### 1.2 Mirror connector configuration
Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in airgapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`):
- `BASEADDRESS` absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`).
- `INDEXPATH` relative path to the mirror index (`/concelier/exports/index.json` by default).
- `DOMAINID` mirror domain identifier from the index (`primary`, `community`, etc.).
- `HTTPTIMEOUT` request timeout; raise when mirrors sit behind slow WAN links.
- `SIGNATURE__ENABLED` require detached JWS verification for `bundle.json`.
- `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` expected signing key metadata.
- `SIGNATURE__PUBLICKEYPATH` PEM fallback used when the mirror key registry is offline.
The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes.
## 2. Secret & certificate layout
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
- `deploy/compose/env/mirror.env.example` copy to `.env` and adjust quotas or domain IDs.
- `deploy/compose/mirror-secrets/` mount read-only into `/run/secrets`. Place:
- `concelier-authority-client` Authority client secret.
- `excititor-authority-client` (optional) reserve for future authn.
- `deploy/compose/mirror-gateway/tls/` PEM-encoded cert/key pairs:
- `mirror-primary.crt`, `mirror-primary.key`
- `mirror-community.crt`, `mirror-community.key`
- `deploy/compose/mirror-gateway/secrets/` htpasswd files:
- `mirror-primary.htpasswd`
- `mirror-community.htpasswd`
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
Create secrets in the target namespace:
```bash
kubectl create secret generic concelier-mirror-auth \
--from-file=concelier-authority-client=concelier-authority-client
kubectl create secret generic excititor-mirror-auth \
--from-file=excititor-authority-client=excititor-authority-client
kubectl create secret tls mirror-gateway-tls \
--cert=mirror-primary.crt --key=mirror-primary.key
kubectl create secret generic mirror-gateway-htpasswd \
--from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
```
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
## 3. Deployment
### 3.1 Docker Compose (edge mirrors, lab validation)
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
2. Populate secrets/tls directories as described above.
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
on the host path backing the `concelier-exports` and `excititor-exports` volumes.
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
### 3.2 Helm (production mirrors)
1. Provision PVCs sized for mirror bundles (baseline: 20GiB per domain).
2. Create secrets/tls config maps (§2).
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
## 4. Artifact sync workflow
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
export jobs. Recommended sync pattern:
### 4.1 Compose host (systemd timer)
`/usr/local/bin/mirror-sync.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
aws s3 sync s3://mirror-stellaops/concelier/latest \
/opt/stellaops/mirror-data/concelier --delete --size-only
aws s3 sync s3://mirror-stellaops/excititor/latest \
/opt/stellaops/mirror-data/excititor --delete --size-only
```
Schedule with a systemd timer every 5minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
### 4.2 Kubernetes (CronJob)
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: mirror-sync
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
command:
- /bin/sh
- -c
- >
aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
volumeMounts:
- name: concelier-exports
mountPath: /exports/concelier
- name: excititor-exports
mountPath: /exports/excititor
envFrom:
- secretRef:
name: mirror-sync-aws
restartPolicy: OnFailure
volumes:
- name: concelier-exports
persistentVolumeClaim:
claimName: concelier-mirror-exports
- name: excititor-exports
persistentVolumeClaim:
claimName: excititor-mirror-exports
```
## 5. CDN integration
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
`Cache-Control: public, max-age=300, immutable` for mirror payloads.
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
- Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60s.
- Bundle/manifest payloads → 300s.
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
to SIEM for anomaly detection.
## 6. Smoke tests
After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses):
```bash
# Index with Basic Auth
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
# Mirror manifest signature and cache headers
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \
| tee /tmp/manifest-headers.txt
grep -E '^Cache-Control: ' /tmp/manifest-headers.txt # expect public, max-age=300, immutable
# Excititor consensus bundle metadata
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
| jq '.exports[].exportKey'
# Signed bundle + detached JWS (spot check digests)
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
-o bundle.json.jws
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
# Service-level auth check (inside cluster no gateway credentials)
kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \
| head -n 5 # expect HTTP/1.1 401 with WWW-Authenticate: Bearer
# Rate limit smoke (repeat quickly; second call should return 429 + Retry-After)
for i in 1 2; do
curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \
-u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)'
sleep 1
done
```
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
should show `X-Cache-Status: HIT/MISS`.
## 7. Maintenance & rotation
- **Bundle freshness** alert if sync job lag exceeds 15minutes or if `concelier` logs
`Mirror export root is not configured`.
- **Secret rotation** change Authority client secrets and Basic Auth credentials quarterly.
Update the mounted secrets and restart deployments (`docker compose restart concelier` or
`kubectl rollout restart deploy/stellaops-concelier`).
- **TLS renewal** reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
- **Quota tuning** adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
Align CDN rate limits and inform downstreams.
## 8. References
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
`deploy/helm/stellaops/values-mirror.yaml`
- Mirror architecture dossiers: `docs/modules/concelier/architecture.md`,
`docs/modules/excititor/mirrors.md`
- Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7