Resolve Concelier/Excititor merge conflicts
This commit is contained in:
		@@ -1,97 +1,97 @@
 | 
			
		||||
# Authority Backup & Restore Runbook
 | 
			
		||||
 | 
			
		||||
## Scope
 | 
			
		||||
- **Applies to:** StellaOps Authority deployments running the official `ops/authority/docker-compose.authority.yaml` stack or equivalent Kubernetes packaging.
 | 
			
		||||
- **Artifacts covered:** MongoDB (`stellaops-authority` database), Authority configuration (`etc/authority.yaml`), plugin manifests under `etc/authority.plugins/`, and signing key material stored in the `authority-keys` volume (defaults to `/app/keys` inside the container).
 | 
			
		||||
- **Frequency:** Run the full procedure prior to upgrades, before rotating keys, and at least once per 24 h in production. Store snapshots in an encrypted, access-controlled vault.
 | 
			
		||||
 | 
			
		||||
## Inventory Checklist
 | 
			
		||||
| Component | Location (compose default) | Notes |
 | 
			
		||||
| --- | --- | --- |
 | 
			
		||||
| Mongo data | `mongo-data` volume (`/var/lib/docker/volumes/.../mongo-data`) | Contains all Authority collections (`AuthorityUser`, `AuthorityClient`, `AuthorityToken`, etc.). |
 | 
			
		||||
| Configuration | `etc/authority.yaml` | Mounted read-only into the container at `/etc/authority.yaml`. |
 | 
			
		||||
| Plugin manifests | `etc/authority.plugins/*.yaml` | Includes `standard.yaml` with `tokenSigning.keyDirectory`. |
 | 
			
		||||
| Signing keys | `authority-keys` volume -> `/app/keys` | Path is derived from `tokenSigning.keyDirectory` (defaults to `../keys` relative to the manifest). |
 | 
			
		||||
 | 
			
		||||
> **TIP:** Confirm the deployed key directory via `tokenSigning.keyDirectory` in `etc/authority.plugins/standard.yaml`; some installations relocate keys to `/var/lib/stellaops/authority/keys`.
 | 
			
		||||
 | 
			
		||||
## Hot Backup (no downtime)
 | 
			
		||||
1. **Create output directory:** `mkdir -p backup/$(date +%Y-%m-%d)` on the host.
 | 
			
		||||
2. **Dump Mongo:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml exec mongo \
 | 
			
		||||
     mongodump --archive=/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz \
 | 
			
		||||
     --gzip --db stellaops-authority
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml cp \
 | 
			
		||||
     mongo:/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz backup/
 | 
			
		||||
   ```
 | 
			
		||||
   The `mongodump` archive preserves indexes and can be restored with `mongorestore --archive --gzip`.
 | 
			
		||||
3. **Capture configuration + manifests:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   cp etc/authority.yaml backup/
 | 
			
		||||
   rsync -a etc/authority.plugins/ backup/authority.plugins/
 | 
			
		||||
   ```
 | 
			
		||||
4. **Export signing keys:** the compose file maps `authority-keys` to a local Docker volume. Snapshot it without stopping the service:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm \
 | 
			
		||||
     -v authority-keys:/keys \
 | 
			
		||||
     -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys .
 | 
			
		||||
   ```
 | 
			
		||||
5. **Checksum:** generate SHA-256 digests for every file and store them alongside the artefacts.
 | 
			
		||||
6. **Encrypt & upload:** wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault.
 | 
			
		||||
 | 
			
		||||
## Cold Backup (planned downtime)
 | 
			
		||||
1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards).
 | 
			
		||||
2. Stop services:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml down
 | 
			
		||||
   ```
 | 
			
		||||
3. Back up volumes directly using `tar`:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm -v mongo-data:/data -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/mongo-data-$(date +%Y%m%d).tar.gz -C /data .
 | 
			
		||||
   docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys .
 | 
			
		||||
   ```
 | 
			
		||||
4. Copy configuration + manifests as in the hot backup (steps 3–6).
 | 
			
		||||
5. Restart services and verify health:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml up -d
 | 
			
		||||
   curl -fsS http://localhost:8080/ready
 | 
			
		||||
   ```
 | 
			
		||||
 | 
			
		||||
## Restore Procedure
 | 
			
		||||
1. **Provision clean volumes:** remove existing volumes if you’re rebuilding a node (`docker volume rm mongo-data authority-keys`), then recreate the compose stack so empty volumes exist.
 | 
			
		||||
2. **Restore Mongo:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose exec -T mongo mongorestore --archive --gzip --drop < backup/authority-YYYYMMDDTHHMMSSZ.gz
 | 
			
		||||
   ```
 | 
			
		||||
   Use `--drop` to replace collections; omit if doing a partial restore.
 | 
			
		||||
3. **Restore configuration/manifests:** copy `authority.yaml` and `authority.plugins/*` into place before starting the Authority container.
 | 
			
		||||
4. **Restore signing keys:** untar into the mounted volume:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys
 | 
			
		||||
   ```
 | 
			
		||||
   Ensure file permissions remain `600` for private keys (`chmod -R 600`).
 | 
			
		||||
5. **Start services & validate:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose up -d
 | 
			
		||||
   curl -fsS http://localhost:8080/health
 | 
			
		||||
   ```
 | 
			
		||||
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`.
 | 
			
		||||
 | 
			
		||||
## Disaster Recovery Notes
 | 
			
		||||
- **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
 | 
			
		||||
- **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
 | 
			
		||||
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice.
 | 
			
		||||
- **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Restoring across major versions requires a compatibility review.
 | 
			
		||||
 | 
			
		||||
## Verification Checklist
 | 
			
		||||
- [ ] `/ready` reports all identity providers ready.
 | 
			
		||||
- [ ] OAuth flows issue tokens signed by the restored keys.
 | 
			
		||||
- [ ] `PluginRegistrationSummary` logs expected providers on startup.
 | 
			
		||||
- [ ] Revocation manifest export (`dotnet run --project src/StellaOps.Authority`) succeeds.
 | 
			
		||||
- [ ] Monitoring dashboards show metrics resuming (see OPS5 deliverables).
 | 
			
		||||
 | 
			
		||||
# Authority Backup & Restore Runbook
 | 
			
		||||
 | 
			
		||||
## Scope
 | 
			
		||||
- **Applies to:** StellaOps Authority deployments running the official `ops/authority/docker-compose.authority.yaml` stack or equivalent Kubernetes packaging.
 | 
			
		||||
- **Artifacts covered:** MongoDB (`stellaops-authority` database), Authority configuration (`etc/authority.yaml`), plugin manifests under `etc/authority.plugins/`, and signing key material stored in the `authority-keys` volume (defaults to `/app/keys` inside the container).
 | 
			
		||||
- **Frequency:** Run the full procedure prior to upgrades, before rotating keys, and at least once per 24 h in production. Store snapshots in an encrypted, access-controlled vault.
 | 
			
		||||
 | 
			
		||||
## Inventory Checklist
 | 
			
		||||
| Component | Location (compose default) | Notes |
 | 
			
		||||
| --- | --- | --- |
 | 
			
		||||
| Mongo data | `mongo-data` volume (`/var/lib/docker/volumes/.../mongo-data`) | Contains all Authority collections (`AuthorityUser`, `AuthorityClient`, `AuthorityToken`, etc.). |
 | 
			
		||||
| Configuration | `etc/authority.yaml` | Mounted read-only into the container at `/etc/authority.yaml`. |
 | 
			
		||||
| Plugin manifests | `etc/authority.plugins/*.yaml` | Includes `standard.yaml` with `tokenSigning.keyDirectory`. |
 | 
			
		||||
| Signing keys | `authority-keys` volume -> `/app/keys` | Path is derived from `tokenSigning.keyDirectory` (defaults to `../keys` relative to the manifest). |
 | 
			
		||||
 | 
			
		||||
> **TIP:** Confirm the deployed key directory via `tokenSigning.keyDirectory` in `etc/authority.plugins/standard.yaml`; some installations relocate keys to `/var/lib/stellaops/authority/keys`.
 | 
			
		||||
 | 
			
		||||
## Hot Backup (no downtime)
 | 
			
		||||
1. **Create output directory:** `mkdir -p backup/$(date +%Y-%m-%d)` on the host.
 | 
			
		||||
2. **Dump Mongo:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml exec mongo \
 | 
			
		||||
     mongodump --archive=/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz \
 | 
			
		||||
     --gzip --db stellaops-authority
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml cp \
 | 
			
		||||
     mongo:/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz backup/
 | 
			
		||||
   ```
 | 
			
		||||
   The `mongodump` archive preserves indexes and can be restored with `mongorestore --archive --gzip`.
 | 
			
		||||
3. **Capture configuration + manifests:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   cp etc/authority.yaml backup/
 | 
			
		||||
   rsync -a etc/authority.plugins/ backup/authority.plugins/
 | 
			
		||||
   ```
 | 
			
		||||
4. **Export signing keys:** the compose file maps `authority-keys` to a local Docker volume. Snapshot it without stopping the service:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm \
 | 
			
		||||
     -v authority-keys:/keys \
 | 
			
		||||
     -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys .
 | 
			
		||||
   ```
 | 
			
		||||
5. **Checksum:** generate SHA-256 digests for every file and store them alongside the artefacts.
 | 
			
		||||
6. **Encrypt & upload:** wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault.
 | 
			
		||||
 | 
			
		||||
## Cold Backup (planned downtime)
 | 
			
		||||
1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards).
 | 
			
		||||
2. Stop services:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml down
 | 
			
		||||
   ```
 | 
			
		||||
3. Back up volumes directly using `tar`:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm -v mongo-data:/data -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/mongo-data-$(date +%Y%m%d).tar.gz -C /data .
 | 
			
		||||
   docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys .
 | 
			
		||||
   ```
 | 
			
		||||
4. Copy configuration + manifests as in the hot backup (steps 3–6).
 | 
			
		||||
5. Restart services and verify health:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose -f ops/authority/docker-compose.authority.yaml up -d
 | 
			
		||||
   curl -fsS http://localhost:8080/ready
 | 
			
		||||
   ```
 | 
			
		||||
 | 
			
		||||
## Restore Procedure
 | 
			
		||||
1. **Provision clean volumes:** remove existing volumes if you’re rebuilding a node (`docker volume rm mongo-data authority-keys`), then recreate the compose stack so empty volumes exist.
 | 
			
		||||
2. **Restore Mongo:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose exec -T mongo mongorestore --archive --gzip --drop < backup/authority-YYYYMMDDTHHMMSSZ.gz
 | 
			
		||||
   ```
 | 
			
		||||
   Use `--drop` to replace collections; omit if doing a partial restore.
 | 
			
		||||
3. **Restore configuration/manifests:** copy `authority.yaml` and `authority.plugins/*` into place before starting the Authority container.
 | 
			
		||||
4. **Restore signing keys:** untar into the mounted volume:
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \
 | 
			
		||||
     busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys
 | 
			
		||||
   ```
 | 
			
		||||
   Ensure file permissions remain `600` for private keys (`chmod -R 600`).
 | 
			
		||||
5. **Start services & validate:**
 | 
			
		||||
   ```bash
 | 
			
		||||
   docker compose up -d
 | 
			
		||||
   curl -fsS http://localhost:8080/health
 | 
			
		||||
   ```
 | 
			
		||||
6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`.
 | 
			
		||||
 | 
			
		||||
## Disaster Recovery Notes
 | 
			
		||||
- **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning.
 | 
			
		||||
- **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually.
 | 
			
		||||
- **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice.
 | 
			
		||||
- **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Driver 3.5.0 requires MongoDB **4.2+**—clusters still on 4.0 must be upgraded before restore, and future driver releases will drop 4.0 entirely. citeturn1open1
 | 
			
		||||
 | 
			
		||||
## Verification Checklist
 | 
			
		||||
- [ ] `/ready` reports all identity providers ready.
 | 
			
		||||
- [ ] OAuth flows issue tokens signed by the restored keys.
 | 
			
		||||
- [ ] `PluginRegistrationSummary` logs expected providers on startup.
 | 
			
		||||
- [ ] Revocation manifest export (`dotnet run --project src/StellaOps.Authority`) succeeds.
 | 
			
		||||
- [ ] Monitoring dashboards show metrics resuming (see OPS5 deliverables).
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -1,77 +1,77 @@
 | 
			
		||||
# Feedser Apple Security Update Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
 | 
			
		||||
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
 | 
			
		||||
- Updated configuration (environment variables or `feedser.yaml`) with an `apple` section. Example baseline:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    apple:
 | 
			
		||||
      softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
 | 
			
		||||
      advisoryBaseUri: "https://support.apple.com/"
 | 
			
		||||
      localeSegment: "en-us"
 | 
			
		||||
      maxAdvisoriesPerFetch: 25
 | 
			
		||||
      initialBackfill: "120.00:00:00"
 | 
			
		||||
      modifiedTolerance: "02:00:00"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> ℹ️  `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Feedser automatically adds both hosts to the connector HttpClient.
 | 
			
		||||
 | 
			
		||||
## 2. Staging Smoke Test
 | 
			
		||||
 | 
			
		||||
1. Deploy the configuration and restart the Feedser workers to ensure the Apple connector options are bound.
 | 
			
		||||
2. Trigger a full connector cycle:
 | 
			
		||||
   - CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
 | 
			
		||||
   - REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
 | 
			
		||||
3. Validate metrics exported under meter `StellaOps.Feedser.Source.Vndr.Apple`:
 | 
			
		||||
   - `apple.fetch.items` (documents fetched)
 | 
			
		||||
   - `apple.fetch.failures`
 | 
			
		||||
   - `apple.fetch.unchanged`
 | 
			
		||||
   - `apple.parse.failures`
 | 
			
		||||
   - `apple.map.affected.count` (histogram of affected package counts)
 | 
			
		||||
4. Cross-check the shared HTTP counters:
 | 
			
		||||
   - `feedser.source.http.requests_total{feedser_source="vndr-apple"}` should increase for both index and detail phases.
 | 
			
		||||
   - `feedser.source.http.failures_total{feedser_source="vndr-apple"}` should remain flat (0) during a healthy run.
 | 
			
		||||
5. Inspect the info logs:
 | 
			
		||||
   - `Apple software index fetch … processed=X newDocuments=Y`
 | 
			
		||||
   - `Apple advisory parse complete … aliases=… affected=…`
 | 
			
		||||
   - `Mapped Apple advisory … pendingMappings=0`
 | 
			
		||||
6. Confirm MongoDB state:
 | 
			
		||||
   - `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
 | 
			
		||||
   - `dtos` store has `schemaVersion="apple.security.update.v1"`.
 | 
			
		||||
   - `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
 | 
			
		||||
   - `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
 | 
			
		||||
 | 
			
		||||
## 3. Production Monitoring
 | 
			
		||||
 | 
			
		||||
- **Dashboards** – Add the following expressions to your Feedser Grafana board (OTLP/Prometheus naming assumed):
 | 
			
		||||
  - `rate(apple_fetch_items_total[15m])` vs `rate(feedser_source_http_requests_total{feedser_source="vndr-apple"}[15m])`
 | 
			
		||||
  - `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
 | 
			
		||||
  - `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
 | 
			
		||||
  - `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
 | 
			
		||||
- **Alerts** – Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
 | 
			
		||||
- **Logs** – Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
 | 
			
		||||
- **Telemetry pipeline** – `StellaOps.Feedser.WebService` now exports `StellaOps.Feedser.Source.Vndr.Apple` alongside existing Feedser meters; ensure your OTEL collector or Prometheus scraper includes it.
 | 
			
		||||
 | 
			
		||||
## 4. Fixture Maintenance
 | 
			
		||||
 | 
			
		||||
Regression fixtures live under `src/StellaOps.Feedser.Source.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
 | 
			
		||||
 | 
			
		||||
1. Run the helper script matching your platform:
 | 
			
		||||
   - Bash: `./scripts/update-apple-fixtures.sh`
 | 
			
		||||
   - PowerShell: `./scripts/update-apple-fixtures.ps1`
 | 
			
		||||
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
 | 
			
		||||
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/StellaOps.Feedser.Source.Vndr.Apple.Tests/StellaOps.Feedser.Source.Vndr.Apple.Tests.csproj`) to verify determinism.
 | 
			
		||||
4. Commit fixture updates together with any parser/mapping changes that motivated them.
 | 
			
		||||
 | 
			
		||||
## 5. Known Issues & Follow-up Tasks
 | 
			
		||||
 | 
			
		||||
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
 | 
			
		||||
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
 | 
			
		||||
- Multi-locale content is still under regression sweep (`src/StellaOps.Feedser.Source.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.
 | 
			
		||||
# Concelier Apple Security Update Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
 | 
			
		||||
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
 | 
			
		||||
- Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    apple:
 | 
			
		||||
      softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
 | 
			
		||||
      advisoryBaseUri: "https://support.apple.com/"
 | 
			
		||||
      localeSegment: "en-us"
 | 
			
		||||
      maxAdvisoriesPerFetch: 25
 | 
			
		||||
      initialBackfill: "120.00:00:00"
 | 
			
		||||
      modifiedTolerance: "02:00:00"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> ℹ️  `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient.
 | 
			
		||||
 | 
			
		||||
## 2. Staging Smoke Test
 | 
			
		||||
 | 
			
		||||
1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound.
 | 
			
		||||
2. Trigger a full connector cycle:
 | 
			
		||||
   - CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
 | 
			
		||||
   - REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
 | 
			
		||||
3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`:
 | 
			
		||||
   - `apple.fetch.items` (documents fetched)
 | 
			
		||||
   - `apple.fetch.failures`
 | 
			
		||||
   - `apple.fetch.unchanged`
 | 
			
		||||
   - `apple.parse.failures`
 | 
			
		||||
   - `apple.map.affected.count` (histogram of affected package counts)
 | 
			
		||||
4. Cross-check the shared HTTP counters:
 | 
			
		||||
   - `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases.
 | 
			
		||||
   - `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run.
 | 
			
		||||
5. Inspect the info logs:
 | 
			
		||||
   - `Apple software index fetch … processed=X newDocuments=Y`
 | 
			
		||||
   - `Apple advisory parse complete … aliases=… affected=…`
 | 
			
		||||
   - `Mapped Apple advisory … pendingMappings=0`
 | 
			
		||||
6. Confirm MongoDB state:
 | 
			
		||||
   - `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
 | 
			
		||||
   - `dtos` store has `schemaVersion="apple.security.update.v1"`.
 | 
			
		||||
   - `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
 | 
			
		||||
   - `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
 | 
			
		||||
 | 
			
		||||
## 3. Production Monitoring
 | 
			
		||||
 | 
			
		||||
- **Dashboards** – Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed):
 | 
			
		||||
  - `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])`
 | 
			
		||||
  - `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
 | 
			
		||||
  - `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
 | 
			
		||||
  - `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
 | 
			
		||||
- **Alerts** – Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
 | 
			
		||||
- **Logs** – Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
 | 
			
		||||
- **Telemetry pipeline** – `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it.
 | 
			
		||||
 | 
			
		||||
## 4. Fixture Maintenance
 | 
			
		||||
 | 
			
		||||
Regression fixtures live under `src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
 | 
			
		||||
 | 
			
		||||
1. Run the helper script matching your platform:
 | 
			
		||||
   - Bash: `./scripts/update-apple-fixtures.sh`
 | 
			
		||||
   - PowerShell: `./scripts/update-apple-fixtures.ps1`
 | 
			
		||||
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
 | 
			
		||||
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism.
 | 
			
		||||
4. Commit fixture updates together with any parser/mapping changes that motivated them.
 | 
			
		||||
 | 
			
		||||
## 5. Known Issues & Follow-up Tasks
 | 
			
		||||
 | 
			
		||||
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
 | 
			
		||||
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
 | 
			
		||||
- Multi-locale content is still under regression sweep (`src/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.
 | 
			
		||||
@@ -1,150 +1,150 @@
 | 
			
		||||
# Feedser Authority Audit Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-12_
 | 
			
		||||
 | 
			
		||||
This runbook helps operators verify and monitor the StellaOps Feedser ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Authority integration is enabled in `feedser.yaml` (or via `FEEDSER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
 | 
			
		||||
- OTLP metrics/log exporters are configured (`feedser.telemetry.*`) or container stdout is shipped to your SIEM.
 | 
			
		||||
- Operators have access to the Feedser job trigger endpoints via CLI or REST for smoke tests.
 | 
			
		||||
 | 
			
		||||
### Configuration snippet
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  authority:
 | 
			
		||||
    enabled: true
 | 
			
		||||
    allowAnonymousFallback: false          # keep true only during initial rollout
 | 
			
		||||
    issuer: "https://authority.internal"
 | 
			
		||||
    audiences:
 | 
			
		||||
      - "api://feedser"
 | 
			
		||||
    requiredScopes:
 | 
			
		||||
      - "feedser.jobs.trigger"
 | 
			
		||||
    bypassNetworks:
 | 
			
		||||
      - "127.0.0.1/32"
 | 
			
		||||
      - "::1/128"
 | 
			
		||||
    clientId: "feedser-jobs"
 | 
			
		||||
    clientSecretFile: "/run/secrets/feedser_authority_client"
 | 
			
		||||
    tokenClockSkewSeconds: 60
 | 
			
		||||
    resilience:
 | 
			
		||||
      enableRetries: true
 | 
			
		||||
      retryDelays:
 | 
			
		||||
        - "00:00:01"
 | 
			
		||||
        - "00:00:02"
 | 
			
		||||
        - "00:00:05"
 | 
			
		||||
      allowOfflineCacheFallback: true
 | 
			
		||||
      offlineCacheTolerance: "00:10:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Store secrets outside source control. Feedser reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
 | 
			
		||||
 | 
			
		||||
### Resilience tuning
 | 
			
		||||
 | 
			
		||||
- **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Feedser retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
 | 
			
		||||
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Feedser will fail fast but keep deterministic logs.
 | 
			
		||||
- Feedser resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `feedser.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
 | 
			
		||||
 | 
			
		||||
## 2. Key Signals
 | 
			
		||||
 | 
			
		||||
### 2.1 Audit log channel
 | 
			
		||||
 | 
			
		||||
Feedser emits structured audit entries via the `Feedser.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
Feedser authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=feedser-cli scopes=feedser.jobs.trigger bypass=False remote=10.1.4.7
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
| Field        | Sample value            | Meaning                                                                                  |
 | 
			
		||||
|--------------|-------------------------|------------------------------------------------------------------------------------------|
 | 
			
		||||
| `route`      | `/jobs/definitions`     | Endpoint that processed the request.                                                     |
 | 
			
		||||
| `status`     | `200` / `401` / `409`   | Final HTTP status code returned to the caller.                                           |
 | 
			
		||||
| `subject`    | `ops@example.com`       | User or service principal subject (falls back to `(anonymous)` when unauthenticated).    |
 | 
			
		||||
| `clientId`   | `feedser-cli`           | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim).         |
 | 
			
		||||
| `scopes`     | `feedser.jobs.trigger`  | Normalised scope list extracted from token claims; `(none)` if the token carried none.   |
 | 
			
		||||
| `bypass`     | `True` / `False`        | Indicates whether the request succeeded because its source IP matched a bypass CIDR.    |
 | 
			
		||||
| `remote`     | `10.1.4.7`              | Remote IP recorded from the connection / forwarded header test hooks.                    |
 | 
			
		||||
 | 
			
		||||
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
 | 
			
		||||
 | 
			
		||||
- `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout).
 | 
			
		||||
- `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration.
 | 
			
		||||
- Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
 | 
			
		||||
 | 
			
		||||
### 2.2 Metrics
 | 
			
		||||
 | 
			
		||||
Feedser publishes counters under the OTEL meter `StellaOps.Feedser.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
 | 
			
		||||
 | 
			
		||||
| Metric name                   | Description                                        | PromQL example |
 | 
			
		||||
|-------------------------------|----------------------------------------------------|----------------|
 | 
			
		||||
| `web.jobs.triggered`          | Accepted job trigger requests.                     | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
 | 
			
		||||
| `web.jobs.trigger.conflict`   | Rejected triggers (already running, disabled…).    | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
 | 
			
		||||
| `web.jobs.trigger.failed`     | Server-side job failures.                          | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
 | 
			
		||||
 | 
			
		||||
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names.
 | 
			
		||||
 | 
			
		||||
Correlate audit logs with the following global meter exported via `Feedser.SourceDiagnostics`:
 | 
			
		||||
 | 
			
		||||
- `feedser.source.http.requests_total{feedser_source="jobs-run"}` – ensures REST/manual triggers route through Authority.
 | 
			
		||||
- If Grafana dashboards are deployed, extend the “Feedser Jobs” board with the above counters plus a table of recent audit log entries.
 | 
			
		||||
 | 
			
		||||
## 3. Alerting Guidance
 | 
			
		||||
 | 
			
		||||
1. **Unauthorized bypass attempt**  
 | 
			
		||||
   - Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`  
 | 
			
		||||
   - Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
 | 
			
		||||
 | 
			
		||||
2. **Missing scopes**  
 | 
			
		||||
   - Query: `sum(rate(log_messages_total{logger="Feedser.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`  
 | 
			
		||||
   - Action: audit Authority client registration; ensure `requiredScopes` includes `feedser.jobs.trigger`.
 | 
			
		||||
 | 
			
		||||
3. **Trigger failure surge**  
 | 
			
		||||
   - Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.  
 | 
			
		||||
   - Action: inspect correlated audit entries and `Feedser.Telemetry` traces for job execution errors.
 | 
			
		||||
 | 
			
		||||
4. **Conflict spike**  
 | 
			
		||||
   - Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).  
 | 
			
		||||
   - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
 | 
			
		||||
 | 
			
		||||
5. **Authority offline**  
 | 
			
		||||
   - Watch `Feedser.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
 | 
			
		||||
 | 
			
		||||
## 4. Rollout & Verification Procedure
 | 
			
		||||
 | 
			
		||||
1. **Pre-checks**
 | 
			
		||||
   - Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
 | 
			
		||||
   - Validate Authority issuer metadata is reachable from Feedser (`curl https://authority.internal/.well-known/openid-configuration` from the host).
 | 
			
		||||
 | 
			
		||||
2. **Smoke test with valid token**
 | 
			
		||||
   - Obtain a token via CLI: `stella auth login --scope feedser.jobs.trigger`.
 | 
			
		||||
   - Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://feedser.internal/jobs/definitions`.
 | 
			
		||||
   - Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=feedser.jobs.trigger`.
 | 
			
		||||
 | 
			
		||||
3. **Negative test without token**
 | 
			
		||||
   - Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
 | 
			
		||||
   - If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
 | 
			
		||||
 | 
			
		||||
4. **Bypass check (if applicable)**
 | 
			
		||||
   - From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
 | 
			
		||||
 | 
			
		||||
5. **Metrics validation**
 | 
			
		||||
   - Ensure `web.jobs.triggered` counter increments during accepted runs.
 | 
			
		||||
   - Exporters should show corresponding spans (`feedser.job.trigger`) if tracing is enabled.
 | 
			
		||||
 | 
			
		||||
## 5. Troubleshooting
 | 
			
		||||
 | 
			
		||||
| Symptom | Probable cause | Remediation |
 | 
			
		||||
|---------|----------------|-------------|
 | 
			
		||||
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
 | 
			
		||||
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Feedser. |
 | 
			
		||||
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`feedser.jobs.trigger`) and ensure the token audience matches `audiences` config. |
 | 
			
		||||
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `feedser.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Feedser.WebService.Jobs` meter. |
 | 
			
		||||
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Feedser job logs, re-run with tracing enabled, validate Authority latency. |
 | 
			
		||||
 | 
			
		||||
## 6. References
 | 
			
		||||
 | 
			
		||||
- `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start.
 | 
			
		||||
- `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines.
 | 
			
		||||
- `docs/ops/authority-monitoring.md` – Authority-side monitoring and alerting playbook.
 | 
			
		||||
- `StellaOps.Feedser.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields.
 | 
			
		||||
# Concelier Authority Audit Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-12_
 | 
			
		||||
 | 
			
		||||
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
 | 
			
		||||
- OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM.
 | 
			
		||||
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
 | 
			
		||||
 | 
			
		||||
### Configuration snippet
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  authority:
 | 
			
		||||
    enabled: true
 | 
			
		||||
    allowAnonymousFallback: false          # keep true only during initial rollout
 | 
			
		||||
    issuer: "https://authority.internal"
 | 
			
		||||
    audiences:
 | 
			
		||||
      - "api://concelier"
 | 
			
		||||
    requiredScopes:
 | 
			
		||||
      - "concelier.jobs.trigger"
 | 
			
		||||
    bypassNetworks:
 | 
			
		||||
      - "127.0.0.1/32"
 | 
			
		||||
      - "::1/128"
 | 
			
		||||
    clientId: "concelier-jobs"
 | 
			
		||||
    clientSecretFile: "/run/secrets/concelier_authority_client"
 | 
			
		||||
    tokenClockSkewSeconds: 60
 | 
			
		||||
    resilience:
 | 
			
		||||
      enableRetries: true
 | 
			
		||||
      retryDelays:
 | 
			
		||||
        - "00:00:01"
 | 
			
		||||
        - "00:00:02"
 | 
			
		||||
        - "00:00:05"
 | 
			
		||||
      allowOfflineCacheFallback: true
 | 
			
		||||
      offlineCacheTolerance: "00:10:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
 | 
			
		||||
 | 
			
		||||
### Resilience tuning
 | 
			
		||||
 | 
			
		||||
- **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
 | 
			
		||||
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
 | 
			
		||||
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
 | 
			
		||||
 | 
			
		||||
## 2. Key Signals
 | 
			
		||||
 | 
			
		||||
### 2.1 Audit log channel
 | 
			
		||||
 | 
			
		||||
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger bypass=False remote=10.1.4.7
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
| Field        | Sample value            | Meaning                                                                                  |
 | 
			
		||||
|--------------|-------------------------|------------------------------------------------------------------------------------------|
 | 
			
		||||
| `route`      | `/jobs/definitions`     | Endpoint that processed the request.                                                     |
 | 
			
		||||
| `status`     | `200` / `401` / `409`   | Final HTTP status code returned to the caller.                                           |
 | 
			
		||||
| `subject`    | `ops@example.com`       | User or service principal subject (falls back to `(anonymous)` when unauthenticated).    |
 | 
			
		||||
| `clientId`   | `concelier-cli`           | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim).         |
 | 
			
		||||
| `scopes`     | `concelier.jobs.trigger`  | Normalised scope list extracted from token claims; `(none)` if the token carried none.   |
 | 
			
		||||
| `bypass`     | `True` / `False`        | Indicates whether the request succeeded because its source IP matched a bypass CIDR.    |
 | 
			
		||||
| `remote`     | `10.1.4.7`              | Remote IP recorded from the connection / forwarded header test hooks.                    |
 | 
			
		||||
 | 
			
		||||
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
 | 
			
		||||
 | 
			
		||||
- `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout).
 | 
			
		||||
- `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration.
 | 
			
		||||
- Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
 | 
			
		||||
 | 
			
		||||
### 2.2 Metrics
 | 
			
		||||
 | 
			
		||||
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
 | 
			
		||||
 | 
			
		||||
| Metric name                   | Description                                        | PromQL example |
 | 
			
		||||
|-------------------------------|----------------------------------------------------|----------------|
 | 
			
		||||
| `web.jobs.triggered`          | Accepted job trigger requests.                     | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
 | 
			
		||||
| `web.jobs.trigger.conflict`   | Rejected triggers (already running, disabled…).    | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
 | 
			
		||||
| `web.jobs.trigger.failed`     | Server-side job failures.                          | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
 | 
			
		||||
 | 
			
		||||
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names.
 | 
			
		||||
 | 
			
		||||
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
 | 
			
		||||
 | 
			
		||||
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` – ensures REST/manual triggers route through Authority.
 | 
			
		||||
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
 | 
			
		||||
 | 
			
		||||
## 3. Alerting Guidance
 | 
			
		||||
 | 
			
		||||
1. **Unauthorized bypass attempt**  
 | 
			
		||||
   - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`  
 | 
			
		||||
   - Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
 | 
			
		||||
 | 
			
		||||
2. **Missing scopes**  
 | 
			
		||||
   - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`  
 | 
			
		||||
   - Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`.
 | 
			
		||||
 | 
			
		||||
3. **Trigger failure surge**  
 | 
			
		||||
   - Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.  
 | 
			
		||||
   - Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
 | 
			
		||||
 | 
			
		||||
4. **Conflict spike**  
 | 
			
		||||
   - Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).  
 | 
			
		||||
   - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
 | 
			
		||||
 | 
			
		||||
5. **Authority offline**  
 | 
			
		||||
   - Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
 | 
			
		||||
 | 
			
		||||
## 4. Rollout & Verification Procedure
 | 
			
		||||
 | 
			
		||||
1. **Pre-checks**
 | 
			
		||||
   - Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
 | 
			
		||||
   - Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host).
 | 
			
		||||
 | 
			
		||||
2. **Smoke test with valid token**
 | 
			
		||||
   - Obtain a token via CLI: `stella auth login --scope concelier.jobs.trigger`.
 | 
			
		||||
   - Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
 | 
			
		||||
   - Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger`.
 | 
			
		||||
 | 
			
		||||
3. **Negative test without token**
 | 
			
		||||
   - Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
 | 
			
		||||
   - If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
 | 
			
		||||
 | 
			
		||||
4. **Bypass check (if applicable)**
 | 
			
		||||
   - From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
 | 
			
		||||
 | 
			
		||||
5. **Metrics validation**
 | 
			
		||||
   - Ensure `web.jobs.triggered` counter increments during accepted runs.
 | 
			
		||||
   - Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
 | 
			
		||||
 | 
			
		||||
## 5. Troubleshooting
 | 
			
		||||
 | 
			
		||||
| Symptom | Probable cause | Remediation |
 | 
			
		||||
|---------|----------------|-------------|
 | 
			
		||||
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
 | 
			
		||||
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
 | 
			
		||||
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
 | 
			
		||||
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
 | 
			
		||||
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
 | 
			
		||||
 | 
			
		||||
## 6. References
 | 
			
		||||
 | 
			
		||||
- `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start.
 | 
			
		||||
- `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines.
 | 
			
		||||
- `docs/ops/authority-monitoring.md` – Authority-side monitoring and alerting playbook.
 | 
			
		||||
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields.
 | 
			
		||||
@@ -1,72 +1,72 @@
 | 
			
		||||
# Feedser CCCS Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook covers day‑to‑day operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
 | 
			
		||||
 | 
			
		||||
## 1. Configuration Checklist
 | 
			
		||||
 | 
			
		||||
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
 | 
			
		||||
- Set the Feedser options before restarting workers. Example `feedser.yaml` snippet:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    cccs:
 | 
			
		||||
      feeds:
 | 
			
		||||
        - language: "en"
 | 
			
		||||
          uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
 | 
			
		||||
        - language: "fr"
 | 
			
		||||
          uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
 | 
			
		||||
      maxEntriesPerFetch: 80        # increase temporarily for backfill runs
 | 
			
		||||
      maxKnownEntries: 512
 | 
			
		||||
      requestTimeout: "00:00:30"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> ℹ️  The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5 100 rows each as of 2025‑10‑14). The connector honours `maxEntriesPerFetch`, so leave it low for steady‑state and raise it for planned backfills.
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry & Logging
 | 
			
		||||
 | 
			
		||||
- **Metrics (Meter `StellaOps.Feedser.Source.Cccs`):**
 | 
			
		||||
  - `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
 | 
			
		||||
  - `cccs.fetch.documents`, `cccs.fetch.unchanged`
 | 
			
		||||
  - `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
 | 
			
		||||
  - `cccs.map.success`, `cccs.map.failures`
 | 
			
		||||
- **Shared HTTP metrics** via `SourceDiagnostics`:
 | 
			
		||||
  - `feedser.source.http.requests{feedser.source="cccs"}`
 | 
			
		||||
  - `feedser.source.http.failures{feedser.source="cccs"}`
 | 
			
		||||
  - `feedser.source.http.duration{feedser.source="cccs"}`
 | 
			
		||||
- **Structured logs**
 | 
			
		||||
  - `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
 | 
			
		||||
  - `CCCS parse completed parsed=… failures=…`
 | 
			
		||||
  - `CCCS map completed mapped=… failures=…`
 | 
			
		||||
  - Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
 | 
			
		||||
 | 
			
		||||
Suggested Grafana alerts:
 | 
			
		||||
- `increase(cccs.fetch.failures_total[15m]) > 0`
 | 
			
		||||
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
 | 
			
		||||
- `histogram_quantile(0.95, rate(feedser_source_http_duration_bucket{feedser_source="cccs"}[1h])) > 5s`
 | 
			
		||||
 | 
			
		||||
## 3. Historical Backfill Plan
 | 
			
		||||
 | 
			
		||||
1. **Snapshot the source** – the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 2018‑06‑08 for EN, 2018‑06‑08 for FR). Mirror those responses into Offline Kit storage when operating air‑gapped.
 | 
			
		||||
2. **Stage ingestion**:
 | 
			
		||||
   - Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Feedser workers.
 | 
			
		||||
   - Run chained jobs until `pendingDocuments` drains:  
 | 
			
		||||
     `stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
 | 
			
		||||
   - Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
 | 
			
		||||
3. **Optional pagination sweep** – for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
 | 
			
		||||
4. **Language split** – keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
 | 
			
		||||
5. **Throttle planning** – schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250 ms request delay or raise it if mirrored traffic is not available.
 | 
			
		||||
 | 
			
		||||
## 4. Selector & Sanitiser Notes
 | 
			
		||||
 | 
			
		||||
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
 | 
			
		||||
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
 | 
			
		||||
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
 | 
			
		||||
 | 
			
		||||
## 5. Fixture Maintenance
 | 
			
		||||
 | 
			
		||||
- Regression fixtures live in `src/StellaOps.Feedser.Source.Cccs.Tests/Fixtures`.
 | 
			
		||||
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Cccs.Tests/StellaOps.Feedser.Source.Cccs.Tests.csproj`.
 | 
			
		||||
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.
 | 
			
		||||
# Concelier CCCS Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook covers day‑to‑day operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
 | 
			
		||||
 | 
			
		||||
## 1. Configuration Checklist
 | 
			
		||||
 | 
			
		||||
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
 | 
			
		||||
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    cccs:
 | 
			
		||||
      feeds:
 | 
			
		||||
        - language: "en"
 | 
			
		||||
          uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
 | 
			
		||||
        - language: "fr"
 | 
			
		||||
          uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
 | 
			
		||||
      maxEntriesPerFetch: 80        # increase temporarily for backfill runs
 | 
			
		||||
      maxKnownEntries: 512
 | 
			
		||||
      requestTimeout: "00:00:30"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> ℹ️  The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5 100 rows each as of 2025‑10‑14). The connector honours `maxEntriesPerFetch`, so leave it low for steady‑state and raise it for planned backfills.
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry & Logging
 | 
			
		||||
 | 
			
		||||
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
 | 
			
		||||
  - `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
 | 
			
		||||
  - `cccs.fetch.documents`, `cccs.fetch.unchanged`
 | 
			
		||||
  - `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
 | 
			
		||||
  - `cccs.map.success`, `cccs.map.failures`
 | 
			
		||||
- **Shared HTTP metrics** via `SourceDiagnostics`:
 | 
			
		||||
  - `concelier.source.http.requests{concelier.source="cccs"}`
 | 
			
		||||
  - `concelier.source.http.failures{concelier.source="cccs"}`
 | 
			
		||||
  - `concelier.source.http.duration{concelier.source="cccs"}`
 | 
			
		||||
- **Structured logs**
 | 
			
		||||
  - `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
 | 
			
		||||
  - `CCCS parse completed parsed=… failures=…`
 | 
			
		||||
  - `CCCS map completed mapped=… failures=…`
 | 
			
		||||
  - Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
 | 
			
		||||
 | 
			
		||||
Suggested Grafana alerts:
 | 
			
		||||
- `increase(cccs.fetch.failures_total[15m]) > 0`
 | 
			
		||||
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
 | 
			
		||||
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
 | 
			
		||||
 | 
			
		||||
## 3. Historical Backfill Plan
 | 
			
		||||
 | 
			
		||||
1. **Snapshot the source** – the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 2018‑06‑08 for EN, 2018‑06‑08 for FR). Mirror those responses into Offline Kit storage when operating air‑gapped.
 | 
			
		||||
2. **Stage ingestion**:
 | 
			
		||||
   - Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
 | 
			
		||||
   - Run chained jobs until `pendingDocuments` drains:  
 | 
			
		||||
     `stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
 | 
			
		||||
   - Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
 | 
			
		||||
3. **Optional pagination sweep** – for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
 | 
			
		||||
4. **Language split** – keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
 | 
			
		||||
5. **Throttle planning** – schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250 ms request delay or raise it if mirrored traffic is not available.
 | 
			
		||||
 | 
			
		||||
## 4. Selector & Sanitiser Notes
 | 
			
		||||
 | 
			
		||||
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
 | 
			
		||||
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
 | 
			
		||||
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
 | 
			
		||||
 | 
			
		||||
## 5. Fixture Maintenance
 | 
			
		||||
 | 
			
		||||
- Regression fixtures live in `src/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
 | 
			
		||||
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
 | 
			
		||||
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.
 | 
			
		||||
@@ -1,146 +1,146 @@
 | 
			
		||||
# Feedser CERT-Bund Connector Operations
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-17_
 | 
			
		||||
 | 
			
		||||
Germany’s Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Feedser CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portal’s JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 1. Configuration Checklist
 | 
			
		||||
 | 
			
		||||
- Allow outbound access (or stage mirrors) for:
 | 
			
		||||
  - `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
 | 
			
		||||
  - `https://wid.cert-bund.de/portal/` (session/bootstrap)
 | 
			
		||||
  - `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
 | 
			
		||||
- Ensure the HTTP client reuses a cookie container (the connector’s dependency injection wiring already sets this up).
 | 
			
		||||
 | 
			
		||||
Example `feedser.yaml` fragment:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    cert-bund:
 | 
			
		||||
      feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
 | 
			
		||||
      portalBootstrapUri: "https://wid.cert-bund.de/portal/"
 | 
			
		||||
      detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
 | 
			
		||||
      maxAdvisoriesPerFetch: 50
 | 
			
		||||
      maxKnownAdvisories: 512
 | 
			
		||||
      requestTimeout: "00:00:30"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry & Logging
 | 
			
		||||
 | 
			
		||||
- **Meter**: `StellaOps.Feedser.Source.CertBund`
 | 
			
		||||
- **Counters / histograms**:
 | 
			
		||||
  - `certbund.feed.fetch.attempts|success|failures`
 | 
			
		||||
  - `certbund.feed.items.count`
 | 
			
		||||
  - `certbund.feed.enqueued.count`
 | 
			
		||||
  - `certbund.feed.coverage.days`
 | 
			
		||||
  - `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
 | 
			
		||||
  - `certbund.parse.success|failures{reason}`
 | 
			
		||||
  - `certbund.parse.products.count`, `certbund.parse.cve.count`
 | 
			
		||||
  - `certbund.map.success|failures{reason}`
 | 
			
		||||
  - `certbund.map.affected.count`, `certbund.map.aliases.count`
 | 
			
		||||
- Shared HTTP metrics remain available through `feedser.source.http.*`.
 | 
			
		||||
 | 
			
		||||
**Structured logs** (all emitted at information level when work occurs):
 | 
			
		||||
 | 
			
		||||
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
 | 
			
		||||
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
 | 
			
		||||
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
 | 
			
		||||
 | 
			
		||||
Alerting ideas:
 | 
			
		||||
 | 
			
		||||
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
 | 
			
		||||
2. `rate(certbund.map.success_total[30m]) == 0`
 | 
			
		||||
3. `histogram_quantile(0.95, rate(feedser_source_http_duration_bucket{feedser_source="cert-bund"}[15m])) > 5s`
 | 
			
		||||
 | 
			
		||||
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 3. Historical Backfill & Export Strategy
 | 
			
		||||
 | 
			
		||||
### 3.1 Retention snapshot
 | 
			
		||||
 | 
			
		||||
- RSS window: ~250 advisories (≈90 days at current cadence).
 | 
			
		||||
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
 | 
			
		||||
 | 
			
		||||
### 3.2 JSON search pagination
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
 | 
			
		||||
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
 | 
			
		||||
curl -s -b cookies.txt -c cookies.txt \
 | 
			
		||||
     -H "X-Requested-With: XMLHttpRequest" \
 | 
			
		||||
     "https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
 | 
			
		||||
 | 
			
		||||
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
 | 
			
		||||
 | 
			
		||||
# 2. Page search results
 | 
			
		||||
curl -s -b cookies.txt \
 | 
			
		||||
     -H "Content-Type: application/json" \
 | 
			
		||||
     -H "Accept: application/json" \
 | 
			
		||||
     -H "X-XSRF-TOKEN: ${XSRF}" \
 | 
			
		||||
     -X POST \
 | 
			
		||||
     --data '{"page":4,"size":100,"sort":["published,desc"]}' \
 | 
			
		||||
     "https://wid.cert-bund.de/portal/api/securityadvisory/search" \
 | 
			
		||||
     > certbund-page4.json
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Iterate `page` until the response `content` array is empty. Pages 0–9 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
 | 
			
		||||
 | 
			
		||||
> **Shortcut** – run `python tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
 | 
			
		||||
> to bootstrap the session, capture the paginated search responses, and regenerate
 | 
			
		||||
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
 | 
			
		||||
> if the portal requires a browser-derived session (see options via `--help`).
 | 
			
		||||
 | 
			
		||||
### 3.3 Export bundles
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
python tools/certbund_offline_snapshot.py \
 | 
			
		||||
  --output seed-data/cert-bund \
 | 
			
		||||
  --start-year 2014 \
 | 
			
		||||
  --end-year "$(date -u +%Y)"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The helper stores yearly exports under `seed-data/cert-bund/export/`,
 | 
			
		||||
captures paginated search snapshots in `seed-data/cert-bund/search/`,
 | 
			
		||||
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
 | 
			
		||||
Split ranges according to your compliance window (default: one file per
 | 
			
		||||
calendar year). Feedser can ingest these JSON payloads directly when
 | 
			
		||||
operating offline.
 | 
			
		||||
 | 
			
		||||
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
 | 
			
		||||
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
 | 
			
		||||
> rebuild the manifest from the existing files.
 | 
			
		||||
 | 
			
		||||
### 3.4 Connector-driven catch-up
 | 
			
		||||
 | 
			
		||||
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
 | 
			
		||||
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
 | 
			
		||||
3. Restore defaults and capture the cursor snapshot for audit.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 4. Locale & Translation Guidance
 | 
			
		||||
 | 
			
		||||
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
 | 
			
		||||
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
 | 
			
		||||
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 5. Verification Checklist
 | 
			
		||||
 | 
			
		||||
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
 | 
			
		||||
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
 | 
			
		||||
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
 | 
			
		||||
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
 | 
			
		||||
5. For Offline Kit exports, validate SHA256 hashes before distribution.
 | 
			
		||||
# Concelier CERT-Bund Connector Operations
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-17_
 | 
			
		||||
 | 
			
		||||
Germany’s Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portal’s JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 1. Configuration Checklist
 | 
			
		||||
 | 
			
		||||
- Allow outbound access (or stage mirrors) for:
 | 
			
		||||
  - `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
 | 
			
		||||
  - `https://wid.cert-bund.de/portal/` (session/bootstrap)
 | 
			
		||||
  - `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
 | 
			
		||||
- Ensure the HTTP client reuses a cookie container (the connector’s dependency injection wiring already sets this up).
 | 
			
		||||
 | 
			
		||||
Example `concelier.yaml` fragment:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    cert-bund:
 | 
			
		||||
      feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
 | 
			
		||||
      portalBootstrapUri: "https://wid.cert-bund.de/portal/"
 | 
			
		||||
      detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
 | 
			
		||||
      maxAdvisoriesPerFetch: 50
 | 
			
		||||
      maxKnownAdvisories: 512
 | 
			
		||||
      requestTimeout: "00:00:30"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry & Logging
 | 
			
		||||
 | 
			
		||||
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
 | 
			
		||||
- **Counters / histograms**:
 | 
			
		||||
  - `certbund.feed.fetch.attempts|success|failures`
 | 
			
		||||
  - `certbund.feed.items.count`
 | 
			
		||||
  - `certbund.feed.enqueued.count`
 | 
			
		||||
  - `certbund.feed.coverage.days`
 | 
			
		||||
  - `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
 | 
			
		||||
  - `certbund.parse.success|failures{reason}`
 | 
			
		||||
  - `certbund.parse.products.count`, `certbund.parse.cve.count`
 | 
			
		||||
  - `certbund.map.success|failures{reason}`
 | 
			
		||||
  - `certbund.map.affected.count`, `certbund.map.aliases.count`
 | 
			
		||||
- Shared HTTP metrics remain available through `concelier.source.http.*`.
 | 
			
		||||
 | 
			
		||||
**Structured logs** (all emitted at information level when work occurs):
 | 
			
		||||
 | 
			
		||||
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
 | 
			
		||||
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
 | 
			
		||||
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
 | 
			
		||||
 | 
			
		||||
Alerting ideas:
 | 
			
		||||
 | 
			
		||||
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
 | 
			
		||||
2. `rate(certbund.map.success_total[30m]) == 0`
 | 
			
		||||
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
 | 
			
		||||
 | 
			
		||||
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 3. Historical Backfill & Export Strategy
 | 
			
		||||
 | 
			
		||||
### 3.1 Retention snapshot
 | 
			
		||||
 | 
			
		||||
- RSS window: ~250 advisories (≈90 days at current cadence).
 | 
			
		||||
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
 | 
			
		||||
 | 
			
		||||
### 3.2 JSON search pagination
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
 | 
			
		||||
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
 | 
			
		||||
curl -s -b cookies.txt -c cookies.txt \
 | 
			
		||||
     -H "X-Requested-With: XMLHttpRequest" \
 | 
			
		||||
     "https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
 | 
			
		||||
 | 
			
		||||
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
 | 
			
		||||
 | 
			
		||||
# 2. Page search results
 | 
			
		||||
curl -s -b cookies.txt \
 | 
			
		||||
     -H "Content-Type: application/json" \
 | 
			
		||||
     -H "Accept: application/json" \
 | 
			
		||||
     -H "X-XSRF-TOKEN: ${XSRF}" \
 | 
			
		||||
     -X POST \
 | 
			
		||||
     --data '{"page":4,"size":100,"sort":["published,desc"]}' \
 | 
			
		||||
     "https://wid.cert-bund.de/portal/api/securityadvisory/search" \
 | 
			
		||||
     > certbund-page4.json
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Iterate `page` until the response `content` array is empty. Pages 0–9 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
 | 
			
		||||
 | 
			
		||||
> **Shortcut** – run `python tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
 | 
			
		||||
> to bootstrap the session, capture the paginated search responses, and regenerate
 | 
			
		||||
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
 | 
			
		||||
> if the portal requires a browser-derived session (see options via `--help`).
 | 
			
		||||
 | 
			
		||||
### 3.3 Export bundles
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
python tools/certbund_offline_snapshot.py \
 | 
			
		||||
  --output seed-data/cert-bund \
 | 
			
		||||
  --start-year 2014 \
 | 
			
		||||
  --end-year "$(date -u +%Y)"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The helper stores yearly exports under `seed-data/cert-bund/export/`,
 | 
			
		||||
captures paginated search snapshots in `seed-data/cert-bund/search/`,
 | 
			
		||||
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
 | 
			
		||||
Split ranges according to your compliance window (default: one file per
 | 
			
		||||
calendar year). Concelier can ingest these JSON payloads directly when
 | 
			
		||||
operating offline.
 | 
			
		||||
 | 
			
		||||
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
 | 
			
		||||
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
 | 
			
		||||
> rebuild the manifest from the existing files.
 | 
			
		||||
 | 
			
		||||
### 3.4 Connector-driven catch-up
 | 
			
		||||
 | 
			
		||||
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
 | 
			
		||||
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
 | 
			
		||||
3. Restore defaults and capture the cursor snapshot for audit.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 4. Locale & Translation Guidance
 | 
			
		||||
 | 
			
		||||
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
 | 
			
		||||
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
 | 
			
		||||
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 5. Verification Checklist
 | 
			
		||||
 | 
			
		||||
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
 | 
			
		||||
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
 | 
			
		||||
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
 | 
			
		||||
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
 | 
			
		||||
5. For Offline Kit exports, validate SHA256 hashes before distribution.
 | 
			
		||||
@@ -1,94 +1,94 @@
 | 
			
		||||
# Feedser Cisco PSIRT Connector – OAuth Provisioning SOP
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-14_
 | 
			
		||||
 | 
			
		||||
## 1. Scope
 | 
			
		||||
 | 
			
		||||
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Feedser Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
 | 
			
		||||
 | 
			
		||||
## 2. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Active Cisco.com (CCO) account with access to the Cisco API Console.
 | 
			
		||||
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
 | 
			
		||||
- Feedser configuration location (typically `/etc/stella/feedser.yaml` in production) or Offline Kit secret bundle staging directory.
 | 
			
		||||
 | 
			
		||||
## 3. Provisioning workflow
 | 
			
		||||
 | 
			
		||||
1. **Register the application**
 | 
			
		||||
   - Sign in at <https://apiconsole.cisco.com>.
 | 
			
		||||
   - Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
 | 
			
		||||
   - Record the generated `clientId` and `clientSecret` in the Ops vault.
 | 
			
		||||
2. **Verify token issuance**
 | 
			
		||||
   - Request an access token with:
 | 
			
		||||
     ```bash
 | 
			
		||||
     curl -s https://id.cisco.com/oauth2/default/v1/token \
 | 
			
		||||
       -H "Content-Type: application/x-www-form-urlencoded" \
 | 
			
		||||
       -d "grant_type=client_credentials" \
 | 
			
		||||
       -d "client_id=${CLIENT_ID}" \
 | 
			
		||||
       -d "client_secret=${CLIENT_SECRET}"
 | 
			
		||||
     ```
 | 
			
		||||
   - Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
 | 
			
		||||
   - Preserve the response only long enough to validate syntax; do **not** persist tokens.
 | 
			
		||||
3. **Authorize Feedser runtime**
 | 
			
		||||
   - Update `feedser:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
 | 
			
		||||
   - For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platform’s sealed secret format.
 | 
			
		||||
4. **Connectivity validation**
 | 
			
		||||
   - From the Feedser control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
 | 
			
		||||
   - Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
 | 
			
		||||
 | 
			
		||||
## 4. Rotation SOP
 | 
			
		||||
 | 
			
		||||
| Step | Owner | Notes |
 | 
			
		||||
| --- | --- | --- |
 | 
			
		||||
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
 | 
			
		||||
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
 | 
			
		||||
| 3. Stage dual credentials | Ops + Feedser On-Call | Publish new credentials to secret store alongside current pair. |
 | 
			
		||||
| 4. Cut over | Feedser On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
 | 
			
		||||
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
 | 
			
		||||
 | 
			
		||||
**Automation hooks**
 | 
			
		||||
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Feedser Cisco when opening a rotation task.
 | 
			
		||||
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
 | 
			
		||||
 | 
			
		||||
## 5. Offline Kit packaging
 | 
			
		||||
 | 
			
		||||
1. Generate the credential bundle using the Offline Kit CLI:  
 | 
			
		||||
   `offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
 | 
			
		||||
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
 | 
			
		||||
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
 | 
			
		||||
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
 | 
			
		||||
 | 
			
		||||
## 6. Quota and throttling guidance
 | 
			
		||||
 | 
			
		||||
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5 000 requests/day per application.citeturn0search0turn3search6
 | 
			
		||||
- Feedser fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
 | 
			
		||||
- Telemetry to watch: `feedser.source.http.requests{feedser.source="vndr-cisco"}`, `feedser.source.http.failures{...}`, and connector-specific metrics once implemented.
 | 
			
		||||
 | 
			
		||||
## 7. Telemetry & Monitoring
 | 
			
		||||
 | 
			
		||||
- **Metrics (Meter `StellaOps.Feedser.Source.Vndr.Cisco`)**
 | 
			
		||||
  - `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
 | 
			
		||||
  - `cisco.parse.success`, `cisco.parse.failures`
 | 
			
		||||
  - `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
 | 
			
		||||
- **Shared HTTP metrics** via `SourceDiagnostics`:
 | 
			
		||||
  - `feedser.source.http.requests{feedser.source="vndr-cisco"}`
 | 
			
		||||
  - `feedser.source.http.failures{feedser.source="vndr-cisco"}`
 | 
			
		||||
  - `feedser.source.http.duration{feedser.source="vndr-cisco"}`
 | 
			
		||||
- **Structured logs**
 | 
			
		||||
  - `Cisco fetch completed date=… pages=… added=…` (info)
 | 
			
		||||
  - `Cisco parse completed parsed=… failures=…` (info)
 | 
			
		||||
  - `Cisco map completed mapped=… failures=…` (info)
 | 
			
		||||
  - Warnings surface when DTO serialization fails or GridFS payload is missing.
 | 
			
		||||
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
 | 
			
		||||
 | 
			
		||||
## 8. Incident response
 | 
			
		||||
 | 
			
		||||
- **Token compromise** – revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
 | 
			
		||||
- **Persistent 401/403** – confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
 | 
			
		||||
- **429 spikes** – inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
 | 
			
		||||
 | 
			
		||||
## 9. References
 | 
			
		||||
 | 
			
		||||
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
 | 
			
		||||
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
 | 
			
		||||
- openVuln API rate limit documentation.citeturn0search0turn3search6
 | 
			
		||||
# Concelier Cisco PSIRT Connector – OAuth Provisioning SOP
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-14_
 | 
			
		||||
 | 
			
		||||
## 1. Scope
 | 
			
		||||
 | 
			
		||||
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
 | 
			
		||||
 | 
			
		||||
## 2. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Active Cisco.com (CCO) account with access to the Cisco API Console.
 | 
			
		||||
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
 | 
			
		||||
- Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory.
 | 
			
		||||
 | 
			
		||||
## 3. Provisioning workflow
 | 
			
		||||
 | 
			
		||||
1. **Register the application**
 | 
			
		||||
   - Sign in at <https://apiconsole.cisco.com>.
 | 
			
		||||
   - Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
 | 
			
		||||
   - Record the generated `clientId` and `clientSecret` in the Ops vault.
 | 
			
		||||
2. **Verify token issuance**
 | 
			
		||||
   - Request an access token with:
 | 
			
		||||
     ```bash
 | 
			
		||||
     curl -s https://id.cisco.com/oauth2/default/v1/token \
 | 
			
		||||
       -H "Content-Type: application/x-www-form-urlencoded" \
 | 
			
		||||
       -d "grant_type=client_credentials" \
 | 
			
		||||
       -d "client_id=${CLIENT_ID}" \
 | 
			
		||||
       -d "client_secret=${CLIENT_SECRET}"
 | 
			
		||||
     ```
 | 
			
		||||
   - Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
 | 
			
		||||
   - Preserve the response only long enough to validate syntax; do **not** persist tokens.
 | 
			
		||||
3. **Authorize Concelier runtime**
 | 
			
		||||
   - Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
 | 
			
		||||
   - For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platform’s sealed secret format.
 | 
			
		||||
4. **Connectivity validation**
 | 
			
		||||
   - From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
 | 
			
		||||
   - Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
 | 
			
		||||
 | 
			
		||||
## 4. Rotation SOP
 | 
			
		||||
 | 
			
		||||
| Step | Owner | Notes |
 | 
			
		||||
| --- | --- | --- |
 | 
			
		||||
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
 | 
			
		||||
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
 | 
			
		||||
| 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. |
 | 
			
		||||
| 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
 | 
			
		||||
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
 | 
			
		||||
 | 
			
		||||
**Automation hooks**
 | 
			
		||||
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task.
 | 
			
		||||
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
 | 
			
		||||
 | 
			
		||||
## 5. Offline Kit packaging
 | 
			
		||||
 | 
			
		||||
1. Generate the credential bundle using the Offline Kit CLI:  
 | 
			
		||||
   `offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
 | 
			
		||||
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
 | 
			
		||||
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
 | 
			
		||||
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
 | 
			
		||||
 | 
			
		||||
## 6. Quota and throttling guidance
 | 
			
		||||
 | 
			
		||||
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5 000 requests/day per application.citeturn0search0turn3search6
 | 
			
		||||
- Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
 | 
			
		||||
- Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented.
 | 
			
		||||
 | 
			
		||||
## 7. Telemetry & Monitoring
 | 
			
		||||
 | 
			
		||||
- **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)**
 | 
			
		||||
  - `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
 | 
			
		||||
  - `cisco.parse.success`, `cisco.parse.failures`
 | 
			
		||||
  - `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
 | 
			
		||||
- **Shared HTTP metrics** via `SourceDiagnostics`:
 | 
			
		||||
  - `concelier.source.http.requests{concelier.source="vndr-cisco"}`
 | 
			
		||||
  - `concelier.source.http.failures{concelier.source="vndr-cisco"}`
 | 
			
		||||
  - `concelier.source.http.duration{concelier.source="vndr-cisco"}`
 | 
			
		||||
- **Structured logs**
 | 
			
		||||
  - `Cisco fetch completed date=… pages=… added=…` (info)
 | 
			
		||||
  - `Cisco parse completed parsed=… failures=…` (info)
 | 
			
		||||
  - `Cisco map completed mapped=… failures=…` (info)
 | 
			
		||||
  - Warnings surface when DTO serialization fails or GridFS payload is missing.
 | 
			
		||||
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
 | 
			
		||||
 | 
			
		||||
## 8. Incident response
 | 
			
		||||
 | 
			
		||||
- **Token compromise** – revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
 | 
			
		||||
- **Persistent 401/403** – confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
 | 
			
		||||
- **429 spikes** – inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
 | 
			
		||||
 | 
			
		||||
## 9. References
 | 
			
		||||
 | 
			
		||||
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
 | 
			
		||||
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
 | 
			
		||||
- openVuln API rate limit documentation.citeturn0search0turn3search6
 | 
			
		||||
@@ -1,160 +1,160 @@
 | 
			
		||||
# Feedser Conflict Resolution Runbook (Sprint 3)
 | 
			
		||||
 | 
			
		||||
This runbook equips Feedser operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 1. Precedence Model (recap)
 | 
			
		||||
 | 
			
		||||
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `feedser:merge:precedence:ranks` to override per source when incident response requires it.
 | 
			
		||||
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
 | 
			
		||||
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
 | 
			
		||||
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry Shipped This Sprint
 | 
			
		||||
 | 
			
		||||
| Instrument | Type | Key Tags | Purpose |
 | 
			
		||||
|------------|------|----------|---------|
 | 
			
		||||
| `feedser.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
 | 
			
		||||
| `feedser.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
 | 
			
		||||
| `feedser.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
 | 
			
		||||
| `feedser.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
 | 
			
		||||
| `feedser.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
 | 
			
		||||
 | 
			
		||||
### Structured logs
 | 
			
		||||
 | 
			
		||||
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
 | 
			
		||||
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
 | 
			
		||||
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
 | 
			
		||||
- `Alias collision ...` (no EventId) - emitted when `feedser.merge.identity_conflicts` increments.
 | 
			
		||||
 | 
			
		||||
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Feedser.Merge`.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 3. Detection & Alerting
 | 
			
		||||
 | 
			
		||||
1. **Dashboard panels**
 | 
			
		||||
   - `feedser.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
 | 
			
		||||
   - `feedser.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
 | 
			
		||||
   - `feedser.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
 | 
			
		||||
   - `feedser.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
 | 
			
		||||
2. **Log based alerts**
 | 
			
		||||
   - `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
 | 
			
		||||
   - `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
 | 
			
		||||
3. **Job health**
 | 
			
		||||
   - `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #feedser-ops.
 | 
			
		||||
 | 
			
		||||
### Threshold updates (2025-10-12)
 | 
			
		||||
 | 
			
		||||
- `feedser.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
 | 
			
		||||
- `feedser.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
 | 
			
		||||
- `feedser.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 4. Triage Workflow
 | 
			
		||||
 | 
			
		||||
1. **Confirm job context**
 | 
			
		||||
   - `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
 | 
			
		||||
2. **Inspect metrics**
 | 
			
		||||
   - Correlate spikes in `feedser.merge.conflicts` with `primary_source`/`suppressed_source` tags from `feedser.merge.overrides`.
 | 
			
		||||
3. **Pull structured logs**
 | 
			
		||||
   - Example (vector output):
 | 
			
		||||
     ```
 | 
			
		||||
     jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-feedser.log
 | 
			
		||||
     ```
 | 
			
		||||
4. **Review merge events**
 | 
			
		||||
   - `mongosh`:
 | 
			
		||||
     ```javascript
 | 
			
		||||
     use feedser;
 | 
			
		||||
     db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
 | 
			
		||||
     ```
 | 
			
		||||
   - Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
 | 
			
		||||
5. **Interrogate provenance**
 | 
			
		||||
   - `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
 | 
			
		||||
   - Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 5. Conflict Classification Matrix
 | 
			
		||||
 | 
			
		||||
| Signal | Likely Cause | Immediate Action |
 | 
			
		||||
|--------|--------------|------------------|
 | 
			
		||||
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
 | 
			
		||||
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
 | 
			
		||||
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `feedser:merge:precedence:ranks` to break the tie; restart merge job. |
 | 
			
		||||
| Rising `feedser.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
 | 
			
		||||
| `feedser.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 6. Resolution Playbook
 | 
			
		||||
 | 
			
		||||
1. **Connector data fix**
 | 
			
		||||
   - Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
 | 
			
		||||
   - Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
 | 
			
		||||
2. **Temporary precedence override**
 | 
			
		||||
   - Edit `etc/feedser.yaml`:
 | 
			
		||||
     ```yaml
 | 
			
		||||
     feedser:
 | 
			
		||||
       merge:
 | 
			
		||||
         precedence:
 | 
			
		||||
           ranks:
 | 
			
		||||
             osv: 1
 | 
			
		||||
             ghsa: 0
 | 
			
		||||
     ```
 | 
			
		||||
   - Restart Feedser workers; confirm tags in `feedser.merge.overrides` show the new ranks.
 | 
			
		||||
   - Document the override with expiry in the change log.
 | 
			
		||||
3. **Alias remediation**
 | 
			
		||||
   - Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
 | 
			
		||||
   - Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
 | 
			
		||||
4. **Escalation**
 | 
			
		||||
   - If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 7. Validation Checklist
 | 
			
		||||
 | 
			
		||||
- [ ] Merge job rerun returns exit code `0`.
 | 
			
		||||
- [ ] `feedser.merge.conflicts` baseline returns to zero after corrective action.
 | 
			
		||||
- [ ] Latest `merge_event` entry shows expected hash delta.
 | 
			
		||||
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
 | 
			
		||||
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 8. Reference Material
 | 
			
		||||
 | 
			
		||||
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
 | 
			
		||||
- Merge engine internals: `src/StellaOps.Feedser.Merge/Services/AdvisoryPrecedenceMerger.cs`.
 | 
			
		||||
- Metrics definitions: `src/StellaOps.Feedser.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
 | 
			
		||||
- Storage audit trail: `src/StellaOps.Feedser.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Feedser.Storage.Mongo/MergeEvents`.
 | 
			
		||||
 | 
			
		||||
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 9. Synthetic Regression Fixtures
 | 
			
		||||
 | 
			
		||||
- **Locations** – Canonical conflict snapshots now live at `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Feedser.Source.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
 | 
			
		||||
- **Validation commands** – To regenerate and verify the fixtures offline, run:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Feedser.Source.Nvd.Tests/StellaOps.Feedser.Source.Nvd.Tests.csproj --filter NvdConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj --filter OsvConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Feedser.Merge.Tests/StellaOps.Feedser.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
- **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `feedser.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 10. Change Log
 | 
			
		||||
 | 
			
		||||
| Date (UTC) | Change | Notes |
 | 
			
		||||
|------------|--------|-------|
 | 
			
		||||
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Feedser Ops Guild; no additional overrides required. |
 | 
			
		||||
# Concelier Conflict Resolution Runbook (Sprint 3)
 | 
			
		||||
 | 
			
		||||
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 1. Precedence Model (recap)
 | 
			
		||||
 | 
			
		||||
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it.
 | 
			
		||||
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
 | 
			
		||||
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
 | 
			
		||||
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 2. Telemetry Shipped This Sprint
 | 
			
		||||
 | 
			
		||||
| Instrument | Type | Key Tags | Purpose |
 | 
			
		||||
|------------|------|----------|---------|
 | 
			
		||||
| `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
 | 
			
		||||
| `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
 | 
			
		||||
| `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
 | 
			
		||||
| `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
 | 
			
		||||
| `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
 | 
			
		||||
 | 
			
		||||
### Structured logs
 | 
			
		||||
 | 
			
		||||
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
 | 
			
		||||
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
 | 
			
		||||
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
 | 
			
		||||
- `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments.
 | 
			
		||||
 | 
			
		||||
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 3. Detection & Alerting
 | 
			
		||||
 | 
			
		||||
1. **Dashboard panels**
 | 
			
		||||
   - `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
 | 
			
		||||
   - `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
 | 
			
		||||
   - `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
 | 
			
		||||
   - `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
 | 
			
		||||
2. **Log based alerts**
 | 
			
		||||
   - `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
 | 
			
		||||
   - `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
 | 
			
		||||
3. **Job health**
 | 
			
		||||
   - `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
 | 
			
		||||
 | 
			
		||||
### Threshold updates (2025-10-12)
 | 
			
		||||
 | 
			
		||||
- `concelier.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
 | 
			
		||||
- `concelier.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
 | 
			
		||||
- `concelier.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 4. Triage Workflow
 | 
			
		||||
 | 
			
		||||
1. **Confirm job context**
 | 
			
		||||
   - `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
 | 
			
		||||
2. **Inspect metrics**
 | 
			
		||||
   - Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`.
 | 
			
		||||
3. **Pull structured logs**
 | 
			
		||||
   - Example (vector output):
 | 
			
		||||
     ```
 | 
			
		||||
     jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
 | 
			
		||||
     ```
 | 
			
		||||
4. **Review merge events**
 | 
			
		||||
   - `mongosh`:
 | 
			
		||||
     ```javascript
 | 
			
		||||
     use concelier;
 | 
			
		||||
     db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
 | 
			
		||||
     ```
 | 
			
		||||
   - Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
 | 
			
		||||
5. **Interrogate provenance**
 | 
			
		||||
   - `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
 | 
			
		||||
   - Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 5. Conflict Classification Matrix
 | 
			
		||||
 | 
			
		||||
| Signal | Likely Cause | Immediate Action |
 | 
			
		||||
|--------|--------------|------------------|
 | 
			
		||||
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
 | 
			
		||||
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
 | 
			
		||||
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. |
 | 
			
		||||
| Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
 | 
			
		||||
| `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 6. Resolution Playbook
 | 
			
		||||
 | 
			
		||||
1. **Connector data fix**
 | 
			
		||||
   - Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
 | 
			
		||||
   - Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
 | 
			
		||||
2. **Temporary precedence override**
 | 
			
		||||
   - Edit `etc/concelier.yaml`:
 | 
			
		||||
     ```yaml
 | 
			
		||||
     concelier:
 | 
			
		||||
       merge:
 | 
			
		||||
         precedence:
 | 
			
		||||
           ranks:
 | 
			
		||||
             osv: 1
 | 
			
		||||
             ghsa: 0
 | 
			
		||||
     ```
 | 
			
		||||
   - Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks.
 | 
			
		||||
   - Document the override with expiry in the change log.
 | 
			
		||||
3. **Alias remediation**
 | 
			
		||||
   - Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
 | 
			
		||||
   - Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
 | 
			
		||||
4. **Escalation**
 | 
			
		||||
   - If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 7. Validation Checklist
 | 
			
		||||
 | 
			
		||||
- [ ] Merge job rerun returns exit code `0`.
 | 
			
		||||
- [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action.
 | 
			
		||||
- [ ] Latest `merge_event` entry shows expected hash delta.
 | 
			
		||||
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
 | 
			
		||||
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 8. Reference Material
 | 
			
		||||
 | 
			
		||||
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
 | 
			
		||||
- Merge engine internals: `src/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`.
 | 
			
		||||
- Metrics definitions: `src/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
 | 
			
		||||
- Storage audit trail: `src/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Concelier.Storage.Mongo/MergeEvents`.
 | 
			
		||||
 | 
			
		||||
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 9. Synthetic Regression Fixtures
 | 
			
		||||
 | 
			
		||||
- **Locations** – Canonical conflict snapshots now live at `src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
 | 
			
		||||
- **Validation commands** – To regenerate and verify the fixtures offline, run:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
 | 
			
		||||
dotnet test src/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
- **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## 10. Change Log
 | 
			
		||||
 | 
			
		||||
| Date (UTC) | Change | Notes |
 | 
			
		||||
|------------|--------|-------|
 | 
			
		||||
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |
 | 
			
		||||
@@ -1,151 +1,151 @@
 | 
			
		||||
{
 | 
			
		||||
  "title": "Feedser CVE & KEV Observability",
 | 
			
		||||
  "uid": "feedser-cve-kev",
 | 
			
		||||
  "schemaVersion": 38,
 | 
			
		||||
  "version": 1,
 | 
			
		||||
  "editable": true,
 | 
			
		||||
  "timezone": "",
 | 
			
		||||
  "time": {
 | 
			
		||||
    "from": "now-24h",
 | 
			
		||||
    "to": "now"
 | 
			
		||||
  },
 | 
			
		||||
  "refresh": "5m",
 | 
			
		||||
  "templating": {
 | 
			
		||||
    "list": [
 | 
			
		||||
      {
 | 
			
		||||
        "name": "datasource",
 | 
			
		||||
        "type": "datasource",
 | 
			
		||||
        "query": "prometheus",
 | 
			
		||||
        "refresh": 1,
 | 
			
		||||
        "hide": 0
 | 
			
		||||
      }
 | 
			
		||||
    ]
 | 
			
		||||
  },
 | 
			
		||||
  "panels": [
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "CVE fetch success vs failure",
 | 
			
		||||
      "gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(cve_fetch_success_total[5m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "success"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(cve_fetch_failures_total[5m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "failure"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "KEV fetch cadence",
 | 
			
		||||
      "gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(kev_fetch_success_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "success"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(kev_fetch_failures_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "failure"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "C",
 | 
			
		||||
          "expr": "rate(kev_fetch_unchanged_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "unchanged"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "table",
 | 
			
		||||
      "title": "KEV parse anomalies (24h)",
 | 
			
		||||
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "short"
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
 | 
			
		||||
          "format": "table",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" }
 | 
			
		||||
        }
 | 
			
		||||
      ],
 | 
			
		||||
      "transformations": [
 | 
			
		||||
        {
 | 
			
		||||
          "id": "organize",
 | 
			
		||||
          "options": {
 | 
			
		||||
            "renameByName": {
 | 
			
		||||
              "Value": "count"
 | 
			
		||||
            }
 | 
			
		||||
          }
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "Advisories emitted",
 | 
			
		||||
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(cve_map_success_total[15m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "CVE"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(kev_map_advisories_total[24h])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "KEV"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    }
 | 
			
		||||
  ]
 | 
			
		||||
}
 | 
			
		||||
{
 | 
			
		||||
  "title": "Concelier CVE & KEV Observability",
 | 
			
		||||
  "uid": "concelier-cve-kev",
 | 
			
		||||
  "schemaVersion": 38,
 | 
			
		||||
  "version": 1,
 | 
			
		||||
  "editable": true,
 | 
			
		||||
  "timezone": "",
 | 
			
		||||
  "time": {
 | 
			
		||||
    "from": "now-24h",
 | 
			
		||||
    "to": "now"
 | 
			
		||||
  },
 | 
			
		||||
  "refresh": "5m",
 | 
			
		||||
  "templating": {
 | 
			
		||||
    "list": [
 | 
			
		||||
      {
 | 
			
		||||
        "name": "datasource",
 | 
			
		||||
        "type": "datasource",
 | 
			
		||||
        "query": "prometheus",
 | 
			
		||||
        "refresh": 1,
 | 
			
		||||
        "hide": 0
 | 
			
		||||
      }
 | 
			
		||||
    ]
 | 
			
		||||
  },
 | 
			
		||||
  "panels": [
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "CVE fetch success vs failure",
 | 
			
		||||
      "gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(cve_fetch_success_total[5m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "success"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(cve_fetch_failures_total[5m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "failure"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "KEV fetch cadence",
 | 
			
		||||
      "gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(kev_fetch_success_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "success"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(kev_fetch_failures_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "failure"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "C",
 | 
			
		||||
          "expr": "rate(kev_fetch_unchanged_total[30m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "unchanged"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "table",
 | 
			
		||||
      "title": "KEV parse anomalies (24h)",
 | 
			
		||||
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "short"
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
 | 
			
		||||
          "format": "table",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" }
 | 
			
		||||
        }
 | 
			
		||||
      ],
 | 
			
		||||
      "transformations": [
 | 
			
		||||
        {
 | 
			
		||||
          "id": "organize",
 | 
			
		||||
          "options": {
 | 
			
		||||
            "renameByName": {
 | 
			
		||||
              "Value": "count"
 | 
			
		||||
            }
 | 
			
		||||
          }
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "type": "timeseries",
 | 
			
		||||
      "title": "Advisories emitted",
 | 
			
		||||
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
 | 
			
		||||
      "fieldConfig": {
 | 
			
		||||
        "defaults": {
 | 
			
		||||
          "unit": "ops",
 | 
			
		||||
          "custom": {
 | 
			
		||||
            "drawStyle": "line",
 | 
			
		||||
            "lineWidth": 2,
 | 
			
		||||
            "fillOpacity": 10
 | 
			
		||||
          }
 | 
			
		||||
        },
 | 
			
		||||
        "overrides": []
 | 
			
		||||
      },
 | 
			
		||||
      "targets": [
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "A",
 | 
			
		||||
          "expr": "rate(cve_map_success_total[15m])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "CVE"
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
          "refId": "B",
 | 
			
		||||
          "expr": "rate(kev_map_advisories_total[24h])",
 | 
			
		||||
          "datasource": { "type": "prometheus", "uid": "${datasource}" },
 | 
			
		||||
          "legendFormat": "KEV"
 | 
			
		||||
        }
 | 
			
		||||
      ]
 | 
			
		||||
    }
 | 
			
		||||
  ]
 | 
			
		||||
}
 | 
			
		||||
@@ -1,4 +1,4 @@
 | 
			
		||||
# Feedser CVE & KEV Connector Operations
 | 
			
		||||
# Concelier CVE & KEV Connector Operations
 | 
			
		||||
 | 
			
		||||
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
 | 
			
		||||
 | 
			
		||||
@@ -7,17 +7,17 @@ This playbook equips operators with the steps required to roll out and monitor t
 | 
			
		||||
### 1.1 Prerequisites
 | 
			
		||||
 | 
			
		||||
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
 | 
			
		||||
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Feedser workers.
 | 
			
		||||
- Updated `feedser.yaml` (or the matching environment variables) with the following section:
 | 
			
		||||
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers.
 | 
			
		||||
- Updated `concelier.yaml` (or the matching environment variables) with the following section:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    cve:
 | 
			
		||||
      baseEndpoint: "https://cveawg.mitre.org/api/"
 | 
			
		||||
      apiOrg: "ORG123"
 | 
			
		||||
      apiUser: "user@example.org"
 | 
			
		||||
      apiKeyFile: "/var/run/secrets/feedser/cve-api-key"
 | 
			
		||||
      apiKeyFile: "/var/run/secrets/concelier/cve-api-key"
 | 
			
		||||
      seedDirectory: "./seed-data/cve"
 | 
			
		||||
      pageSize: 200
 | 
			
		||||
      maxPagesPerFetch: 5
 | 
			
		||||
@@ -26,44 +26,44 @@ feedser:
 | 
			
		||||
      failureBackoff: "00:10:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> ℹ️  Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `FEEDSER_SOURCES__CVE__APIKEY`.
 | 
			
		||||
> ℹ️  Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`.
 | 
			
		||||
 | 
			
		||||
> 🪙  When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repo’s `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
 | 
			
		||||
 | 
			
		||||
### 1.2 Smoke Test (staging)
 | 
			
		||||
 | 
			
		||||
1. Deploy the updated configuration and restart the Feedser service so the connector picks up the credentials.
 | 
			
		||||
1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials.
 | 
			
		||||
2. Trigger one end-to-end cycle:
 | 
			
		||||
   - Feedser CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
 | 
			
		||||
   - Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
 | 
			
		||||
   - REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
 | 
			
		||||
3. Observe the following metrics (exported via OTEL meter `StellaOps.Feedser.Source.Cve`):
 | 
			
		||||
3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`):
 | 
			
		||||
   - `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
 | 
			
		||||
   - `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
 | 
			
		||||
   - `cve.map.success`
 | 
			
		||||
4. Verify Prometheus shows matching `feedser.source.http.requests_total{feedser_source="cve"}` deltas (list vs detail phases) while `feedser.source.http.failures_total{feedser_source="cve"}` stays flat.
 | 
			
		||||
4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat.
 | 
			
		||||
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
 | 
			
		||||
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
 | 
			
		||||
 | 
			
		||||
### 1.3 Production Monitoring
 | 
			
		||||
 | 
			
		||||
- **Dashboards** – Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `feedser_source_http_requests_total{feedser_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `feedser.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
 | 
			
		||||
- **Dashboards** – Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
 | 
			
		||||
  - `rate(cve_fetch_failures_total[5m]) > 0` for 10 minutes (`severity=warning`)
 | 
			
		||||
  - `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
 | 
			
		||||
  - `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
 | 
			
		||||
- **Logs** – Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
 | 
			
		||||
- **Grafana pack** – Import `docs/ops/feedser-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
 | 
			
		||||
- **Backfill window** – Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Feedser to apply changes.
 | 
			
		||||
- **Grafana pack** – Import `docs/ops/concelier-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
 | 
			
		||||
- **Backfill window** – Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes.
 | 
			
		||||
 | 
			
		||||
### 1.4 Staging smoke log (2025-10-15)
 | 
			
		||||
 | 
			
		||||
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
 | 
			
		||||
 | 
			
		||||
- Command: `dotnet test src/StellaOps.Feedser.Source.Cve.Tests/StellaOps.Feedser.Source.Cve.Tests.csproj -l "console;verbosity=detailed"`
 | 
			
		||||
- Command: `dotnet test src/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"`
 | 
			
		||||
- Summary log emitted by the connector:
 | 
			
		||||
  ```
 | 
			
		||||
  CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
 | 
			
		||||
  ```
 | 
			
		||||
- Telemetry captured by `Meter` `StellaOps.Feedser.Source.Cve`:
 | 
			
		||||
- Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`:
 | 
			
		||||
  | Metric | Value |
 | 
			
		||||
  |--------|-------|
 | 
			
		||||
  | `cve.fetch.attempts` | 1 |
 | 
			
		||||
@@ -72,7 +72,7 @@ While Ops finalises long-lived CVE Services credentials, we validated the connec
 | 
			
		||||
  | `cve.parse.success` | 1 |
 | 
			
		||||
  | `cve.map.success` | 1 |
 | 
			
		||||
 | 
			
		||||
The Grafana pack `docs/ops/feedser-cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
 | 
			
		||||
The Grafana pack `docs/ops/concelier-cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
 | 
			
		||||
 | 
			
		||||
## 2. CISA KEV Connector (`source:kev:*`)
 | 
			
		||||
 | 
			
		||||
@@ -80,10 +80,10 @@ The Grafana pack `docs/ops/feedser-cve-kev-grafana-dashboard.json` has been impo
 | 
			
		||||
 | 
			
		||||
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
 | 
			
		||||
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
 | 
			
		||||
- Confirm the following snippet in `feedser.yaml` (defaults shown; tune as needed):
 | 
			
		||||
- Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed):
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    kev:
 | 
			
		||||
      feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
 | 
			
		||||
@@ -105,15 +105,15 @@ Treat repeated schema failures or growing anomaly counts as an upstream regressi
 | 
			
		||||
 | 
			
		||||
### 2.3 Smoke Test (staging)
 | 
			
		||||
 | 
			
		||||
1. Deploy the configuration and restart Feedser.
 | 
			
		||||
1. Deploy the configuration and restart Concelier.
 | 
			
		||||
2. Trigger a pipeline run:
 | 
			
		||||
   - CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
 | 
			
		||||
   - REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
 | 
			
		||||
3. Verify the metrics exposed by meter `StellaOps.Feedser.Source.Kev`:
 | 
			
		||||
3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`:
 | 
			
		||||
   - `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
 | 
			
		||||
   - `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
 | 
			
		||||
   - `kev.map.advisories` (tag `catalogVersion`)
 | 
			
		||||
4. Confirm `feedser.source.http.requests_total{feedser_source="kev"}` increments once per fetch and that the paired `feedser.source.http.failures_total` stays flat (zero increase).
 | 
			
		||||
4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase).
 | 
			
		||||
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
 | 
			
		||||
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
 | 
			
		||||
 | 
			
		||||
@@ -126,7 +126,7 @@ Treat repeated schema failures or growing anomaly counts as an upstream regressi
 | 
			
		||||
 | 
			
		||||
### 2.5 Known good dashboard tiles
 | 
			
		||||
 | 
			
		||||
Add the following panels to the Feedser observability board:
 | 
			
		||||
Add the following panels to the Concelier observability board:
 | 
			
		||||
 | 
			
		||||
| Metric | Recommended visualisation |
 | 
			
		||||
|--------|---------------------------|
 | 
			
		||||
@@ -140,4 +140,4 @@ Add the following panels to the Feedser observability board:
 | 
			
		||||
- Record staging/production smoke test results (date, catalog version, advisory counts) in your team’s change log.
 | 
			
		||||
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
 | 
			
		||||
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
 | 
			
		||||
- Version-control dashboard tweaks alongside `docs/ops/feedser-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.
 | 
			
		||||
- Version-control dashboard tweaks alongside `docs/ops/concelier-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.
 | 
			
		||||
@@ -1,123 +1,123 @@
 | 
			
		||||
# Feedser GHSA Connector – Operations Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-16_
 | 
			
		||||
 | 
			
		||||
## 1. Overview
 | 
			
		||||
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
 | 
			
		||||
 | 
			
		||||
## 2. Rate-limit telemetry
 | 
			
		||||
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
 | 
			
		||||
 | 
			
		||||
| Metric | Description | Tags |
 | 
			
		||||
|--------|-------------|------|
 | 
			
		||||
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
 | 
			
		||||
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
 | 
			
		||||
 | 
			
		||||
### Dashboards & alerts
 | 
			
		||||
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
 | 
			
		||||
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10 %** for longer than a single reset window helps avoid secondary limits.
 | 
			
		||||
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
 | 
			
		||||
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
 | 
			
		||||
 | 
			
		||||
## 3. Logging signals
 | 
			
		||||
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
 | 
			
		||||
```
 | 
			
		||||
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
 | 
			
		||||
```
 | 
			
		||||
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
 | 
			
		||||
 | 
			
		||||
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
 | 
			
		||||
 | 
			
		||||
## 4. Configuration knobs (`feedser.yaml`)
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    ghsa:
 | 
			
		||||
      apiToken: "${GITHUB_PAT}"
 | 
			
		||||
      pageSize: 50
 | 
			
		||||
      requestDelay: "00:00:00.200"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
      rateLimitWarningThreshold: 500    # warn below this many remaining calls
 | 
			
		||||
      secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Recommendations
 | 
			
		||||
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
 | 
			
		||||
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
 | 
			
		||||
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHub’s secondary-limit guidance.
 | 
			
		||||
 | 
			
		||||
#### Default job schedule
 | 
			
		||||
 | 
			
		||||
| Job kind | Cron | Timeout | Lease |
 | 
			
		||||
|----------|------|---------|-------|
 | 
			
		||||
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
 | 
			
		||||
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `feedser.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
 | 
			
		||||
 | 
			
		||||
## 5. Provisioning credentials
 | 
			
		||||
 | 
			
		||||
Feedser requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `feedser.sources.ghsa.apiToken`.
 | 
			
		||||
 | 
			
		||||
### Docker Compose (stack operators)
 | 
			
		||||
```yaml
 | 
			
		||||
services:
 | 
			
		||||
  feedser:
 | 
			
		||||
    environment:
 | 
			
		||||
      FEEDSER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
 | 
			
		||||
    secrets:
 | 
			
		||||
      - ghsa_pat
 | 
			
		||||
 | 
			
		||||
secrets:
 | 
			
		||||
  ghsa_pat:
 | 
			
		||||
    file: ./secrets/ghsa_pat.txt  # contains only the PAT value
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Helm values (cluster operators)
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  extraEnv:
 | 
			
		||||
    - name: FEEDSER__SOURCES__GHSA__APITOKEN
 | 
			
		||||
      valueFrom:
 | 
			
		||||
        secretKeyRef:
 | 
			
		||||
          name: feedser-ghsa
 | 
			
		||||
          key: apiToken
 | 
			
		||||
 | 
			
		||||
extraSecrets:
 | 
			
		||||
  feedser-ghsa:
 | 
			
		||||
    apiToken: "<paste PAT here or source from external secret store>"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
After rotating the PAT, restart the Feedser workers (or run `kubectl rollout restart deployment/feedser`) to ensure the configuration reloads.
 | 
			
		||||
 | 
			
		||||
When enabling GHSA the first time, run a staged backfill:
 | 
			
		||||
 | 
			
		||||
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
 | 
			
		||||
2. Watch `feedser.jobs.health` for the GHSA jobs until they report `healthy`.
 | 
			
		||||
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
 | 
			
		||||
 | 
			
		||||
## 6. Runbook steps when throttled
 | 
			
		||||
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
 | 
			
		||||
2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff.
 | 
			
		||||
3. If rate limits stay exhausted:
 | 
			
		||||
   - Verify no other jobs are sharing the PAT.
 | 
			
		||||
   - Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
 | 
			
		||||
   - Consider provisioning a dedicated PAT (GHSA permissions only) for Feedser.
 | 
			
		||||
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
 | 
			
		||||
 | 
			
		||||
## 7. Alert integration quick reference
 | 
			
		||||
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) – use `histogram_quantile(0.99, ...)` to trend capacity.
 | 
			
		||||
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
 | 
			
		||||
- Grafana: stack remaining + used to visualise total limit per resource.
 | 
			
		||||
 | 
			
		||||
## 8. Canonical metric fallback analytics
 | 
			
		||||
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
 | 
			
		||||
 | 
			
		||||
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
 | 
			
		||||
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
 | 
			
		||||
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.
 | 
			
		||||
# Concelier GHSA Connector – Operations Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-16_
 | 
			
		||||
 | 
			
		||||
## 1. Overview
 | 
			
		||||
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
 | 
			
		||||
 | 
			
		||||
## 2. Rate-limit telemetry
 | 
			
		||||
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
 | 
			
		||||
 | 
			
		||||
| Metric | Description | Tags |
 | 
			
		||||
|--------|-------------|------|
 | 
			
		||||
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
 | 
			
		||||
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
 | 
			
		||||
 | 
			
		||||
### Dashboards & alerts
 | 
			
		||||
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
 | 
			
		||||
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10 %** for longer than a single reset window helps avoid secondary limits.
 | 
			
		||||
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
 | 
			
		||||
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
 | 
			
		||||
 | 
			
		||||
## 3. Logging signals
 | 
			
		||||
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
 | 
			
		||||
```
 | 
			
		||||
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
 | 
			
		||||
```
 | 
			
		||||
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
 | 
			
		||||
 | 
			
		||||
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
 | 
			
		||||
 | 
			
		||||
## 4. Configuration knobs (`concelier.yaml`)
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    ghsa:
 | 
			
		||||
      apiToken: "${GITHUB_PAT}"
 | 
			
		||||
      pageSize: 50
 | 
			
		||||
      requestDelay: "00:00:00.200"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
      rateLimitWarningThreshold: 500    # warn below this many remaining calls
 | 
			
		||||
      secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Recommendations
 | 
			
		||||
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
 | 
			
		||||
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
 | 
			
		||||
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHub’s secondary-limit guidance.
 | 
			
		||||
 | 
			
		||||
#### Default job schedule
 | 
			
		||||
 | 
			
		||||
| Job kind | Cron | Timeout | Lease |
 | 
			
		||||
|----------|------|---------|-------|
 | 
			
		||||
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
 | 
			
		||||
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
 | 
			
		||||
 | 
			
		||||
## 5. Provisioning credentials
 | 
			
		||||
 | 
			
		||||
Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`.
 | 
			
		||||
 | 
			
		||||
### Docker Compose (stack operators)
 | 
			
		||||
```yaml
 | 
			
		||||
services:
 | 
			
		||||
  concelier:
 | 
			
		||||
    environment:
 | 
			
		||||
      CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
 | 
			
		||||
    secrets:
 | 
			
		||||
      - ghsa_pat
 | 
			
		||||
 | 
			
		||||
secrets:
 | 
			
		||||
  ghsa_pat:
 | 
			
		||||
    file: ./secrets/ghsa_pat.txt  # contains only the PAT value
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Helm values (cluster operators)
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  extraEnv:
 | 
			
		||||
    - name: CONCELIER__SOURCES__GHSA__APITOKEN
 | 
			
		||||
      valueFrom:
 | 
			
		||||
        secretKeyRef:
 | 
			
		||||
          name: concelier-ghsa
 | 
			
		||||
          key: apiToken
 | 
			
		||||
 | 
			
		||||
extraSecrets:
 | 
			
		||||
  concelier-ghsa:
 | 
			
		||||
    apiToken: "<paste PAT here or source from external secret store>"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads.
 | 
			
		||||
 | 
			
		||||
When enabling GHSA the first time, run a staged backfill:
 | 
			
		||||
 | 
			
		||||
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
 | 
			
		||||
2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`.
 | 
			
		||||
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
 | 
			
		||||
 | 
			
		||||
## 6. Runbook steps when throttled
 | 
			
		||||
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
 | 
			
		||||
2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff.
 | 
			
		||||
3. If rate limits stay exhausted:
 | 
			
		||||
   - Verify no other jobs are sharing the PAT.
 | 
			
		||||
   - Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
 | 
			
		||||
   - Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier.
 | 
			
		||||
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
 | 
			
		||||
 | 
			
		||||
## 7. Alert integration quick reference
 | 
			
		||||
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) – use `histogram_quantile(0.99, ...)` to trend capacity.
 | 
			
		||||
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
 | 
			
		||||
- Grafana: stack remaining + used to visualise total limit per resource.
 | 
			
		||||
 | 
			
		||||
## 8. Canonical metric fallback analytics
 | 
			
		||||
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
 | 
			
		||||
 | 
			
		||||
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
 | 
			
		||||
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
 | 
			
		||||
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.
 | 
			
		||||
@@ -1,122 +1,122 @@
 | 
			
		||||
# Feedser CISA ICS Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
 | 
			
		||||
 | 
			
		||||
## 1. Credential Provisioning
 | 
			
		||||
 | 
			
		||||
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).  
 | 
			
		||||
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
 | 
			
		||||
   - `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
 | 
			
		||||
   - `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
 | 
			
		||||
   - `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
 | 
			
		||||
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
 | 
			
		||||
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `feedser/sources/icscisa/govdelivery/code`.
 | 
			
		||||
 | 
			
		||||
> ℹ️  GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
 | 
			
		||||
 | 
			
		||||
## 2. Feed Validation
 | 
			
		||||
 | 
			
		||||
Use the following command to confirm the feed is reachable before wiring it into Feedser (substitute `<CODE>` with the personalised value):
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
curl -H "User-Agent: StellaOpsFeedser/ics-cisa" \
 | 
			
		||||
     "https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
 | 
			
		||||
 | 
			
		||||
## 3. Configuration Snippet
 | 
			
		||||
 | 
			
		||||
Add the connector configuration to `feedser.yaml` (or equivalent environment variables):
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    icscisa:
 | 
			
		||||
      govDelivery:
 | 
			
		||||
        code: "${FEEDSER_ICS_CISA_GOVDELIVERY_CODE}"
 | 
			
		||||
        topics:
 | 
			
		||||
          - "USDHSCISA_16"
 | 
			
		||||
          - "USDHSCISA_19"
 | 
			
		||||
          - "USDHSCISA_17"
 | 
			
		||||
      rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
 | 
			
		||||
      requestDelay: "00:00:01"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Environment variable example:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
export FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Feedser automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Optional tuning keys (set only when needed):
 | 
			
		||||
 | 
			
		||||
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
 | 
			
		||||
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
 | 
			
		||||
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
 | 
			
		||||
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
 | 
			
		||||
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
 | 
			
		||||
 | 
			
		||||
## 4. Seeding Without GovDelivery
 | 
			
		||||
 | 
			
		||||
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
 | 
			
		||||
 | 
			
		||||
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
 | 
			
		||||
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
 | 
			
		||||
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
 | 
			
		||||
4. Once credentials arrive, update `feedser:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
 | 
			
		||||
 | 
			
		||||
> The CSVs are licensed under ODbL 1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
 | 
			
		||||
 | 
			
		||||
## 4. Integration Validation
 | 
			
		||||
 | 
			
		||||
1. Ensure secrets are in place and restart the Feedser workers.
 | 
			
		||||
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
 | 
			
		||||
   ```bash
 | 
			
		||||
   FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \ 
 | 
			
		||||
   FEEDSER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \ 
 | 
			
		||||
   stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
 | 
			
		||||
   ```
 | 
			
		||||
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
 | 
			
		||||
4. If Akamai blocks direct fetches, set `feedser:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
 | 
			
		||||
 | 
			
		||||
## 4. Rotation & Incident Response
 | 
			
		||||
 | 
			
		||||
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.  
 | 
			
		||||
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.  
 | 
			
		||||
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
 | 
			
		||||
 | 
			
		||||
## 5. Offline Kit Handling
 | 
			
		||||
 | 
			
		||||
Include the personalised code in `offline-kit/secrets/feedser/icscisa.env`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
FEEDSER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/feedser`. Ensure permissions are `600` and ownership matches the Feedser runtime user.
 | 
			
		||||
 | 
			
		||||
## 6. Telemetry & Monitoring
 | 
			
		||||
 | 
			
		||||
The connector emits metrics under the meter `StellaOps.Feedser.Source.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
 | 
			
		||||
 | 
			
		||||
- `icscisa.fetch.*` – counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `feedser.source`, `icscisa.topic`).
 | 
			
		||||
- `icscisa.parse.*` – counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
 | 
			
		||||
- `icscisa.detail.*` – counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
 | 
			
		||||
- `icscisa.map.*` – counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
 | 
			
		||||
 | 
			
		||||
Suggested alerts:
 | 
			
		||||
 | 
			
		||||
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
 | 
			
		||||
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
 | 
			
		||||
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
 | 
			
		||||
- Keep an eye on shared HTTP metrics (`feedser.source.http.*{feedser.source="ics-cisa"}`) for request latency and retry patterns.
 | 
			
		||||
 | 
			
		||||
## 6. Related Tasks
 | 
			
		||||
 | 
			
		||||
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
 | 
			
		||||
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.
 | 
			
		||||
# Concelier CISA ICS Connector Operations
 | 
			
		||||
 | 
			
		||||
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
 | 
			
		||||
 | 
			
		||||
## 1. Credential Provisioning
 | 
			
		||||
 | 
			
		||||
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).  
 | 
			
		||||
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
 | 
			
		||||
   - `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
 | 
			
		||||
   - `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
 | 
			
		||||
   - `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
 | 
			
		||||
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
 | 
			
		||||
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`.
 | 
			
		||||
 | 
			
		||||
> ℹ️  GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
 | 
			
		||||
 | 
			
		||||
## 2. Feed Validation
 | 
			
		||||
 | 
			
		||||
Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value):
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \
 | 
			
		||||
     "https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
 | 
			
		||||
 | 
			
		||||
## 3. Configuration Snippet
 | 
			
		||||
 | 
			
		||||
Add the connector configuration to `concelier.yaml` (or equivalent environment variables):
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    icscisa:
 | 
			
		||||
      govDelivery:
 | 
			
		||||
        code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}"
 | 
			
		||||
        topics:
 | 
			
		||||
          - "USDHSCISA_16"
 | 
			
		||||
          - "USDHSCISA_19"
 | 
			
		||||
          - "USDHSCISA_17"
 | 
			
		||||
      rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
 | 
			
		||||
      requestDelay: "00:00:01"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Environment variable example:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Optional tuning keys (set only when needed):
 | 
			
		||||
 | 
			
		||||
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
 | 
			
		||||
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
 | 
			
		||||
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
 | 
			
		||||
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
 | 
			
		||||
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
 | 
			
		||||
 | 
			
		||||
## 4. Seeding Without GovDelivery
 | 
			
		||||
 | 
			
		||||
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
 | 
			
		||||
 | 
			
		||||
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
 | 
			
		||||
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
 | 
			
		||||
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
 | 
			
		||||
4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
 | 
			
		||||
 | 
			
		||||
> The CSVs are licensed under ODbL 1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
 | 
			
		||||
 | 
			
		||||
## 4. Integration Validation
 | 
			
		||||
 | 
			
		||||
1. Ensure secrets are in place and restart the Concelier workers.
 | 
			
		||||
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
 | 
			
		||||
   ```bash
 | 
			
		||||
   CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \ 
 | 
			
		||||
   CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \ 
 | 
			
		||||
   stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
 | 
			
		||||
   ```
 | 
			
		||||
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
 | 
			
		||||
4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
 | 
			
		||||
 | 
			
		||||
## 4. Rotation & Incident Response
 | 
			
		||||
 | 
			
		||||
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.  
 | 
			
		||||
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.  
 | 
			
		||||
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
 | 
			
		||||
 | 
			
		||||
## 5. Offline Kit Handling
 | 
			
		||||
 | 
			
		||||
Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user.
 | 
			
		||||
 | 
			
		||||
## 6. Telemetry & Monitoring
 | 
			
		||||
 | 
			
		||||
The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
 | 
			
		||||
 | 
			
		||||
- `icscisa.fetch.*` – counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`).
 | 
			
		||||
- `icscisa.parse.*` – counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
 | 
			
		||||
- `icscisa.detail.*` – counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
 | 
			
		||||
- `icscisa.map.*` – counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
 | 
			
		||||
 | 
			
		||||
Suggested alerts:
 | 
			
		||||
 | 
			
		||||
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
 | 
			
		||||
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
 | 
			
		||||
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
 | 
			
		||||
- Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns.
 | 
			
		||||
 | 
			
		||||
## 6. Related Tasks
 | 
			
		||||
 | 
			
		||||
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
 | 
			
		||||
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.
 | 
			
		||||
@@ -1,74 +1,74 @@
 | 
			
		||||
# Feedser KISA Connector Operations
 | 
			
		||||
 | 
			
		||||
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
 | 
			
		||||
- Connector options defined under `feedser:sources:kisa`:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    kisa:
 | 
			
		||||
      feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
 | 
			
		||||
      detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
 | 
			
		||||
      detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
 | 
			
		||||
      maxAdvisoriesPerFetch: 10
 | 
			
		||||
      requestDelay: "00:00:01"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Ensure the URIs stay absolute—Feedser adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
 | 
			
		||||
 | 
			
		||||
## 2. Staging Smoke Test
 | 
			
		||||
 | 
			
		||||
1. Restart the Feedser workers so the KISA options bind.
 | 
			
		||||
2. Run a full connector cycle:
 | 
			
		||||
   - CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
 | 
			
		||||
   - REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
 | 
			
		||||
3. Confirm telemetry (Meter `StellaOps.Feedser.Source.Kisa`):
 | 
			
		||||
   - `kisa.feed.success`, `kisa.feed.items`
 | 
			
		||||
   - `kisa.detail.success` / `.failures`
 | 
			
		||||
   - `kisa.parse.success` / `.failures`
 | 
			
		||||
   - `kisa.map.success` / `.failures`
 | 
			
		||||
   - `kisa.cursor.updates`
 | 
			
		||||
4. Inspect logs for structured entries:
 | 
			
		||||
   - `KISA feed returned {ItemCount}`
 | 
			
		||||
   - `KISA fetched detail for {Idx} … category={Category}`
 | 
			
		||||
   - `KISA mapped advisory {AdvisoryId} (severity={Severity})`
 | 
			
		||||
   - Absence of warnings such as `document missing GridFS payload`.
 | 
			
		||||
5. Validate MongoDB state:
 | 
			
		||||
   - `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
 | 
			
		||||
   - DTO store contains `schemaVersion="kisa.detail.v1"`.
 | 
			
		||||
   - Advisories include aliases (`IDX`, CVE) and `language="ko"`.
 | 
			
		||||
   - `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
 | 
			
		||||
 | 
			
		||||
## 3. Production Monitoring
 | 
			
		||||
 | 
			
		||||
- **Dashboards** – Add the following Prometheus/OTEL expressions:
 | 
			
		||||
  - `rate(kisa_feed_items_total[15m])` versus `rate(feedser_source_http_requests_total{feedser_source="kisa"}[15m])`
 | 
			
		||||
  - `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
 | 
			
		||||
  - `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
 | 
			
		||||
  - `increase(kisa_map_failures_total[1h])` to flag schema drift
 | 
			
		||||
  - `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
 | 
			
		||||
- **Alerts** – Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
 | 
			
		||||
- **Logs** – Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
 | 
			
		||||
 | 
			
		||||
## 4. Localisation Handling
 | 
			
		||||
 | 
			
		||||
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF‑8 and avoid transliteration.
 | 
			
		||||
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
 | 
			
		||||
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
 | 
			
		||||
 | 
			
		||||
## 5. Fixture & Regression Maintenance
 | 
			
		||||
 | 
			
		||||
- Regression fixtures: `src/StellaOps.Feedser.Source.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
 | 
			
		||||
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Kisa.Tests/StellaOps.Feedser.Source.Kisa.Tests.csproj`.
 | 
			
		||||
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
 | 
			
		||||
 | 
			
		||||
## 6. Known Issues
 | 
			
		||||
 | 
			
		||||
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
 | 
			
		||||
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
 | 
			
		||||
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.
 | 
			
		||||
# Concelier KISA Connector Operations
 | 
			
		||||
 | 
			
		||||
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
 | 
			
		||||
- Connector options defined under `concelier:sources:kisa`:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    kisa:
 | 
			
		||||
      feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
 | 
			
		||||
      detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
 | 
			
		||||
      detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
 | 
			
		||||
      maxAdvisoriesPerFetch: 10
 | 
			
		||||
      requestDelay: "00:00:01"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
 | 
			
		||||
 | 
			
		||||
## 2. Staging Smoke Test
 | 
			
		||||
 | 
			
		||||
1. Restart the Concelier workers so the KISA options bind.
 | 
			
		||||
2. Run a full connector cycle:
 | 
			
		||||
   - CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
 | 
			
		||||
   - REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
 | 
			
		||||
3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`):
 | 
			
		||||
   - `kisa.feed.success`, `kisa.feed.items`
 | 
			
		||||
   - `kisa.detail.success` / `.failures`
 | 
			
		||||
   - `kisa.parse.success` / `.failures`
 | 
			
		||||
   - `kisa.map.success` / `.failures`
 | 
			
		||||
   - `kisa.cursor.updates`
 | 
			
		||||
4. Inspect logs for structured entries:
 | 
			
		||||
   - `KISA feed returned {ItemCount}`
 | 
			
		||||
   - `KISA fetched detail for {Idx} … category={Category}`
 | 
			
		||||
   - `KISA mapped advisory {AdvisoryId} (severity={Severity})`
 | 
			
		||||
   - Absence of warnings such as `document missing GridFS payload`.
 | 
			
		||||
5. Validate MongoDB state:
 | 
			
		||||
   - `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
 | 
			
		||||
   - DTO store contains `schemaVersion="kisa.detail.v1"`.
 | 
			
		||||
   - Advisories include aliases (`IDX`, CVE) and `language="ko"`.
 | 
			
		||||
   - `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
 | 
			
		||||
 | 
			
		||||
## 3. Production Monitoring
 | 
			
		||||
 | 
			
		||||
- **Dashboards** – Add the following Prometheus/OTEL expressions:
 | 
			
		||||
  - `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])`
 | 
			
		||||
  - `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
 | 
			
		||||
  - `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
 | 
			
		||||
  - `increase(kisa_map_failures_total[1h])` to flag schema drift
 | 
			
		||||
  - `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
 | 
			
		||||
- **Alerts** – Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
 | 
			
		||||
- **Logs** – Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
 | 
			
		||||
 | 
			
		||||
## 4. Localisation Handling
 | 
			
		||||
 | 
			
		||||
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF‑8 and avoid transliteration.
 | 
			
		||||
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
 | 
			
		||||
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
 | 
			
		||||
 | 
			
		||||
## 5. Fixture & Regression Maintenance
 | 
			
		||||
 | 
			
		||||
- Regression fixtures: `src/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
 | 
			
		||||
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
 | 
			
		||||
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
 | 
			
		||||
 | 
			
		||||
## 6. Known Issues
 | 
			
		||||
 | 
			
		||||
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
 | 
			
		||||
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
 | 
			
		||||
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.
 | 
			
		||||
							
								
								
									
										196
									
								
								docs/ops/concelier-mirror-operations.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										196
									
								
								docs/ops/concelier-mirror-operations.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,196 @@
 | 
			
		||||
# Concelier & Excititor Mirror Operations
 | 
			
		||||
 | 
			
		||||
This runbook describes how Stella Ops operates the managed mirrors under `*.stella-ops.org`.
 | 
			
		||||
It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant
 | 
			
		||||
authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current.
 | 
			
		||||
 | 
			
		||||
## 1. Prerequisites
 | 
			
		||||
 | 
			
		||||
- **Authority access** – client credentials (`client_id` + secret) authorised for
 | 
			
		||||
  `concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git.
 | 
			
		||||
- **Signed TLS certificates** – wildcard or per-domain (`mirror-primary`, `mirror-community`).
 | 
			
		||||
  Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets.
 | 
			
		||||
- **Mirror gateway credentials** – Basic Auth htpasswd files per domain. Generate with
 | 
			
		||||
  `htpasswd -B`. Operators distribute credentials to downstream consumers.
 | 
			
		||||
- **Export artifact source** – read access to the canonical S3 buckets (or rsync share)
 | 
			
		||||
  that hold `concelier` JSON bundles and `excititor` VEX exports.
 | 
			
		||||
- **Persistent volumes** – storage for Concelier job metadata and mirror export trees.
 | 
			
		||||
  For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
 | 
			
		||||
  `excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
 | 
			
		||||
 | 
			
		||||
## 2. Secret & certificate layout
 | 
			
		||||
 | 
			
		||||
### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`)
 | 
			
		||||
 | 
			
		||||
- `deploy/compose/env/mirror.env.example` – copy to `.env` and adjust quotas or domain IDs.
 | 
			
		||||
- `deploy/compose/mirror-secrets/` – mount read-only into `/run/secrets`. Place:
 | 
			
		||||
  - `concelier-authority-client` – Authority client secret.
 | 
			
		||||
  - `excititor-authority-client` (optional) – reserve for future authn.
 | 
			
		||||
- `deploy/compose/mirror-gateway/tls/` – PEM-encoded cert/key pairs:
 | 
			
		||||
  - `mirror-primary.crt`, `mirror-primary.key`
 | 
			
		||||
  - `mirror-community.crt`, `mirror-community.key`
 | 
			
		||||
- `deploy/compose/mirror-gateway/secrets/` – htpasswd files:
 | 
			
		||||
  - `mirror-primary.htpasswd`
 | 
			
		||||
  - `mirror-community.htpasswd`
 | 
			
		||||
 | 
			
		||||
### Helm (`deploy/helm/stellaops/values-mirror.yaml`)
 | 
			
		||||
 | 
			
		||||
Create secrets in the target namespace:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
kubectl create secret generic concelier-mirror-auth \
 | 
			
		||||
  --from-file=concelier-authority-client=concelier-authority-client
 | 
			
		||||
 | 
			
		||||
kubectl create secret generic excititor-mirror-auth \
 | 
			
		||||
  --from-file=excititor-authority-client=excititor-authority-client
 | 
			
		||||
 | 
			
		||||
kubectl create secret tls mirror-gateway-tls \
 | 
			
		||||
  --cert=mirror-primary.crt --key=mirror-primary.key
 | 
			
		||||
 | 
			
		||||
kubectl create secret generic mirror-gateway-htpasswd \
 | 
			
		||||
  --from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients.
 | 
			
		||||
 | 
			
		||||
## 3. Deployment
 | 
			
		||||
 | 
			
		||||
### 3.1 Docker Compose (edge mirrors, lab validation)
 | 
			
		||||
 | 
			
		||||
1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env`
 | 
			
		||||
2. Populate secrets/tls directories as described above.
 | 
			
		||||
3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted
 | 
			
		||||
   on the host path backing the `concelier-exports` and `excititor-exports` volumes.
 | 
			
		||||
4. Run the profile validator: `deploy/tools/validate-profiles.sh`.
 | 
			
		||||
5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`.
 | 
			
		||||
 | 
			
		||||
### 3.2 Helm (production mirrors)
 | 
			
		||||
 | 
			
		||||
1. Provision PVCs sized for mirror bundles (baseline: 20 GiB per domain).
 | 
			
		||||
2. Create secrets/tls config maps (§2).
 | 
			
		||||
3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`.
 | 
			
		||||
4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by
 | 
			
		||||
   your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout).
 | 
			
		||||
 | 
			
		||||
## 4. Artifact sync workflow
 | 
			
		||||
 | 
			
		||||
Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor
 | 
			
		||||
export jobs. Recommended sync pattern:
 | 
			
		||||
 | 
			
		||||
### 4.1 Compose host (systemd timer)
 | 
			
		||||
 | 
			
		||||
`/usr/local/bin/mirror-sync.sh`:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
#!/usr/bin/env bash
 | 
			
		||||
set -euo pipefail
 | 
			
		||||
export AWS_ACCESS_KEY_ID=…
 | 
			
		||||
export AWS_SECRET_ACCESS_KEY=…
 | 
			
		||||
 | 
			
		||||
aws s3 sync s3://mirror-stellaops/concelier/latest \
 | 
			
		||||
  /opt/stellaops/mirror-data/concelier --delete --size-only
 | 
			
		||||
 | 
			
		||||
aws s3 sync s3://mirror-stellaops/excititor/latest \
 | 
			
		||||
  /opt/stellaops/mirror-data/excititor --delete --size-only
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Schedule with a systemd timer every 5 minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*`
 | 
			
		||||
into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and
 | 
			
		||||
`EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`.
 | 
			
		||||
 | 
			
		||||
### 4.2 Kubernetes (CronJob)
 | 
			
		||||
 | 
			
		||||
Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs:
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
apiVersion: batch/v1
 | 
			
		||||
kind: CronJob
 | 
			
		||||
metadata:
 | 
			
		||||
  name: mirror-sync
 | 
			
		||||
spec:
 | 
			
		||||
  schedule: "*/5 * * * *"
 | 
			
		||||
  jobTemplate:
 | 
			
		||||
    spec:
 | 
			
		||||
      template:
 | 
			
		||||
        spec:
 | 
			
		||||
          containers:
 | 
			
		||||
          - name: sync
 | 
			
		||||
            image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea
 | 
			
		||||
            command:
 | 
			
		||||
              - /bin/sh
 | 
			
		||||
              - -c
 | 
			
		||||
              - >
 | 
			
		||||
                aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only &&
 | 
			
		||||
                aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only
 | 
			
		||||
            volumeMounts:
 | 
			
		||||
              - name: concelier-exports
 | 
			
		||||
                mountPath: /exports/concelier
 | 
			
		||||
              - name: excititor-exports
 | 
			
		||||
                mountPath: /exports/excititor
 | 
			
		||||
            envFrom:
 | 
			
		||||
              - secretRef:
 | 
			
		||||
                  name: mirror-sync-aws
 | 
			
		||||
          restartPolicy: OnFailure
 | 
			
		||||
          volumes:
 | 
			
		||||
            - name: concelier-exports
 | 
			
		||||
              persistentVolumeClaim:
 | 
			
		||||
                claimName: concelier-mirror-exports
 | 
			
		||||
            - name: excititor-exports
 | 
			
		||||
              persistentVolumeClaim:
 | 
			
		||||
                claimName: excititor-mirror-exports
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 5. CDN integration
 | 
			
		||||
 | 
			
		||||
1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer).
 | 
			
		||||
2. Honour the response headers emitted by the gateway and Concelier/Excititor:
 | 
			
		||||
   `Cache-Control: public, max-age=300, immutable` for mirror payloads.
 | 
			
		||||
3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs:
 | 
			
		||||
   - Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60 s.
 | 
			
		||||
   - Bundle/manifest payloads → 300 s.
 | 
			
		||||
4. Forward the `Authorization` header—Basic Auth terminates at the gateway.
 | 
			
		||||
5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging
 | 
			
		||||
   to SIEM for anomaly detection.
 | 
			
		||||
 | 
			
		||||
## 6. Smoke tests
 | 
			
		||||
 | 
			
		||||
After each deployment or sync cycle:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# Index with Basic Auth
 | 
			
		||||
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys'
 | 
			
		||||
 | 
			
		||||
# Mirror manifest signature
 | 
			
		||||
curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json
 | 
			
		||||
 | 
			
		||||
# Excititor consensus bundle metadata
 | 
			
		||||
curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \
 | 
			
		||||
  | jq '.exports[].exportKey'
 | 
			
		||||
 | 
			
		||||
# Signed bundle + detached JWS (spot check digests)
 | 
			
		||||
curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \
 | 
			
		||||
  -o bundle.json.jws
 | 
			
		||||
cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway`
 | 
			
		||||
should show `X-Cache-Status: HIT/MISS`.
 | 
			
		||||
 | 
			
		||||
## 7. Maintenance & rotation
 | 
			
		||||
 | 
			
		||||
- **Bundle freshness** – alert if sync job lag exceeds 15 minutes or if `concelier` logs
 | 
			
		||||
  `Mirror export root is not configured`.
 | 
			
		||||
- **Secret rotation** – change Authority client secrets and Basic Auth credentials quarterly.
 | 
			
		||||
  Update the mounted secrets and restart deployments (`docker compose restart concelier` or
 | 
			
		||||
  `kubectl rollout restart deploy/stellaops-concelier`).
 | 
			
		||||
- **TLS renewal** – reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`).
 | 
			
		||||
- **Quota tuning** – adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file.
 | 
			
		||||
  Align CDN rate limits and inform downstreams.
 | 
			
		||||
 | 
			
		||||
## 8. References
 | 
			
		||||
 | 
			
		||||
- Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`,
 | 
			
		||||
  `deploy/helm/stellaops/values-mirror.yaml`
 | 
			
		||||
- Mirror architecture dossiers: `docs/ARCHITECTURE_CONCELIER.md`,
 | 
			
		||||
  `docs/ARCHITECTURE_EXCITITOR_MIRRORS.md`
 | 
			
		||||
- Export bundling: `docs/ARCHITECTURE_DEVOPS.md` §3, `docs/ARCHITECTURE_EXCITITOR.md` §7
 | 
			
		||||
@@ -1,86 +1,86 @@
 | 
			
		||||
# Feedser MSRC Connector – Azure AD Onboarding Brief
 | 
			
		||||
 | 
			
		||||
_Drafted: 2025-10-15_
 | 
			
		||||
 | 
			
		||||
## 1. App registration requirements
 | 
			
		||||
 | 
			
		||||
- **Tenant**: shared StellaOps production Azure AD.
 | 
			
		||||
- **Application type**: confidential client (web/API) issuing client credentials.
 | 
			
		||||
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
 | 
			
		||||
- **Token audience**: `https://api.msrc.microsoft.com/`.
 | 
			
		||||
- **Grant type**: client credentials. Feedser will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
 | 
			
		||||
 | 
			
		||||
## 2. Secret/credential policy
 | 
			
		||||
 | 
			
		||||
- Maintain two client secrets (primary + standby) rotating every 90 days.
 | 
			
		||||
- Store secrets in the Feedser secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
 | 
			
		||||
- Record rotation cadence in Ops runbook and update Feedser configuration (`FEEDSER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
 | 
			
		||||
 | 
			
		||||
## 3. Feedser configuration sample
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    vndr.msrc:
 | 
			
		||||
      tenantId: "<azure-tenant-guid>"
 | 
			
		||||
      clientId: "<app-registration-client-id>"
 | 
			
		||||
      clientSecret: "<pull from secret store>"
 | 
			
		||||
      apiVersion: "2024-08-01"
 | 
			
		||||
      locale: "en-US"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
      cursorOverlapMinutes: 10
 | 
			
		||||
      downloadCvrf: false  # set true to persist CVRF ZIP alongside JSON detail
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 4. CVRF artefacts
 | 
			
		||||
 | 
			
		||||
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
 | 
			
		||||
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
 | 
			
		||||
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
 | 
			
		||||
 | 
			
		||||
### 4.1 State seeding helper
 | 
			
		||||
 | 
			
		||||
Use `tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
  "source": "vndr.msrc",
 | 
			
		||||
  "cursor": {
 | 
			
		||||
    "lastModifiedCursor": "2024-01-01T00:00:00Z"
 | 
			
		||||
  },
 | 
			
		||||
  "documents": [
 | 
			
		||||
    {
 | 
			
		||||
      "uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
 | 
			
		||||
      "contentFile": "./seeds/adv2024-0001.json",
 | 
			
		||||
      "contentType": "application/json",
 | 
			
		||||
      "metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
 | 
			
		||||
      "addToPendingDocuments": true
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
 | 
			
		||||
      "contentFile": "./seeds/adv2024-0001.cvrf.zip",
 | 
			
		||||
      "contentType": "application/zip",
 | 
			
		||||
      "status": "mapped",
 | 
			
		||||
      "addToPendingDocuments": false
 | 
			
		||||
    }
 | 
			
		||||
  ]
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Run the helper:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
dotnet run --project tools/SourceStateSeeder -- \
 | 
			
		||||
  --connection-string "mongodb://localhost:27017" \
 | 
			
		||||
  --database feedser \
 | 
			
		||||
  --input seeds/msrc-backfill.json
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
 | 
			
		||||
 | 
			
		||||
## 5. Outstanding items
 | 
			
		||||
 | 
			
		||||
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
 | 
			
		||||
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
 | 
			
		||||
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.
 | 
			
		||||
# Concelier MSRC Connector – Azure AD Onboarding Brief
 | 
			
		||||
 | 
			
		||||
_Drafted: 2025-10-15_
 | 
			
		||||
 | 
			
		||||
## 1. App registration requirements
 | 
			
		||||
 | 
			
		||||
- **Tenant**: shared StellaOps production Azure AD.
 | 
			
		||||
- **Application type**: confidential client (web/API) issuing client credentials.
 | 
			
		||||
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
 | 
			
		||||
- **Token audience**: `https://api.msrc.microsoft.com/`.
 | 
			
		||||
- **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
 | 
			
		||||
 | 
			
		||||
## 2. Secret/credential policy
 | 
			
		||||
 | 
			
		||||
- Maintain two client secrets (primary + standby) rotating every 90 days.
 | 
			
		||||
- Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
 | 
			
		||||
- Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
 | 
			
		||||
 | 
			
		||||
## 3. Concelier configuration sample
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
concelier:
 | 
			
		||||
  sources:
 | 
			
		||||
    vndr.msrc:
 | 
			
		||||
      tenantId: "<azure-tenant-guid>"
 | 
			
		||||
      clientId: "<app-registration-client-id>"
 | 
			
		||||
      clientSecret: "<pull from secret store>"
 | 
			
		||||
      apiVersion: "2024-08-01"
 | 
			
		||||
      locale: "en-US"
 | 
			
		||||
      requestDelay: "00:00:00.250"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
      cursorOverlapMinutes: 10
 | 
			
		||||
      downloadCvrf: false  # set true to persist CVRF ZIP alongside JSON detail
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 4. CVRF artefacts
 | 
			
		||||
 | 
			
		||||
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
 | 
			
		||||
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
 | 
			
		||||
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
 | 
			
		||||
 | 
			
		||||
### 4.1 State seeding helper
 | 
			
		||||
 | 
			
		||||
Use `tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
  "source": "vndr.msrc",
 | 
			
		||||
  "cursor": {
 | 
			
		||||
    "lastModifiedCursor": "2024-01-01T00:00:00Z"
 | 
			
		||||
  },
 | 
			
		||||
  "documents": [
 | 
			
		||||
    {
 | 
			
		||||
      "uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
 | 
			
		||||
      "contentFile": "./seeds/adv2024-0001.json",
 | 
			
		||||
      "contentType": "application/json",
 | 
			
		||||
      "metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
 | 
			
		||||
      "addToPendingDocuments": true
 | 
			
		||||
    },
 | 
			
		||||
    {
 | 
			
		||||
      "uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
 | 
			
		||||
      "contentFile": "./seeds/adv2024-0001.cvrf.zip",
 | 
			
		||||
      "contentType": "application/zip",
 | 
			
		||||
      "status": "mapped",
 | 
			
		||||
      "addToPendingDocuments": false
 | 
			
		||||
    }
 | 
			
		||||
  ]
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Run the helper:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
dotnet run --project tools/SourceStateSeeder -- \
 | 
			
		||||
  --connection-string "mongodb://localhost:27017" \
 | 
			
		||||
  --database concelier \
 | 
			
		||||
  --input seeds/msrc-backfill.json
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
 | 
			
		||||
 | 
			
		||||
## 5. Outstanding items
 | 
			
		||||
 | 
			
		||||
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
 | 
			
		||||
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
 | 
			
		||||
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.
 | 
			
		||||
@@ -1,48 +1,48 @@
 | 
			
		||||
# NKCKI Connector Operations Guide
 | 
			
		||||
 | 
			
		||||
## Overview
 | 
			
		||||
 | 
			
		||||
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
 | 
			
		||||
 | 
			
		||||
## Configuration
 | 
			
		||||
 | 
			
		||||
Key options exposed through `feedser:sources:ru-nkcki:http`:
 | 
			
		||||
 | 
			
		||||
- `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
 | 
			
		||||
- `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
 | 
			
		||||
- `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
 | 
			
		||||
- `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
 | 
			
		||||
- `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.
 | 
			
		||||
 | 
			
		||||
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/feedser/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
 | 
			
		||||
 | 
			
		||||
## Telemetry
 | 
			
		||||
 | 
			
		||||
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Feedser.Source.Ru.Nkcki`:
 | 
			
		||||
 | 
			
		||||
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
 | 
			
		||||
- `nkcki.listing.pages.visited` (histogram, `pages`)
 | 
			
		||||
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
 | 
			
		||||
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
 | 
			
		||||
- `nkcki.entries.processed` (histogram, `entries`)
 | 
			
		||||
 | 
			
		||||
Integrate these counters into standard Feedser observability dashboards to track crawl coverage and cache hit rates.
 | 
			
		||||
 | 
			
		||||
## Archive Backfill Strategy
 | 
			
		||||
 | 
			
		||||
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
 | 
			
		||||
 | 
			
		||||
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
 | 
			
		||||
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
 | 
			
		||||
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
 | 
			
		||||
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/feedser-connector-research-20251011.md`).
 | 
			
		||||
 | 
			
		||||
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
 | 
			
		||||
 | 
			
		||||
## Failure Handling
 | 
			
		||||
 | 
			
		||||
- Listing failures mark the source state with exponential backoff while attempting cache replay.
 | 
			
		||||
- Bulletin fetches fall back to cached copies before surfacing an error.
 | 
			
		||||
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
 | 
			
		||||
 | 
			
		||||
Refer to `ru-nkcki` entries in `src/StellaOps.Feedser.Source.Ru.Nkcki/TASKS.md` for outstanding items.
 | 
			
		||||
# NKCKI Connector Operations Guide
 | 
			
		||||
 | 
			
		||||
## Overview
 | 
			
		||||
 | 
			
		||||
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
 | 
			
		||||
 | 
			
		||||
## Configuration
 | 
			
		||||
 | 
			
		||||
Key options exposed through `concelier:sources:ru-nkcki:http`:
 | 
			
		||||
 | 
			
		||||
- `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`).
 | 
			
		||||
- `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`).
 | 
			
		||||
- `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
 | 
			
		||||
- `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios.
 | 
			
		||||
- `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness.
 | 
			
		||||
 | 
			
		||||
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
 | 
			
		||||
 | 
			
		||||
## Telemetry
 | 
			
		||||
 | 
			
		||||
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
 | 
			
		||||
 | 
			
		||||
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
 | 
			
		||||
- `nkcki.listing.pages.visited` (histogram, `pages`)
 | 
			
		||||
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
 | 
			
		||||
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
 | 
			
		||||
- `nkcki.entries.processed` (histogram, `entries`)
 | 
			
		||||
 | 
			
		||||
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
 | 
			
		||||
 | 
			
		||||
## Archive Backfill Strategy
 | 
			
		||||
 | 
			
		||||
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
 | 
			
		||||
 | 
			
		||||
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
 | 
			
		||||
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
 | 
			
		||||
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
 | 
			
		||||
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
 | 
			
		||||
 | 
			
		||||
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
 | 
			
		||||
 | 
			
		||||
## Failure Handling
 | 
			
		||||
 | 
			
		||||
- Listing failures mark the source state with exponential backoff while attempting cache replay.
 | 
			
		||||
- Bulletin fetches fall back to cached copies before surfacing an error.
 | 
			
		||||
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
 | 
			
		||||
 | 
			
		||||
Refer to `ru-nkcki` entries in `src/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.
 | 
			
		||||
@@ -1,24 +1,24 @@
 | 
			
		||||
# Feedser OSV Connector – Operations Notes
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-16_
 | 
			
		||||
 | 
			
		||||
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
 | 
			
		||||
 | 
			
		||||
## 1. Canonical metric fallbacks
 | 
			
		||||
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
 | 
			
		||||
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
 | 
			
		||||
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
 | 
			
		||||
 | 
			
		||||
## 2. CWE provenance
 | 
			
		||||
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
 | 
			
		||||
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
 | 
			
		||||
 | 
			
		||||
## 3. Dashboards & alerts
 | 
			
		||||
- Extend existing merge dashboards with the new counter:
 | 
			
		||||
  - Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
 | 
			
		||||
  - Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
 | 
			
		||||
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
 | 
			
		||||
 | 
			
		||||
## 4. Runbook updates
 | 
			
		||||
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj`.
 | 
			
		||||
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.
 | 
			
		||||
# Concelier OSV Connector – Operations Notes
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-16_
 | 
			
		||||
 | 
			
		||||
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
 | 
			
		||||
 | 
			
		||||
## 1. Canonical metric fallbacks
 | 
			
		||||
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
 | 
			
		||||
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
 | 
			
		||||
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
 | 
			
		||||
 | 
			
		||||
## 2. CWE provenance
 | 
			
		||||
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
 | 
			
		||||
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
 | 
			
		||||
 | 
			
		||||
## 3. Dashboards & alerts
 | 
			
		||||
- Extend existing merge dashboards with the new counter:
 | 
			
		||||
  - Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
 | 
			
		||||
  - Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
 | 
			
		||||
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
 | 
			
		||||
 | 
			
		||||
## 4. Runbook updates
 | 
			
		||||
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
 | 
			
		||||
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.
 | 
			
		||||
@@ -1,50 +1,50 @@
 | 
			
		||||
# SemVer Style Backfill Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-11_
 | 
			
		||||
 | 
			
		||||
## Overview
 | 
			
		||||
 | 
			
		||||
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
 | 
			
		||||
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
 | 
			
		||||
runs when the feature flag `feedser:storage:enableSemVerStyle` is enabled.
 | 
			
		||||
 | 
			
		||||
## Preconditions
 | 
			
		||||
 | 
			
		||||
1. **Review configuration** – set `feedser.storage.enableSemVerStyle` to `true` on all Feedser services.
 | 
			
		||||
2. **Confirm batch size** – adjust `feedser.storage.backfillBatchSize` if you need smaller batches for older
 | 
			
		||||
   deployments (default: `250`).
 | 
			
		||||
3. **Back up** – capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
 | 
			
		||||
4. **Staging dry-run** – enable the flag in a staging environment and observe the migration output before
 | 
			
		||||
   rolling to production.
 | 
			
		||||
 | 
			
		||||
## Execution
 | 
			
		||||
 | 
			
		||||
No manual command is required. After deploying the configuration change, restart the Feedser WebService or
 | 
			
		||||
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
 | 
			
		||||
Mongo migration 20251011-semver-style-backfill applied
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The migration reads advisories in batches (`feedser.storage.backfillBatchSize`) and writes flattened
 | 
			
		||||
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
 | 
			
		||||
 | 
			
		||||
## Post-checks
 | 
			
		||||
 | 
			
		||||
1. Verify the new indexes exist:
 | 
			
		||||
   ```
 | 
			
		||||
   db.advisory.getIndexes()
 | 
			
		||||
   ```
 | 
			
		||||
   You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
 | 
			
		||||
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
 | 
			
		||||
   the embedded package data.
 | 
			
		||||
3. Run `dotnet test` for `StellaOps.Feedser.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
 | 
			
		||||
   the storage suite passes with the feature flag enabled.
 | 
			
		||||
 | 
			
		||||
## Rollback
 | 
			
		||||
 | 
			
		||||
Set `feedser.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
 | 
			
		||||
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
 | 
			
		||||
the feature flag is off. If you must remove them entirely, restore from the backup captured during
 | 
			
		||||
preparation.
 | 
			
		||||
# SemVer Style Backfill Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-11_
 | 
			
		||||
 | 
			
		||||
## Overview
 | 
			
		||||
 | 
			
		||||
The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures
 | 
			
		||||
provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only
 | 
			
		||||
runs when the feature flag `concelier:storage:enableSemVerStyle` is enabled.
 | 
			
		||||
 | 
			
		||||
## Preconditions
 | 
			
		||||
 | 
			
		||||
1. **Review configuration** – set `concelier.storage.enableSemVerStyle` to `true` on all Concelier services.
 | 
			
		||||
2. **Confirm batch size** – adjust `concelier.storage.backfillBatchSize` if you need smaller batches for older
 | 
			
		||||
   deployments (default: `250`).
 | 
			
		||||
3. **Back up** – capture a fresh snapshot of the `advisory` collection or a full MongoDB backup.
 | 
			
		||||
4. **Staging dry-run** – enable the flag in a staging environment and observe the migration output before
 | 
			
		||||
   rolling to production.
 | 
			
		||||
 | 
			
		||||
## Execution
 | 
			
		||||
 | 
			
		||||
No manual command is required. After deploying the configuration change, restart the Concelier WebService or
 | 
			
		||||
any component that hosts the Mongo migration runner. During startup you will see log entries similar to:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled.
 | 
			
		||||
Mongo migration 20251011-semver-style-backfill applied
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The migration reads advisories in batches (`concelier.storage.backfillBatchSize`) and writes flattened
 | 
			
		||||
`normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched.
 | 
			
		||||
 | 
			
		||||
## Post-checks
 | 
			
		||||
 | 
			
		||||
1. Verify the new indexes exist:
 | 
			
		||||
   ```
 | 
			
		||||
   db.advisory.getIndexes()
 | 
			
		||||
   ```
 | 
			
		||||
   You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`.
 | 
			
		||||
2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches
 | 
			
		||||
   the embedded package data.
 | 
			
		||||
3. Run `dotnet test` for `StellaOps.Concelier.Storage.Mongo.Tests` (optional but recommended) in CI to confirm
 | 
			
		||||
   the storage suite passes with the feature flag enabled.
 | 
			
		||||
 | 
			
		||||
## Rollback
 | 
			
		||||
 | 
			
		||||
Set `concelier.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on
 | 
			
		||||
subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when
 | 
			
		||||
the feature flag is off. If you must remove them entirely, restore from the backup captured during
 | 
			
		||||
preparation.
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user