Resolve Concelier/Excititor merge conflicts

This commit is contained in:
root
2025-10-20 14:19:25 +03:00
2687 changed files with 212646 additions and 85913 deletions

View File

@@ -0,0 +1,146 @@
# Concelier CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `concelier.yaml` fragment:
```yaml
concelier:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `concelier.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Concelier can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.