Files
git.stella-ops.org/docs/ops/concelier-cccs-operations.md

73 lines
4.6 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Concelier CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
```yaml
concelier:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="cccs"}`
- `concelier.source.http.failures{concelier.source="cccs"}`
- `concelier.source.http.duration{concelier.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.