Add authority bootstrap flows and Concelier ops runbooks
This commit is contained in:
		
							
								
								
									
										72
									
								
								docs/ops/feedser-cccs-operations.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										72
									
								
								docs/ops/feedser-cccs-operations.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,72 @@ | ||||
| # Feedser CCCS Connector Operations | ||||
|  | ||||
| This runbook covers day‑to‑day operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories. | ||||
|  | ||||
| ## 1. Configuration Checklist | ||||
|  | ||||
| - Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`. | ||||
| - Set the Feedser options before restarting workers. Example `feedser.yaml` snippet: | ||||
|  | ||||
| ```yaml | ||||
| feedser: | ||||
|   sources: | ||||
|     cccs: | ||||
|       feeds: | ||||
|         - language: "en" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat" | ||||
|         - language: "fr" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat" | ||||
|       maxEntriesPerFetch: 80        # increase temporarily for backfill runs | ||||
|       maxKnownEntries: 512 | ||||
|       requestTimeout: "00:00:30" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5 100 rows each as of 2025‑10‑14). The connector honours `maxEntriesPerFetch`, so leave it low for steady‑state and raise it for planned backfills. | ||||
|  | ||||
| ## 2. Telemetry & Logging | ||||
|  | ||||
| - **Metrics (Meter `StellaOps.Feedser.Source.Cccs`):** | ||||
|   - `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures` | ||||
|   - `cccs.fetch.documents`, `cccs.fetch.unchanged` | ||||
|   - `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine` | ||||
|   - `cccs.map.success`, `cccs.map.failures` | ||||
| - **Shared HTTP metrics** via `SourceDiagnostics`: | ||||
|   - `feedser.source.http.requests{feedser.source="cccs"}` | ||||
|   - `feedser.source.http.failures{feedser.source="cccs"}` | ||||
|   - `feedser.source.http.duration{feedser.source="cccs"}` | ||||
| - **Structured logs** | ||||
|   - `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…` | ||||
|   - `CCCS parse completed parsed=… failures=…` | ||||
|   - `CCCS map completed mapped=… failures=…` | ||||
|   - Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails. | ||||
|  | ||||
| Suggested Grafana alerts: | ||||
| - `increase(cccs.fetch.failures_total[15m]) > 0` | ||||
| - `rate(cccs.map.success_total[1h]) == 0` while other connectors are active | ||||
| - `histogram_quantile(0.95, rate(feedser_source_http_duration_bucket{feedser_source="cccs"}[1h])) > 5s` | ||||
|  | ||||
| ## 3. Historical Backfill Plan | ||||
|  | ||||
| 1. **Snapshot the source** – the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 2018‑06‑08 for EN, 2018‑06‑08 for FR). Mirror those responses into Offline Kit storage when operating air‑gapped. | ||||
| 2. **Stage ingestion**: | ||||
|    - Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Feedser workers. | ||||
|    - Run chained jobs until `pendingDocuments` drains:   | ||||
|      `stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map` | ||||
|    - Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete. | ||||
| 3. **Optional pagination sweep** – for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift. | ||||
| 4. **Language split** – keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number. | ||||
| 5. **Throttle planning** – schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250 ms request delay or raise it if mirrored traffic is not available. | ||||
|  | ||||
| ## 4. Selector & Sanitiser Notes | ||||
|  | ||||
| - `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`. | ||||
| - Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers. | ||||
| - `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation. | ||||
|  | ||||
| ## 5. Fixture Maintenance | ||||
|  | ||||
| - Regression fixtures live in `src/StellaOps.Feedser.Source.Cccs.Tests/Fixtures`. | ||||
| - Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/StellaOps.Feedser.Source.Cccs.Tests/StellaOps.Feedser.Source.Cccs.Tests.csproj`. | ||||
| - Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing. | ||||
		Reference in New Issue
	
	Block a user