up
This commit is contained in:
		
							
								
								
									
										111
									
								
								docs/ops/feedser-ghsa-operations.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										111
									
								
								docs/ops/feedser-ghsa-operations.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,111 @@
 | 
			
		||||
# Feedser GHSA Connector – Operations Runbook
 | 
			
		||||
 | 
			
		||||
_Last updated: 2025-10-12_
 | 
			
		||||
 | 
			
		||||
## 1. Overview
 | 
			
		||||
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
 | 
			
		||||
 | 
			
		||||
## 2. Rate-limit telemetry
 | 
			
		||||
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
 | 
			
		||||
 | 
			
		||||
| Metric | Description | Tags |
 | 
			
		||||
|--------|-------------|------|
 | 
			
		||||
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
 | 
			
		||||
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
 | 
			
		||||
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
 | 
			
		||||
 | 
			
		||||
### Dashboards & alerts
 | 
			
		||||
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
 | 
			
		||||
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
 | 
			
		||||
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
 | 
			
		||||
 | 
			
		||||
## 3. Logging signals
 | 
			
		||||
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
 | 
			
		||||
```
 | 
			
		||||
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource}
 | 
			
		||||
```
 | 
			
		||||
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
 | 
			
		||||
 | 
			
		||||
## 4. Configuration knobs (`feedser.yaml`)
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  sources:
 | 
			
		||||
    ghsa:
 | 
			
		||||
      apiToken: "${GITHUB_PAT}"
 | 
			
		||||
      pageSize: 50
 | 
			
		||||
      requestDelay: "00:00:00.200"
 | 
			
		||||
      failureBackoff: "00:05:00"
 | 
			
		||||
      rateLimitWarningThreshold: 500    # warn below this many remaining calls
 | 
			
		||||
      secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Recommendations
 | 
			
		||||
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
 | 
			
		||||
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
 | 
			
		||||
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHub’s secondary-limit guidance.
 | 
			
		||||
 | 
			
		||||
#### Default job schedule
 | 
			
		||||
 | 
			
		||||
| Job kind | Cron | Timeout | Lease |
 | 
			
		||||
|----------|------|---------|-------|
 | 
			
		||||
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
 | 
			
		||||
 | 
			
		||||
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `feedser.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
 | 
			
		||||
 | 
			
		||||
## 5. Provisioning credentials
 | 
			
		||||
 | 
			
		||||
Feedser requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `feedser.sources.ghsa.apiToken`.
 | 
			
		||||
 | 
			
		||||
### Docker Compose (stack operators)
 | 
			
		||||
```yaml
 | 
			
		||||
services:
 | 
			
		||||
  feedser:
 | 
			
		||||
    environment:
 | 
			
		||||
      FEEDSER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
 | 
			
		||||
    secrets:
 | 
			
		||||
      - ghsa_pat
 | 
			
		||||
 | 
			
		||||
secrets:
 | 
			
		||||
  ghsa_pat:
 | 
			
		||||
    file: ./secrets/ghsa_pat.txt  # contains only the PAT value
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Helm values (cluster operators)
 | 
			
		||||
```yaml
 | 
			
		||||
feedser:
 | 
			
		||||
  extraEnv:
 | 
			
		||||
    - name: FEEDSER__SOURCES__GHSA__APITOKEN
 | 
			
		||||
      valueFrom:
 | 
			
		||||
        secretKeyRef:
 | 
			
		||||
          name: feedser-ghsa
 | 
			
		||||
          key: apiToken
 | 
			
		||||
 | 
			
		||||
extraSecrets:
 | 
			
		||||
  feedser-ghsa:
 | 
			
		||||
    apiToken: "<paste PAT here or source from external secret store>"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
After rotating the PAT, restart the Feedser workers (or run `kubectl rollout restart deployment/feedser`) to ensure the configuration reloads.
 | 
			
		||||
 | 
			
		||||
When enabling GHSA the first time, run a staged backfill:
 | 
			
		||||
 | 
			
		||||
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
 | 
			
		||||
2. Watch `feedser.jobs.health` for the GHSA jobs until they report `healthy`.
 | 
			
		||||
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
 | 
			
		||||
 | 
			
		||||
## 6. Runbook steps when throttled
 | 
			
		||||
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
 | 
			
		||||
2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff.
 | 
			
		||||
3. If rate limits stay exhausted:
 | 
			
		||||
   - Verify no other jobs are sharing the PAT.
 | 
			
		||||
   - Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
 | 
			
		||||
   - Consider provisioning a dedicated PAT (GHSA permissions only) for Feedser.
 | 
			
		||||
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
 | 
			
		||||
 | 
			
		||||
## 7. Alert integration quick reference
 | 
			
		||||
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) – use `histogram_quantile(0.99, ...)` to trend capacity.
 | 
			
		||||
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
 | 
			
		||||
- Grafana: stack remaining + used to visualise total limit per resource.
 | 
			
		||||
		Reference in New Issue
	
	Block a user