Files
git.stella-ops.org/docs/ops/feedser-ghsa-operations.md
master 607e72e2a1
Some checks failed
Build Test Deploy / docs (push) Has been cancelled
Build Test Deploy / deploy (push) Has been cancelled
Build Test Deploy / build-test (push) Has been cancelled
Build Test Deploy / authority-container (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
up
2025-10-12 20:37:18 +03:00

5.3 KiB
Raw Blame History

Feedser GHSA Connector Operations Runbook

Last updated: 2025-10-12

1. Overview

The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API /security/advisories endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.

2. Rate-limit telemetry

The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:

Metric Description Tags
ghsa.ratelimit.limit (histogram) Samples the reported request quota at fetch time. phase = list or detail, resource (e.g., core).
ghsa.ratelimit.remaining (histogram) Remaining requests returned by X-RateLimit-Remaining. phase, resource.
ghsa.ratelimit.reset_seconds (histogram) Seconds until X-RateLimit-Reset. phase, resource.
ghsa.ratelimit.exhausted (counter) Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. phase.

Dashboards & alerts

  • Plot ghsa.ratelimit.remaining as the latest value to watch the runway. Alert when the value stays below RateLimitWarningThreshold (default 500) for more than 5 minutes.
  • Raise a separate alert on increase(ghsa.ratelimit.exhausted[15m]) > 0 to catch hard throttles.
  • Overlay ghsa.fetch.attempts vs ghsa.fetch.failures to confirm retries are effective.

3. Logging signals

When X-RateLimit-Remaining falls below RateLimitWarningThreshold, the connector emits:

GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource}

When GitHub reports zero remaining calls, the connector logs and sleeps for the reported Retry-After/X-RateLimit-Reset interval (falling back to SecondaryRateLimitBackoff).

4. Configuration knobs (feedser.yaml)

feedser:
  sources:
    ghsa:
      apiToken: "${GITHUB_PAT}"
      pageSize: 50
      requestDelay: "00:00:00.200"
      failureBackoff: "00:05:00"
      rateLimitWarningThreshold: 500    # warn below this many remaining calls
      secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After

Recommendations

  • Increase requestDelay in air-gapped or burst-heavy deployments to smooth token consumption.
  • Lower rateLimitWarningThreshold only if your dashboards already page on the new histogram; never set it negative.
  • For bots using a low-privilege PAT, keep secondaryRateLimitBackoff at ≥60 seconds to respect GitHubs secondary-limit guidance.

Default job schedule

Job kind Cron Timeout Lease
source:ghsa:fetch 1,11,21,31,41,51 * * * * 6 minutes 4 minutes
source:ghsa:parse 3,13,23,33,43,53 * * * * 5 minutes 4 minutes
source:ghsa:map 5,15,25,35,45,55 * * * * 5 minutes 4 minutes

These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via feedser.jobs.definitions[...] when coordinating multiple connectors on the same runner.

5. Provisioning credentials

Feedser requires a GitHub personal access token (classic) with the read:org and security_events scopes to pull GHSA data. Store it as a secret and reference it via feedser.sources.ghsa.apiToken.

Docker Compose (stack operators)

services:
  feedser:
    environment:
      FEEDSER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
    secrets:
      - ghsa_pat

secrets:
  ghsa_pat:
    file: ./secrets/ghsa_pat.txt  # contains only the PAT value

Helm values (cluster operators)

feedser:
  extraEnv:
    - name: FEEDSER__SOURCES__GHSA__APITOKEN
      valueFrom:
        secretKeyRef:
          name: feedser-ghsa
          key: apiToken

extraSecrets:
  feedser-ghsa:
    apiToken: "<paste PAT here or source from external secret store>"

After rotating the PAT, restart the Feedser workers (or run kubectl rollout restart deployment/feedser) to ensure the configuration reloads.

When enabling GHSA the first time, run a staged backfill:

  1. Trigger source:ghsa:fetch manually (CLI or API) outside of peak hours.
  2. Watch feedser.jobs.health for the GHSA jobs until they report healthy.
  3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).

6. Runbook steps when throttled

  1. Check ghsa.ratelimit.exhausted for the affected phase (list vs detail).
  2. Confirm the connector is delaying—logs will show GHSA rate limit exhausted... with the chosen backoff.
  3. If rate limits stay exhausted:
    • Verify no other jobs are sharing the PAT.
    • Temporarily reduce MaxPagesPerFetch or PageSize to shrink burst size.
    • Consider provisioning a dedicated PAT (GHSA permissions only) for Feedser.
  4. After the quota resets, reset rateLimitWarningThreshold/requestDelay to their normal values and monitor the histograms for at least one hour.

7. Alert integration quick reference

  • Prometheus: ghsa_ratelimit_remaining_bucket (from histogram) use histogram_quantile(0.99, ...) to trend capacity.
  • VictoriaMetrics: LAST_over_time(ghsa_ratelimit_remaining_sum[5m]) for simple last-value graphs.
  • Grafana: stack remaining + used to visualise total limit per resource.