Files
git.stella-ops.org/docs/deploy/containers.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

159 lines
6.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Container Deployment Guide — AOC Update
> **Audience:** DevOps Guild, platform operators deploying StellaOps services.
> **Scope:** Deployment configuration changes required by the Aggregation-Only Contract (AOC), including schema validators, guard environment flags, and verifier identities.
This guide supplements existing deployment manuals with AOC-specific configuration. It assumes familiarity with the base Compose/Helm manifests described in `ops/deployment/` and `docs/modules/devops/architecture.md`.
---
## 1 · Schema constraint enablement
### 1.1 PostgreSQL constraints
- Apply CHECK constraints and NOT NULL rules to `advisory_raw` and `vex_raw` tables before enabling AOC guards.
- Before enabling constraints or the idempotency index, run the duplicate audit helper to confirm no conflicting raw advisories remain:
```bash
psql -d concelier -f ops/devops/scripts/check-advisory-raw-duplicates.sql -v LIMIT=200
```
Resolve any reported rows prior to rollout.
- Use the migration script provided in `ops/devops/scripts/apply-aoc-constraints.sql`:
```bash
kubectl exec -n concelier deploy/concelier-postgres -- \
psql -d concelier -f ops/devops/scripts/apply-aoc-constraints.sql
kubectl exec -n excititor deploy/excititor-postgres -- \
psql -d excititor -f ops/devops/scripts/apply-aoc-constraints.sql
```
- Constraints enforce required fields (`tenant`, `source`, `upstream`, `linkset`) and reject forbidden keys at DB level.
- Rollback plan: constraints can be dropped via the same script with `--remove` if required.
### 1.2 Migration order
1. Deploy constraints in maintenance window.
2. Roll out Concelier/Excititor images with guard middleware enabled (`AOC_GUARD_ENABLED=true`).
3. Run smoke tests (`stella sources ingest --dry-run` fixtures) before resuming production ingestion.
### 1.3Supersedes backfill verification
1. **Duplicate audit:** Confirm `psql -d concelier -f ops/devops/scripts/check-advisory-raw-duplicates.sql -v LIMIT=200` reports no conflicts before restarting Concelier with the new migrations.
2. **Post-migration check:** After the service restarts, validate that the `advisory` view points to `advisory_backup_20251028`:
```bash
psql -d concelier -c "SELECT viewname, definition FROM pg_views WHERE viewname = 'advisory';"
```
The definition should reference `advisory_backup_20251028`.
3. **Supersedes chain spot-check:** Inspect a sample set to ensure deterministic chaining:
```bash
psql -d concelier -c "
SELECT id, supersedes FROM advisory_raw
WHERE upstream_id IS NOT NULL
ORDER BY tenant, source_vendor, upstream_id, retrieved_at
LIMIT 5;"
```
Each revision should reference the previous `id` (or `null` for the first revision). Record findings in the change ticket before proceeding to production.
---
## 2·Container environment flags
Add the following environment variables to Concelier/Excititor deployments:
| Variable | Default | Description |
|----------|---------|-------------|
| `AOC_GUARD_ENABLED` | `true` | Enables `AOCWriteGuard` interception. Set `false` only for controlled rollback. |
| `AOC_ALLOW_SUPERSEDES_RETROFIT` | `false` | Allows temporary supersedes backfill during migration. Remove after cutover. |
| `AOC_METRICS_ENABLED` | `true` | Emits `ingestion_write_total`, `aoc_violation_total`, etc. |
| `AOC_TENANT_HEADER` | `X-Stella-Tenant` | Header name expected from Gateway. |
| `AOC_VERIFIER_USER` | `stella-aoc-verify` | Read-only service user used by UI/CLI verification. |
Compose snippet:
```yaml
environment:
- AOC_GUARD_ENABLED=true
- AOC_ALLOW_SUPERSEDES_RETROFIT=false
- AOC_METRICS_ENABLED=true
- AOC_TENANT_HEADER=X-Stella-Tenant
- AOC_VERIFIER_USER=stella-aoc-verify
```
Ensure `AOC_VERIFIER_USER` exists in Authority with `aoc:verify` scope and no write permissions.
---
## 3·Verifier identity
- Create a dedicated client (`stella-aoc-verify`) via Authority bootstrap:
```yaml
clients:
- clientId: stella-aoc-verify
grantTypes: [client_credentials]
scopes: [aoc:verify, advisory:read, vex:read]
tenants: [default]
```
- Store credentials in secret store (`Kubernetes Secret`, `Docker swarm secret`).
- Bind credentials to `stella aoc verify` CI jobs and Console verification service.
- Rotate quarterly; document in `ops/authority-key-rotation.md`.
---
## 4·Deployment steps
1. **Pre-checks:** Confirm database backups, alerting in maintenance mode, and staging environment validated.
2. **Apply validators:** Run scripts per §1.1.
3. **Update manifests:** Inject environment variables (§2) and mount guard configuration configmaps.
4. **Redeploy services:** Rolling restart Concelier/Excititor pods. Monitor `ingestion_write_total` for steady throughput.
5. **Seed verifier:** Deploy read-only verifier user and store credentials.
6. **Run verification:** Execute `stella aoc verify --since 24h` and ensure exit code `0`.
7. **Update dashboards:** Point Grafana panels to new metrics (`aoc_violation_total`).
8. **Record handoff:** Capture console screenshots and verification logs for release notes.
---
## 5·Offline Kit updates
- Ship validator scripts with Offline Kit (`offline-kit/scripts/apply-aoc-validators.js`).
- Include pre-generated verification reports for air-gapped deployments.
- Document offline CLI workflow in bundle README referencing `docs/modules/cli/guides/cli-reference.md`.
- Ensure `stella-aoc-verify` credentials are scoped to offline tenant and rotated during bundle refresh.
---
## 6·Rollback plan
1. Disable guard via `AOC_GUARD_ENABLED=false` on Concelier/Excititor and rollout.
2. Remove validators with the migration script (`--remove`).
3. Pause verification jobs to prevent noise.
4. Investigate and remediate upstream issues before re-enabling guards.
---
## 7·References
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
- [Authority scopes & tenancy](../security/authority-scopes.md)
- [Observability guide](../observability/observability.md)
- [CLI AOC commands](../modules/cli/guides/cli-reference.md)
- [Concelier architecture](../modules/concelier/architecture.md)
- [Excititor architecture](../modules/excititor/architecture.md)
---
## 8·Compliance checklist
- [ ] Validators documented and scripts referenced for online/offline deployments.
- [ ] Environment variables cover guard enablement, metrics, and tenant header.
- [ ] Read-only verifier user installation steps included.
- [ ] Offline kit instructions align with validator/verification workflow.
- [ ] Rollback procedure captured.
- [ ] Cross-links to AOC docs, Authority scopes, and observability guides present.
- [ ] DevOps Guild sign-off tracked (owner: @devops-guild, due 2025-10-29).
---
*Last updated: 2025-10-26 (Sprint19).*