- Implemented comprehensive unit tests for VexCandidateEmitter to validate candidate emission logic based on various scenarios including absent and present APIs, confidence thresholds, and rate limiting. - Added integration tests for SmartDiff PostgreSQL repositories, covering snapshot storage and retrieval, candidate storage, and material risk change handling. - Ensured tests validate correct behavior for storing, retrieving, and querying snapshots and candidates, including edge cases and expected outcomes.
203 lines
10 KiB
Markdown
Executable File
203 lines
10 KiB
Markdown
Executable File
# 17 · Security Hardening Guide — **Stella Ops**
|
||
*(v2.0 — 12 Jul 2025)*
|
||
|
||
> **Audience** — Site‑reliability and platform teams deploying **the open‑source Core** in production or restricted networks.
|
||
---
|
||
|
||
## 0 Table of Contents
|
||
|
||
1. Threat model (summary)
|
||
2. Host‑OS baseline
|
||
3. Container & runtime hardening
|
||
4. Network‑plane guidance
|
||
5. Secrets & key management
|
||
6. Image, SBOM & plug‑in supply‑chain controls
|
||
7. Logging, monitoring & audit
|
||
8. Update & patch strategy
|
||
9. Incident‑response workflow
|
||
10. Pen‑testing & continuous assurance
|
||
11. Contacts & vulnerability disclosure
|
||
12. Change log
|
||
|
||
---
|
||
|
||
## 1 Threat model (summary)
|
||
|
||
| Asset | Threats | Mitigations |
|
||
| -------------------- | --------------------- | ---------------------------------------------------------------------- |
|
||
| SBOMs & scan results | Disclosure, tamper | TLS‑in‑transit, read‑only Redis volume, RBAC, Cosign‑verified plug‑ins |
|
||
| Backend container | RCE, code‑injection | Distroless image, non‑root UID, read‑only FS, seccomp + `CAP_DROP:ALL` |
|
||
| Update artefacts | Supply‑chain attack | Cosign‑signed images & SBOMs, enforced by admission controller |
|
||
| Admin credentials | Phishing, brute force | OAuth 2.0 with 12‑h token TTL, optional mTLS |
|
||
|
||
---
|
||
|
||
## 2 Host‑OS baseline checklist
|
||
|
||
| Item | Recommended setting |
|
||
| ------------- | --------------------------------------------------------- |
|
||
| OS | Ubuntu 22.04 LTS (kernel ≥ 5.15) or Alma 9 |
|
||
| Patches | `unattended‑upgrades` or vendor‑equivalent enabled |
|
||
| Filesystem | `noexec,nosuid` on `/tmp`, `/var/tmp` |
|
||
| Docker Engine | v24.*, API socket root‑owned (`0660`) |
|
||
| Auditd | Watch `/etc/docker`, `/usr/bin/docker*` and Compose files |
|
||
| Time sync | `chrony` or `systemd‑timesyncd` |
|
||
|
||
---
|
||
|
||
## 3 Container & runtime hardening
|
||
|
||
### 3.1 Docker Compose reference (`compose-core.yml`)
|
||
|
||
```yaml
|
||
services:
|
||
backend:
|
||
image: registry.stella-ops.org/stella-ops/stella-ops:<PINNED_TAG_OR_DIGEST>
|
||
user: "101:101" # non‑root
|
||
read_only: true
|
||
security_opt:
|
||
- "no-new-privileges:true"
|
||
- "seccomp:./seccomp-backend.json"
|
||
cap_drop: [ALL]
|
||
tmpfs:
|
||
- /tmp:size=64m,exec,nosymlink
|
||
environment:
|
||
- ASPNETCORE_URLS=https://+:8080
|
||
- TLSPROVIDER=OpenSslGost
|
||
depends_on: [redis]
|
||
networks: [core-net]
|
||
healthcheck:
|
||
test: ["CMD", "wget", "-qO-", "https://localhost:8080/health"]
|
||
interval: 30s
|
||
timeout: 5s
|
||
retries: 5
|
||
|
||
redis:
|
||
image: redis:7.2-alpine
|
||
command: ["redis-server", "--requirepass", "${REDIS_PASS}", "--rename-command", "FLUSHALL", ""]
|
||
user: "redis"
|
||
read_only: true
|
||
cap_drop: [ALL]
|
||
tmpfs:
|
||
- /data
|
||
networks: [core-net]
|
||
|
||
networks:
|
||
core-net:
|
||
driver: bridge
|
||
```
|
||
|
||
No dedicated "Redis" or "PostgreSQL" sub-nets are declared; the single bridge network suffices for the default stack.
|
||
|
||
### 3.2 Kubernetes deployment highlights
|
||
|
||
Use a separate NetworkPolicy that only allows egress from backend to Redis :6379.
|
||
securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false, drop all capabilities.
|
||
PodDisruptionBudget of minAvailable: 1.
|
||
Optionally add CosignVerified=true label enforced by an admission controller (e.g. Kyverno or Connaisseur).
|
||
|
||
## 4 Network‑plane guidance
|
||
|
||
| Plane | Recommendation |
|
||
| ------------------ | -------------------------------------------------------------------------- |
|
||
| North‑south | Terminate TLS 1.2+ (OpenSSL‑GOST default). Use LetsEncrypt or internal CA. |
|
||
| East-west | Compose bridge or K8s ClusterIP only; no public Redis/PostgreSQL ports. |
|
||
| Ingress controller | Limit methods to GET, POST, PATCH (no TRACE). |
|
||
| Rate‑limits | 40 rps default; tune ScannerPool.Workers and ingress limit‑req to match. |
|
||
|
||
## 5 Secrets & key management
|
||
|
||
| Secret | Storage | Rotation |
|
||
| --------------------------------- | ---------------------------------- | ----------------------------- |
|
||
| **Client‑JWT (offline)** | `/var/lib/stella/tokens/client.jwt` (root : 600) | **30 days** – provided by each OUK |
|
||
| REDIS_PASS | Docker/K8s secret | 90 days |
|
||
| OAuth signing key | /keys/jwt.pem (read‑only mount) | 180 days |
|
||
| Cosign public key | /keys/cosign.pub baked into image; | change on every major release |
|
||
| Trivy DB mirror token (if remote) | Secret + read‑only | 30 days |
|
||
|
||
Never bake secrets into images; always inject at runtime.
|
||
|
||
> **Operational tip:** schedule a cron reminding ops 5 days before
|
||
> `client.jwt` expiry. The backend also emits a Prometheus metric
|
||
> `stella_quota_token_days_remaining`.
|
||
|
||
## 6 Image, SBOM & plug‑in supply‑chain controls
|
||
|
||
* Images — Pull by digest not latest; verify:
|
||
|
||
```bash
|
||
cosign verify ghcr.io/stellaops/backend@sha256:<DIGEST> \
|
||
--key https://stella-ops.org/keys/cosign.pub
|
||
```
|
||
|
||
* SBOM — Each release ships an SPDX file; store alongside images for audit.
|
||
* Third‑party plug‑ins — Place in /plugins/; backend will:
|
||
* Validate Cosign signature.
|
||
* Check [StellaPluginVersion("major.minor")].
|
||
* Refuse to start if Security.DisablePluginUnsigned=false (default).
|
||
|
||
## 7 Logging, monitoring & audit
|
||
|
||
| Control | Implementation |
|
||
| ------------ | ----------------------------------------------------------------- |
|
||
| Log format | Serilog JSON; ship via Fluent‑Bit to ELK or Loki |
|
||
| Metrics | Prometheus /metrics endpoint; default Grafana dashboard in infra/ |
|
||
| Audit events | Redis stream audit; export daily to SIEM |
|
||
| Alert rules | Feed age ≥ 48 h, P95 wall‑time > 5 s, Redis used memory > 75 % |
|
||
|
||
### 7.1 Concelier authorization audits
|
||
|
||
- Enable the Authority integration for Concelier (`authority.enabled=true`). Keep
|
||
`authority.allowAnonymousFallback` set to `true` only during migration and plan
|
||
to disable it before **2025-12-31 UTC** so the `/jobs*` surface always demands
|
||
a bearer token.
|
||
- Store the Authority client secret using Docker/Kubernetes secrets and point
|
||
`authority.clientSecretFile` at the mounted path; the value is read at startup
|
||
and never logged.
|
||
- Watch the `Concelier.Authorization.Audit` logger. Each entry contains the HTTP
|
||
status, subject, client ID, scopes, remote IP, and a boolean `bypass` flag
|
||
showing whether a network bypass CIDR allowed the request. Configure your SIEM
|
||
to alert when unauthenticated requests (`status=401`) appear with
|
||
`bypass=true`, or when unexpected scopes invoke job triggers.
|
||
Detailed monitoring and response guidance lives in `docs/modules/concelier/operations/authority-audit-runbook.md`.
|
||
|
||
## 8 Update & patch strategy
|
||
|
||
| Layer | Cadence | Method |
|
||
| -------------------- | -------------------------------------------------------- | ------------------------------ |
|
||
| Backend & CLI images | Monthly or CVE‑driven docker pull + docker compose up -d |
|
||
| Trivy DB | 24 h scheduler via Concelier (vulnerability ingest/merge/export service) | configurable via Concelier scheduler options |
|
||
| Docker Engine | vendor LTS | distro package manager |
|
||
| Host OS | security repos enabled | unattended‑upgrades |
|
||
|
||
## 9 Incident‑response workflow
|
||
|
||
* Detect — PagerDuty alert from Prometheus or SIEM.
|
||
* Contain — Stop affected Backend container; isolate Redis RDB snapshot.
|
||
* Eradicate — Pull verified images, redeploy, rotate secrets.
|
||
* Recover — Restore RDB, replay SBOMs if history lost.
|
||
* Review — Post‑mortem within 72 h; create follow‑up issues.
|
||
* Escalate P1 incidents to <security@stella‑ops.org> (24 × 7).
|
||
|
||
|
||
## 10 Pen‑testing & continuous assurance
|
||
|
||
| Control | Frequency | Tool/Runner |
|
||
|----------------------|-----------------------|-------------------------------------------|
|
||
| OWASP ZAP baseline | Each merge to `main` | GitHub Action `zap-baseline-scan` |
|
||
| Dependency scanning | Per pull request | Trivy FS + Dependabot |
|
||
| External red‑team | Annual or pre‑GA | CREST‑accredited third‑party |
|
||
|
||
## 11 Vulnerability disclosure & contact
|
||
|
||
* Preferred channel: security@stella‑ops.org (GPG key on website).
|
||
* Coordinated disclosure reward: public credit and swag (no monetary bounty at this time).
|
||
|
||
## 12 Change log
|
||
|
||
| Version | Date | Notes |
|
||
| ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------- |
|
||
| v2.0 | 2025‑07‑12 | Full overhaul: host‑OS baseline, supply‑chain signing, removal of unnecessary sub‑nets, role‑based contact e‑mail, K8s guidance. |
|
||
| v1.1 | 2025‑07‑09 | Minor fence fixes. |
|
||
| v1.0 | 2025‑07‑09 | Original draft. |
|