git.stella-ops.org/docs/17_SECURITY_HARDENING_GUIDE.md

#  17 · Security Hardening Guide — **Stella Ops**
*(v2.0 — 12 Jul 2025)*

> **Audience** — Site‑reliability and platform teams deploying **the open‑source Core** in production or restricted networks.
---

##  0 Table of Contents

1. Threat model (summary)
2. Host‑OS baseline
3. Container & runtime hardening
4. Network‑plane guidance
5. Secrets & key management
6. Image, SBOM & plug‑in supply‑chain controls
7. Logging, monitoring & audit
8. Update & patch strategy
9. Incident‑response workflow
10. Pen‑testing & continuous assurance
11. Contacts & vulnerability disclosure
12. Change log

---

##  1 Threat model (summary)

| Asset                | Threats               | Mitigations                                                            |
| -------------------- | --------------------- | ---------------------------------------------------------------------- |
| SBOMs & scan results | Disclosure, tamper    | TLS‑in‑transit, read‑only Redis volume, RBAC, Cosign‑verified plug‑ins |
| Backend container    | RCE, code‑injection   | Distroless image, non‑root UID, read‑only FS, seccomp + `CAP_DROP:ALL` |
| Update artefacts     | Supply‑chain attack   | Cosign‑signed images & SBOMs, enforced by admission controller         |
| Admin credentials    | Phishing, brute force | OAuth 2.0 with 12‑h token TTL, optional mTLS                           |

---

##  2 Host‑OS baseline checklist

| Item          | Recommended setting                                       |
| ------------- | --------------------------------------------------------- |
| OS            | Ubuntu 22.04 LTS (kernel ≥ 5.15) or Alma 9                |
| Patches       | `unattended‑upgrades` or vendor‑equivalent enabled        |
| Filesystem    | `noexec,nosuid` on `/tmp`, `/var/tmp`                     |
| Docker Engine | v24.*, API socket root‑owned (`0660`)                     |
| Auditd        | Watch `/etc/docker`, `/usr/bin/docker*` and Compose files |
| Time sync     | `chrony` or `systemd‑timesyncd`                           |

---

##  3 Container & runtime hardening

###  3.1 Docker Compose reference (`compose-core.yml`)

```yaml
services:
  backend:
    image: registry.stella-ops.org/stella-ops/stella-ops:<PINNED_TAG_OR_DIGEST>
    user: "101:101"              # non‑root
    read_only: true
    security_opt:
      - "no-new-privileges:true"
      - "seccomp:./seccomp-backend.json"
    cap_drop: [ALL]
    tmpfs:
      - /tmp:size=64m,exec,nosymlink
    environment:
      - ASPNETCORE_URLS=https://+:8080
      - TLSPROVIDER=OpenSslGost
    depends_on: [redis]
    networks: [core-net]
    healthcheck:
      test: ["CMD", "wget", "-qO-", "https://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7.2-alpine
    command: ["redis-server", "--requirepass", "${REDIS_PASS}", "--rename-command", "FLUSHALL", ""]
    user: "redis"
    read_only: true
    cap_drop: [ALL]
    tmpfs:
      - /data
    networks: [core-net]

networks:
  core-net:
    driver: bridge
```

No dedicated "Redis" or "PostgreSQL" sub-nets are declared; the single bridge network suffices for the default stack.

###  3.2 Kubernetes deployment highlights

Use a separate NetworkPolicy that only allows egress from backend to Redis :6379.
securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false, drop all capabilities.
PodDisruptionBudget of minAvailable: 1.
Optionally add CosignVerified=true label enforced by an admission controller (e.g. Kyverno or Connaisseur).

##  4 Network‑plane guidance

| Plane              | Recommendation                                                             |
| ------------------ | -------------------------------------------------------------------------- |
| North‑south        | Terminate TLS 1.2+ (OpenSSL‑GOST default). Use LetsEncrypt or internal CA. |
| East-west          | Compose bridge or K8s ClusterIP only; no public Redis/PostgreSQL ports.    |
| Ingress controller | Limit methods to GET, POST, PATCH (no TRACE).                              |
| Rate‑limits        | 40 rps default; tune ScannerPool.Workers and ingress limit‑req to match.   |

##  5 Secrets & key management

| Secret                            | Storage                            | Rotation                      |
| --------------------------------- | ---------------------------------- | ----------------------------- |
| **Client‑JWT (offline)** | `/var/lib/stella/tokens/client.jwt` (root : 600) | **30 days** – provided by each OUK |
| REDIS_PASS                        | Docker/K8s secret                  | 90 days                       |
| OAuth signing key                 | /keys/jwt.pem (read‑only mount)    | 180 days                      |
| Cosign public key                 | /keys/cosign.pub baked into image; | change on every major release |
| Trivy DB mirror token (if remote) | Secret + read‑only                 | 30 days                       |

Never bake secrets into images; always inject at runtime.

> **Operational tip:** schedule a cron reminding ops 5 days before
> `client.jwt` expiry. The backend also emits a Prometheus metric
> `stella_quota_token_days_remaining`.

##  6 Image, SBOM & plug‑in supply‑chain controls

* Images — Pull by digest not latest; verify:

```bash
cosign verify ghcr.io/stellaops/backend@sha256:<DIGEST> \
  --key https://stella-ops.org/keys/cosign.pub
```

* SBOM — Each release ships an SPDX file; store alongside images for audit.
* Third‑party plug‑ins — Place in /plugins/; backend will:
* Validate Cosign signature.
* Check [StellaPluginVersion("major.minor")].
* Refuse to start if Security.DisablePluginUnsigned=false (default).

##  7 Logging, monitoring & audit

| Control      | Implementation                                                    |
| ------------ | ----------------------------------------------------------------- |
| Log format   | Serilog JSON; ship via Fluent‑Bit to ELK or Loki                  |
| Metrics      | Prometheus /metrics endpoint; default Grafana dashboard in infra/ |
| Audit events | Redis stream audit; export daily to SIEM                          |
| Alert rules  | Feed age  ≥ 48 h, P95 wall‑time > 5 s, Redis used memory > 75 %   |

###  7.1 Concelier authorization audits

- Enable the Authority integration for Concelier (`authority.enabled=true`). Keep
  `authority.allowAnonymousFallback` set to `true` only during migration and plan
  to disable it before **2025-12-31 UTC** so the `/jobs*` surface always demands
  a bearer token.
- Store the Authority client secret using Docker/Kubernetes secrets and point
  `authority.clientSecretFile` at the mounted path; the value is read at startup
  and never logged.
- Watch the `Concelier.Authorization.Audit` logger. Each entry contains the HTTP
  status, subject, client ID, scopes, remote IP, and a boolean `bypass` flag
  showing whether a network bypass CIDR allowed the request. Configure your SIEM
  to alert when unauthenticated requests (`status=401`) appear with
  `bypass=true`, or when unexpected scopes invoke job triggers.
  Detailed monitoring and response guidance lives in `docs/modules/concelier/operations/authority-audit-runbook.md`.

##  8 Update & patch strategy

| Layer                | Cadence                                                  | Method                         |
| -------------------- | -------------------------------------------------------- | ------------------------------ |
| Backend & CLI images | Monthly or CVE‑driven docker pull + docker compose up -d |
| Trivy DB             | 24 h scheduler via Concelier (vulnerability ingest/merge/export service) | configurable via Concelier scheduler options |
| Docker Engine        | vendor LTS                                               | distro package manager         |
| Host OS              | security repos enabled                                   | unattended‑upgrades            |

##  9 Incident‑response workflow

* Detect — PagerDuty alert from Prometheus or SIEM.
* Contain — Stop affected Backend container; isolate Redis RDB snapshot.
* Eradicate — Pull verified images, redeploy, rotate secrets.
* Recover — Restore RDB, replay SBOMs if history lost.
* Review — Post‑mortem within 72 h; create follow‑up issues.
* Escalate P1 incidents to <security@stella‑ops.org> (24 × 7).


##  10 Pen‑testing & continuous assurance

| Control              | Frequency             | Tool/Runner                               |
|----------------------|-----------------------|-------------------------------------------|
| OWASP ZAP baseline   | Each merge to `main`  | GitHub Action `zap-baseline-scan`         |
| Dependency scanning  | Per pull request      | Trivy FS + Dependabot                     |
| External red‑team    | Annual or pre‑GA      | CREST‑accredited third‑party              |

##  11 Vulnerability disclosure & contact

* Preferred channel: security@stella‑ops.org (GPG key on website).
* Coordinated disclosure reward: public credit and swag (no monetary bounty at this time).

##  12 Change log

| Version | Date       | Notes                                                                                                                            |
| ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------- |
| v2.0    | 2025‑07‑12 | Full overhaul: host‑OS baseline, supply‑chain signing, removal of unnecessary sub‑nets, role‑based contact e‑mail, K8s guidance. |
| v1.1    | 2025‑07‑09 | Minor fence fixes.                                                                                                               |
| v1.0    | 2025‑07‑09 | Original draft.                                                                                                                  |